This workshop will be held in rooms L401-3.

## Morning session: 9:00am-12:30pm

9:00-9:25 **Yoshua Bengio: Introduction**. An overview of representation learning methods and the challenges we face. This talk will include a technical overview of RBMs and autoencoders, in order to make the subsequent talks more accessible.

9:25-10:05 **Invited talk: Grégoire Montavon: “Deep Learning of Molecular Electronic Properties in Chemical Compound Space”** Modern DFT-based quantum chemistry methods are enabling the systematic screening of the chemical compound space for particular molecular electronic properties, with important applications to drug design and novel materials discovery. Yet, current methods are often application specific, and their many hyperparameters have unpredictable effects on model’s accuracy. On the other hand, machine learning offers a principled way of minimizing the risk of the prediction problem and can automatically extract the desired problem relevant subspace from the input representation. In this talk, I will present our first attempt to build a machine learning model of quantum chemistry in the context of predicting molecular electronic properties from raw molecular geometries. Our machine learning model is trained on a controlled testbed of small organic molecules taken from the GDB-13 database. The resulting “quantum machine” achieves an error rate which competes in some cases with DFT-based methods at a marginal computational cost.

10:05-10:30 **Dumitru Erhan: Overview of the Black Box challenge**. One of the challenges highlighted by this workshop is the challenge of learning with a new or unknown modality. For this challenge, we hosted a Kaggle competition using obfuscated data, so that human knowledge of the data domain was not helpful. The winner of the challenge, David Thaler, was not able to attend, so Dumitru summarizes his methods here, along with the method of two other top-scoring Kagglers, Dong-Hyun Lee and Lukasz Romaszko.

10:30-10:50 **Coffee Break**

10:50-11:30 **Invited talk: Ruslan Salakhutdinov: “Annealing Between Distributions by Averaging Moments”**

Many powerful Monte Carlo techniques for estimating partition functions, such as annealed importance sampling (AIS), are based on sampling from a sequence of intermediate distributions which interpolate between a tractable initial distribution and an intractable target distribution. The near-universal practice is to use geometric averages of the initial and target distributions, but alternative paths can perform substantially better. We present a novel sequence of intermediate distributions for exponential families: averaging the moments of the initial and target distributions. We derive an asymptotically optimal piecewise linear schedule for the moments path and show that it performs at least as well as geometric averages with a linear schedule. Moment averaging performs well empirically at estimating partition functions of restricted Boltzmann machines (RBMs), which form the building blocks of many deep learning models.

Joint work with Roger Grosse and Chris Maddison

11:30-12:30 **Poster Session**

## Afternoon session: 2:00pm-6:00pm

2:00-2:10 **Ian Goodfellow: Overview of Facial Expression Recognition and Multimodal Learning challenges.** Two additional challenges were both won by Yichuan Tang, so we combine their presentation. In the Facial Expression Recognition challenge, the organizers introduced a new facial expression recognition dataset developed by Pierre-Luc Carrier and Aaron Courville. This challenge offered the chance to compare the performance of representation learning algorithms to algorithms that use hand-engineered features on a real-world task. In the Multimodal Learning challenge, competitors were encouraged to learn to represent text and images in the same representation space via a contest to match images with their textual description.

2:10-2:40 **Invited talk: Yichuan Tang: “Deep learning with support vector machines.”** The winner of the Facial Expression Recognition and Multimodal Learning challenges describes his methods.

2:40-3:20 **Invited talk: Ilya Sutskever: “Learning Control Laws with Recurrent Neural Networks”**

3:20-3:50 **Contributed talk: Nitish Srivastava: “Discriminative Transfer Learning with Tree-based Priors”.** This paper proposes a way of improving classification performance for classes which have very few training examples. The key idea is to discover classes which are similar and transfer knowledge among them. Our method organizes the classes into a tree hierarchy. The tree structure can be used to impose a prior over classification parameters. We show that these priors can be combined with discriminative models such as deep neural networks. Our method benefits from the power of discriminative training of deep neural networks, at the same time using tree-based priors over classification parameters. We also propose an algorithm for learning the underlying tree structure. This gives the model some flexibility to tune the tree so that the tree is pertinent to task being solved. We show that the model can transfer knowledge across related classes using fixed trees. Moreover, it can learn new meaningful trees usually leading to improved performance. Our method achieves state-of-the-art classification results on the CIFAR-100 image dataset and the MIR Flickr multimodal dataset.

3:50-4:10 **Coffee break**

4:10-4:40 **Invited talk: Yichuan Tang: A New Learning Algorithm for Stochastic Feedforward Neural Networks** Having won two contests, Yichuan won two talks. We invite him back to talk about new research not applied to the contests: Multilayer perceptrons (MLPs) or artificial neural nets are popular models used for non-linear regression and classification tasks. As regressors, MLPs model the conditional distribution of the predictor variables y given the input variables x. However, this predictive distribution is assumed to be unimodal (e.g. Normal distribution). For tasks such as structured prediction problems, the conditional distribution should be multi-modal, or one-to-many mappings. By turning the hidden variables in a MLP into stochastic nodes rather than deterministic ones, Sigmoid Belief Nets can induce a rich multimodal distribution in the output space. Learning Sigmoid Belief Nets is very slow and modeling real-valued data is difficult. In this paper, we propose a stochastic feedforward network where its hidden layers have both deterministic and stochastic variables. A new Generalized EM training procedure using importance sampling allows us to efficiently learn complicated conditional distributions. We demonstrate the superiority of our model to conditional Restricted Boltzmann Machines and Mixture Density Networks on 3 synthetic datasets and modeling facial expressions. Moreover, we show that latent features of our model improves classification and provide additional qualitative results on color images.

4:40-5:30 **Invited talk: Arthur Szlam: “Some variations on K-Subspaces.”** I will discuss two variations to the K-subspaces algorithm. The first allows the algorithm to ignore gross corruptions of the input; for this we will need a brief digression on robust pca. The second variation retools the algorithm as a nonlinear regressor, using a tree based encoder.

5:30-6:00 **Panel Discussion: Building representations for the future!** Our speakers throughout the day discuss questions from the audience as well as topics prepared by the organizers.