Your models in Pylearn2

Who should read this

We recommend you spend some time with Pylearn2 and read some of our other tutorials before starting with this minimalistic technique. If you are completely new to Pylearn2, have a look at the softmax regression tutorial.

Pylearn2 is great for many things; we’ll highlight two here.

  • It allows you to experiment with new ideas without much implementation overhead. The library was built to be modular, and it aims to be usable without an extensive knowledge of the codebase. Writing a new model from scratch is usually pretty fast once you know what to do and where to look.
  • It has an interface (YAML) that allows one to decouple implementation from experimental choices, enabling experiments to be constructed in a light and readable fashion.

Obviously, there is always a trade-off between being user-friendly and being flexible, and Pylearn2 is no exception. For instance, users looking for a way to work with sequential data might have a harder time getting started (although we’re working to make this experience better).

In this post, we will assume that you have built a regression or classification model with Theano and that the training data can be cast into two matrices, one for training examples and one for training targets. People with different requirements may need to work a little more (e.g. by figuring out how to put their data inside Pylearn2). This tutorial contains useful information for anyone interested in porting a model to Pylearn2.

How is Pylearn2 used?

While many researchers use Pylearn2 as their primary research tool, this doesn’t necessarily mean they know or use every feature in Pylearn2. In fact, you can prototype new models in a very Theano-like fashion: write a model as a big monolithic block of hard coded Theano expressions, and wrap that up in the minimal amount of code necessary to be able to plug a model into Pylearn2. This bare minimum is what we’ll explain here.

The resulting model may be hard to extend, but it represents a good starting point. As you explore new ideas and change the code, you can gradually make it more flexible: a hard coded input dimension gets factored out as a constructor argument, functions being composed are separated into layers, etc.

Our point: it is alright to stick to the bare minimum when developing a model for Pylearn2. Your code probably won’t satisfy any other use cases than your own, but this is something that you can change gradually as you go. There’s no need to overcomplicate things when you start.

The bare minimum

Let’s look at that bare minimum. It involves writing exactly two subclasses:

  • One subclass of pylearn2.costs.cost.Cost
  • One subclass of pylearn2.models.model.Model

Need more than that? Nope. That’s it! Let’s have a look.

It all starts with a cost expression

In the scenario we’re describing, your model maps an input to an output, the output is compared with some ground truth using some measure of dissimilarity, and the parameters of the model are changed to reduce this measure using gradient information.

It is therefore natural that the object that interfaces between the model and the training algorithm represents a cost. The base class for this object is pylearn2.costs.cost.Cost and does three main things:

  • It describes what data it needs to perform its duty and how it should be formatted.
  • It computes the cost expression by feeding the input to the model and receiving its output.
  • It differentiates the cost expression with respect to the model parameter and returns the gradients to the training algorithm.

What’s nice about Cost is if you follow the guidelines we’re about to describe, you only have to worry about the cost expression; the gradient part is all handled by the Cost base class, and a very useful DefaultDataSpecsMixin mixin subclass is defined to handle the data description part (more about that when we look at the Model subclass).

Let’s look at how the subclass should look:

from pylearn2.costs.cost import Cost, DefaultDataSpecsMixin


class MyCostSubclass(DefaultDataSpecsMixin, Cost):
    # Here it is assumed that we are doing supervised learning
    supervised = True

    def expr(self, model, data, **kwargs):
        space, source = self.get_data_specs(model)
        space.validate(data)

        inputs, targets = data
        outputs = model.some_method_for_outputs(inputs)
        loss = # some loss measure involving outputs and targets
        return loss

The supervised class attribute is used by DefaultDataSpecsMixin to know how to specify the data requirements. If it is set to True, the cost will expect to receive inputs and targets, and if it is set to False, the cost will expect to receive inputs only. In the example, it is assumed that we are doing supervised learning, so we set supervised to True.

The first two lines of expr do some basic input checking and should always be included at the beginning of your expr method. Without going too much into detail, space.validate(data) will make sure that the data you get is the data you requested (e.g. if you do supervised learning, you need an input a tensor variable and a target tensor variable). How to determine “what you need” will be covered when we look at the Model subclass.

In that case, data is a tuple containing the inputs as the first element and the targets as the second element.

We then get the model output by calling its some_method_for_outputs method, whose name and behaviour is really for you to decide, as long as your Cost subclass knows which method to call on the model.

Finally, we compute some loss measure on outputs and targets and return that as the cost expression.

Note that things don’t have to be exactly like this. For instance, you could ask the model to have a method that takes inputs and targets as arguments and returns the loss directly, and that would be perfectly fine. All you need is some way to make your Model and Cost subclasses work together to produce a cost expression in the end.

Defining the model

Now it’s time to make things more concrete by writing the model itself. The model will be a subclass of pylearn2.models.model.Model, which is responsible for the following:

  • Defining what its parameters are
  • Defining what its data requirements are
  • Doing something with the input to produce an output

As is the case with Cost, the Model base class does many useful things on its own, provided you set the appropriate instance attributes. Let’s have a look at a subclass example:

from pylearn2.models.model import Model

class MyModelSubclass(Model):
    def __init__(self, *args, **kwargs):
        super(MyModelSubclass, self).__init__()

        # Some parameter initialization using *args and **kwargs
        # ...
        self._params = [
            # List of all the model parameters
        ]

        self.input_space = # Some `pylearn2.space.Space` subclass
        # This one is necessary only for supervised learning
        self.output_space = # Some `pylearn2.space.Space` subclass

    def some_method_for_outputs(self, inputs):
        # Some computation involving the inputs

The first thing you should do if you’re overriding the constructor is call the the superclass’ constructor. Pylearn2 checks for that and will scold you if you don’t.

You should then initialize you model parameters as shared variables: Pylearn2 will build an updates dictionary for your model variables using gradients returned by your cost. Protip: the `pylearn2.utils.sharedX` method initializes a shared variable with the value and an optional name you provide. This allows your code to be GPU-compatible without putting too much thought into it. For instance, a weights matrix can be initialized this way:

import numpy
from pylearn2.utils import sharedX

self.W = sharedX(numpy.random.normal(size=(size1, size2)), 'W')

Put all your parameters in a list as the _params instance attribute. The Model superclass defines a get_params method which returns self._params for you, and that is method that is called to get the model parameters when Cost is computing the gradients.

Your Model subclass should also describe the data format it expects as inputs (self.input_space) and the data format of the model’s output (self.output_space), which is required only if you’re doing supervised learning. These attributes should be instances of pylearn2.space.Space (and generally are instances of pylearn2.space.VectorSpace, a subclass of pylearn2.space.Space used to represent batches of vectors). Broadly, this mechanism allows for automatic conversion between different data formats (e.g. if your targets are stored as integer indexes in the dataset but are required to be one-hot encoded by the model).

The some_method_for_outputs method is really where all the magic happens. Remember, the name of the method doesn’t really matter, as long as your Cost subclass knows that it’s the one it has to call. This method expects a tensor variable as input and returns a symbolic expression involving the input and its parameters. What happens in between is up to you, and this is where you can put all the Theano code you could possibly hope for, just like you would do in pure Theano scripts.

Examples

Let’s demonstrate these ideas by writing two models, one which does supervised learning and one which does unsupervised learning.

The data you train these models on is up to you, as long as it is represented in a matrix of features (each row being an example) and a matrix of targets (where each row is a target for an example). Obviously this second matrix is only required for supervised learning. While this is not the only way to store data in Pylearn2, it is probably the most common method, so we will use it in the remainder of this discussion.

For the purposes of this tutorial, we will train models on the venerable MNIST dataset, which you can download at:

wget http://deeplearning.net/data/mnist/mnist.pkl.gz

To make things easier to manipulate, we will unzip the archive into six different files:

python -c "from pylearn2.utils import serial; \
          data = serial.load('mnist.pkl'); \
          serial.save('mnist_train_X.pkl', data[0][0]); \
          serial.save('mnist_train_y.pkl', data[0][1].reshape((-1, 1))); \
          serial.save('mnist_valid_X.pkl', data[1][0]); \
          serial.save('mnist_valid_y.pkl', data[1][1].reshape((-1, 1))); \
          serial.save('mnist_test_X.pkl', data[2][0]); \
          serial.save('mnist_test_y.pkl', data[2][1].reshape((-1, 1)))"

Supervised learning using logistic regression

Let’s keep things simple by porting to Pylearn2 the Hello World! of supervised learning: logistic regression. For a refresher, we suggest that you first read the deeplearning.net tutorial on logistic regression. Here is what we need to do:

  • Implement the negative log-likelihood (NLL) loss in our Cost subclass
  • Initialize the model parameters W and b
  • Implement the model’s logistic regression output

Let’s start with the Cost subclass:

import theano.tensor as T
from pylearn2.costs.cost import Cost, DefaultDataSpecsMixin


class LogisticRegressionCost(DefaultDataSpecsMixin, Cost):
    supervised = True

    def expr(self, model, data, **kwargs):
        space, source = self.get_data_specs(model)
        space.validate(data)

        inputs, targets = data
        outputs = model.logistic_regression(inputs)
        loss = -(targets * T.log(outputs)).sum(axis=1)
        return loss.mean()

We assumed our model has a logistic_regression method which accepts a batch of examples and computes the logistic regression output. We will implement that method in just a moment. We also computed the loss as the average negative log-likelihood of the targets given the logistic regression output, as described in the deeplearning.net tutorial. Also, notice how we set supervised to True.

Now for the Model subclass:

import numpy
import theano.tensor as T
from pylearn2.models.model import Model
from pylearn2.space import VectorSpace
from pylearn2.utils import sharedX


class LogisticRegression(Model):
    def __init__(self, nvis, nclasses):
        super(LogisticRegression, self).__init__()

        self.nvis = nvis
        self.nclasses = nclasses

        W_value = numpy.random.uniform(size=(self.nvis, self.nclasses))
        self.W = sharedX(W_value, 'W')
        b_value = numpy.zeros(self.nclasses)
        self.b = sharedX(b_value, 'b')
        self._params = [self.W, self.b]

        self.input_space = VectorSpace(dim=self.nvis)
        self.output_space = VectorSpace(dim=self.nclasses)

    def logistic_regression(self, inputs):
        return T.nnet.softmax(T.dot(inputs, self.W) + self.b)

The model’s constructor receives the dimensionality of the input and the number of classes. It initializes the weights matrix and the bias vector with sharedX. It also sets its input space to an instance of VectorSpace of the dimensionality of the input (meaning it expects the input to be a batch of examples which are all vectors of size nvis) and its output space to an instance of VectorSpace of dimension nclasses (meaning it produces an output corresponding to a batch of probability vectors, one element for each possible class).

The logistic_regression method does pretty much what you would expect: it returns a linear transformation of the input followed by a softmax non-linearity.

How about we give it a try? Save those two code snippets in a single file (e.g. log_reg.py) and save the following in log_reg.yaml:

!obj:pylearn2.train.Train {
    dataset: &train !obj:pylearn2.datasets.dense_design_matrix.DenseDesignMatrix {
        X: !pkl: 'mnist_train_X.pkl',
        y: !pkl: 'mnist_train_y.pkl',
        y_labels: 10,
    },
    model: !obj:log_reg.LogisticRegression {
        nvis: 784,
        nclasses: 10,
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        batch_size: 200,
        learning_rate: 1e-3,
        monitoring_dataset: {
            'train' : *train,
            'valid' : !obj:pylearn2.datasets.dense_design_matrix.DenseDesignMatrix {
                X: !pkl: 'mnist_valid_X.pkl',
                y: !pkl: 'mnist_valid_y.pkl',
                y_labels: 10,
            },
            'test' : !obj:pylearn2.datasets.dense_design_matrix.DenseDesignMatrix {
                X: !pkl: 'mnist_test_X.pkl',
                y: !pkl: 'mnist_test_y.pkl',
                y_labels: 10,
            },
        },
        cost: !obj:log_reg.LogisticRegressionCost {},
        termination_criterion: !obj:pylearn2.termination_criteria.EpochCounter {
            max_epochs: 15
        },
    },
}

Run the following command:

python -c "from pylearn2.utils import serial; \
           train_obj = serial.load_train_file('log_reg.yaml'); \
           train_obj.main_loop()"

Congratulations, you just implemented your first model in Pylearn2!

(By the way, the targets you used to initialize `DenseDesignMatrix` instances were column matrices, yet your model expects to receive one-hot encoded vectors. The reason why you can do that is because Pylearn2 does the conversion for you via the `data_specs` mechanism. That’s why specifying the model’s `input_space` and `output_space` is important.)

Unsupervised learning using an autoencoder

Let’s now have a look at an unsupervised learning example: an autoencoder with tied weights. Once again, we recommend that you read the deeplearning.net tutorial. Here’s what we’ll do:

  • Implement the binary cross-entropy reconstruction loss in our Cost subclass
  • Initialize the model parameters W and b
  • Implement the model’s reconstruction logic

Let’s start again by the Cost subclass:

import theano.tensor as T
from pylearn2.costs.cost import Cost, DefaultDataSpecsMixin


class AutoencoderCost(DefaultDataSpecsMixin, Cost):
    supervised = False

    def expr(self, model, data, **kwargs):
        space, source = self.get_data_specs(model)
        space.validate(data)

        X = data
        X_hat = model.reconstruct(X)
        loss = -(X * T.log(X_hat) + (1 - X) * T.log(1 - X_hat)).sum(axis=1)
        return loss.mean()

We assumed our model has a reconstruction method which encodes and decodes its input. We also computed the loss as the average binary cross-entropy between the input and its reconstruction. This time, however, we set supervised to False.

Now for the Model subclass:

import numpy
import theano.tensor as T
from pylearn2.models.model import Model
from pylearn2.space import VectorSpace
from pylearn2.utils import sharedX


class Autoencoder(Model):
    def __init__(self, nvis, nhid):
        super(Autoencoder, self).__init__()

        self.nvis = nvis
        self.nhid = nhid

        W_value = numpy.random.uniform(size=(self.nvis, self.nhid))
        self.W = sharedX(W_value, 'W')
        b_value = numpy.zeros(self.nhid)
        self.b = sharedX(b_value, 'b')
        c_value = numpy.zeros(self.nvis)
        self.c = sharedX(c_value, 'c')
        self._params = [self.W, self.b, self.c]

        self.input_space = VectorSpace(dim=self.nvis)

    def reconstruct(self, X):
        h = T.tanh(T.dot(X, self.W) + self.b)
        return T.nnet.sigmoid(T.dot(h, self.W.T) + self.c)

The constructor looks quite similar to the logistic regression example, except that this time we don’t need to specify the model’s output space.

The reconstruct method simply encodes and decodes its input.

Let’s try to train it. Save the two code snippets in a single file. For instance autoencoder.py. Then save the following in autoencoder.yaml:

!obj:pylearn2.train.Train {
    dataset: &train !obj:pylearn2.datasets.dense_design_matrix.DenseDesignMatrix {
        X: !pkl: 'mnist_train_X.pkl',
    },
    model: !obj:autoencoder.Autoencoder {
        nvis: 784,
        nhid: 200,
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        batch_size: 200,
        learning_rate: 1e-3,
        monitoring_dataset: {
            'train' : *train,
            'valid' : !obj:pylearn2.datasets.dense_design_matrix.DenseDesignMatrix {
                X: !pkl: 'mnist_valid_X.pkl',
            },
            'test' : !obj:pylearn2.datasets.dense_design_matrix.DenseDesignMatrix {
                X: !pkl: 'mnist_test_X.pkl',
            },
        },
        cost: !obj:autoencoder.AutoencoderCost {},
        termination_criterion: !obj:pylearn2.termination_criteria.EpochCounter {
            max_epochs: 15
        },
    },
}

Run the following command:

python -c "from pylearn2.utils import serial; \
           train_obj = serial.load_train_file('autoencoder.yaml'); \
           train_obj.main_loop()"

What have we gained?

At this point you might be thinking “There’s still boilerplate code to write; what have we gained?”

The answer is that we gained access to the plethora of scripts, model parts, costs and training algorithms which are built into Pylearn2. You don’t have to reinvent the wheel anymore when you wish to train using SGD and momentum. If you want to switch from SGD to BGD, then Pylearn2 makes this is as simple as changing the training algorithm description in your YAML file.

As we pointed out earlier, this demonstrates only the bare minimum needed to implement a model in Pylearn2. Nothing prevents you from digging deeper in the codebase and overriding some methods to gain new functionalities.

Here’s an example of how a few more lines of code can do a lot for you in Pylearn2.

Monitoring various quantities during training

Let’s monitor the classification error of our logistic regression classifier.

To do so, you will have to override Model‘s get_monitoring_data_specs and get_monitoring_channels methods. The former specifies what the model needs for its monitoring, and in which format they should be provided. The latter does the actual monitoring by returning an OrderedDict mapping string identifiers to their quantities.

Let’s look at how it’s done. Add the following to LogisticRegression:

# Keeps things compatible for Python 2.6
from theano.compat.python2x import OrderedDict
from pylearn2.space import CompositeSpace


class LogisticRegression(Model):
    # (Your previous code)

    def get_monitoring_data_specs(self):
        space = CompositeSpace([self.get_input_space(),
                                self.get_target_space()])
        source = (self.get_input_source(), self.get_target_source())
        return (space, source)

    def get_monitoring_channels(self, data):
        space, source = self.get_monitoring_data_specs()
        space.validate(data)

        X, y = data
        y_hat = self.logistic_regression(X)
        error = T.neq(y.argmax(axis=1), y_hat.argmax(axis=1)).mean()

        return OrderedDict([('error', error)])

The content of get_monitoring_data_specs may look cryptic at first. Documentation for data specs can be found here. All you really need to know, is that this is the standard method in Pylearn2 to request a tuple whose first element represents features and second element represents targets.

The content of get_monitoring_channels should more familiar. We start by checking data just as in Cost subclasses’ implementation of expr, and we separate data into features and targets. We then get predictions by calling logistic_regression and computing the average error the standard way. We return an OrderedDict mapping ‘error’ to the Theano expression for the classification error.

If we launch training again using

python -c "from pylearn2.utils import serial; \
           train_obj = serial.load_train_file('log_reg.yaml'); \
           train_obj.main_loop()"

then you’ll see the classification error being displayed with the other monitored quantities.

What’s next?

The examples given in this tutorial are obviously very simplistic and could be easily replaced by existing parts of Pylearn2. However, they show a path that one can take to implement arbitrary ideas in Pylearn2.

In order to avoid reinventing the wheel, it is often useful to dig into Pylearn2’s codebase to see what has already been implemented. For example, the VAE framework relies on the MLP framework to represent the mapping from inputs to conditional distribution parameters.

While it is often desirable to reuse code, the inherent difficulty of this depends on your knowledge of Pylearn2, and also how similar your model is to what is already implemented. You should never feel ashamed to dump Theano code inside a Model subclass’ method like we showed here. The modularity of your code can be improved gradually, and at your own pace. In the meantime you can still benefit from Pylearn2’s features, like human-readable descriptions of experiments, automatic monitoring of various quantities, easily-interchangeable training algorithms, and so on.