Table Of Contents

Previous topic

Making the double type

Next topic

Views and inplace operations

This Page

Making arithmetic Ops on double

Now that we have a double type, we have yet to use it to perform computations. We’ll start by defining multiplication.

Op’s contract

An Op (gof.Op) is any object which defines the following methods:

make_node(*inputs)

This method is responsible for creating output Variables of a suitable Type to serve as the outputs of this Op’s application. This method should put these outputs into an Apply instance, and return the Apply instance.

This method creates an Apply node representing the application of the Op on the inputs provided. If the Op cannot be applied on these inputs, it must raise an appropriate exception.

The inputs of the Apply instance returned by this call must be ordered correctly: a subsequent self.make_node(*apply.inputs) must produce something equivalent to the first apply.

default_output

Default: None

If this member variable is an integer, then the default implementation of __call__ will return node.outputs[self.default_output], where node was returned by make_node. Otherwise, the entire list of outputs will be returned.

__call__(*inputs)

Syntactic shortcut to make_node which returns the output Variables of the Op.

Default: this is done for you by Op.

perform(node, inputs, output_storage)

This method computes the function associated to this Op. The node is an Apply node created by the Op’s make_node method, inputs is a list of references to data to operate on, and output_storage is a list of storage cells where the variables of the computation must be put. More specifically:

  • node: This is a reference to an Apply node which was previously obtained via mul‘s make_node method. It is typically not used in simple Ops, but it contains symbolic information that could be required for complex Ops.

  • inputs: This is a list of data.

  • output_storage: This is a list of storage cells. A storage cell is a one-element list. It is forbidden to change the length of the list(s) contained in output_storage. There is one storage cell for each output of the Op.

    The data you put in output_storage must match the type of the symbolic output. This is a situation where the node argument can come in handy.

    A function Mode may allow output_storage elements to persist between evaluations, or it may reset output_storage cells to hold a value of None. This feature can allow perform to reuse memory between calls, for example.

This method must be determined by the inputs. That is to say, if it is evaluated once on inputs A and returned B, then if ever inputs C, equal to A, are presented again, then outputs equal to B must be returned again.

You must be careful about aliasing outputs to inputs, and making modifications to any of the inputs. See Views and inplace operations before writing a perform implementation that does either of these things.

__eq__(other)

other is also an Op.

Returning True here is a promise to the optimization system that the other Op will produce exactly the same graph effects (from perform) as this one, given identical inputs. This means it will produce the same output values, it will destroy the same inputs (same destroy_map), and will alias outputs to the same inputs (same view_map). For more details, see Views and inplace operations.

__hash__()

If two Op instances compare equal, then they must return the same hash value.

Equally important, this hash value must not change during the lifetime of self. Op instances should be immutable in this sense.

__ne__(other)
Default: (not (self==other))
grad(inputs, output_gradients)

Optional.

If the Op you are defining is differentiable, you can define its gradient symbolically in this method.

Both the inputs and output_gradients will be Variables. This method must return a list containing one Variable (or None) for each input. Each returned Variable represents the gradient with respect to that input given the symbolic gradients with respect to each output.

If the output is not differentiable with respect to any inputs, then this method should be defined to return [None for i in inputs].

If this method is not defined, then Theano assumes it has been forgotten. Symbolic differentiation will fail on a graph that includes this Op.

It is important to understand that this is not meant to return the gradient of the Op’s output but rather the gradient of some other criterion C with respect to the Op’s input.

If the outputs of your op are [ f_1, ... f_n], then output_derivatives gives [ grad_{f_1} C, grad_{f_2} C, ... , grad_{f_n} C ] If the inputs of your op are [x_1, ..., x_n], then your Op.grad should return [ grad_{x_1} C, grad_{x_2} C, ..., grad_{x_n} C ]

where (grad_{y} z)_i = partial z / partial y_i (and i can have any number of dimensions) (note: in the case where i is 2 dimensional, this definition of grad is different from the standard mathematical definition of the gradient of a scalar with respect to a matrix, where you transpose the indices)

At a bare minimum, a new Op must define make_node and perform, which have no defaults.

For more details, including the interface for providing a C implementation of perform(), refer to the documentation for Op.

Defining an Op: mul

We’ll define multiplication as a binary operation, even though a multiplication Op could take an arbitrary number of arguments.

First, we’ll instantiate a mul Op:

from theano import gof
mul = gof.Op()

make_node

This function must take as many arguments as the operation we are defining is supposed to take as inputs—in this example that would be two. This function ensures that both inputs have the double type. Since multiplying two doubles yields a double, this function makes an Apply node with an output Variable of type double.

def make_node(x, y):
    if x.type != double or y.type != double:
        raise TypeError('mul only works on doubles')
    return gof.Apply(mul, [x, y], [double()])
mul.make_node = make_node

The first two lines make sure that both inputs are Variables of the double type that we created in the previous section. We would not want to multiply two arbitrary types, it would not make much sense (and we’d be screwed when we implement this in C!)

The last line is the meat of the definition. There we create an Apply node representing the application of Op mul to inputs x and y, giving a Variable instance of type double as the output.

Note

Theano relies on the fact that if you call the make_node method of Apply’s first argument on the inputs passed as the Apply’s second argument, the call will not fail and the returned Apply instance will be equivalent. This is how graphs are copied.

perform

This code actually computes the function. In our example, the data in inputs will be instances of Python’s built-in type float because this is the type that double.filter() will always return, per our own definition. output_storage will contain a single storage cell for the multiplication’s variable.

def perform(node, inputs, output_storage):
    x, y = inputs[0], inputs[1]
    z = output_storage[0]
    z[0] = x * y
mul.perform = perform

Here, z is a list of one element. By default, z == [None].

Note

It is possible that z does not contain None. If it contains anything else, Theano guarantees that whatever it contains is what perform put there the last time it was called with this particular storage. Furthermore, Theano gives you permission to do whatever you want with z‘s contents, chiefly reusing it or the memory allocated for it. More information can be found in the Op documentation.

Warning

We gave z the Theano type double in make_node, which means that a Python float must be put there. You should not put, say, an int in z[0] because Theano assumes Ops handle typing properly.

Trying out our new Op

In the following code, we use our new Op:

>>> x, y = double('x'), double('y')
>>> z = mul(x, y)
>>> f = theano.function([x, y], z)
>>> f(5, 6)
30.0
>>> f(5.6, 6.7)
37.519999999999996

Note that there is an implicit call to double.filter() on each argument, so if we give integers as inputs they are magically casted to the right type. Now, what if we try this?

>>> x = double('x')
>>> z = mul(x, 2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/u/breuleuo/hg/theano/theano/gof/op.py", line 207, in __call__
  File "<stdin>", line 2, in make_node
AttributeError: 'int' object has no attribute 'type'

Automatic Constant Wrapping

Well, OK. We’d like our Op to be a bit more flexible. This can be done by modifying make_node to accept Python int or float as x and/or y:

def make_node(x, y):
    if isinstance(x, (int, float)):
        x = gof.Constant(double, x)
    if isinstance(y, (int, float)):
        y = gof.Constant(double, y)
    if x.type != double or y.type != double:
        raise TypeError('mul only works on doubles')
    return gof.Apply(mul, [x, y], [double()])
mul.make_node = make_node

Whenever we pass a Python int or float instead of a Variable as x or y, make_node will convert it to Constant for us. gof.Constant is a Variable we statically know the value of.

>>> x = double('x')
>>> z = mul(x, 2)
>>> f = theano.function([x], z)
>>> f(10)
20.0
>>> f(3.4)
6.7999999999999998

Now the code works the way we want it to.

Note

Most Theano Ops follow this convention of up-casting literal make_node arguments to Constants. This makes typing expressions more natural. If you do not want a constant somewhere in your graph, you have to pass a Variable (like double('x') here).

Final version

The above example is pedagogical. When you define other basic arithmetic operations add, sub and div, code for make_node can be shared between these Ops. Here is revised implementation of these four arithmetic operators:

from theano import gof

class BinaryDoubleOp(gof.Op):

    def __init__(self, name, fn):
        self.name = name
        self.fn = fn

    def __eq__(self, other):
        return type(self) == type(other) and (self.name == other.name) and (self.fn == other.fn)

    def __hash__(self):
        return hash(type(self)) ^ hash(self.name) ^ hash(self.fn)

    def make_node(self, x, y):
        if isinstance(x, (int, float)):
            x = gof.Constant(double, x)
        if isinstance(y, (int, float)):
            y = gof.Constant(double, y)
        if x.type != double or y.type != double:
            raise TypeError('%s only works on doubles' % self.name)
        return gof.Apply(self, [x, y], [double()])

    def perform(self, node, (x, y), (z, )):
        z[0] = self.fn(x, y)

    def __str__(self):
        return self.name

add = BinaryDoubleOp(name = 'add',
                     fn = lambda x, y: x + y)

sub = BinaryDoubleOp(name = 'sub',
                     fn = lambda x, y: x - y)

mul = BinaryDoubleOp(name = 'mul',
                     fn = lambda x, y: x * y)

div = BinaryDoubleOp(name = 'div',
                     fn = lambda x, y: x / y)

Instead of working directly on an instance of Op, we create a subclass of Op that we can parametrize. All the operations we define are binary. They all work on two inputs with type double. They all return a single Variable of type double. Therefore, make_node does the same thing for all these operations, except for the Op reference self passed as first argument to Apply. We define perform using the function fn passed in the constructor.

This design is a flexible way to define basic operations without duplicating code. The same way a Type subclass represents a set of structurally similar types (see previous section), an Op subclass represents a set of structurally similar operations: operations that have the same input/output types, operations that only differ in one small detail, etc. If you see common patterns in several Ops that you want to define, it can be a good idea to abstract out what you can. Remember that an Op is just an object which satisfies the contract described above on this page and that you should use all the tools at your disposal to create these objects as efficiently as possible.

Exercise: Make a generic DoubleOp, where the number of arguments can also be given as a parameter.