Here’s another straightforward example, though a bit more elaborate than adding two numbers together. Let’s say that you want to compute the logistic curve, which is given by:

A plot of the logistic function, with x on the x-axis and s(x) on the y-axis.
You want to compute the function elementwise on matrices of doubles, which means that you want to apply this function to each individual element of the matrix.
Well, what you do is this:
>>> x = T.dmatrix('x')
>>> s = 1 / (1 + T.exp(-x))
>>> logistic = function([x], s)
>>> logistic([[0, 1], [-1, -2]])
array([[ 0.5 , 0.73105858],
[ 0.26894142, 0.11920292]])
The reason logistic is performed elementwise is because all of its operations—division, addition, exponentiation, and division—are themselves elementwise operations.
It is also the case that:

We can verify that this alternate form produces the same values:
>>> s2 = (1 + T.tanh(x / 2)) / 2
>>> logistic2 = function([x], s2)
>>> logistic2([[0, 1], [-1, -2]])
array([[ 0.5 , 0.73105858],
[ 0.26894142, 0.11920292]])
Theano supports functions with multiple outputs. For example, we can compute the elementwise difference, absolute difference, and squared difference between two matrices a and b at the same time:
>>> a, b = T.dmatrices('a', 'b')
>>> diff = a - b
>>> abs_diff = abs(diff)
>>> diff_squared = diff**2
>>> f = function([a, b], [diff, abs_diff, diff_squared])
Note
dmatrices produces as many outputs as names that you provide. It’s a shortcut for allocating symbolic variables that we will often use in the tutorials.
When we use the function, it will return the three variables (the printing was reformatted for readability):
>>> f([[1, 1], [1, 1]], [[0, 1], [2, 3]])
[array([[ 1., 0.],
[-1., -2.]]),
array([[ 1., 0.],
[ 1., 2.]]),
array([[ 1., 0.],
[ 1., 4.]])]
Now let’s use Theano for a slightly more sophisticated task: create a
function which computes the derivative of some expression y with
respect to its parameter x. For instance, we can compute the
gradient of
with respect to
. Note that:
.
Here is code to compute this gradient:
>>> x = T.dscalar('x')
>>> y = x**2
>>> gy = T.grad(y, x)
>>> pp(gy) # print out the gradient prior to optimization
'((fill((x ** 2), 1.0) * 2) * (x ** (2 - 1)))'
>>> f = function([x], gy)
>>> f(4)
array(8.0)
>>> f(94.2)
array(188.40000000000001)
In the example above, we can see from pp(gy) that we are computing the correct symbolic gradient. fill((x ** 2), 1.0) means to make a matrix of the same shape as x ** 2 and fill it with 1.0.
Note
The optimizer simplifies the symbolic gradient expression. You can see this by digging inside the internal properties of the compiled function.
pp(f.maker.env.outputs[0])
'(2.0 * x)'
After optimization there is only one Apply node left in the graph, which doubles the input.
We can also compute the gradient of complex expressions such as the
logistic function defined above. It turns out that the derivative of the
logistic is:
.
A plot of the gradient of the logistic function, with x on the x-axis
and
on the y-axis.
>>> x = T.dmatrix('x')
>>> s = 1 / (1 + T.exp(-x))
>>> gs = T.grad(s, x)
>>> dlogistic = function([x], gs)
>>> dlogistic([[0, 1], [-1, -2]])
array([[ 0.25 , 0.19661193],
[ 0.19661193, 0.10499359]])
The resulting function computes the gradient of its first argument with respect to the second. In this way, Theano can be used for automatic differentiation.
Note
The second argument of T.grad can be a list, in which case the output is also a list. The order in both list is important, element i of the output list is the gradient of the first argument of T.grad with respect to the i-th element of the list given as second argument. The first arguement of T.grad has to be a scalar (a tensor of size 1). For more information on the semantics of the arguments of T.grad and details about the implementation, see this.
Let’s say you want to define a function that adds two numbers, except that if you only provide one number, the other input is assumed to be one. You can do it like this:
>>> x, y = T.dscalars('x', 'y')
>>> z = x + y
>>> f = function([x, Param(y, default=1)], z)
>>> f(33)
array(34.0)
>>> f(33, 2)
array(35.0)
This makes use of the Param class which allows you to specify properties of your function’s parameters with greater detail. Here we give a default value of 1 for y by creating a Param instance with its default field set to 1.
Inputs with default values must follow inputs without default values (like python’s functions). There can be multiple inputs with default values. These parameters can be set positionally or by name, as in standard Python:
>>> x, y, w = T.dscalars('x', 'y', 'w')
>>> z = (x + y) * w
>>> f = function([x, Param(y, default=1), Param(w, default=2, name='w_by_name')], z)
>>> f(33)
array(68.0)
>>> f(33, 2)
array(70.0)
>>> f(33, 0, 1)
array(33.0)
>>> f(33, w_by_name=1)
array(34.0)
>>> f(33, w_by_name=1, y=0)
array(33.0)
Note
Param does not know the name of the local variables y and w that are passed as arguments. The symbolic variable objects have name attributes (set by dscalars in the example above) and these are the names of the keyword parameters in the functions that we build. This is the mechanism at work in Param(y, default=1). In the case of Param(w, default=2, name='w_by_name'), we override the symbolic variable’s name attribute with a name to be used for this function.
Because in Theano you first express everything symbolically and afterwards compile this expression to get functions, using pseudo-random numbers is not as straightforward as it is in numpy, though also not too complicated.
The way to think about putting randomness into Theano’s computations is to put random variables in your graph. Theano will allocate a numpy RandomStream object (a random number generator) for each such variable, and draw from it as necessary. I’ll call this sort of sequence of random numbers a random stream. Random streams are at their core shared variables, so the observations on shared variables hold here as well.
Here’s a brief example. The setup code is:
from theano.tensor.shared_randomstreams import RandomStreams
srng = RandomStreams(seed=234)
rv_u = srng.uniform((2,2))
rv_n = srng.normal((2,2))
f = function([], rv_u)
g = function([], rv_n, no_default_updates=True) #Not updating rv_n.rng
nearly_zeros = function([], rv_u + rv_u - 2 * rv_u)
Here, ‘rv_u’ represents a random stream of 2x2 matrices of draws from a uniform distribution. Likewise, ‘rv_n’ represenents a random stream of 2x2 matrices of draws from a normal distribution. The distributions that are implemented are defined in RandomStreams.
Now let’s use these things. If we call f(), we get random uniform numbers. The internal state of the random number generator is automatically updated, so we get different random numbers every time.
>>> f_val0 = f()
>>> f_val1 = f() #different numbers from f_val0
When we add the extra argument no_default_updates=True to function (as in g), then the random number generator state is not affected by calling the returned function. So for example, calling g multiple times will return the same numbers.
>>> g_val0 = g() # different numbers from f_val0 and f_val1
>>> g_val0 = g() # same numbers as g_val0 !!!
An important remark is that a random variable is drawn at most once during any single function execution. So the nearly_zeros function is guaranteed to return approximately 0 (except for rounding error) even though the rv_u random variable appears three times in the output expression.
>>> nearly_zeros = function([], rv_u + rv_u - 2 * rv_u)
Random variables can be seeded individually or collectively.
You can seed just one random variable by seeding or assigning to the .rng.value attribute.
>>> rv_u.rng.value.seed(89234) # seeds the generator for rv_u
You can also seed all of the random variables allocated by a RandomStreams object by that object’s seed method. This seed will be used to seed a temporary random number generator, that will in turn generate seeds for each of the random variables.
>>> srng.seed(902340) # seeds rv_u and rv_n with different seeds each
As usual for shared variables, the random number generators used for random variables are common between functions. So our nearly_zeros function will update the state of the generators used in function f above.
For example:
>>> state_after_v0 = rv_u.rng.value.get_state()
>>> nearly_zeros() # this affects rv_u's generator
>>> v1 = f()
>>> rv_u.rng.value.set_state(state_after_v0)
>>> v2 = f() # v2 != v1