Theano is a Python library that lets you to define, optimize, and evaluate mathematical expressions, especially ones with multi-dimensional arrays (numpy.ndarray). Using Theano it is possible to attain speeds rivaling hand-crafted C implementations for problems involving large amounts of data. It can also surpass C on a CPU by many orders of magnitude by taking advantage of recent GPUs.

Theano combines aspects of a computer algebra system (CAS) with aspects of an optimizing compiler. It can also generate customized C code for many mathematical operations. This combination of CAS with optimizing compilation is particularly useful for tasks in which complicated mathematical expressions are evaluated repeatedly and evaluation speed is critical. For situations where many different expressions are each evaluated once Theano can minimize the amount of compilation/analysis overhead, but still provide symbolic features such as automatic differentiation.

Theano’s compiler applies many optimizations of varying complexity to these symbolic expressions. These optimizations include, but are not limited to:

- use of GPU for computations
- constant folding
- merging of similar subgraphs, to avoid redundant calculation
- arithmetic simplification (e.g.
`x*y/x -> y`,`--x -> x`) - inserting efficient BLAS operations (e.g.
`GEMM`) in a variety of contexts - using memory aliasing to avoid calculation
- using inplace operations wherever it does not interfere with aliasing
- loop fusion for elementwise sub-expressions
- improvements to numerical stability (e.g. and )
- for a complete list, see
*Optimizations*

Theano was written at the LISA lab to support rapid development of
efficient machine learning algorithms. Theano is
named after the Greek mathematician, who may have been Pythagoras’
wife. Theano is released under a BSD license (*link*).

Here is an example of how to use Theano. It doesn’t show off many of Theano’s features, but it illustrates concretely what Theano is.

```
import theano
from theano import tensor
# declare two symbolic floating-point scalars
a = tensor.dscalar()
b = tensor.dscalar()
# create a simple expression
c = a + b
# convert the expression into a callable object that takes (a,b)
# values as input and computes a value for c
f = theano.function([a,b], c)
# bind 1.5 to 'a', 2.5 to 'b', and evaluate 'c'
assert 4.0 == f(1.5, 2.5)
```

Theano is not a programming language in the normal sense because you write a program in Python that builds expressions for Theano. Still it is like a programming language in the sense that you have to

- declare variables (
`a,b`) and give their types - build expressions for how to put those variables together
- compile expression graphs to functions in order to use them for computation.

It is good to think of `theano.function` as the interface to a
compiler which builds a callable object from a purely symbolic graph.
One of theano’s most important features is that `theano.function`
can optimize a graph and even compile some or all of it into native
machine instructions.

Theano is a Python library and optimizing compiler for manipulating and evaluating expressions, especially matrix-valued ones. Manipulation of matrices is typically done using the numpy package, so what does Theano do that Python and numpy do not?

*execution speed optimizations*: Theano can use g++ or nvcc to compile parts your expression graph into CPU or GPU instructions, which run much faster than pure Python.*symbolic differentiation*: Theano can automatically build symbolic graphs for computing gradients.*stability optimizations*: Theano can recognize [some] numerically unstable expressions and compute them with more stable algorithms.

The closest Python package to Theano is sympy. Theano focuses more on tensor expressions than Sympy, and has more machinery for compilation. Sympy has more sophisticated algebra rules and can handle a wider variety of mathematical operations (such as series, limits, and integrals).

If numpy is to be compared to MATLAB and sympy to Mathematica, Theano is a sort of hybrid of the two which tries to combine the best of both worlds.

*Installing Theano*- Instructions to download and install Theano on your system.
*Tutorial*- Getting started with Theano’s basic features. Go here if you are new!
*Library Documentation*- Details of what Theano provides. It is recommended to go through
the
*Tutorial*first though.

A PDF version of the online documentation may be found here.

This is the vision we have for Theano. This is give people an idea of what to expect in the future of Theano, but we can’t promise to implement all of it. This should also help you to understand where Theano fits in relation to other computational tools.

Support tensor and sparse operations

Support linear algebra operations

- Graph Transformations
- Differentiation/higher order differentiation
- ‘R’ and ‘L’ differential operators
- Speed/memory optimizations
- Numerical stability optimizations

Can use many compiled languages, instructions sets: C/C++, CUDA, OpenCL, PTX, CAL, AVX, ...

Lazy evaluation

Loop

Parallel execution (SIMD, multi-core, multi-node on cluster, multi-node distributed)

Support all NumPy/basic SciPy functionality

Easy wrapping of library functions in Theano

Note: There is no short term plan to support multi-node computation.

Here is the state of that vision as of December 3th, 2013 (after Theano release 0.6):

- We support tensors using the numpy.ndarray object and we support many operations on them.
- We support sparse types by using the scipy.{csc,csr}_matrix object and support some operations on them.
- We have started implementing/wrapping more advanced linear algebra operations.
- We have many graph transformations that cover the 4 categories listed above.
- We can improve the graph transformation with better storage optimization
and instruction selection.
- Similar to auto-tuning during the optimization phase, but this doesn’t apply to only 1 op.
- Example of use: Determine if we should move computation to the GPU or not depending on the input size.
- Possible implementation note: allow Theano Variable in the fgraph to have more than 1 owner.

- We have a CUDA backend for tensors of type float32 only.
- Efforts have begun towards a generic GPU ndarray (GPU tensor) (started in the
compyte project)
- Move GPU backend outside of Theano (on top of PyCUDA/PyOpenCL)
- Will provide better support for GPU on Windows and use an OpenCL backend on CPU.

- Loops work, but not all related optimizations are currently done.
- The cvm linker allows lazy evaluation. It is the current default linker.
- How to have DebugMode check it? Right now, DebugMode checks the computation non-lazily.

- SIMD parallelism on the CPU comes from the compiler.
- Multi-core parallelism is only supported by Conv2d(not by default). If the external BLAS implementation supports it, there are also, gemm, gemv and ger that are parallelized.
- No multi-node support.
- Many, but not all NumPy functions/aliases are implemented. * https://github.com/Theano/Theano/issues/1080
- Wrapping an existing Python function in easy and documented.
- We know how to separate the shared variable memory storage location from its object type (tensor, sparse, dtype, broadcast flags), but we need to do it.

Discussion about Theano takes place in the theano-dev and theano-users mailing lists. People interested in development of Theano should check the former, while the latter is reserved for issues that concern the end users.

Questions, comments, praise, criticism as well as bug reports should be submitted to these mailing lists.

We welcome all kinds of contributions. If you have any questions regarding how to extend Theano, please feel free to ask on the theano-dev mailing list.