Table Of Contents

Previous topic

Graph Structures

Next topic

Using the GPU

This Page

Using different compiling modes

Mode

Everytime theano.function is called the symbolic relationships between the input and output Theano variables are optimized and compiled. The way this compilation occurs is controlled by the value of the mode parameter.

Theano defines the following modes by name:

  • 'FAST_COMPILE': Apply just a few graph optimizations, but use C implementations where possible.

  • 'FAST_RUN': Apply all optimizations, and use C implementations where possible.

  • 'DEBUG_MODE': Verify the correctness of all optimizations, and compare C and python

    implementations. This mode can take much longer than the other modes, but can identify many kinds of problems.

  • 'PROFILE_MODE': Same optimization then FAST_RUN, put print some profiling information

The default mode is typically FAST_RUN, but it can be controlled via the configuration variable config.mode, which can be overridden by passing the keyword argument to theano.function.

short name Full constructor What does it do?
FAST_COMPILE compile.mode.Mode(linker='c|py', optimizer='fast_compile') C implementations where available, quick and cheap graph transformations
FAST_RUN compile.mode.Mode(linker='c|py', optimizer='fast_run') C implementations where available, all available graph transformations.
DEBUG_MODE compile.debugmode.DebugMode() Both implementations where available, all available graph transformations.
PROFILE_MODE compile.profilemode.ProfileMode() C implementations where available, all available graph transformations, print profile information.

Using DebugMode

While normally you should use the FAST_RUN or FAST_COMPILE mode, it is useful at first (especially when you are defining new kinds of expressions or new optimizations) to run your code using the DebugMode (available via mode='DEBUG_MODE'). The DebugMode is designed to do several self-checks and assertations that can help to diagnose possible programming errors that can lead to incorect output. Note that DEBUG_MODE is much slower then FAST_RUN or FAST_COMPILE so use it only during development (not when you launch 1000 process on a cluster!).

DebugMode is used as follows:

x = T.dvector('x')

f = theano.function([x], 10*x, mode='DEBUG_MODE')

f(5)
f(0)
f(7)

If any problem is detected, DebugMode will raise an exception according to what went wrong, either at call time (f(5)) or compile time ( f = theano.function(x, 10*x, mode='DEBUG_MODE')). These exceptions should not be ignored; talk to your local Theano guru or email the users list if you cannot make the exception go away.

Some kinds of errors can only be detected for certain input value combinations. In the example above, there is no way to guarantee that a future call to say, f(-1) won’t cause a problem. DebugMode is not a silver bullet.

If you instantiate DebugMode using the constructor (see DebugMode) rather than the keyword DEBUG_MODE you can configure its behaviour via constructor arguments. See DebugMode for details. The keyword version of DebugMode (which you get by using mode='DEBUG_MODE) is quite strict.

ProfileMode

Beside checking for errors, another important task is to profile your code. For this Theano uses a special mode called ProfileMode which has to be passed as an argument to theano.function. Using the ProfileMode is a three-step process.

To change the default to it, put the theano flags mode to PROFILE_MODE.

Creating a ProfileMode Instance

First create a ProfileMode instance.

>>> from theano import ProfileMode
>>> profmode = theano.ProfileMode(optimizer='fast_run', linker=theano.gof.OpWiseCLinker())

The ProfileMode constructor takes as input an optimizer and a linker. Which optimizer and linker to use will depend on the application. For example, a user wanting to profile the Python implementation only, should use the gof.PerformLinker (or “py” for short). On the other hand, a user wanting to profile his graph using C implementations wherever possible should use the gof.OpWiseCLinker (or “c|py”). For testing the speed of your code we would recommend using the ‘fast_run’ optimizer and gof.OpWiseCLinker linker.

Compiling your Graph with ProfileMode

Once the ProfileMode instance is created, simply compile your graph as you would normally, by specifying the mode parameter.

>>> # with functions
>>> f = theano.function([input1,input2],[output1], mode=profmode)
>>> # with modules
>>> m = theano.Module()
>>> minst = m.make(mode=profmode)

Retrieving Timing Information

Once your graph is compiled, simply run the program or operation you wish to profile, then call profmode.print_summary(). This will provide you with the desired timing information, indicating where your graph is spending most of its time.

This is best shown through an example. Lets use the example of logistic regression. (Code for this example is in the file benchmark/regression/regression.py.)

Compiling the module with ProfileMode and calling profmode.print_summary() generates the following output:

"""
ProfileMode.print_summary()
---------------------------

local_time 0.0749197006226 (Time spent running thunks)
Apply-wise summary: <fraction of local_time spent at this position> (<Apply position>, <Apply Op name>)
        0.069   15      _dot22
        0.064   1       _dot22
        0.053   0       InplaceDimShuffle{x,0}
        0.049   2       InplaceDimShuffle{1,0}
        0.049   10      mul
        0.049   6       Elemwise{ScalarSigmoid{output_types_preference=<theano.scalar.basic.transfer_type object at 0x171e650>}}[(0, 0)]
        0.049   3       InplaceDimShuffle{x}
        0.049   4       InplaceDimShuffle{x,x}
        0.048   14      Sum{0}
        0.047   7       sub
        0.046   17      mul
        0.045   9       sqr
        0.045   8       Elemwise{sub}
        0.045   16      Sum
        0.044   18      mul
   ... (remaining 6 Apply instances account for 0.25 of the runtime)
Op-wise summary: <fraction of local_time spent on this kind of Op> <Op name>
        0.139   * mul
        0.134   * _dot22
        0.092   * sub
        0.085   * Elemwise{Sub{output_types_preference=<theano.scalar.basic.transfer_type object at 0x1779f10>}}[(0, 0)]
        0.053   * InplaceDimShuffle{x,0}
        0.049   * InplaceDimShuffle{1,0}
        0.049   * Elemwise{ScalarSigmoid{output_types_preference=<theano.scalar.basic.transfer_type object at 0x171e650>}}[(0, 0)]
        0.049   * InplaceDimShuffle{x}
        0.049   * InplaceDimShuffle{x,x}
        0.048   * Sum{0}
        0.045   * sqr
        0.045   * Sum
        0.043   * Sum{1}
        0.042   * Elemwise{Mul{output_types_preference=<theano.scalar.basic.transfer_type object at 0x17a0f50>}}[(0, 1)]
        0.041   * Elemwise{Add{output_types_preference=<theano.scalar.basic.transfer_type object at 0x1736a50>}}[(0, 0)]
        0.039   * Elemwise{Second{output_types_preference=<theano.scalar.basic.transfer_type object at 0x1736d90>}}[(0, 1)]
   ... (remaining 0 Ops account for 0.00 of the runtime)
(*) Op is running a c implementation

"""

The summary has two components to it. In the first section called the Apply-wise summary, timing information is provided for the worst offending Apply nodes. This corresponds to individual Op applications within your graph which take the longest to execute (so if you use dot twice, you will see two entries there). In the second portion, the Op-wise summary, the execution time of all Apply nodes executing the same Op are grouped together and the total execution time per Op is shown (so if you use dot twice, you will see only one entry there corresponding to the sum of the time spent in each of them).

Note that the ProfileMode also shows which Ops were running a c implementation.