Platforms: Unix, Windows
Symbolic gradient is usually computed from tensor.grad(), which offers a more convenient syntax for the common case of wanting the gradient in some expressions with respect to a scalar cost. The grad_sources_inputs() function does the underlying work, and is more flexible, but is also more awkward to use when tensor.grad() can do the job.
A gradient source is a pair (r, g_r), in which r is a Variable, and g_r is a Variable that is a gradient wrt r.
This function traverses the graph backward from the r sources, calling op.grad(...) for all ops with some non-None gradient on an output.
The op.grad(...) functions are called like this:
op.grad(op.inputs[:], [total_gradient(v) for v in op.outputs])
This call to op.grad should return a list or tuple: one symbolic gradient per input. If op has a single input, then op.grad should return a list or tuple of length 1.
For each input wrt to which op is not differentiable, it should return None instead of a Variable instance.
If a source r receives a gradient from another source r2, then the effective gradient on r is the sum of both gradients.
| Parameters: |
|
|---|---|
| Return type: | dictionary whose keys and values are of type Variable |
| Returns: | mapping from each Variable encountered in the backward traversal to its [total] gradient. |