# Module cost

source code

Cost functions.

Note: All of these functions return one cost per example. So it is your job to perform a tensor.sum over the individual example losses.

 Classes
LogFactorial
Compute log x!.
 Functions

 poissonlambda(unscaled_output, doclen, beta_scale) A continuous parameter lambda_i which is the expected number of occurence of word i in the document. source code

 nlpoisson(target, output, beta_scale=1, axis=0, sumloss=True, zerothreshold=0) The negative log Poisson regression probability. source code
 Variables
undefined_gradient = `UndefinedGradientOp()`
scalar_logfactorial = `LogFactorial(scalar.upgrade_to_float, na...`
logfactorial = `tensor.Elemwise(scalar_logfactorial, name= 'log...`

Imports: T, tensor, scalar, numpy, gof

 Function Details

### poissonlambda(unscaled_output, doclen, beta_scale)

source code

A continuous parameter lambda_i which is the expected number of occurence of word i in the document. Note how this must be positive, and that is why Ranzato and Szummer (2008) use an exponential.

Yoshua: I don't like exponentials to guarantee positivity. softplus is numerically much better behaved (but you might want to try both to see if it makes a difference).

To Do: Maybe there are more sensible ways to set the beta_scale.

### nlpoisson(target, output, beta_scale=1, axis=0, sumloss=True, zerothreshold=0)

source code
```
The negative log Poisson regression probability.
From Ranzato and Szummer (2008).

Output should be of the form Weight*code+bias, i.e. unsquashed.
NB this is different than the formulation in Salakhutdinov and Hinton
(2007), in which the output is softmax'ed and multiplied by the input
document length. That is also what Welling et. al (2005) do.  It would
be useful to try the softmax, because it is more well-behaved.

There is a beta term that is proportional to document length. We
are not sure what beta scale is used by the authors. We use 1 as
the default, but this value might be inappropriate.
For numerical reasons, Yoshua recommends choosing beta such that
the lambda is expected to be around 1 for words that have a non-zero count.
So he would take:

beta = document_size / unique_words_per_document

I am not sure the above math is correct, I need to talk to him.

Yoshua notes that ``there is a x_i log(beta) term missing, if you
compare with eqn 2 (i.e., take the log). They did not include in
3 because it does not depend on the parameters, so the gradient
wrt it would be 0. But if you really want log-likelihood it should
be included.'' If you want a true log-likelihood, you probably should
actually compute the derivative of the entire eqn 2.

Axis is the axis along which we sum the target values, to obtain
the document length.

If sumloss, we sum the loss along axis.

If zerothreshold is non-zero, we threshold the loss:
If this target dimension is zero and beta * tensor.exp(output)
< zerothreshold, let this loss be zero.

@todo: Include logfactorial term

```

 Variables Details

### scalar_logfactorial

Value:
 ```LogFactorial(scalar.upgrade_to_float, name= 'scalar_logfactoral') ```

### logfactorial

Value:
 ```tensor.Elemwise(scalar_logfactorial, name= 'logfactorial') ```