nlpoisson(target,
output,
beta_scale=1,
axis=0,
sumloss=True,
zerothreshold=0)
 source code

The negative log Poisson regression probability.
From Ranzato and Szummer (2008).
Output should be of the form Weight*code+bias, i.e. unsquashed.
NB this is different than the formulation in Salakhutdinov and Hinton
(2007), in which the output is softmax'ed and multiplied by the input
document length. That is also what Welling et. al (2005) do. It would
be useful to try the softmax, because it is more wellbehaved.
There is a beta term that is proportional to document length. We
are not sure what beta scale is used by the authors. We use 1 as
the default, but this value might be inappropriate.
For numerical reasons, Yoshua recommends choosing beta such that
the lambda is expected to be around 1 for words that have a nonzero count.
So he would take:
beta = document_size / unique_words_per_document
I am not sure the above math is correct, I need to talk to him.
Yoshua notes that ``there is a x_i log(beta) term missing, if you
compare with eqn 2 (i.e., take the log). They did not include in
3 because it does not depend on the parameters, so the gradient
wrt it would be 0. But if you really want loglikelihood it should
be included.'' If you want a true loglikelihood, you probably should
actually compute the derivative of the entire eqn 2.
Axis is the axis along which we sum the target values, to obtain
the document length.
If sumloss, we sum the loss along axis.
If zerothreshold is nonzero, we threshold the loss:
If this target dimension is zero and beta * tensor.exp(output)
< zerothreshold, let this loss be zero.
@todo: Include logfactorial term
