Package pylearn :: Package algorithms :: Package sandbox :: Module cost
[hide private]

Module cost

source code

Cost functions.


Note: All of these functions return one cost per example. So it is your job to perform a tensor.sum over the individual example losses.

Classes [hide private]
UndefinedGradient
Raised by UndefinedGradientOp to indicate that the gradient is undefined mathematically.
UndefinedGradientOp
LogFactorial
Compute log x!.
Functions [hide private]
 
poissonlambda(unscaled_output, doclen, beta_scale)
A continuous parameter lambda_i which is the expected number of occurence of word i in the document.
source code
 
nlpoisson(target, output, beta_scale=1, axis=0, sumloss=True, zerothreshold=0)
The negative log Poisson regression probability.
source code
Variables [hide private]
  undefined_gradient = UndefinedGradientOp()
  scalar_logfactorial = LogFactorial(scalar.upgrade_to_float, na...
  logfactorial = tensor.Elemwise(scalar_logfactorial, name= 'log...

Imports: T, tensor, scalar, numpy, gof


Function Details [hide private]

poissonlambda(unscaled_output, doclen, beta_scale)

source code 

A continuous parameter lambda_i which is the expected number of occurence of word i in the document. Note how this must be positive, and that is why Ranzato and Szummer (2008) use an exponential.

Yoshua: I don't like exponentials to guarantee positivity. softplus is numerically much better behaved (but you might want to try both to see if it makes a difference).

To Do: Maybe there are more sensible ways to set the beta_scale.

nlpoisson(target, output, beta_scale=1, axis=0, sumloss=True, zerothreshold=0)

source code 

The negative log Poisson regression probability.
From Ranzato and Szummer (2008).

Output should be of the form Weight*code+bias, i.e. unsquashed.
NB this is different than the formulation in Salakhutdinov and Hinton
(2007), in which the output is softmax'ed and multiplied by the input
document length. That is also what Welling et. al (2005) do.  It would
be useful to try the softmax, because it is more well-behaved.

There is a beta term that is proportional to document length. We
are not sure what beta scale is used by the authors. We use 1 as
the default, but this value might be inappropriate.
For numerical reasons, Yoshua recommends choosing beta such that
the lambda is expected to be around 1 for words that have a non-zero count.
So he would take:

  beta = document_size / unique_words_per_document

I am not sure the above math is correct, I need to talk to him.

Yoshua notes that ``there is a x_i log(beta) term missing, if you
compare with eqn 2 (i.e., take the log). They did not include in
3 because it does not depend on the parameters, so the gradient
wrt it would be 0. But if you really want log-likelihood it should
be included.'' If you want a true log-likelihood, you probably should
actually compute the derivative of the entire eqn 2.

Axis is the axis along which we sum the target values, to obtain
the document length.

If sumloss, we sum the loss along axis.

If zerothreshold is non-zero, we threshold the loss:
    If this target dimension is zero and beta * tensor.exp(output)
    < zerothreshold, let this loss be zero.

@todo: Include logfactorial term


Variables Details [hide private]

scalar_logfactorial

Value:
LogFactorial(scalar.upgrade_to_float, name= 'scalar_logfactoral')

logfactorial

Value:
tensor.Elemwise(scalar_logfactorial, name= 'logfactorial')