Package pylearn :: Package datasets :: Module dataset :: Class Dataset
[hide private]

type Dataset

source code


Nested Classes [hide private]
Obj
Class Variables [hide private]
  data = None
SIMPLE REGRESSION / CLASSIFICATION ----------------------------------
  train = None
  valid = None
  test = None
  n_classes = None
WHEN INPUTS ARE FIXED-SIZE GREYSCALE IMAGES -------------------------------------------
  img_shape = None
When inputs 'x' must somehow be preprocessed, processor is a function that will take care of it.
  preprocess = None
TIMESERIES ----------
Class Variable Details [hide private]

data


SIMPLE REGRESSION / CLASSIFICATION
----------------------------------

In this setting, you are aiming to do vector classification or vector regression
where your train, valid and test sets fit in memory.
The convention is to put your data into numpy ndarray instances.  Put training data in the
`train` attribute,  validation data in the `valid` attribute and test data in the `test
attribute`.
Each of those attributes should be an instance that defines at least two attributes: `x` for the
input matrix and `y` for the target matrix.  The `x` ndarray should be one example per
leading index (row for matrices).
The `y` ndarray should be one target per leading index (entry for vectors, row for matrices).
If `y` is a classification target, than it should be a vector with numpy dtype 'int32'.

If there are weights associated with different examples, then create a 'weights' attribute whose
value is a vector with one floating-point value (typically double-precision) per example.

If the task is classification, then the classes should be mapped to the integers
0,1,...,N-1.
The number of classes (here, N) should be stored in the `n_classes` attribute.

Value:
None

n_classes


WHEN INPUTS ARE FIXED-SIZE GREYSCALE IMAGES
-------------------------------------------

In this setting we typically encode images as vectors, by enumerating the pixel values in
left-to-right, top-to-bottom order.  Pixel values should be in floating-point, and
normalized between 0 and 1.

The shape of the images should be recorded in the `img_shape` attribute as a tuple (rows,
cols).

Value:
None

img_shape

When inputs 'x' must somehow be preprocessed, processor is a function that will take care of it. A cleaner (transparent) alternative would be for x to wrap the data intelligently.

Value:
None

preprocess


TIMESERIES
----------

When dealing with examples which are themselves timeseries, put each example timeseries in a
tensor and make a list of them.  Generally use tensors, and resort to lists or arrays
wherever different 

Value:
None