Datasets

These datasets can be used for benchmarking deep learning algorithms:

Symbolic Music Datasets


Natural Images


Artificial Datasets

Faces


Text


Speech


Recommendation Systems

  • MovieLens: Two datasets available from http://www.grouplens.org. The first dataset has 100,000 ratings for 1682 movies by 943 users, subdivided into five disjoint subsets. The second dataset has about 1 million ratings for 3900 movies by 6040 users. 
  • Jester: This dataset contains 4.1 million continuous ratings (-10.00 to +10.00) of 100 jokes from 73,421 users.
  • Netflix Prize: Netflix released an anonymised version of their movie rating dataset; it consists of 100 million ratings, done by 480,000 users who have rated between 1 and all of the 17,770 movies.
  • Book-Crossing dataset: This dataset is from the Book-Crossing community, and contains 278,858 users providing 1,149,780 ratings about 271,379 books.

Misc