MovieLens 100k benchmark results

From RecSysWiki
Revision as of 07:10, 2 February 2013 by Zeno Gantner (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This page is a first example of how a benchmark page in RecSysWiki could look like. It is work in progress. Please contribute and comment.

Rationale: This is primarily meant to be a comparison between methods, not between tools. This is why we sort by method. At the same time, we state the version number and all input arguments for maximum reproducibility.

If there are two lines for one method, then the first line are results with the random seed set to 1; the second line (or otherwise the only line) contains the average results for 5 runs with random initialization.

Baseline Methods

Software Method 5-fold CV all-but-10 References
MyMediaLite 3.07 GlobalAverage 1.1256 1.1238
MyMediaLite 3.07 UserAverage 1.0437 1.0518
MyMediaLite 3.07 ItemAverage 1.0246 1.0453
MyMediaLite 3.07 UserItemBaseline 0.9413 0.9656

kNN-based Collaborative Filtering

Software Method 5-fold CV all-but-10 References
MyMediaLite 3.07 UserKNN 0.9283 0.9572
MyMediaLite 3.07 ItemKNN 0.9182 0.9445

Matrix Factorization

Software Method 5-fold CV all-but-10 References
MyMediaLite 3.07 BiasedMatrixFactorization 0.9220 0.9475
MyMediaLite 3.07 SVDPlusPlus 0.9112 0.9409
MyMediaLite 3.07 SigmoidUserAsymmetricFactorModel 0.8939 0.9232


Attribute-Aware Methods

Other Methods

Disclaimers

  • The results presented here come with no warranty whatsoever. Use at your own risk.
  • Most if not all results are self-reported by the implementations, which may contain bugs in their evaluation routines.
  • The results are not necessarily fair towards the compared methods and implementations. There could be hyper-parameter overfitting, or you could achieve a lot better results by better tuning.
  • MovieLens 100k is one of the oldest existing collaborative filtering datasets, and it was dominating the literature for years, because it was one of the few available datasets. It could be that methods developed in that period have a certain bias towards this dataset. The dataset is also quite small by today's standards.