Movielens: The Reliable Alternative
In previous posts I’ve expressed my concern with corporate interests impacting the integrity of movie recommender algorithms. IMDB is owned by Amazon. Rotten Tomatoes is owned by Fandango. Netflix is owned by, well, …Netflix. Criticker isn’t corporately owned but is partially funded by commercial advertising. Now I present to you, Movielens, which isn’t owned by a corporation and doesn’t advertise on its website. Movielens is operated by GroupLens Research at the University of Minnesota. It exists for the benefit of students at the University who are researching predictive modeling. In other words, it exists because it wants to build the best recommender of movies that you will “really like” that you can possibly build. There is no corporate bottom line. There is just the goal of building a better mousetrap.
So far, it’s done a pretty good job. My benchmark for movies that I will “really like” is 4 out of 5 stars, or 7.5 out of 10, or 75 out of 100, depending on the rating scale used. When I calculate the probability that I will “really like” a movie that meets that benchmark for each individual website, I get the following results:
|Website||Recommend Criteria||Probability I Will “Really Like”|
|Rotten Tomatoes||Certified Fresh||89.0%|
Movielens holds its own with Netflix and, unlike Netflix, its algorithm is not held hostage to their corporate interests, and its free. All you have to do is click on the MovieLens link above, sign up, and begin rating movies that you’ve seen. Even though MovieLens uses a five star scale, you can enter half stars. You will, at times, be torn between whether you “really like” a movie or just “like” it. MovieLens lets you enter 3 1/2 stars for that situation.
I encourage you to use MovieLens. You can pat yourself on the back for making a contribution to science.
Geek Alert!! Geek Alert!!
If you look at the Movie Lists I updated yesterday, you may be puzzled why so many movies have the same probability. Each month I recalibrate the probabilities in my Bayesian model. I’m constantly experimenting to get the right balance between a model that has many probability differences among movies, but more uncertainty about reliability, and a model that has fewer probability differences among individual movies but is more reliable. Too many probability groups create the risk of randomness creeping into the probabilities. The Bayesian Model recognizes this and shifts the probabilities closer to the probability of a random movie selection. This happened last month when I used 20 groups. When fewer groups are used, the larger groups that result are more credible and produce less randomness. The model recognizes this and allows for probabilities closer to the tendencies of the group. This month I went back to 5 groups which produces more reliable probability results but with more movies with the same probability. Just in case you were wondering:)