Will I "Really Like" this Movie?

Navigating Movie Website Ratings to Select More Enjoyable Movies

If You Want to Watch “Really Like” Movies, Don’t Count on IMDB.

Today’s post is for those of you who want to get your “geek” on. As regular readers of these pages are aware, IMDB is the least reliable indicator of whether I will “really like” a given movie. As you might also be aware, I am constantly making adjustments to my forecasting algorithm for “really like” movies. I follow the practice of establishing probabilities for the movies in my database, measuring how effectively those probabilities are at selecting “really like” movies, and revising the model to improve on the results. When that’s done, I start the process all over. Which brings me back to IMDB, the focus of today’s study.

My first step in measuring the effectiveness of IMDB at selecting “really like” movies is to rank the movies in the database by IMDB average rating and then divide the movies into ten groups of the same size. Here are my results:

> 8.1 198 64.6%
7.8 to 8.1 198 60.6%
7.7 to 7.8 198 64.6%
7.5 to 7.7 198 58.6%
7.4 to 7.5 198 55.1%
7.2 to 7.4 198 52.5%
7.0 to 7.2 198 42.4%
6.8 to 7.0 198 39.4%
6.5 to 6.8 198 35.4%
< 6.5 197 11.7%
All Movies          1,979 48.5%

There seems to be a correlation between IMDB rating and the probability of “really like” movies in the group. The problem is that the results suggest that IMDB does a better job identifying movies that you won’t “really like” rather than which ones that you will “really like”. For example, when I’ve gone through the same exercise for Netflix and Movielens, the probabilities for the top 10% of the ratings have been over 90% for each site, compared to the 64.6% for IMDB.

With the graph displayed here, you can begin to picture the problem.

IMDB Rating Graph

The curve peaks at 7.4. There are enough ratings on the low ratings side of the curve to create significant probability differences in the groups. On the low side, it looks more like a classic bell curve. On the high side, the highest rated movie, Shawshank Redemption has a 9.2 rating. The range between 7.4 and 9.2 is too narrow to create the kind of probability differences that would make IMDB a good predictor of “really like” movies. IMDB would probably work as a predictor of “really like” movies if IMDB voters rated average movies as a 5.0. Instead an average movie is probably in the low 7s.

So, what is a good average IMDB rating to use for “really like” movies? Let’s simplify the data from above:

> 7.7 636 62.7%
7.3 to 7.6 502 55.4%
< 7.2 841 33.7%
All Movies          1,979 48.5%

If we want to incrementally improve IMDB as a predictor of “really like” movies, we might set the bar at movies that are rated  7.7 or higher. I’m inclined to go in the opposite direction and utilize what IMDB does best, identify which movies have a high probability of not being “really like” movies. By setting the IMDB recommendation threshold at 7.3, we are identifying better than average movies and relying on the other recommender websites to identify the “really like” movies.

IMDB is one of the most utilized movie sites in the world. It has a tremendous amount of useful information. But,if you want to select movies that you will “really like” don’t count on IMDB.


