Does Critic Expertise on Rotten Tomatoes Overcome the Law of Large Numbers?
In the evolution of my “really like” movie algorithm, one of the difficulties I keep encountering is how should I integrate Rotten Tomatoes ratings in a statistically significant way. Every time I try I keep rediscovering that its ratings are not as useful as the other websites that I use. It’s not that it has no use. To determine if a movie is worth seeing within a week after its release, you’ll be hard pressed to find a better indicator. The problem is that most of the data for a particular movie is counted in that first week. Most of the critic reviews are completed close to the release dates to provide moviegoers with guidance on the day a movie is released. After that first week, the critics are on to the next batch of new movies to review. With all of the other websites, the ratings continually get better as more people see the movie and provide input. The data pool gets larger and the law of large numbers kicks in. With Rotten Tomatoes, there is very little data growth. Its value is based on the expertise of the critics and less on the law of large numbers.
The question becomes what is the value of film critics expertise. It is actually pretty valuable. When Rotten Tomatoes slots movies into one of their three main rating buckets (Certified Fresh, Fresh, Rotten), it does create a statistically significant differentiation.
|Rating||“Really Like” %|
Rotten Tomatoes is able to separate pretty well those movies I “really like” from those I don’t.
So what’s the problem? If we stick to Certified Fresh movies we’ll “really like” them 7 out of 10 times. That’s true. And, if I’m deciding on which new release to see in the movie theater, that’s really good. But, if I’m deciding what movie my wife and I should watch on Friday night movie night and our selection is from the movies on cable or our streaming service, we can do better.
Of the 1,998 movies I’ve seen in the last 15 years, 923 are Certified Fresh. Which of those movies am I most likely to “really like”? Based on the following table, I wouldn’t rely on the Rotten Tomatoes % Fresh number.
|Rating||% Fresh Range||“Really Like” %|
|Certified Fresh||96 to 100%||69.9%|
|Certified Fresh||93 to 95%||73.4%|
|Certified Fresh||89 to 92%||68.3%|
|Certified Fresh||85 to 88%||71.2%|
|Certified Fresh||80 to 84%||73.0%|
|Certified Fresh||74 to 79%||65.3%|
This grouping of six equal size buckets suggests that there isn’t much difference between a movie in my database that is 75% Fresh and one that is 100% Fresh. Now, it is entirely possible that there is an actual difference between 75% Fresh and 100% Fresh. It is possible that, if my database were larger, my data might produce a less random pattern which might be statistically significant. For now, though, the data is what it is. Certified Fresh is predictive and the % Fresh part of the rating less so.
Expertise can reduce the numbers needed for meaningful differentiation between what is Certified Fresh and what is Rotten. The law of large numbers, though, may be too daunting for credible guidance much beyond that.