Last week in my post I spent some time leading you through my thought process in developing a Watch List. There were some loose threads in that article that I’ve been tugging at over the last week.
The first thread was the high “really like” probability that my algorithm assigned to two movies, Fight Club and Amelie, that I “really” didn’t like the first time I saw them. It bothered me to the point that I took another look at my algorithm. Without boring you with the details, I had an “aha” moment and was able to reengineer my algorithm in such a way that I can now develop unique probabilities for each movie. Prior to this I was assigning the same probability to groups of movies with similar ratings. The result is a tighter range of probabilities clustered around the base probability. The base probability is defined as the probability that I would “really like” a movie randomly selected from the database. If you look at this week’s Watch List, you’ll notice that my top movie, The Untouchables, has a “really like” probability of 72.2%. In my revised algorithm that is a high probability movie. As my database gets larger, the extremes of the assigned probabilities will get wider.
One of the by-products of this change is that the rating assigned by Netflix is the most dominant driver of the final probability. This is as it should be. Netflix has by far the largest database of any I use. Because of this it produces the most credible and reliable ratings of any of the rating websites. Which brings me back to Fight Club and Amelie. The probability for Fight Club went from 84.8% under the old formula to 50.8% under the new formula. Amelie went from 72.0% to 54.3%. On the other hand, a movie that I’m pretty confident that I will like, Hacksaw Ridge changed only slightly from 71.5% to 69.6%.
Another thread I tugged at this week was in response to a question from one of the readers of this blog. The question was why was Beauty and the Beast earning the low “really like” probability of 36.6% when I felt that there was a high likelihood that I was going to “really like” it. The fact is that I saw the movie this past week and it turned out to be a “really like” instant classic. I rated it a 93 out of 100, which is a very high rating from me for a new movie. In my algorithm, new movies are underrated for two reasons. Because they generate so few ratings in their early months, e.g. Netflix has only 2,460 ratings for Beauty and the Beast so far, the credibility of the movie’s own data is so small that the “really like” probability is driven by the Oscar performance part of the algorithm. This is the second reason for the low rating. New movies haven’t been through the Oscar cycle yet and so their Oscar performance probability is that of a movie that didn’t earn an Oscar nomination, or 35.8%. This is why Beauty and the Beast was only at 36.6% “really like” probability on my Watch List last week.
I’ll leave you this week with a concern. As I mentioned above, Netflix is the cornerstone of my whole “really like” system. You can appreciate then my heart palpitations when it was announced a couple of weeks ago that Netflix is abandoning it’s five star rating system in April. It is replacing it with a thumbs up or down rating with a % next to it, perhaps a little like Rotten Tomatoes. While I am keeping and open mind about the change, it has the potential of destroying the best movie recommender system in the business. If it does, I will be one “mad” movie man, and that’s not “crazy” mad.