Will I "Really Like" this Movie?

Navigating Movie Website Ratings to Select More Enjoyable Movies

Archive for the month “April, 2017”

“Really Like” Movies: Is That All There Is?

After scoring a movie that I’ve watched, one of my rituals is to read a critic’s review of the movie. If the movie is contemporaneous to Roger Ebert’s tenure as the world’s most read critic, he becomes my critic of choice. I choose Ebert, first of all, because he is a terrific writer. He has a way of seeing beyond the entertainment value of the movie and observing how it fits into the culture of the time. I also choose Ebert because I find that he “really likes” many of the movies I “really like”. He acts as a validator of my film taste.

The algorithm that I use to find “really like” movies to watch is also a validator. It sifts through a significant amount of data about a movie I’m considering and validates whether I’ll probably “really like” it or not based on how I’ve scored other movies. It guides me towards movies that will be “safe” to watch. That’s a good thing. Right? I guess so. Particularly, if my goal is to find a movie that will entertain me on a Friday night when I might want to escape the stress of the week.

But what if I want to experience more than a comfortable escape? What if I want to develop a more sophisticated movie palate? That won’t happen if I only watch movies that are “safe”. Is it possible that my algorithm is limiting my movie options by guiding me away from movies that might expand my taste? My algorithm suggests that because I “really liked” Rocky I & II, I’ll “really like” Rocky III as well. While that’s probably a true statement, the movie won’t surprise me. I’ll enjoy the movie because it is a variation of a comfortable and enjoyable formula.

By the same token, I don’t want to start watching a bunch of movies that I don’t “really like” in the name of expanding my comfort zone. I do, however, want to change the trajectory of my movie taste. In the end, perhaps it’s an algorithm design issue. Perhaps, I need to step back and define what I want my algorithm to do. It should be able to walk and chew gum at the same time.

I mentioned that I used Roger Ebert’s reviews because he seemed to “really like” many of the same movies that I “really liked”. It’s important to note that Roger Ebert “really liked” many more movies than I have over his lifetime. Many of those movies are outside my “really like” comfort zone. Perhaps I should aspire to “really like” the movies that Ebert did rather than find comfort that Ebert “really liked” the movies that I did.


Does Critic Expertise on Rotten Tomatoes Overcome the Law of Large Numbers?

In the evolution of my “really like” movie algorithm, one of the difficulties I keep encountering is how should I integrate Rotten Tomatoes ratings in a statistically significant way. Every time I try I keep rediscovering that its ratings are not as useful as the other websites that I use. It’s not that it has no use. To determine if a movie is worth seeing within a week after its release, you’ll be hard pressed to find a better indicator. The problem is that most of the data for a particular movie is counted in that first week. Most of the critic reviews are completed close to the release dates to provide moviegoers with guidance on the day a movie is released. After that first week, the critics are on to the next batch of new movies to review. With all of the other websites, the ratings continually get better as more people see the movie and provide input. The data pool gets larger and the law of large numbers kicks in. With Rotten Tomatoes, there is very little data growth. Its value is based on the expertise of the critics and less on the law of large numbers.

The question becomes what is the value of film critics expertise. It is actually pretty valuable. When Rotten Tomatoes slots movies into one of their three main rating buckets (Certified Fresh, Fresh, Rotten), it does create a statistically significant differentiation.

Rating “Really Like” %
Certified Fresh 69.7%
Fresh 50.0%
Rotten 36.6%

Rotten Tomatoes is able to separate pretty well those movies I “really like” from those I don’t.

So what’s the problem? If we stick to Certified Fresh movies we’ll “really like” them 7 out of 10 times. That’s true. And, if I’m deciding on which new release to see in the movie theater, that’s really good. But, if I’m deciding what movie my wife and I should watch on Friday night movie night and our selection is from the movies on cable or our streaming service, we can do better.

Of the 1,998 movies I’ve seen in the last 15 years, 923 are Certified Fresh. Which of those movies am I most likely to “really like”? Based on the following table, I wouldn’t rely on the Rotten Tomatoes % Fresh number.

Rating % Fresh Range “Really Like” %
Certified Fresh 96 to 100% 69.9%
Certified Fresh 93 to 95% 73.4%
Certified Fresh 89 to 92% 68.3%
Certified Fresh 85 to 88% 71.2%
Certified Fresh 80 to 84% 73.0%
Certified Fresh 74 to 79% 65.3%

This grouping of six equal size buckets suggests that there isn’t much difference between a movie in my database that is 75% Fresh and one that is 100% Fresh. Now, it is entirely possible that there is an actual difference between 75% Fresh and 100% Fresh. It is possible that, if my database were larger, my data might produce a less random pattern which might be statistically significant. For now, though, the data is what it is. Certified Fresh is predictive and the % Fresh part of the rating less so.

Expertise can reduce the numbers needed for meaningful differentiation between what is Certified Fresh and what is Rotten. The law of large numbers, though, may be too daunting for credible guidance much beyond that.



Some Facts Are Not So Trivial

As I’ve mentioned before on these pages, I always pay a visit to the IMDB trivia link after watching a movie. Often I will find a fun but ultimately trivial fact such as the one I discovered after viewing Beauty and the Beast. According to IMDB, Emma Watson was offered the Academy Award winning role of Mia in La La Land but turned it down because she was committed to Beauty and the Beast. Coincidentally, the heretofore non-musical Ryan Gosling was offered the role of the Beast and turned it down because he was committed to that other musical, La La Land. You really can’t fault either of their decisions. Both movies have been huge successes.

On Tuesday I watched the “really like” 1935 film classic Mutiny on the Bounty.My visit to the trivia pages of this film unearthed facts that were more consequential than trivial. For example, the film was the first movie of  historically factual events with actors playing historically factual people to win the Academy Award for Best Picture. The previous eight winners were all based on fiction. Real life became a viable source for great films as the next two Best Picture winners, The Great Ziegfeld and The Life of Emile Zola, were also biographies. Interestingly, it would be another 25 years before another non-fictional film, Lawrence of Arabia, would win a Best Picture award.

Mutiny on the Bounty also has the distinction of being the only movie ever to have three actors nominated for Best Actor. Clark Gable, Charles Laughton, and Franchot Tone were all nominated for Best Actor. Everyone expected one of them to win. After splitting the votes amongst themselves, none of them won. Oscar officials vowed to never let that happen again. For the next Academy Awards in 1937, they created two new awards for Actor and Actress in a Supporting Role. Since then, in only six other instances, have two actors from the same movie been nominated for Best Actor.

Before leaving Mutiny on the Bounty, there is one more non-trivial fact to relate about the movie. The characters of Captain Bligh and First Mate Fletcher Christian grow to hate each other in the plot. To further that requisite hate in the movie, Irving Thalberg, one of the producers, purposely cast the overtly gay Charles Laughton as Bligh and the notorious homophobe Gable as Fletcher Christian. This crass manipulation of the actors’ prejudice seemed to have worked as the hate between the two men was evident on the set and clearly translated to the screen. This kind of morally corrupt behavior was not uncommon in the boardrooms of the Studio system in Hollywood at the time.

Some other older Best Picture winning films with facts, not trivial, but consequential to the film industry or the outside world include:

  • It Happened One Night, another Clark Gable classic, in 1935 became the first of only three films to win the Oscar “grand slam”. The other two were One Flew Over the Cuckoo’s Nest and Silence of the Lambs. The Oscar “grand slam” is when a movie wins all five major awards, Best Picture, Director, Actor, Actress, and Screenplay.
  • Gone with the Wind, along with being the first Best Picture filmed in color,  is the longest movie, at four hours, to win Best Picture. Hattie McDaniel became the first black actor to be nominated and win an Oscar for her role in the film.
  • In Casablanca, there is a scene where the locals drown out the Nazi song “Watch on the Rhine” with their singing of the “Marseillaise”. In that scene you can see tears running down the cheeks of many of the locals. For many of these extras the tears were real since they were actual refugees from Nazi tyranny. Ironically, many of the Nazis in the scene were also German Jews who had escaped Germany.
  • To prepare for his 1946 award winning portrayal of an alcoholic in The Lost Weekend, IMDB reveals that “Ray Milland actually checked himself into Bellevue Hospital with the help of resident doctors, in order to experience the horror of a drunk ward. Milland was given an iron bed and he was locked inside the “booze tank.” That night, a new arrival came into the ward screaming, an entrance which ignited the whole ward into hysteria. With the ward falling into bedlam, a robed and barefooted Milland escaped while the door was ajar and slipped out onto 34th Street where he tried to hail a cab. When a suspicious cop spotted him, Milland tried to explain, but the cop didn’t believe him, especially after he noticed the Bellevue insignia on his robe. The actor was dragged back to Bellevue where it took him a half-hour to explain his situation to the authorities before he was finally released.”
  • In the 1947 film Gentlemen’s Agreement about anti-Semitism, according to IMDB, “The movie mentions three real people well-known for their racism and anti-Semitism at the time: Sen. Theodore Bilbo (D-Mississippi), who advocated sending all African-Americans back to Africa; Rep. John Rankin (D-Mississippi), who called columnist Walter Winchell  “the little kike” on the floor of the House of Representatives; and leader of “Share Our Wealth” and “Christian Nationalist Crusade” Gerald L. K. Smith, who tried legal means to prevent Twentieth Century-Fox from showing the movie in Tulsa. He lost the case, but then sued Fox for $1,000,000. The case was thrown out of court in 1951.”

One of the definitions of “trivia” is “an inessential fact; trifle”. Because IMDB lists facts under the Trivia link does not make them trivia. The facts presented here either promoted creative growth in the film industry or made a significant statement about society. Some facts are not so trivial.




Sometimes When You Start To Go There You End Up Here

There are some weeks when I’m stumped as to what I should write about in this weekly trip to Mad Moviedom. Sometimes I’m in the middle of an interesting study that isn’t quite ready for publication. Sometimes an idea isn’t quite fully developed. Sometimes I have an idea but I find myself blocked as to how to present it. When I find myself in this position, one avenue always open to me is to create a quick study that might be halfway interesting.

This is where I found myself this week. I had ideas that weren’t ready to publish yet. So, my fallback study was going to be a quick study of which movie decades present the best “really like” viewing potential. Here are the results of my first pass at this:

“Really Like” Decades
Based on Number of “Really Like” Movies
As of April 6, 2017
My Rating
Really Liked Didn’t Really Like Total “Really Like” Probability
 All       1,108                  888         1,996
 2010’s           232                  117            349 60.9%
 2000’s           363                  382            745 50.5%
 1990’s           175                    75            250 62.0%
 1980’s             97                    60            157 58.4%
 1970’s             56                    49            105 54.5%
 1960’s             60                    55            115 53.9%
 1950’s             51                    78            129 46.6%
 1940’s             55                    43               98 55.8%
 1930’s             19                    29               48 46.9%

These results are mildly interesting. The 2010’s, 1990″s, 1980’s, and 1940’s are above average decades for me. There are an unusually high number of movies in the sample that were released in the 2000’s. Remember that movies stay in my sample for 15 years from the year I last watched the movie. After 15 years they are removed from the sample and put into the pool of movies available to watch again. The good movies get watched again and the other movies are never seen again, hopefully. Movies last seen after 2002 have not gone through the process of separating out the “really like” movies to be watched again and permanently weeding from the sample the didn’t “really like” movies. The contrast of the 2000’s with the 2010’s is a good measure of the impact of the undisciplined selection movies and the disciplined selection.

As I’ve pointed out in recent posts, I’ve made some changes to my algorithm. One of the big changes I’ve made is that I’ve replaced the number of movies that are “really like” movies with the number of ratings for the movies that are “really like” movies. After doing my decade study based on number of movies, I realized I should have used the number of ratings method to be consistent with my new methodology. Here are the results based on the new methodology:

“Really Like” Decades
Based on Number of “Really Like” Ratings
As of April 6, 2017
My Rating
Really Liked Didn’t Really Like Total “Really Like” Probability
 All    2,323,200,802    1,367,262,395    3,690,463,197
 2010’s        168,271,890        166,710,270        334,982,160 57.1%
 2000’s    1,097,605,373        888,938,968    1,986,544,341 56.6%
 1990’s        610,053,403        125,896,166        735,949,569 70.8%
 1980’s        249,296,289        111,352,418        360,648,707 65.3%
 1970’s          85,940,966          25,372,041        111,313,007 67.7%
 1960’s          57,485,708          15,856,076          73,341,784 68.0%
 1950’s          28,157,933          23,398,131          51,556,064 59.5%
 1940’s          17,003,848            5,220,590          22,224,438 67.4%
 1930’s            9,385,392            4,517,735          13,903,127 64.6%

While the results are different, the big reveal was that 63.0% of the ratings are for “really like” movies and only 55.5% of the number of movies are “really like” movies. It starkly reinforces the impact of the law of large numbers. Movie website indicators of “really like” movies are more reliable when the number of ratings driving those indicators are larger. The following table illustrates this better:

“Really Like” Decades
Based on Average Number of “Really Like” Ratings per Movie
As of April 6, 2017
My Rating
Really Liked Didn’t Really Like Total “Really Like” % Difference
 All      2,096,751.63      1,539,709.90      1,848,929.46 36.2%
 2010’s          725,309.87      1,424,874.10          959,834.27 -49.1%
 2000’s      3,023,706.26      2,327,065.36      2,666,502.47 29.9%
 1990’s      3,486,019.45      1,678,615.55      2,943,798.28 107.7%
 1980’s      2,570,064.84      1,855,873.63      2,297,125.52 38.5%
 1970’s      1,534,660.11          517,796.76      1,060,123.88 196.4%
 1960’s          958,095.13          288,292.29          637,754.64 232.3%
 1950’s          552,116.33          299,976.04          399,659.41 84.1%
 1940’s          309,160.87          121,409.07          226,779.98 154.6%
 1930’s          493,968.00          155,783.97          289,648.48 217.1%

With the exception of the 2010’s, the average number of ratings per movie is larger for the “really like” movies. In fact, they are dramatically different for the decades prior to 2000. My educated guess is that the post-2000 years will end up fitting the pattern of the other decades once those years mature.

So what is the significance of this finding. It clearly suggests that waiting to decide whether to see a new movie or not until a sufficient number of ratings come in will produce a more reliable result. The unanswered question is how many ratings is enough.

The finding also reinforces the need to have something like Oscar performance to act as a second measure of quality for movies that will never have “enough” ratings for a reliable result.

Finally, the path from “there to here” is not always found on a map.

Post Navigation