This post is just an outcome of my “research” of what I have watched over past 2.5 years. I was interested in it’s Statistics as well as looking to explore what Movies and TV shows I should watch next. So here it is, my analysis.I just wrote a stupid post about it because , Adam Savage has said:
The difference between screwing around and science is writing it down.
I took it quite literally 😉
Why is it that when one man builds a wall, the next man immediately needs to
know what’s on the other side?
—Tyrion Lannister in George R.R. Martin’s A Game of Thrones
The data source is generated from IMDb(sadly, I started using IMDb to track my watchlist only about 2.5 yrs ago, so the time period,)
Using Python, Pandas , Seaborn and Matplotlib , I analyzed my Watching history. The code used can be found in this Github Repo .

The total numbers of rated titles was 228 in numbers. Out of which 181 were movies, which account for 80.4%, while remaining were TV series accounting for nearly 19.6%.
I am going to analyze my movie list initially as it comprises of major portion of my rating list. A single movie may belong to multiple genres as “(500) Days of Summer” belonged to ‘Comedy, Drama, Romance‘ and one of my favourite neo-noir film American Psycho belongs to genre ‘comedy, drama, crime’. So , I first unstacked the genres using Pandas and then moved to plot using versatile matplotlib library. The colorbar is also included which I have drawn keeping Edward Tufte’s rules for data visualization in mind.

I also mapped the movies I have seen as per their release year, which was also interesting,

I would say 1999 was a great year as ‘Fight Club’ , “American Beauty’, ‘The Matrix’ was released , but 1994 was better which gave ‘Shawshank Redemption’ , ‘Forrest Gump’ , and ‘ Pulp Fiction’. I would say Tarantino is spooky , and I have to watch him more often.
Well , It seems I have “invested ” ;=) a lot of time watching movies , so let’s see what is the distribution of screen time of movies. The mean was 128.34 minutes with a standard deviation of 28 minutes. The distribution seems to be normal(although not completely), but we have a large sample size of 181 titles, so our watching time should also follow the empirical 67.5%-95%-99.7% rule, due to central limit theorem.

I now want to see , how my rating compares to the average IMDb rating, do I have same likeness towards a movie and series or does it vary with respect to main stream audiences.

You can see I rated most of the movies 8+ , I would contribute this to the research I do before watching any movies 😉 or series .

Thanks to the Indiana Jones , Spielberg was the most seen director. Kubrick’s ‘The Shining’ , ‘A Clockwork Orange’ , ‘A space Odyssey’, ‘Dr Strangelove’ and Nolan’s ‘The Dark Knight Series’ were all awesome. I have to see more of Quentin Tarantino though. ‘The Before Trilogy’ and ‘Dazed and Confused’ directed by Richard Linklater were great. This graph is leading me to another Quarantine binge watching trip.

Well , one thing I noticed that all these films were great and all these have such high number of ratings. I then went on to analyze what are their IMDb rating. My assumptions were if so many people watched these films they should have good IMDb ratings too. I wanted to do this for some top tv series in my list too.

So , for both Tv shows and movies I plotted a linear regression model to see correlation between ‘Num of votes’ and ‘IMDb Rating’. The pearson correlation was about 0.48 , which was fairly strong. you can see it below:

To recap, I downloaded my watching history from IMDb, analyzed and visualized the data ,did some statistical analysis, while listening to Bowie and Doors while hanging on night time coffee. Hope you will atleast like my movie selections ,if not this post, Otherwise I will have to fix you an appointment to Dr .Hannibal Lecter . And as Albert Einstein Said:
Science is a wonderful thing if one does not have to earn one’s living at it.
Albert Einstein
In next post, Probably I would be coming with an another interesting dataset.
Till then :>