What I see , When I see

This post is just an outcome of my “research” of what I have watched over past 2.5 years. I was interested in it’s Statistics as well as looking to explore what Movies and TV shows I should watch next. So here it is, my analysis.I just wrote a stupid post about it because , Adam Savage has said:

The difference between screwing around and science is writing it down.

I took it quite literally 😉

Why is it that when one man builds a wall, the next man immediately needs to
know what’s on the other side?

—Tyrion Lannister in George R.R. Martin’s A Game of Thrones

The data source is generated from IMDb(sadly, I started using IMDb to track my watchlist only about 2.5 yrs ago, so the time period,)

Using Python, Pandas , Seaborn and Matplotlib , I analyzed my Watching history. The code used can be found in this Github Repo .

Movies and Shows by percentage in List

The total numbers of rated titles was 228 in numbers. Out of which 181 were movies, which account for 80.4%, while remaining were TV series accounting for nearly 19.6%.

I am going to analyze my movie list initially as it comprises of major portion of my rating list. A single movie may belong to multiple genres as “(500) Days of Summer” belonged to ‘Comedy, Drama, Romance‘ and one of my favourite neo-noir film American Psycho belongs to genre ‘comedy, drama, crime’. So , I first unstacked the genres using Pandas and then moved to plot using versatile matplotlib library. The colorbar is also included which I have drawn keeping Edward Tufte’s rules for data visualization in mind.

If You are a fan of Sci-Fi or an ardent Drama Fan, recommend me some.

I also mapped the movies I have seen as per their release year, which was also interesting,

Well 2017 accounted for most movies because of release of ‘It , Wonder Woman, Justice League, Qarib Qarib Single, Logan and Dunkirk’.

I would say 1999 was a great year as ‘Fight Club’ , “American Beauty’, ‘The Matrix’ was released , but 1994 was better which gave ‘Shawshank Redemption’ , ‘Forrest Gump’ , and ‘ Pulp Fiction’. I would say Tarantino is spooky , and I have to watch him more often.

Well , It seems I have “invested ” ;=) a lot of time watching movies , so let’s see what is the distribution of screen time of movies. The mean was 128.34 minutes with a standard deviation of 28 minutes. The distribution seems to be normal(although not completely), but we have a large sample size of 181 titles, so our watching time should also follow the empirical 67.5%-95%-99.7% rule, due to central limit theorem.

Hmm, Almost a normal distribution, This histogram shows the duration of movies , well the whooping 300 minute out-lier is none other than “Gangs Of Wasseypur”. Most movie are between 110 to 150 minutes as should be expected .

I now want to see , how my rating compares to the average IMDb rating, do I have same likeness towards a movie and series or does it vary with respect to main stream audiences.

I added some jitter to scatter plot coz otherwise it would not be much useful. As you can see my rating and IMDb’s rating are not that common , but it is moderately common , the Pearson’ s Correlation coefficient came out to be 0.225

You can see I rated most of the movies 8+ , I would contribute this to the research I do before watching any movies 😉 or series .

I then went on to see the directors whose work I have seen most.

Thanks to the Indiana Jones , Spielberg was the most seen director. Kubrick’s ‘The Shining’ , ‘A Clockwork Orange’ , ‘A space Odyssey’, ‘Dr Strangelove’ and Nolan’s ‘The Dark Knight Series’ were all awesome. I have to see more of Quentin Tarantino though. ‘The Before Trilogy’ and ‘Dazed and Confused’ directed by Richard Linklater were great. This graph is leading me to another Quarantine binge watching trip.

Some of the Most rated movies(in terms of numbers of votes IMDb users) In my library.

Well , one thing I noticed that all these films were great and all these have such high number of ratings. I then went on to analyze what are their IMDb rating. My assumptions were if so many people watched these films they should have good IMDb ratings too. I wanted to do this for some top tv series in my list too.

Well , Breaking Bad is awesome and so is Mr.Robot

So , for both Tv shows and movies I plotted a linear regression model to see correlation between ‘Num of votes’ and ‘IMDb Rating’. The pearson correlation was about 0.48 , which was fairly strong. you can see it below:

The relation is fairly strong which is shown by this statistical model.

To recap, I downloaded my watching history from IMDb, analyzed and visualized the data ,did some statistical analysis, while listening to Bowie and Doors while hanging on night time coffee. Hope you will atleast like my movie selections ,if not this post, Otherwise I will have to fix you an appointment to Dr .Hannibal Lecter . And as Albert Einstein Said:

Science is a wonderful thing if one does not have to earn one’s living at it.

Albert Einstein

In next post, Probably I would be coming with an another interesting dataset.

Till then :>

Design a site like this with WordPress.com
Get started