Evolution of Formula-1

If you torture the data long enough, it will confess.

~ Ronald Coase

This is the second blog post related to Data Visualization and this time it’s about Formula 1. I am going to comment on datasets related to Formula-1 racing . You can download the dataset from here . The DataSet and code can be found here too.

“Race cars are neither ugly nor beautiful. They become beautiful when they win.”

Enzo Ferrari, Founder of Scuderia Ferrari

Some of the stacks used are : Matplotlib, Seaborn , Plotly, Pandas, Numpy,

Formula One cars are the fastest regulated road-course racing cars in the world and many famous constructors like Ferrari, Mercedes, Red-Bull, McLaren , Renault participate in this with their highly engineered cars.

The success in Formula-1 depends upon cars and drivers. But ,Recent years have seen total domination by one team in any given season. It seems that if a team manages to find an engineering edge for their cars, they will win the drivers’ and the constructors’ championships regardless of how good the opposition drivers are. Ferrari, Brawn, Red Bull, now Mercedes – same story. There have been years where Ferrari reigned and then in more recent years Red Bull and then Mercedes have won Championships .

Let’s see which constructor has won most percentage and absolute count of championships.


The upper plot shows the number of Constructor’s championship won by the constructors. The following Plot will show the winners with respective year , so that the plot would be more interesting.

As you can see last 6 years is all Mercedes, from 2010-13 all Red Bull. And this is a proof that machine and constructors are most important factors when it comes to Championship win.Let’s see one more graph

Let’s see this Boxplot , this boxplot is a trivial one and generally it’s not a good idea to use boxplot in this condition. But one thing can be well explained by this boxplot .If we see from 1980s to mid 2000s , there has been three major constructors(Ferrari, McLaren, Williams) who won large number of championships, and they won alternately , i.e there was a three sided wars and all three won their fair share. But after mid-2000s , we can see it has just been a short period of consecutive wins by various constructors as Renault , Red Bull and now Mercedes.when we see after 2005 period it has been all one sided for a period of time, and then a new winner emerges.

Before going to next topic I will also add the plot for most number of championship wins for drivers.

We can see there are a very limited number of driver who has won the championship that started from 1950s , and constructors play a major role in this.

How the speed of cars have changed over time

It will be interesting to see how cars have evolved in all these years, how their speed have changed , how different tracks affect speed.

Hmm, from graph, it doesn’t seem there has been much of improvement in the speed.

We can also see a 3 year span of short lived Indian GP. Italian GP recorded the fastest lap speed , while Singapore GP the least.

Let’s see top 10 fastest Lap and along with their drivers and Constructors.

This info is slightly outdated and now Kimi Räikkönen holds the record with fastest lap speed of 263.587 km/h.(this graph doesn’t contain 2018,2019 championship’s data.)

Now , instead of championship we will see most number of “Races” won by each Driver and Constructor. so,

Top 30 Constructors on basis of Most Races won(i.e most Top finish till 2017)

Well , we can see It is lead by Ferrari and McLaren , and Mercedes are chasing them very well , and it would become more evident if we were to include 2018,2019 data. The below graph is for drivers,

Schumacher and Hamilton !

Is there a British superiority in F1 ?

In this plot the color is coded as per nationality (you can see by hovering over graph). Well it may be not completely dominated by british constructors but a well portion of it has been. This will become more evident by following graph:

Nationality of constructors and count of wins in races

It is slightly more evident from this plot, that british constructors have been more dominant than anyone else.

I am going to analyze two density plots:

Density plot of wins for Constructors, What does this Plot mean?

This plot shows the distribution of race wins over 1950s-2017.We can see the winners of race comes from a very narrow distribution of constructors. This distribution expands as we go from ‘Top 1’ to ‘top 3’ to ‘top 10’ finish.Now let’s see this density plot for drivers:

Well, same explanation as above plot, but we have drivers here.

so , this has been my analysis for plot for the F1 championship, I would like to add one more plot for Most number of wins by Driver encoded with their nationality:

So, I stumbled across this dataset and it was way more intresting and huge than my previous movie dataset.So , after a day and half of continuous tinkering this is the result I got. Hope you guys will like it.

In next post, Probably I would be coming with an another interesting dataset.

Till then 🙂

What I see , When I see

This post is just an outcome of my “research” of what I have watched over past 2.5 years. I was interested in it’s Statistics as well as looking to explore what Movies and TV shows I should watch next. So here it is, my analysis.I just wrote a stupid post about it because , Adam Savage has said:

The difference between screwing around and science is writing it down.

I took it quite literally 😉

Why is it that when one man builds a wall, the next man immediately needs to
know what’s on the other side?

—Tyrion Lannister in George R.R. Martin’s A Game of Thrones

The data source is generated from IMDb(sadly, I started using IMDb to track my watchlist only about 2.5 yrs ago, so the time period,)

Using Python, Pandas , Seaborn and Matplotlib , I analyzed my Watching history. The code used can be found in this Github Repo .

Movies and Shows by percentage in List

The total numbers of rated titles was 228 in numbers. Out of which 181 were movies, which account for 80.4%, while remaining were TV series accounting for nearly 19.6%.

I am going to analyze my movie list initially as it comprises of major portion of my rating list. A single movie may belong to multiple genres as “(500) Days of Summer” belonged to ‘Comedy, Drama, Romance‘ and one of my favourite neo-noir film American Psycho belongs to genre ‘comedy, drama, crime’. So , I first unstacked the genres using Pandas and then moved to plot using versatile matplotlib library. The colorbar is also included which I have drawn keeping Edward Tufte’s rules for data visualization in mind.

If You are a fan of Sci-Fi or an ardent Drama Fan, recommend me some.

I also mapped the movies I have seen as per their release year, which was also interesting,

Well 2017 accounted for most movies because of release of ‘It , Wonder Woman, Justice League, Qarib Qarib Single, Logan and Dunkirk’.

I would say 1999 was a great year as ‘Fight Club’ , “American Beauty’, ‘The Matrix’ was released , but 1994 was better which gave ‘Shawshank Redemption’ , ‘Forrest Gump’ , and ‘ Pulp Fiction’. I would say Tarantino is spooky , and I have to watch him more often.

Well , It seems I have “invested ” ;=) a lot of time watching movies , so let’s see what is the distribution of screen time of movies. The mean was 128.34 minutes with a standard deviation of 28 minutes. The distribution seems to be normal(although not completely), but we have a large sample size of 181 titles, so our watching time should also follow the empirical 67.5%-95%-99.7% rule, due to central limit theorem.

Hmm, Almost a normal distribution, This histogram shows the duration of movies , well the whooping 300 minute out-lier is none other than “Gangs Of Wasseypur”. Most movie are between 110 to 150 minutes as should be expected .

I now want to see , how my rating compares to the average IMDb rating, do I have same likeness towards a movie and series or does it vary with respect to main stream audiences.

I added some jitter to scatter plot coz otherwise it would not be much useful. As you can see my rating and IMDb’s rating are not that common , but it is moderately common , the Pearson’ s Correlation coefficient came out to be 0.225

You can see I rated most of the movies 8+ , I would contribute this to the research I do before watching any movies 😉 or series .

I then went on to see the directors whose work I have seen most.

Thanks to the Indiana Jones , Spielberg was the most seen director. Kubrick’s ‘The Shining’ , ‘A Clockwork Orange’ , ‘A space Odyssey’, ‘Dr Strangelove’ and Nolan’s ‘The Dark Knight Series’ were all awesome. I have to see more of Quentin Tarantino though. ‘The Before Trilogy’ and ‘Dazed and Confused’ directed by Richard Linklater were great. This graph is leading me to another Quarantine binge watching trip.

Some of the Most rated movies(in terms of numbers of votes IMDb users) In my library.

Well , one thing I noticed that all these films were great and all these have such high number of ratings. I then went on to analyze what are their IMDb rating. My assumptions were if so many people watched these films they should have good IMDb ratings too. I wanted to do this for some top tv series in my list too.

Well , Breaking Bad is awesome and so is Mr.Robot

So , for both Tv shows and movies I plotted a linear regression model to see correlation between ‘Num of votes’ and ‘IMDb Rating’. The pearson correlation was about 0.48 , which was fairly strong. you can see it below:

The relation is fairly strong which is shown by this statistical model.

To recap, I downloaded my watching history from IMDb, analyzed and visualized the data ,did some statistical analysis, while listening to Bowie and Doors while hanging on night time coffee. Hope you will atleast like my movie selections ,if not this post, Otherwise I will have to fix you an appointment to Dr .Hannibal Lecter . And as Albert Einstein Said:

Science is a wonderful thing if one does not have to earn one’s living at it.

Albert Einstein

In next post, Probably I would be coming with an another interesting dataset.

Till then :>

Design a site like this with WordPress.com
Get started