knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(ipl) library(dplyr) library(ggplot2) library(purrr)
The ipl
package aims at supplying data and functions for analysis of data on the Indian Premier League (IPL). It includes datasets on individual balls played in IPL games between 2008 and 2020, on the games themselves, and on the top 100 IPL batsmen and the top 100 IPL bowlers across time. Furthermore, the package includes simple functions on analysis of individual batsmen and bowlers, and for combined analysis of IPL matches.
Cricket is a globally played sport, which extensively uses statistics for the analysis of games and individual players. Moreover, the IPL is a game that brings players from across the globe together, thus providing a unique perspective on how different cricketers play together. However, data providing information on these games and players, or functions which allow for statistical analyses of the same are not widely accessible. Right now, there is no consolidated place for all this data on the IPL. Thus, we wanted to create a package that makes this analysis of cricketers easier, since users do not have to worry about data wrangling, missing data, or important cricket statistics, and can, instead, begin their analyses on IPL.
Anyone interested in cricket, and specifically the Indian Premier League, who wants to use actual data and functions to think critically about the game, conduct analyses, and make inferences should use this package.
deliveries
: Ball-by-ball data of IPL matches played in 2008-2020 teams
: Winning team, overs bowled, runs made and wickets fallen for each match played by each IPL team in 2008-2020ipl
: More information on matches from 2008 to 2020batsman_100
: Information of top 100 batsmen of IPLbowlers_100
: Informayion of top 100 bowlers of IPLbat_avg
\~ 134,112 Bbat_max
\~ 113,136 Bbatsman_summary
\~ 130,616 Bbowler_score
\~ 81,216 Bbowler_summary
\~ 90,000 Bbowling_analysis
\~ 93,216 Bcents_halfcents
\~ 120,688 Bfours
\~ 87,072 Bovers_balls
\~ 90,600 Bovers
\~ 77,240 Bpartnership_runs
\~ 93,000 Bruns
\~ 76,984 Bsixes
\~ 87,072 strike_rate
\~ 118,520 Btoss_choice
\~ 87,016 Bwickets_taken
\~ 77,888 Bwinloss
\~ 120,768 Bfor example:
# The batting average of Virat Kohli and MS Dhoni in the years 2016 and 2017 can be computed by bat_avg(c("V Kohli", "MS Dhoni"), 2016:2017) # The maximum runs made by MS Dhoni in the year 2017 can be computed by bat_max("MS Dhoni", 2017) # The strike rate of AB de Villiers in the year 2015 can be computed by strike_rate("AB de Villiers", 2015) # The total distribution of the toss choice of Royal Challengers Bangalore can be computed using toss_choice("Royal Challengers Bangalore") # The list of bowlers with a bowling average above 40 can be retrieved using bowler_score(40) # The batting summary of Virate Kohli can be accessed by batsman_summary("V Kohli") # The runs made by each batting partnership by Delhi Daredevils in their match against Rajasthan Royals in 2008 can be found by partnership_runs(335984, "Delhi Daredevils") # The bowling analysis of Harbhajan Singh can be accessed by bowling_analysis("Harbhajan Singh") # The summary table of the wins and losses of Kolkata Knight Riders in 2012 can be found by winloss("Kolkata Knight Riders", 2012)
More details can be found on the documentation pages which can be called via: ?function_name
The first six rows of the deliveries
data set looks like
head(deliveries)
The first six rows of the teams
data set looks like
head(teams)
The first six rows of the bowlers_100
data set looks like
head(bowlers_100)
We can use this package to address the (non-exhaustive) list of questions:
bat_avg(c("KL Rahul", "RG Sharma", "V Kohli"), 2013:2020) %>% ggplot(aes(year, batting_avg, color = batsman)) + geom_line() + labs( title = "Batting average between 2013 and 2020", x = "Year", y = "Batting Average", color = "Batsman" ) + theme_linedraw()
my_df <- map_df(c("MS Dhoni", "V Kohli", "AB de Villiers"), batsman_summary)
ggplot(aes(max_runs, strike_rate), data = my_df) + geom_point() + labs( x = "Maximum Runs", y = "Strike Rate", title = "Strike Rate and Maximum Runs" ) + theme_linedraw()
my_df2 <- my_df[, -1]
my_lm <- lm(strike_rate ~ ., data = my_df2) summary(my_lm)
Note that this example is not accurate since it contains a selection bias as only three players are used in the data frame. To return a regression with accurate results, data on all the batsmen should be used.
map_df(c("Rahul Sharma", "Harbhajan Singh"), bowling_analysis)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.