knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(ipl)
library(dplyr)
library(ggplot2)
library(purrr)

The ipl package aims at supplying data and functions for analysis of data on the Indian Premier League (IPL). It includes datasets on individual balls played in IPL games between 2008 and 2020, on the games themselves, and on the top 100 IPL batsmen and the top 100 IPL bowlers across time. Furthermore, the package includes simple functions on analysis of individual batsmen and bowlers, and for combined analysis of IPL matches.

Motivation

Cricket is a globally played sport, which extensively uses statistics for the analysis of games and individual players. Moreover, the IPL is a game that brings players from across the globe together, thus providing a unique perspective on how different cricketers play together. However, data providing information on these games and players, or functions which allow for statistical analyses of the same are not widely accessible. Right now, there is no consolidated place for all this data on the IPL. Thus, we wanted to create a package that makes this analysis of cricketers easier, since users do not have to worry about data wrangling, missing data, or important cricket statistics, and can, instead, begin their analyses on IPL.

Who should use this package?

Anyone interested in cricket, and specifically the Indian Premier League, who wants to use actual data and functions to think critically about the game, conduct analyses, and make inferences should use this package.

Datasets Included

Functions Included

for example:

# The batting average of Virat Kohli and MS Dhoni in the years 2016 and 2017 can be computed by
bat_avg(c("V Kohli", "MS Dhoni"), 2016:2017)

# The maximum runs made by MS Dhoni in the year 2017 can be computed by
bat_max("MS Dhoni", 2017)

# The strike rate of AB de Villiers in the year 2015 can be computed by
strike_rate("AB de Villiers", 2015)

# The total distribution of  the toss choice of Royal Challengers Bangalore can be computed using
toss_choice("Royal Challengers Bangalore")

# The list of bowlers with a bowling average above 40 can be retrieved using
bowler_score(40)

# The batting summary of Virate Kohli can be accessed by
batsman_summary("V Kohli")

# The runs made by each batting partnership by Delhi Daredevils in their match against Rajasthan Royals in 2008 can be found by
partnership_runs(335984, "Delhi Daredevils")

# The bowling analysis of Harbhajan Singh can be accessed by
bowling_analysis("Harbhajan Singh")

# The summary table of the wins and losses of Kolkata Knight Riders in 2012 can be found by
winloss("Kolkata Knight Riders", 2012)

More details can be found on the documentation pages which can be called via: ?function_name

What does the data look like

The first six rows of the deliveries data set looks like

head(deliveries)

The first six rows of the teams data set looks like

head(teams)

The first six rows of the bowlers_100 data set looks like

head(bowlers_100)

What can we do with this data?

We can use this package to address the (non-exhaustive) list of questions:

  1. Compare the batting averages of batsmen over time using a data visualization
  2. Compare strike rate and maximum runs of batsmen using a data visualization
  3. Run a regression to predict strike rate based on different batting statistics
  4. The bowling analysis of different bowlers can be compared

Example 1

bat_avg(c("KL Rahul", "RG Sharma", "V Kohli"), 2013:2020) %>% 
  ggplot(aes(year, batting_avg, color = batsman)) +
  geom_line() +
  labs(
    title = "Batting average between 2013 and 2020",
    x = "Year",
    y = "Batting Average",
    color = "Batsman"
  ) +
  theme_linedraw()

Example 2

my_df <- map_df(c("MS Dhoni", "V Kohli", "AB de Villiers"), batsman_summary)
ggplot(aes(max_runs, strike_rate), data = my_df) +
  geom_point() +
  labs(
    x = "Maximum Runs",
    y = "Strike Rate",
    title = "Strike Rate and Maximum Runs"
  ) +
  theme_linedraw()

Example 3

my_df2 <- my_df[, -1]
my_lm <- lm(strike_rate ~ ., data = my_df2)

summary(my_lm)

Note that this example is not accurate since it contains a selection bias as only three players are used in the data frame. To return a regression with accurate results, data on all the batsmen should be used.

Example 4

map_df(c("Rahul Sharma", "Harbhajan Singh"), bowling_analysis)


Swaha294/ipl documentation built on May 10, 2022, 3:23 p.m.