This example will focus on looking at SEC basketball players for 2017-2018. Data is accquired using the player_stats
function found in collegeballR
library(dplyr) library(ggplot2) library(ggrepel) library(data.table) library(collegeballR) library(rvest) sec_urls <- "http://stats.ncaa.org/team/inst_team_list?academic_year=2018&conf_id=911&division=1&sport_code=MBB" team_names <- read_html(sec_urls) %>% html_nodes(".css-panes a") %>% html_text() team_names <- trimws(team_names) teams <- team_mapping(2018,"MBB") sec <- teams %>% filter(team_name %in% team_names) %>% mutate( team_id = as.numeric(team_id) ) head(sec) sec_player_stats <- sec %>% mutate( team_id = as.numeric(team_id), player_df = purrr::map(team_id,player_stats,year=2018,sport="MBB") )
I'll look to incorporate a conference type function, just have to scale it out to all sports easily.
So there is an example above of using the player_stats
function within purrr to get data for all SEC teams in basketball.
Let's bring all the data into one dataframe for all teams.
sec_player_l <- sec_player_stats %>% select(player_df) %>% pull() sec_player_df <- rbindlist(sec_player_l,fill=T) colnames(sec_player_df)[c(19,23)] <- c("PPG","RPG") sec_player_df <- sec_player_df %>% inner_join(.,sec) %>% select(-G,-team_id)
Note to self, look at all sport player sites and see if Average is repeated more than once. It seems for basketball they are using Avg right after point and rebound data. It's sort of odd, I'll try and find a way to fix this.
sec_player_df <- sec_player_df %>% mutate( MP = as.numeric(gsub("\\:.*","",MP)), ) sec_player_df[,c("FGM","FGA","DRebs","Tot Reb")] <- sapply(sec_player_df[,c("FGM","FGA","DRebs","Tot Reb")],as.numeric) head(sec_player_df)
But for now the data is cleaned and team names are added back. Depending on the sport you select the data will look like this, with differing stats recorded. From this you can move on to creating advanced metrics, etc.
Let's calculate usage rate. First we need to calculate some team based stats, before calculating USG rate.
sec_p_df <- sec_player_df %>% group_by(team_name) %>% mutate( team_minutes = sum(MP,na.rm=T), team_FGA = sum(FGA,na.rm=T), team_FTA = sum(FTA,na.rm=T), team_TO = sum(TO,na.rm=T) ) %>% ungroup() %>% as.data.frame() sec_p_df[is.na(sec_p_df)] <- 0 sec_p_df <- sec_p_df %>% mutate( usg_rate = 100 * ((FGA + 0.475 * FTA + TO) * (team_minutes/5)) / (MP * (team_FGA + 0.475 * team_FTA + team_TO)) )
ind_df <- sec_p_df %>% filter(MP>100) labels <- ind_df %>% filter(usg_rate>25 & MP > 750) ggplot() + geom_point(data = ind_df, aes(y = MP, x = usg_rate, color = team_name)) + theme_bw(base_size=16) + geom_text_repel(data = labels, aes(x = usg_rate, y = MP, label = Player)) + facet_wrap(~team_name) + labs(x = "Usage Rate", y = "Minutes", caption = "@msubbaiah1", title = "SEC Basketball Usage Rate by team", subtitle = "2017 - 2018 season") + theme(legend.position = "none")
There is the basics of getting player data from collegeballR!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.