As of this writing the 2019 NCAA® women's division I Final Four® tournament is about to begin. There's a high degree of interest here in Oregon, because the Oregon Ducks are competing against Baylor, Notre Dame and UConn for the championship. So I've wrangled some data and come up with an archetypal analysis.
Archetypal analysis [@Eugster2012] is a statistical technique for analyzing athletes' performance. It operates as follows:
We create the input data as follows:
library(dplyr) # get the pipe operator # the minutes played are given in the form "800:45" .parse_minutes <- function(text_minutes) { items <- stringr::str_split_fixed(text_minutes, ":", -1) return(as.numeric(items[, 1]) + as.numeric(items[, 2]) / 60.0) } # the height is given in the form "6-1" .parse_height <- function(text_height) { items <- stringr::str_split_fixed(text_height, "-", -1) return(as.numeric(items[, 1]) + as.numeric(items[, 2]) / 12.0) } # raw data - just filter out some NAs raw <- readr::read_delim( "~/Downloads/division_1_womens.tsv", "\t", escape_double = FALSE, trim_ws = TRUE ) %>% dplyr::filter( !is.na(minutes_played), !is.na(games_played), !is.na(class_year), !is.na(position) )
# clean - some fields have NA where there really should be a zero cleaned <- raw %>% tidyr::replace_na(list( field_goals_made = 0, three_point_field_goals = 0, free_throws = 0, total_rebounds = 0, assists = 0, turnovers = 0, steals = 0, blocks = 0 )) %>% dplyr::mutate( two_point_field_goals = field_goals_made - three_point_field_goals, player_height_ft = .parse_height(height), total_minutes = .parse_minutes(minutes_played) ) %>% dplyr::select( player_name, team_name, class_year, position, player_height = height, player_height_ft, games_played, total_rebounds, total_minutes, two_point_field_goals, three_point_field_goals, free_throws, assists, turnovers, steals, blocks ) # there are duplicate player names in this data set, so we add the team name # in parentheses cleaned$player_name <- paste0(cleaned$player_name, " (", cleaned$team_name, ")")
Normally we would search for the number of archetypes to use, typically three to seven for basketball. However, for simplicity we will use the default, three. This has some advantages in interpretation and visualization:
We use the dfstools
library package [@Borasky2019] to do the calculations.
player_totals <- cleaned %>% dplyr::select(player_name, total_rebounds:blocks) player_labels <- cleaned %>% dplyr::select(player_name:position, player_height_ft) archetype_models <- dfstools::compute_archetypes(player_totals, player_labels) player_alphas <- archetype_models[["player_alphas"]] %>% dplyr::arrange(Bench) player_alphas[, 6:8] <- round(player_alphas[, 6:8], digits = 4) DT::datatable(player_alphas)
Notes:
I've broken out the teams in the Final Four for exploration below.
Baylor <- player_alphas %>% dplyr::filter(team_name == "Baylor") DT::datatable(Baylor)
NotreDame <- player_alphas %>% dplyr::filter(team_name == "Notre Dame") DT::datatable(NotreDame)
UConn <- player_alphas %>% dplyr::filter(team_name == "UConn") DT::datatable(UConn)
Oregon <- player_alphas %>% dplyr::filter(team_name == "Oregon") DT::datatable(Oregon)
To wrap up, let's look at the totals of archetypal ratings for the teams.
column_sums <- dplyr::bind_rows( colSums(Baylor[, 6:7]), colSums(NotreDame[, 6:7]), colSums(UConn[, 6:7]), colSums(Oregon[, 6:7]) ) column_sums <- dplyr::bind_cols( tibble::enframe(c("Baylor", "Notre Dame", "UConn", "Oregon"), name = NULL, value = "Team"), column_sums ) column_sums$Total <- (column_sums[, 2] + column_sums[, 3]) %>% tibble::deframe() column_sums <- column_sums %>% arrange(desc(Total)) DT::datatable(column_sums)
What this says is that Baylor has the equivalent of 1.866 Ciera Dillards and 3.286 Teaira McCowans, etc. The totals give the overall strength of the teams. It appears that Notre Dame is strongest overall, with Baylor being best at the rim and Oregon being best in three-point shooting.
Of course, coaching and strategy can even things up, and three-point shooting tends to add more value than rim protection in modern basketball. This promises to be an exciting tournament. And #GoDucks!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.