This vignette is an introduction to the data and capabilities in nwslR.

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(nwslR)
library(tidyverse)
library(teamcolors)

About nwslR

nwslR is an R package that contains datasets and analysis functionality for the National Women's Soccer League (NWSL). Founded in 2013, the NWSL is the United States' top professional women's soccer league, featuring players from all over the world.

Previously, data regarding the league has been disparate and often difficult to find. The goal of this package is to make it easier for fans and analysts to access and engage with all these data in one place.

About the Datasets and Functionality

The data housed in this package can be sorted into two categories: ID tables, statistics, and functionality. While much of the data is housed directly in nwslR itself, there is additional data that can be scraped and analyzed using nwslR's built in functionality. By joining the ID tables with statistics, we can join many datasets/functionality together for analysis. The datasets are as follows:

ID tables

Statistical Tables

Functionality

In order make full use of nwslR's capabilities, it's necessary to use the ID and Statistical tables in tandem. Here are two sample analyses:

Analysis One: Built in Datasets

We want to understand the goalscoring capabilities of the past five Rookie of the Year award recipients: Danielle Colaprico (CRS), Raquel Rodríguez (NJ), Ashley Hatch (NCC), Imani Dorsey (NJ), Bethany Balcer (SEA).

First, we want to join the award table with the player table:

rookie_winners <- award %>%
  filter(
    award == "Rookie of the Year",
    season >= 2015
  )

rookie_winners <- left_join(rookie_winners, player, by = "person_id")

rookie_winners

We can see from the person_id that all of these athletes are field players (person_id is below 10000).

Next, we want to join these person_ids to their statistics by year.

rookie_stats <- rookie_winners %>%
  left_join(fieldplayer_overall_season_stats,
    by = c("person_id", "season")
  )

rookie_stats <- rookie_stats %>%
  select(player, season, team_id, gls)

rookie_stats

First, we need to join our team_id column to the franchise dataset, so ensure functionality with the teamcolors dataset for visualization. This dataset uses the full team name rather than an ID. We change the name of the team to work with teamcolors, which uses the most recent team names/colors. Since Reign has changed names since 2019, the name in teamcolors has been updated.

rookie_team <- left_join(
  rookie_stats, franchise, 
  by = c("team_id", "season")
) %>%
  mutate(team_name = if_else(team_name == "Reign FC", "OL Reign", team_name))

Finally, we want to visualize this.

ggplot(rookie_team, aes(x = reorder(player, season), 
                        y = gls, fill = team_name)) +
  geom_bar(stat = "identity") +
  scale_fill_teams(2) +
  geom_text(aes(label = season), 
            position = position_dodge(width = 0.9), 
            vjust = -0.25) +
  labs(
    x = "Player",
    y = "Goals Scored",
    title = "Number of Goals Scored by Rookie of the Year",
    fill = "Team"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

As we can see, Ashley Hatch scored the most goals in her rookie season of all Rookie of the Year winners.

Analysis 2: Utilizing the Scraper

We're curious to know how many total passes the Portland Thorns and North Carolina Courage completed in each game of their 2017 championship seasons.

First, we need to pull the data using the get_adv_team_stats function.

#all games played by either team in 2017, but games played in 9/3/2017 do not have available statistics
games_2017 <- game %>%
  filter(season == 2017,
         home_team %in% c("POR", "NC") | away_team %in% c("POR", "NC"),
         game_id != "chicago-red-stars-vs-north-carolina-courage-2017-09-03") 

stats_2017 <- map_df(games_2017$game_id, get_adv_team_stats)

stats_2017 <- stats_2017 %>%
  filter(team_id %in% c("POR", "NC")) %>%
  select(game_id, status, team_id, total_pass)

stats_2017_join <- left_join(stats_2017, game, by = "game_id") 

To ensure functionality with teamcolors, we now join to our franchise dataset.

stats_name <- left_join(stats_2017_join, franchise, 
                        by = c("team_id", "season"))

Now, we visualize this information

ggplot(stats_name, aes(x = game_date, y = total_pass, 
                       group = team_id, color = team_name)) +
  geom_line() +
  scale_color_teams(1) +
  scale_x_date(date_breaks = "2 weeks") +
  labs(
    x = "Date of Game",
    y = "Total Passes",
    title = "Total Number of Passes in Each Game",
    color = "Team"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

This graph shows that Portland generally had more passes in each game than North Carolina in 2017.



adror1/nwslR documentation built on Oct. 4, 2022, 3:06 a.m.