github_vignettes/history-summary.md

License History & Summary Data

Once license data is stored in a standardized format, several functions can be applied in sequence to build the necessary summary data. The Southwick package dashtemplate (installable from the Data Server: E:/SA/Projects/R-Software/Southwick-packages/build_binary/dashtemplate_0.1.zip) includes the relevant functions. The workflow has been documented in a Github repo (dashboard-template) for sharing with state agency folks who choose to produce their own summary data. The dashtemplate package isn't installable from the Github repo because the corresponding DESCRIPTION, etc., aren't tracked there (i.e., only included on the Data Server).

Summary Data States

Some states will produce their own summary data rather than sending license data. The sadashreg package includes functions for validating and organizing the summary data that we receive from these states.

Workflow Documentation

Information about the corresponding data structures and workflow is included in the package salic documentation:

Summary Data Request Documentation

State agencies that are considering building their own summary data are sent an 8-page Word document that details the required data. The relevant sections from that document are copied below.

Overview & Scope

Dashboards require summarized data that focus on customer-level dynamics (as opposed to license summaries which typically look at license-level dynamics such as total sales, percent changes, etc.). The customer-level focus relies on a customer ID, which enables trending on metrics such as churn and recruitment.

Data Pull Frequency: Twice per Year

Summary datasets should be sent to Southwick Associates twice per year for the duration of the project; corresponding to full-year and mid-year dashboards.

Data Filter: Only customers aged 18-64 each year

To facilitate a consistent comparison across states, only customers aged 18-64 each year should be included in the summary results. For example, suppose a man aged 64 in 2015 buys a fishing license in both 2015 and 2016. He should be included in the 2015 summary statistics, but not for those in 2016. Note that youths and seniors are excluded because states vary in whether they issue licenses for these age groups. The corresponding age filter should be applied as a final step, so that recruitment/churn is still captured to the degree possible.

Metrics to Include

Four summary statistics (metrics) are to be calculated by state agencies:

Dimensions by which to Summarize

Summary statistics are to be presented across several dimensions:

Requested Format

A single table of summary data (stored in a .csv or similar tabular data format) is needed for input to each iteration of the Data Dashboard. Summary tables should include all requested years of angler/hunter summary statistics stored in 7 columns:

Example: Total fishing participation by year

Data Definitions

Metrics

Five metrics of interest are used to characterize participation dynamics. Definitions with examples are included below.

  1. Participants – The total number of unique sportspersons (anglers or hunters) who purchased (or carried over) a license/permit each year.

  2. Residents – The number of participants who are also state residents. This will be used by Southwick Associates for calculating participation rate.

  3. Recruits (New Participants) – The number of new customers who purchased a license; defined as participants who bought a license in a given year, but didn’t buy a license granting that permission in any of the previous 5 years. For example, a new fishing participant in 2015 may have bought a fishing license in 2009 (or earlier) but did not buy a fishing license anytime from 2010 to 2014.

  4. Churn Rate – This metric demonstrates annual turnover in fishing or hunting, and only applies to the full-year time period. Of the total number of customers that buy a license in a given year, churn is the percentage of those customers who fail to buy a license in the next year. For example, suppose 100,000 people held a fishing license in your state in 2017. Of these anglers, only 60,000 of them held a fishing license in the following year (2018). The churn rate for 2018 therefore equals 40,000 / 100,000 (40%).

  5. Participation Rate – The ratio of unique sportspersons (state residents only) to the state population for a given customer segment. Southwick Associates will calculate participation rate using state-supplied resident participant counts in conjunction with an in-house database of relevant US Census data (Southwick updates this database annually to reflect the most recent estimates).

Demographic Segments

Metrics are to be further summarized at the three demographic dimensions for specified categories:

Treatment of Multi-year & Lifetime Licenses

Treatment of Missing Data

It is expected that certain customer characteristics (e.g., gender, age, residency) will be unknown for a small percentage of records. When estimating results by demographic segment, the demographic percentages should be scaled to match totals. For example, say 100 thousand people buy fishing licenses where 69% are male, 30% are female, and 1% are gender unknown. The estimated number of males should equal 100,000 * 69 / (69+30), and females should equal 100,000 * 30 / (69+30). As long as the percentage of missing is small, than this is likely a reasonable estimate.



southwick-associates/salicprep documentation built on Oct. 6, 2020, 12:03 p.m.