Preparation of license data is similar to the requirements for the National/Regional Dashboard, with some additional complexity described below.
When data comes in for a new state, create template files using the R console:
# for initializing new template code:
lictemplate::new_project_individual("YY", "2019-q4")
## A new individual state dashboard has been initialized:
## E:/SA/Projects/Data-Dashboards/YY/2019-q4
# altenatively, for updating code from a previous period:
lictemplate::update_project("YY", "2020-q2", "2019-q4")
## An updated project has been initialized:
## E:/SA/Projects/Data-Dashboards/YY/2020-q2
Backup existing sqlite databases that will be overwritten:
lictemplate::data_backup("YY")
Open the new Rstudio project and setup the project package library:
renv::restore()
The workflow generally matches that of the national/regional dashboards, but with some additional requirements:
Individual state dashboards have been created since 2016, and the workflow has evolved over that time. For 2019-q4, I recommend rewriting dashboard code to reflect the current lictemplate workflow and following some conventions for organizing/documenting:
E:/SA/Projects/Data-Dashboads/[state]/README.txt
) to keep track of information useful when updating the workflow for new time periods../1-prep-license-data/latex_documentation/documentation.pdf
The individual dashboards use state-defined license years based on effective dates (unlike strict calendar-year sale dates as is done for the national/regional dashboards). These sometimes (but not always) follow calendar years, and some percentage of sales for a given license year will fall outside the relevant dates. As a consequence the "month" variable is defined based on an unbounded integer range: ..., -1 (previous Nov), 0 (previous Dec), 1 (Jan), 2 (Feb), ..., 12 (Dec), 13 (Jan), 14 (Feb), ...
Any particular state will have its own set of reasonable month ranges. For example:
Data should be structured in a set of sqlite databases (similar to national/regional but with some additions):
The individual dashboards include some additional data columns:
cust$county_fips
(produced using geocoding)cust$zip4dp
(temporary for checking for customer duplicates, produced using geocoding)lic$priv
for defining privilege permissions (deer hunting, trout fishing, etc.)lic$subtype
for permissions that represent a subset of customers holding a given privilege (e.g., spousal fishing, which tracks anglers who purchase a spouse license type).The sale$month
is modified to allow additional values: ..., -1, 0, 1, 2, ...
The national/regional workflow creates license history tables only as a temporary datasets in producing dashbord summaries. For individual state dashboards, these history tables are stored in a database, where each table corresponds to a given privilege (hunters, anglers, deer hunters, etc.).
A permission
table is written as part of building license histories for each permission. It is a convenience table that makes it easy to pull all license types (or sales) that contribute to a given permission. Example:
Customer county and delivery point (zip4dp) can be identified by geocoding customer addresses (details below). Note that this is a fairly time-consuming process; a much quicker (although potentially less accurate) approach is to use a zip-to-county crosswalk stored in E:/SA/Data-production/Data-Dashboards/_Shared/census.sqlite3
although this won't provide a zip4dp to aid in deduplication.
You can reference WI-2019-q4 for an example, where an R script is used to create the builk mailer import file: .code/1-prep-license-data/03-pull-geocode.R
Go to Address Quality >> Address Correction. Running through this process may take a while (e.g., overnight). A file larger than 1.5 million rows or so may need to be split into multiple files.
Run Export Wizard (Dashboard_county_append), saving three variables ("ZIP+4 + DPC", "County FIPS", "cust_id") in a text file.
Privilege/subtype permissions shouldn't need to be updated for existing states. You can access the logic that was used to identify these from the production database (license.sqlite3) in lic$priv
(for privileges) and lic$subtype
(for subtypes). License histories for permissions are stored in history.sqlite3.
# example: count WI deer hunters from 2015 to 2018
library(tidyverse)
library(DBI)
con <- dbConnect(RSQLite::SQLite(), "E:/SA/Data-production/Data-Dashboards/WI/history.sqlite3")
deer <- tbl(con, "deer") %>% filter(year > 2015, year <= 2018) %>% collect()
dbDisconnect(con)
count(deer, year)
## A tibble: 3 x 2
# year n
# <int> <int>
#1 2016 640935
#2 2017 632154
#3 2018 620079
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.