R/usa_facts_data.R

Defines functions usa_facts_data

Documented in usa_facts_data

#' United States state and county level data from USAFacts
#'
#' A single dataset from USAFacts that has state and county
#' data included.
#'
#' @details
#'
#' From the USAFacts website:
#'
#' Methodology: This interactive feature aggregates data from the
#' Centers for Disease Control and Prevention (CDC), state- and
#' local-level public health agencies. County-level data is confirmed
#' by referencing state and local agencies directly.
#'
#' The data for all states was last updated on March 27, 2020, at 7:00
#' AM Pacific/10:00 AM Eastern Time. We've noted below when we last
#' checked data from the states.
#' 
#' The 21 cases confirmed on the Grand Princess cruise ship on March 5
#' and 6 are attributed to the state of California, but not to any
#' counties. The national numbers also include the 45 people with
#' coronavirus repatriated from the Diamond Princess.
#' 
#' USAFacts attempts to match each case with a county, but some cases
#' counted at the state level are not allocated to counties due to
#' lack of information.
#' 
#' Because of the frequency with which we are currently updating this
#' data, they may not reflect the exact numbers reported state and
#' local government organizations or the news media. Numbers may also
#' fluctuate as agencies update their own data. At present, we are
#' working on ensuring that we can provide this data with the most
#' up-to-date information possible.
#'
#' 
#' @importFrom dplyr bind_rows
#' @importFrom lubridate mdy
#' @importFrom readr read_csv cols
#'
#' @source \url{https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/}
#'
#' @note Uses https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/ as data
#' source, then modifies column names and munges to long form table.
#' 
#' @return
#'
#' a tidy data.table with columns:
#'
#'   - fips
#'   - county
#'   - state (two-letter abbreviation)
#'   - subset: `deaths` or `confirmed`
#'   - date: observation date
#'   - count: case count for that date on that local
#'
#' @section Licensing:
#'
#' From the folks at USAFacts
#' 
#' Want to use USAFacts county-level COVID-19 data? Download it
#' here. The data is available under a Creative Commons license. We
#' simply request that you cite USAFacts as the data provider and link
#' back to this page. Don’t forget to share what you've created with
#' the USAFacts data. Please tag @usafacts on social media and use the
#' hashtag #MadewithUSAFacts. We'll reshare the posts with the
#' data-loving community.
#' 
#' @examples
#' res = usa_facts_data()
#' colnames(res)
#' head(res)
#' dplyr::glimpse(res)
#' summary(res)
#' 
#' # dataset inclusion as of this build
#' max(res$date)
#' 
#'
#' @family data-import
#' @family case-tracking
#'  
#' @export
usa_facts_data = function() {
    confirmed_path = s2p_cached_url('https://static.usafacts.org/public/data/covid-19/covid_confirmed_usafacts.csv')
    confirmed = data.table::fread(confirmed_path)
    # This converts all the integer cols to integer,
    # even in the case of parsing errors. Uses data.table
    # idioms.
    for(col in colnames(confirmed)[5:ncol(confirmed)]) {
        set(confirmed, j=col, value=as.integer(confirmed[[col]]))
    }
    confirmed$subset = 'confirmed'
    deaths_path = s2p_cached_url('https://static.usafacts.org/public/data/covid-19/covid_deaths_usafacts.csv')
    deaths = data.table::fread(deaths_path)
    # This converts all the integer cols to integer,
    # even in the case of parsing errors. Uses data.table
    # idioms.
    for(col in colnames(deaths)[5:ncol(deaths)]) {
        set(deaths, j=col, value=as.integer(deaths[[col]]))
    }
    deaths$subset = 'deaths'
    ret = dplyr::bind_rows(confirmed,deaths)
    colnames(ret)[2] = 'county'
    colnames(ret)[3] = 'state'
    colnames(ret)[1] = 'fips'
    ret = ret[-4]
    ret$fips = integer_to_fips(ret$fips)
    ret = tidyr::pivot_longer(ret,cols=-c("fips","state","county","subset"),names_to='date',values_to='count')
    ret$date = lubridate::mdy(ret$date,quiet = TRUE)
    ret
}
seandavi/sars2pack documentation built on May 13, 2022, 3:41 p.m.