knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(regcensusAPI)

Introduction

RegDataAPI is a client that connects to the RegData regulatory restrictions data by the Mercatus Center at George Mason University. RegData uses machine learning algorithms to quantify the number of regulatory restrictions in a jurisdiction. Currently, RegData is available for three countries - Australia, Canada, and the United States. In addition, there are regulatory restrictions data for jurisdictions (provinces in Canada and states in Australia and US) within these countries. You can find out more about RegData from http://www.quantgov.org.

This API client connects to the api located at at the QuantGov website. More advanced users who want to interact with the API directly can use the link above to pull data from the RegData API.

Structure of the API

The API organizes data around topics, which are then divided into series. Within each series, are values. Values are available by three sub-groups: agency, industry, and occupation. Presently, there are no series with occupation subgroup. However, these are available for future use. Topics broadly define the data available. For example, RegData for regulatory restrictions is falls under the broad topic "Regulatory Restrictions." Within Regulatory Restrictions topic, there are a number of series available. These include Total Restrictions, Total Wordcount, Total "Shall," etc.

A fundamental concept in RegData is the "document." In RegData, a set of documents represents a body of regulations for which we have produced regulatory restriction counts. For example, to produce restrictions data the US Federal government, RegData uses the Code of Federal Regulations (CFR) as the source documents. Within the CFR, RegData identifies a unit of regulation as the title-part combination. The CFR is organized into 50 titles, and within each title are parts, which could have subparts, but not always. Under the parts are sections. Determining this unit of analyses is critical for the context of the data produced by RegData. Producing regulatory restriction data for US states follows the same strategy but uses the state-specific regulatory code.

There are three helper functions to quickly view the list of topics, series available. To see the list of topics, series, agencies, jurisdictions, and finally the list of series and the years available for each. The list functions begin with list. For example, to view the list of topics call list_topics(). When an topic id parameter is supplied, the function returns the details about a specific topic.

#list_topics()

Each topic comprises one or more series. The list_series() function returns the list of all series when no series id is provided and the Under each topic are series. You can get the list of all series by calling the helper function list_series().

#list_series(id = 91)

You can request the series under a particular topic by passing the topic id as paramter to the list_series() function.

#list_series(by='all')

The list_series() function takes two parameters: the series id and the subset of data - one of "all", "jurisdiction", "topic", "agency", or "industry". When any of the subset options must be used with the id, where the id represents an id within the subset. For example, list_series(id=1,by='agencies') will return the list of series for agency with ID 1.

There are other helper functions that give you a tour around RegData. To see the jurisdictions with data in RegData, call list_jurisdiction(). This function returns the complete list in a list format.

#list_jurisdictions(jurisdictionID = 38)

Finally, the list_seriesyear() function returns a list of all seriesa nd the years with data available. The output from this function can serve as a reference for the valid values that can be passed to parameters in the get_values() function. The number of records returned is the unique combination of series and jurisdictions that are available in RegData.The function takes the optional argument jurisdiction id.

#list_seriesyear(jurisdictionID = 38)

Metadata

The get functions return the details about RegData metadata. These metadata are not included in the get_values() functions that will be described later.

Jurisdictions

Use the get_jurisdiction() function to return a data frame with all the jurisdictions. When you supply the jurisdiction ID parameter, the function returns the details of just that jurisdiction. Use the output from the get_jurisdiction() function to merge with data from the get_values() function.

get_jurisdictions(38)

Agencies

The get_agencies() function returns a data frame of all agencies with data in RegData. If an ID is supplied, the data frame returns the details about a single agency specified by the id. The data frame includes characteristics of the agencies. Currently, agency data are only available for federal RegData.

#get_agencies()

Use the value of the agency_id field when pulling values with the get_values() function.

Industries

The get_industries() function returns a data frame of industries with data in the API. Presently the only classification system available is the North American Industry Classification System (NAICS). NAICS is used for both countries in North America and Australia, even the latter uses the Australia and New Zealand Standard Industrial Classification (ANZSIC) system. Presently, industry regulations for Australia are based on the NAICS. RegData expands to other countries, the industry codes will be country specific as well as contain mapping to the Standard Industry Codes (SIC) system.

#get_industries(38)

Values

The get_values() function is the primary function for obtaining RegData. The function takes the following parameters (required fields are in bold and optional parameters are italicized):

  • _list of jurisdiction ids, using the 'c' function, for example, c(1,2)
  • list of series ids
  • list of years__
  • list of agencies
  • list of industries
  • dateIsRange - specify if the list of years provided for the parameter years is a range. Default is TRUE
  • In the example below, we are interested in the total number of restrictions and total number of words (get_topics(1)) for the US (get_jurisdictions(38)) for all years between 2010 and 2018.

    vals <-get_values(jurisdiction = 38, series = c(1,2), date = c(2010,2018))

    Values by Subgroup

    You can obtain data for any of the three subgroups for each series - agencies, industries, and occupations (when they become available).

    Values by Agencies

    To obtain the restrictions for a specific agency (or agencies), the series id supplied must be in the list of available series by agency. To recap, the list of available series for an agency is available via the list_series(id,by='agency') function, and the list of agencies with data is available via get_agencies() function.

    #Identify all agencies
    #get_agencies()
    
    #Find the available series for this agency, suppose agency_ids 81 and 84
    #list_series(id=91, by='agencies')
    
    ##Call the get_values() for this agency and series 91 and 92
    #get_values(jurisdiction = 38, series = c(91,92), date = c(1990,2018), agency = c(81,84))
    

    Values by Agency and Industry

    Some agency series may also have data by industry. For example, under the Total Restrictions topic, RegData includes the industry-relevant restrictions, which estimates the number of restrictions that apply to a given industry. These are available in both the main series - Total Restrictions, and the sub-group Restrictions by Agency.

    To pull industry-relevant restrictions for an agency, call get_agencies() with the industry variable. The industry variable is of type string, and valid values include the industry codes specified in the classification system obtained by calling the get_industries(jurisdiction) function.

    In the example below, the series 92 (Restrictions by Agency and Industry), we can request data for the two industries 111 and 33 by the following code snippet.

    #get_values(jurisdiction = 38, series = c(92), date = c(1990,2000), industry = c('111','33'), agency = 66)
    

    Merging with Metadata

    Once you pull the values by get_values, you can use various R packages to include the metadata.

    Suppose we want to attach the agency names and other agency characteristics to the data from the last code snippet. First be sure to pull the list of agencies into a separate data frame. Then merge with the values data frame. The key for matching the data will be the agency_id column.

    Using the dplyr package, we can merge the agency data with the values data as in the code snippet below. This code function assumes you have loaded the {r, results='asis'}

    require(dplyr) require(pander) agencies <- get_agencies() agency_by_industry <- get_values(jurisdiction = 38, series = c(92), date = c(1990,2000), industry = c('111','33'), agency = c(66,111))

    agency_restrictions_ind <- agency_by_industry %>%

    dplyr::left_join(agencies, by=c('agency_agency_id'='agency_id'))

    pander::pandoc.table(agency_restrictions_ind)



    QuantGov/regcensus-api-client-R documentation built on Jan. 4, 2022, 12:02 a.m.