knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The goal of naicsmatch is to provide functions and crosswalks from and between NAICS sector codes
The North American Industry Classification system is the standard used by Federal statistical agencies in classifying businesses for the purposes of data collection, analysis, and publishing statistical data. see here
The NAICS code classification system is regularly updated, and so translating between different iterations can be necessary to do comparisons between different data sources. This package provides some functions to automate some of these translations using concordances downloaded from the U.S. Census and other agencies and data sources.
References:
You can install the development version from GitHub with:
# install.packages("devtools") devtools::install_github("jameelalsalam/naicsmatch")
This package provides concordances between different vintages of NAICS codes, and functions to facilitate converting data from one NAICS categorization to another.
library(tidyverse) library(naicsmatch) sum(ex_asm09$vos09) ex_asm09 ## basic example code
A sample of data from the 2009 Annual Survey of Manufacturers is included as ex_asm09
. This data was collected and published using the 2007 NAICS classification codes.
The included data set naics_2007_2012
is a concordance between the 2007 and 2012 classifications:
library(tidyverse)
naics_2007_2012
asm09_2012 <- ex_asm09 %>% left_join(naics_2007_2012, by = "naics_2007") sum(asm09_2012$vos09) asm09_2012
naics_2007_2012_wgt <- naics_2007_2012 %>% group_by(naics_2007) %>% summarize(wgt = 1 / n()) filter(naics_2007_2012_wgt, wgt != 1)
asm09_2012_v2 <- asm09_2012 %>% left_join(naics_2007_2012_wgt, by = c("naics_2007")) %>% mutate(vos09 = vos09 * wgt) sum(asm09_2012_v2$vos09) asm09_2012_v2
asm09_2012_v2 %>% naics_sankey() + labs( title = "Allocating Split Categories by 1 / N" )
asm09_2012_v3 <- asm09_2012_v2 %>% group_by(naics_2012) %>% summarize(vos09 = sum(vos09, na.rm = TRUE)) sum(asm09_2012_v3$vos09) asm09_2012_v3
A common application is to calculate weights from one dataset and apply them to another dataset, when putting them both into common NAICS classifications.
#ex_asm09 #ex_asm15 naics_2007_2012_datawgt <- naics_2007_2012 %>% left_join(ex_asm15, by = "naics_2012") %>% # TODO: weighting only works for positive values? filter(!is.na(vos15), vos15 > 0) %>% group_by(naics_2007, naics_2012) %>% summarize(vos15 = sum(vos15, na.rm = TRUE)) %>% group_by(naics_2007) %>% mutate(wgt = vos15 / sum(vos15, na.rm = TRUE)) naics_2007_2012_datawgt
asm09_2012_v4 <- asm09_2012 %>% left_join(naics_2007_2012_datawgt, by = c("naics_2007", "naics_2012")) %>% mutate(vos09 = vos09 * wgt) sum(asm09_2012_v4$vos09) asm09_2012_v4
asm09_2012_v4 %>% mutate(x = "naics_2012") %>% ggplot(aes(x = x, y = vos15, fill = naics_2012)) + geom_col() asm09_2012_v4 %>% naics_sankey() + labs( title = "Allocating Split Categories by Second Dataset" )
sum(asm09_2012_v4$vos09)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.