knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

naicsmatch

The goal of naicsmatch is to provide functions and crosswalks from and between NAICS sector codes

The North American Industry Classification system is the standard used by Federal statistical agencies in classifying businesses for the purposes of data collection, analysis, and publishing statistical data. see here

The NAICS code classification system is regularly updated, and so translating between different iterations can be necessary to do comparisons between different data sources. This package provides some functions to automate some of these translations using concordances downloaded from the U.S. Census and other agencies and data sources.

References:

Installation

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("jameelalsalam/naicsmatch")

Example

This package provides concordances between different vintages of NAICS codes, and functions to facilitate converting data from one NAICS categorization to another.

library(tidyverse)
library(naicsmatch)

sum(ex_asm09$vos09)

ex_asm09
## basic example code

A sample of data from the 2009 Annual Survey of Manufacturers is included as ex_asm09. This data was collected and published using the 2007 NAICS classification codes.

The included data set naics_2007_2012 is a concordance between the 2007 and 2012 classifications:

library(tidyverse)

naics_2007_2012
asm09_2012 <- ex_asm09 %>% left_join(naics_2007_2012, by = "naics_2007")

sum(asm09_2012$vos09)

asm09_2012

Weighting by 1 / N

naics_2007_2012_wgt <- naics_2007_2012 %>%
  group_by(naics_2007) %>%
  summarize(wgt = 1 / n())

filter(naics_2007_2012_wgt, wgt != 1)
asm09_2012_v2 <- asm09_2012 %>%
  left_join(naics_2007_2012_wgt, 
            by = c("naics_2007")) %>%
  mutate(vos09 = vos09 * wgt)

sum(asm09_2012_v2$vos09)

asm09_2012_v2
asm09_2012_v2 %>%
  naics_sankey() +
  labs(
    title = "Allocating Split Categories by 1 / N"
  )

Aggregation

asm09_2012_v3 <- asm09_2012_v2 %>%
  group_by(naics_2012) %>%
  summarize(vos09 = sum(vos09, na.rm = TRUE))

sum(asm09_2012_v3$vos09)

asm09_2012_v3

Weighting by Data

A common application is to calculate weights from one dataset and apply them to another dataset, when putting them both into common NAICS classifications.

#ex_asm09
#ex_asm15

naics_2007_2012_datawgt <- naics_2007_2012 %>%
  left_join(ex_asm15, by = "naics_2012") %>%

  # TODO: weighting only works for positive values?
  filter(!is.na(vos15), vos15 > 0) %>%

  group_by(naics_2007, naics_2012) %>%
  summarize(vos15 = sum(vos15, na.rm = TRUE)) %>%
  group_by(naics_2007) %>%

  mutate(wgt = vos15 / sum(vos15, na.rm = TRUE))

naics_2007_2012_datawgt
asm09_2012_v4 <- asm09_2012 %>%
  left_join(naics_2007_2012_datawgt, 
            by = c("naics_2007", "naics_2012")) %>%
  mutate(vos09 = vos09 * wgt)

sum(asm09_2012_v4$vos09)

asm09_2012_v4
asm09_2012_v4 %>%
  mutate(x = "naics_2012") %>%
  ggplot(aes(x = x, y = vos15, fill = naics_2012)) +
  geom_col()

asm09_2012_v4 %>%
  naics_sankey() +
  labs(
    title = "Allocating Split Categories by Second Dataset"
  )

Application and Check

sum(asm09_2012_v4$vos09)


jameelalsalam/naicsmatch documentation built on April 4, 2020, 11:33 a.m.