options(width = 999) knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE, cache = FALSE, tidy = FALSE, size = "small")
In this vignette we provide examples of how to use entrydatar to download precise files from the QCEW.
In the second part we give rudimentary documentations of the feature that are of principal interest to work with the data.
For an overview of the QCEW, head over to the BLS website
The vignette is organized as:
To get read to download data, we load some libraries that the package might have forgotten to call (the package works with all tables in a data.table
format)
library(dplyr); library(data.table); library(entrydatar)
For example if we are interested in downloading aggregate level data we use the cut 10
. Please see documentation below for a definition of what is availabe and the mapping of the different cuts of QCEW.
dt_agg <- get_qcew_cut( data_cut = 10, year_start = 1990, year_end =1993, path_data = "~/Downloads/", write = F)
as_tibble(dt_agg) # A tibble: 16 x 42 area_fips own_code indus… agglv… size_… year qtr discl… qtrly… month… month… month… total_… taxa… qtrl… avg_… lq_d… lq_q… lq_m… lq_m… lq_m… lq_t… lq_t… lq_q… lq_a… oty_… oty_q… oty_… oty_mo… oty_m… oty_mo… oty_m… oty_mo… oty_m… oty_to… oty_t… oty_tax… oty_… oty_qt… oty_… oty_a… oty_… <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 US000 0 10.0 10.0 0 1990 1.00 "" 6.02e⁶ 1.06e⁸ 1.07e⁸ 1.08e⁸ 6.23e¹¹ 0 0 448 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 127332 2.20 2.09e⁶ 2.00 2.12e⁶ 2.00 2.03e⁶ 1.90 3.91e¹⁰ 6.70 -3.57e¹¹ -100 -7.56e⁹ -100 20.0 4.70 2 US000 0 10.0 10.0 0 1990 2.00 "" 6.04e⁶ 1.08e⁸ 1.09e⁸ 1.10e⁸ 6.29e¹¹ 0 0 443 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 154120 2.60 1.87e⁶ 1.80 2.12e⁶ 2.00 1.82e⁶ 1.70 3.67e¹⁰ 6.20 -1.82e¹¹ -100 -4.01e⁹ -100 19.0 4.50 3 US000 0 10.0 10.0 0 1990 3.00 "" 6.09e⁶ 1.08e⁸ 1.09e⁸ 1.10e⁸ 6.24e¹¹ 0 0 441 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 164809 2.80 1.68e⁶ 1.60 1.79e⁶ 1.70 1.34e⁶ 1.20 3.37e¹⁰ 5.70 -1.10e¹¹ -100 -2.47e⁹ -100 18.0 4.30 4 US000 0 10.0 10.0 0 1990 4.00 "" 6.12e⁶ 1.09e⁸ 1.09e⁸ 1.09e⁸ 6.88e¹¹ 0 0 483 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 149102 2.50 1.01e⁶ 0.900 6.69e⁵ 0.600 2.75e⁵ 0.300 3.90e¹⁰ 6.00 -8.46e¹⁰ -100 -1.85e⁹ -100 25.0 5.50 5 US000 0 10.0 10.0 0 1991 1.00 "" 6.30e⁶ 1.05e⁸ 1.05e⁸ 1.06e⁸ 6.37e¹¹ 0 0 465 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 284012 4.70 -1.03e⁶ -1.00 -1.51e⁶ -1.40 -1.77e⁶ -1.60 1.40e¹⁰ 2.20 0 0 0 0 17.0 3.80 6 US000 0 10.0 10.0 0 1991 2.00 "" 6.36e⁶ 1.06e⁸ 1.07e⁸ 1.08e⁸ 6.45e¹¹ 0 0 463 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 317123 5.30 -1.84e⁶ -1.70 -2.05e⁶ -1.90 -2.24e⁶ -2.00 1.64e¹⁰ 2.60 0 0 0 0 20.0 4.50 7 US000 0 10.0 10.0 0 1991 3.00 "" 6.39e⁶ 1.06e⁸ 1.07e⁸ 1.08e⁸ 6.41e¹¹ 0 0 462 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 303568 5.00 -2.13e⁶ -2.00 -1.93e⁶ -1.80 -2.03e⁶ -1.90 1.76e¹⁰ 2.80 0 0 0 0 21.0 4.80 8 US000 0 10.0 10.0 0 1991 4.00 "" 6.43e⁶ 1.08e⁸ 1.08e⁸ 1.08e⁸ 7.05e¹¹ 0 0 503 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 317664 5.20 -1.70e⁶ -1.60 -1.61e⁶ -1.50 -1.52e⁶ -1.40 1.72e¹⁰ 2.50 0 0 0 0 20.0 4.10 9 US000 0 10.0 10.0 0 1992 1.00 "" 6.48e⁶ 1.05e⁸ 1.05e⁸ 1.06e⁸ 6.64e¹¹ 0 0 486 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 176781 2.80 -5.04e⁵ -0.500 -1.48e⁵ -0.100 -1.06e⁵ -0.100 2.72e¹⁰ 4.30 0 0 0 0 21.0 4.50 10 US000 0 10.0 10.0 0 1992 2.00 "" 6.50e⁶ 1.07e⁸ 1.08e⁸ 1.09e⁸ 6.72e¹¹ 0 0 479 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 140624 2.20 5.39e⁵ 0.500 6.40e⁵ 0.600 6.08e⁵ 0.600 2.69e¹⁰ 4.20 0 0 0 0 16.0 3.50 11 US000 0 10.0 10.0 0 1992 3.00 "" 6.52e⁶ 1.07e⁸ 1.07e⁸ 1.08e⁸ 6.72e¹¹ 0 0 482 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 131248 2.10 7.06e⁵ 0.700 2.48e⁵ 0.200 6.40e⁵ 0.600 3.06e¹⁰ 4.80 0 0 0 0 20.0 4.30 12 US000 0 10.0 10.0 0 1992 4.00 "" 6.57e⁶ 1.09e⁸ 1.09e⁸ 1.09e⁸ 7.79e¹¹ 0 0 550 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 134740 2.10 1.09e⁶ 1.00 1.05e⁶ 1.00 1.22e⁶ 1.10 7.36e¹⁰ 10.4 0 0 0 0 47.0 9.30 13 US000 0 10.0 10.0 0 1993 1.00 "" 6.62e⁶ 1.06e⁸ 1.07e⁸ 1.07e⁸ 6.67e¹¹ 0 0 480 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 138525 2.10 1.43e⁶ 1.40 1.70e⁶ 1.60 1.69e⁶ 1.60 2.54e ⁹ 0.400 0 0 0 0 - 6.00 -1.20 14 US000 0 10.0 10.0 0 1993 2.00 "" 6.65e⁶ 1.09e⁸ 1.10e⁸ 1.11e⁸ 7.04e¹¹ 0 0 494 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 155234 2.40 1.85e⁶ 1.70 1.75e⁶ 1.60 1.98e⁶ 1.80 3.19e¹⁰ 4.70 0 0 0 0 15.0 3.10 15 US000 0 10.0 10.0 0 1993 3.00 "" 6.70e⁶ 1.09e⁸ 1.09e⁸ 1.11e⁸ 7.11e¹¹ 0 0 499 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 176159 2.70 2.15e⁶ 2.00 2.31e⁶ 2.20 2.35e⁶ 2.20 3.86e¹⁰ 5.70 0 0 0 0 17.0 3.50 16 US000 0 10.0 10.0 0 1993 4.00 "" 6.75e⁶ 1.11e⁸ 1.11e⁸ 1.12e⁸ 8.06e¹¹ 0 0 556 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 176528 2.70 2.42e⁶ 2.20 2.54e⁶ 2.30 2.80e⁶ 2.60 2.76e¹⁰ 3.50 0 0 0 0 6.00 1.10
Note that the dataset can be large. Downloading most of the industry 3 and 4 digits cuts across counties and MSAs ends up with 57mn rows. Saving it in .rds
formats takes around 2gb of memory.
On the other hand for something more precise, say nationally by size cuts at the 6 digits industry level we would call the cut 28
:
dt_naics <- get_qcew_cut( data_cut = 28, industry = "naics", year_start = 1990, year_end = 2015, path_data = "~/Downloads/", write = F)
as_tibble(dt_naics) # A tibble: 208,667 x 47 area… own_… indus… aggl… size… year qtr disc… area… own_… indu… aggl… size… qtrly… month… month… month… total… taxa… qtrl… avg_… lq_d… lq_q… lq_m… lq_m… lq_m… lq_t… lq_t… lq_q… lq_a… oty_… oty_… oty_… oty_… oty_… oty_… oty_… oty_… oty_… oty_… oty_… oty_… oty_… oty_… oty_… oty_… oty_… <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 US000 5.00 111110 28.0 1.00 1990 1.00 NA NA NA NA NA NA 2.16e² 354 353 371 1.57e⁶ 0 0 337 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 US000 5.00 111110 28.0 2.00 1990 1.00 NA NA NA NA NA NA 9.80e¹ 547 577 640 1.95e⁶ 0 0 255 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 US000 5.00 111110 28.0 3.00 1990 1.00 NA NA NA NA NA NA 4.00e¹ 439 450 519 1.32e⁶ 0 0 216 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 US000 5.00 111110 28.0 4.00 1990 1.00 NA NA NA NA NA NA 1.40e¹ 336 336 382 1.01e⁶ 0 0 220 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 US000 5.00 111120 28.0 1.00 1990 1.00 NA NA NA NA NA NA 2.60e¹ 39.0 33.0 39.0 1.30e⁵ 0 0 270 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 US000 5.00 111120 28.0 2.00 1990 1.00 NA NA NA NA NA NA 5.00e⁰ 21.0 23.0 28.0 6.87e⁴ 0 0 220 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 US000 5.00 111130 28.0 1.00 1990 1.00 NA NA NA NA NA NA 8.20e¹ 76.0 73.0 85.0 3.28e⁵ 0 0 324 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 US000 5.00 111140 28.0 1.00 1990 1.00 NA NA NA NA NA NA 1.92e³ 1369 1314 1575 5.09e⁶ 0 0 276 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 US000 5.00 111140 28.0 2.00 1990 1.00 NA NA NA NA NA NA 8.50e¹ 374 404 526 1.34e⁶ 0 0 237 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 US000 5.00 111140 28.0 4.00 1990 1.00 NA NA NA NA NA NA 8.00e⁰ 193 211 224 7.45e⁵ 0 0 274 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # ... with 208,657 more rows
The BLS provides quarterly as well as annual averages, which are lighter. There is now an option to download directly the annual average using a frequency option. The default is quarterly, so everything should be backward compatible. Let's see first with the quarterly data (warning it is slow):
dt_naics <- get_qcew_cut( data_cut = 16, industry = "naics", frequency = "quarter", year_start = 2000, year_end = 2005, path_data = "~/Downloads/", write = F)
as_tibble(dt_naics) # A tibble: 18,028 x 42 area_fips own_code indus… agglv… size_… year qtr discl… qtrly_… month… month… month… total… taxa… qtrl… avg_… lq_d… lq_q… lq_m… lq_m… lq_m… lq_t… lq_t… lq_q… lq_a… oty_… oty_… oty_q… oty_mo… oty_mo… oty_mo… oty_m… oty_mo… oty_m… oty_to… oty_to… oty_… oty_… oty_… oty_… oty_av… oty_av… <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 US000 1.00 1125 16.0 0 2000 1.00 "" 2.00 7.00e⁰ 7.00e⁰ 7.00e⁰ 5.24e⁴ 0 0 576 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 0 0 - 1.00 -12.5 - 1.00 -12.5 - 1.00 -12.5 -1.90e³ - 3.50 0 0 0 0 54.0 10.3 2 US000 1.00 1125 16.0 0 2000 2.00 "" 2.00 7.00e⁰ 7.00e⁰ 8.00e⁰ 5.59e⁴ 0 0 587 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 0 0 - 1.00 -12.5 0 0 1.00 14.3 1.26e⁴ 29.0 0 0 0 0 132 29.0 3 US000 1.00 1125 16.0 0 2000 3.00 "" 2.00 8.00e⁰ 8.00e⁰ 8.00e⁰ 5.99e⁴ 0 0 576 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 0 0 1.00 14.3 1.00 14.3 1.00 14.3 9.74e³ 19.4 0 0 0 0 25.0 4.50 4 US000 1.00 1125 16.0 0 2000 4.00 "" 3.00 8.00e⁰ 8.00e⁰ 8.00e⁰ 6.42e⁴ 0 0 617 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 1.00 50.0 1.00 14.3 1.00 14.3 1.00 14.3 1.34e⁴ 26.4 0 0 0 0 59.0 10.6 5 US000 1.00 1131 16.0 0 2000 1.00 "" 78.0 4.09e³ 3.92e³ 3.93e³ 3.66e⁷ 0 0 706 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA -1.00 - 1.30 -237 - 5.50 -217 - 5.20 -282 - 6.70 -1.22e⁵ - 0.300 0 0 0 0 38.0 5.70 6 US000 1.00 1131 16.0 0 2000 2.00 "" 78.0 4.19e³ 4.57e³ 5.26e³ 4.77e⁷ 0 0 786 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 1.00 1.30 -315 - 7.00 -255 - 5.30 250 5.00 7.70e⁶ 19.2 0 0 0 0 142 22.0 7 US000 1.00 1131 16.0 0 2000 3.00 "" 78.0 5.85e³ 5.67e³ 5.47e³ 5.86e⁷ 0 0 796 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 0 0 - 97.0 - 1.60 -167 - 2.90 -110 - 2.00 -1.43e⁶ - 2.40 0 0 0 0 - 1.00 - 0.100 8 US000 1.00 1131 16.0 0 2000 4.00 "" 78.0 4.84e³ 4.49e³ 4.43e³ 5.09e⁷ 0 0 853 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA 0 0 -502 - 9.40 -146 - 3.10 205 4.80 -1.07e⁵ - 0.200 0 0 0 0 25.0 3.00 9 US000 1.00 2211 16.0 0 2000 1.00 "" 140 1.23e⁴ 1.23e⁴ 1.23e⁴ 1.91e⁸ 0 0 1193 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA -5.00 - 3.40 -201 - 1.60 -157 - 1.30 -125 - 1.00 1.54e⁷ 8.80 0 0 0 0 110 10.2 10 US000 1.00 2211 16.0 0 2000 2.00 "" 141 1.24e⁴ 1.24e⁴ 1.24e⁴ 1.83e⁸ 0 0 1138 NA 1.00 1.00 1.00 1.00 1.00 0 0 1.00 NA -2.00 - 1.40 49.0 0.400 5.00 0 6.00 0 2.54e⁷ 16.1 0 0 0 0 157 16.0 # ... with 18,018 more rows
Now let's download the data only at annual frequency (similar to the size cuts from earlier):
dt_naics_year <- get_qcew_cut( data_cut = 16, industry = "naics", frequency = "year", year_start = 2000, year_end = 2005, path_data = "~/Downloads/", write = F)
as_tibble(dt_naics_year) # A tibble: 4,507 x 38 area_fips own_code industry_code agglvl_code size_code year qtr disclosure_code annual… annual… total_… taxab… annua… annua… avg_a… lq_di… lq_a… lq_a… lq_t… lq_t… lq_a… lq_a… lq_a… oty_… oty_… oty_a… oty_an… oty_ann… oty_to… oty_tot… oty_… oty_… oty_… oty_… oty_a… oty_a… oty_… oty_a… <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 US000 1.00 1125 16.0 0 2000 NA "" 2.00 8.00e⁰ 2.32e⁵ 0 0 589 30648 NA 1.00 1.00 1.00 0 0 1.00 1.00 NA 0 0 1.00e⁰ 14.3 3.38e⁴ 17.0 0 0 0 0 68.0 13.1 3567 13.2 2 US000 1.00 1131 16.0 0 2000 NA "" 78.0 4.73e³ 1.94e⁸ 0 0 788 40994 NA 1.00 1.00 1.00 0 0 1.00 1.00 NA 0 0 -1.56e² - 3.20 6.04e⁶ 3.20 0 0 0 0 49.0 6.60 2547 6.60 3 US000 1.00 2211 16.0 0 2000 NA "" 142 1.24e⁴ 7.90e⁸ 0 0 1222 63557 NA 1.00 1.00 1.00 0 0 1.00 1.00 NA -2.00 - 1.40 -5.10e¹ - 0.400 8.53e⁷ 12.1 0 0 0 0 136 12.5 7096 12.6 4 US000 1.00 2213 16.0 0 2000 NA "" 10.0 9.37e² 2.67e⁷ 0 0 547 28463 NA 1.00 1.00 1.00 0 0 1.00 1.00 NA 1.00 11.1 1.14e² 13.9 -4.72e⁶ - 15.0 0 0 0 0 -186 -25.4 -9643 -25.3 5 US000 1.00 2362 16.0 0 2000 NA "" 1.00 1.00e⁰ 6.73e⁴ 0 0 1294 67270 NA 1.00 1.00 1.00 0 0 1.00 1.00 NA 0 0 1.00e⁰ 100 4.32e⁴ 180 0 0 0 0 184 16.6 9555 16.6 6 US000 1.00 3231 16.0 0 2000 NA "" 8.00 4.86e³ 2.69e⁸ 0 0 1063 55287 NA 1.00 1.00 1.00 0 0 1.00 1.00 NA 0 0 -6.10e¹ - 1.20 8.19e⁵ 0.300 0 0 0 0 16.0 1.50 852 1.60 7 US000 1.00 3259 16.0 0 2000 NA "" 3.00 3.10e² 1.67e⁷ 0 0 1037 53945 NA 1.00 1.00 1.00 0 0 1.00 1.00 NA 0 0 -2.00e⁰ - 0.600 1.30e⁶ 8.40 0 0 0 0 84.0 8.80 4396 8.90 8 US000 1.00 3321 16.0 0 2000 NA "" 3.00 4.75e² 3.06e⁷ 0 0 1241 64540 NA 1.00 1.00 1.00 0 0 1.00 1.00 NA 1.00 50.0 1.26e² 36.1 9.43e⁶ 44.5 0 0 0 0 72.0 6.20 3748 6.20 9 US000 1.00 3329 16.0 0 2000 NA "" 10.0 1.23e⁴ 6.24e⁸ 0 0 976 50746 NA 1.00 1.00 1.00 0 0 1.00 1.00 NA 0 0 -1.32e³ - 9.70 5.99e⁶ 1.00 0 0 0 0 104 11.9 5379 11.9 10 US000 1.00 3364 16.0 0 2000 NA "" 1.00 2.60e³ 6.86e⁷ 0 0 508 26390 NA 1.00 1.00 1.00 0 0 1.00 1.00 NA 0 0 -4.50e¹ - 1.70 -1.19e⁷ - 14.8 0 0 0 0 - 77.0 -13.2 -4048 -13.3 # ... with 4,497 more rows
The data is still raw from the BLS download. We can clean the data in two ways: a monthly dataset with employment by categories and a quarterly dataset with only number of firms. Finally we allow an option for aggregating both types. Note this function works only for the quarterly version of the data.
dt_naics_tidy <- tidy_qcew(dt_naics, frequency = "month")
as_tibble(dt_naics_tidy) # A tibble: 54,084 x 9 year quarter industry_code area_fips own_code size_code agglvl_code month emplvl <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> 1 2000 1.00 1111 US000 5.00 0 16.0 1.00 21243 2 2000 1.00 1111 US000 5.00 0 16.0 2.00 21759 3 2000 1.00 1111 US000 5.00 0 16.0 3.00 23600 4 2000 2.00 1111 US000 5.00 0 16.0 4.00 26142 5 2000 2.00 1111 US000 5.00 0 16.0 5.00 27287 6 2000 2.00 1111 US000 5.00 0 16.0 6.00 29355 7 2000 3.00 1111 US000 5.00 0 16.0 7.00 39098 8 2000 3.00 1111 US000 5.00 0 16.0 8.00 33418 9 2000 3.00 1111 US000 5.00 0 16.0 9.00 30534 10 2000 4.00 1111 US000 5.00 0 16.0 10.0 30475 # ... with 54,074 more rows
I included an option to specify a snapshot of the data take through the url wayback machine: thanks To Gabriel Chodorow-Reich for suggesting the idea.1 Specify a suffix found from the internet wayback machine on the BLS website and add it as an argument in the download function:
# Enter the address directly from the html address wayback_suffix <- "https://web.archive.org/web/20141101135821" # suffix for November 1st, 2014 # Or enter the exact date at which date it has been crawled: wayback_suffix <- 20141101135821 # execute adding the wayback option dt_naics <- get_qcew_cut( data_cut = 28, industry = "naics", year_start = 1990, year_end = 2015, # try not to ask for data from the future url_wayback = wayback_suffix, path_data = "~/Downloads/", write = F) dt_naics
The table of cuts looks as the following for NAICS based industries is as follows. See the reference table and official BLS documentation for more details.
#------------------------------------------------------------------------------------------------------------------------------ #| | Geographic/Size | National | National | CSA | MSA | Statewide | Statewide | County | MicroSA | #| | LevelCode Format | | by size | | | | by size | | | #|---------------------------|-------------------|----------|--- ------|-----|-----|-----------|-----------|--------|---------| #| Industry Ownership Level | | 1x | 2x | 3x | 4x | 5x | 6x | 7x | 8x | #| | | | | | | | | | | #| Total, All Ownerships | x0 | 10 | - | 30 | 40 | 50 | - | 70 | 80 | #| Total, by Ownership | x1 | 11 | 21 | - | 41 | 51 | 61 | 71 | - | #| Domain, by Ownership | x2 | 12 | 22 | - | 42 | 52 | 62 | 72 | - | #| SuperSector, by Ownership | x3 | 13 | 23 | - | 43 | 53 | 63 | 73 | - | #| Sector, by Ownership | x4 | 14 | 24 | - | 44 | 54 | 64 | 74 | - | #| 3-digit, by Ownership | x5 | 15 | 25 | - | 45 | 55 | - | 75 | - | #| 4-digit, by Ownership | x6 | 16 | 26 | - | 46 | 56 | - | 76 | - | #| 5-digit, by Ownership | x7 | 17 | 27 | - | 47 | 57 | - | 77 | - | #| 6-digit, by Ownership | x8 | 18 | 28 | - | 48 | 58 | - | 78 | - | #------------------------------------------------------------------------------------------------------------------------------
Be careful as the SIC aggregation levels are different than for NAICS. See the reference table and official BLS documentation for more details.
The division follows
01
to 11
: National level; all sectors down to 4 digits; aggregated and by size classes12
to 17
: MSA level; all sectors down to 4 digits; aggregated (no split by size classes available)18
to 25
: State level; all sectors down to 4 digits; aggregated and by size classes (only some size split are available)12
to 17
: County level; all sectors down to 4 digits; aggregated (no split by size classes available)The data file layout for a general view of what is available:
Some cuts are limited to an annual frequency (first quarter only) while some have a quarterly frequency (with monthly reports for employment)
21, 22, 23, 24, 25, 26, 27, 28
61, 62, 63, 64
Same as for NAICS. Collections stops in 2000. Size starts only in 1997.
7, 8, 9, 10, 11
24, 25
The BLS Table can be misleading at times. And all of the files are not created equals. I am working on harmonizing all of the data pulls but there seems to be a lot of exceptions. Here are warnings I found important:
01
, 02
, 03
, 04
, 05
, 06
, 13
, 14
, 15
, 18
, 19
, 20
, 21
, 22
, 23
, 26
, 27
, 28
, 29
01
, 02
, 03
, 04
, 05
, 06
, 13
, 14
, 15
, 16
, 17
, 18
, 19
, 20
, 21
, 22
, 23
, 26
, 27
, 28
, 29
, 30
, 31
16
, 17
(MSA 3 and 4 digits SIC) and 30
, 31
(County 3 and 4 digits SIC)The table of contents to download directly datasets
The data file layout for a general view of what is available:
Industry titles are standard in that case
Ownership codes go from 0 to 6:
Total Covered
Private
sectorInternational Government
; 3 for Local Government
; 2 for State Government
and 1 for Federal Government
Structures somewhat like industry codes. 5 characters. US000
is aggregated over the total US.
Then for example XXYYY
can be split in two parts:
XX
represents the state as in Census codes (alphabetical orders): 01 is Alabama and 02 is Alaska.YYY
represents the county within the given stateYYY
matching to counties:YYY
is 000
then this represents data aggregated at the state level: 01000,YYY
is 996
then it represents "Overseas Locations"YYY
is 997
then it represents "Multicounty, Not Statewide"YYY
is 998
then it represents "Out-of-State"YYY
is 999
then it represents "Unknown Or Undefined"C
then it represents subdivisions at the MSA levelBLS documentation is available at the following links:
1: The stata
version of this code is on Gabe's website here
(c) Erik Loualiche
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.