We download data from the County Business Pattern (CBP). Data is available for the period from 1986 to 2014 at an annual frequency.

The most detailed cut of the data incorporates size, county (zipcode since 1994) and industry (4 digits) employment and businesses.

The data is directly retrieved from the Census website

For example if we would like the raw data:

library(entrydatar)
dt_raw <- download_all_cbp(1990, 1990)
dt_raw

The data is by default aggregated at the county level. I allow different level of aggregations using variable aggregation_level. The aggregation_level variable is detailed in a table below. Calling variable without the default is now:

dt_county_level   <- download_all_cbp(1990, 1990,
                                      aggregation_level = "county")
dt_national_level <- download_all_cbp(1990, 1990,
                                      aggregation_level = "US")

Some more information about data layout and availability


| Variable Name | Description | |---------------------------|----------------------------------| | | | | Variable | | | emp | Total Mid-March Employees | | qp1 | First Quarter Payroll ($1,000) | | ap | Total Annual Payroll ($1,000) | | est | Total Number of establishments | | | | | Size | | | N1_4 | 1-4 Employee Size Class | | N5_9 | 5-9 Employee Size Class | | N10_19 | 10-19 Employee Size Class | | N20_49 | 20-49 Employee Size Class | | N50_99 | 50-99 Employee Size Class | | N100_249 | 100-249 Employee Size Class | | N250_499 | 250-499 Employee Size Class | | N500_999 | 500-999 Employee Size Class | | N1000 | 1000 or More Employee Size Class | | N1000_1 | 1000-1499 Employee Size Class | | N1000_2 | 1500-2499 Employee Size Class | | N1000_3 | 2500-4999 Employee Size Class | | N1000_4 | 5000 or More Employee Size Class | | | | | Flags | | | empflag | Size class replaced withheld | | | emp and payroll replaced by 0 | | fipstate | FIPS State Code | | fipcscty | FIPCS County Code | | censtate | Census State Code | | cencty | Census County Code |


Informative flags are the empflag about data suppression. It denotes employment size class for data withheld to avoid disclosure (confidentiality) or withheld because data do not meet publication standards. The aggregate size class is indicated by letters from a to m. lfo informs about the legal organizational form of the firms considered. Definitions of the flags are in the following table:


| Data Flag | Description | |----------------------------|----------------------------------------| | | | | Employment Flag | | | EMPFLAG | | | a | 0-19 | | b | 20-99 | | c | 100-249 | | e | 250-499 | | f | 500-999 | | g | 1,000-2,499 | | h | 2,500-4,999 | | i | 5,000-9,999 | | j | 10,000-24,999 | | k | 25,000-49,999 | | l | 50,000-99,999 | | m | 100,000 or More | | | | | Legal Form of Organization | | | LFO | | | - | All Establishments | | C | Corporations | | Z | S-Corporations | | S | Sole Proprietorships | | P | Partnerships | | N | Non-Profits | | G | Government | | O | Other | | | | | Noise Flag | | | EMP_NF | Total Mid-March Employees Noise Flag | | QP1_NF | Total First Quarter Payroll Noise Flag | | AP_NF | Total Annual Payroll Noise Flag | | G | 0 to < 2% noise (low noise) | | H | 2 to < 5% noise (low noise) | | D | | | D | Withheld (see records layout) | | S | Withheld (see records layout) |


To Download different version of the dataset the match is:

| Aggregation Argument | Description | Data layout | |----------------------------|----------------------------------|--------------------------------------------| | county | county level (default) | | | US | national | US | | MSA | metropolitan area | MSA | | state | state level | State |



For something a bit tidier we gather the data around year, geography, industry and size. Not yet implemented.

# dt_tidy <- entrydatar::tidy_cbp(dt_raw, 1990, 1990)
# dt_tidy

(c) Erik Loualiche



eloualiche/entrydatar documentation built on Aug. 20, 2023, 1:56 p.m.