README.md

concordance: Product Concordance

R build
status Build Status CRAN downloads CRAN status

Authors: Steven Liao (steven.liao@ucr.edu), In Song Kim (insong@mit.edu), Sayumi Miyano (smiyano@princeton.edu), Hao Zhang (hzhang3@mit.edu)

This R package provides a set of utilities for matching products in different classification codes used in international trade research. It currently supports concordance between the classifications below:

Support between the above and the below classifications will be offered soon:

Additionally, the package provides functions for:

Installation Instructions

concordance is available on CRAN and can be installed using:

install.packages("concordance")

You can install the most recent development version of concordance using the devtools package. First you have to install devtools using the following code. Note that you only have to do this once:

if(!require(devtools)) install.packages("devtools")

Then, load devtools and use the function install_github() to install concordance:

library(devtools)
install_github("insongkim/concordance", dependencies=TRUE)

If concordance was previously installed, please uninstall the package, restart R, and then reinstall the developer version following instructions above.

Citation

To cite concordance in publications use:

  Steven Liao, In Song Kim, Sayumi Miyano, Hao Zhang (2020). concordance: Product Concordance. 
  R package version 2.0.0. https://CRAN.R-project.org/package=concordance

A BibTeX entry for LaTeX users is:

  @Manual{,
    title = {concordance: Product Concordance},
    author = {Steven Liao and In Song Kim and Sayumi Miyano and Hao Zhang},
    year = {2020},
    note = {R package version 2.0.0},
    url = {https://CRAN.R-project.org/package=concordance},
  }

Usage Examples

Getting Product Description

The get_desc function allows users to look up the product description of different classification codes. The example below focuses on HS codes.

# load package
library(concordance)

# get product description
get_desc(sourcevar = c("120600", "854690"), origin = "HS5")
[1] "Oil seeds; sunflower seeds, whether or not broken" "Electrical insulators; other than of glass and ceramics"

Users can also input codes with different digits. For HS codes, 2, 4, 6-digits are supported. Note that users should always include leading zeroes in the codes (e.g. use HS code 010110 instead of 10110) -- results may be buggy otherwise.

get_desc(sourcevar = c("1206", "8546"), origin = "HS5")
[1] "Sunflower seeds; whether or not broken" "Electrical insulators of any material"
get_desc(sourcevar = c("12", "85"), origin = "HS5")
[1] "Oil seeds and oleaginous fruits; miscellaneous grains, seeds and fruit, industrial or medicinal plants; straw and fodder"     
[2] "Electrical machinery and equipment and parts thereof; sound recorders and reproducers; television image and sound recorders and reproducers, parts and accessories of such articles"

Getting Product Codes By Keywords (developer version only)

The get_product function allows users to look up product codes for which descriptions match user-specified keywords.

The function utilizes the function stringr::str_detect for pattern detection. The argument pattern takes specific string patterns to search for, origin indicates the classification system of focus, digits sets the number of digits of the output codes, type sets the type of pattern interpretation (e.g., "regex", "fixed", "coll", see ?str_detect for further details), and ignore.case decides whether to ignore case differences (TRUE by default). The example below returns manufacture-related NAICS codes.

manu.vec <- get_product(pattern = "manu", origin = "NAICS2017", digits = 4,
                        type = "regex", ignore.case = TRUE)
manu.vec
[1] "3111" "3113" "3114" "3115" "3118" "3119" "3121" "3122" "3152" "3159" "3162" "3169" "3212" "3219" "3222" "3241" "3251" "3252" "3253" "3254" "3255" "3256" "3259"
[24] "3261" "3262" "3271" "3272" "3273" "3274" "3279" "3311" "3312" "3322" "3323" "3324" "3325" "3326" "3327" "3329" "3331" "3332" "3333" "3334" "3335" "3336" "3339"
[47] "3341" "3342" "3343" "3344" "3345" "3346" "3351" "3352" "3353" "3359" "3361" "3362" "3363" "3364" "3365" "3369" "3371" "3372" "3379" "3391" "3399"

Users can double-check the product descriptions with get_desc.

get_desc(manu.vec, origin = "NAICS2017")
[1] "Animal Food Manufacturing"                                                                   
[2] "Sugar and Confectionery Product Manufacturing"                                               
[3] "Fruit and Vegetable Preserving and Specialty Food Manufacturing"                             
[4] "Dairy Product Manufacturing"                                                                 
[5] "Bakeries and Tortilla Manufacturing"                                                         
[6] "Other Food Manufacturing"                                                                    
[7] "Beverage Manufacturing"                                                                      
[8] "Tobacco Manufacturing"                                                                       
[9] "Cut and Sew Apparel Manufacturing"                                                           
[10] "Apparel Accessories and Other Apparel Manufacturing"
...

Concording Different Classification Codes

The concord function allows users to concord between different classification codes. The example below converts HS5 to NAICS2017 codes.

Users can choose to retain all matches for each input by setting all = TRUE. This option will also return the share of occurrences for each matched output among all matched outputs at the user-specified digit level.

# HS to NAICS
concord(sourcevar = c("120600", "854690"),
        origin = "HS5", destination = "NAICS2017",
        dest.digit = 6, all = TRUE)
$`120600`
$`120600`$match
[1] "111120"

$`120600`$weight
[1] 1


$`854690`
$`854690`$match
[1] "326199" "335932"

$`854690`$weight
[1] 0.5 0.5

Alternatively, users can simply obtain the matched output with the largest share of occurrences (the mode match) with all = FALSE (default). If the mode consists of multiple matches, the function will return the first matched output.

concord(sourcevar = c("120600", "854690"),
        origin = "HS5", destination = "NAICS2017",
        dest.digit = 6, all = FALSE)
[1] "111120" "326199"

Users can double-check the validity of the matches with get_desc.

# get product description of NAICS ouput
get_desc(sourcevar = c("111120", "326199"), origin = "NAICS2017")
[1] "Oilseed (except Soybean) Farming" "All Other Plastics Product Manufacturing"

More technically, the function works by matching an input code to the most fine-grained level of destination codes in our package (e.g., the 6-digit NAICS codes above) and then calculates the occurrence share of each matched code at the user-specified digit-level. Mode(s) can occur when users choose destination codes at a more aggregated level and multiple finer-grained matched codes belong to certain groups at that level.

We illustrate the above mechanics using HS5 code "8546" as an example. When users ask for 6-digit NAICS codes (the most fine-grained level available), HS5 code "8546" is matched to five NAICS codes: "327212", "327113", "327110", "326199", and "335932", with weights of 0.2 (1/5) each.

concord(sourcevar = "8546",
        origin = "HS5", destination = "NAICS",
        dest.digit = 6, all = TRUE)
$`8546`
$`8546`$match
[1] "327212" "327113" "327110" "326199" "335932"

$`8546`$weight
[1] 0.2 0.2 0.2 0.2 0.2

Instead, when users ask for 4-digit NAICS codes, HS5 code "8546" is matched to four NAICS codes: "3271", "3272", "3261", "3359". NAICS code "3271" gets a weight of 0.4 since it consists of two finer-grained matches "327113" and "327110" out of the 5 total matches (2/5).

concord(sourcevar = "8546",
        origin = "HS5", destination = "NAICS",
        dest.digit = 4, all = TRUE)
$`8546`
$`8546`$match
[1] "3271" "3272" "3261" "3359"

$`8546`$weight
[1] 0.4 0.2 0.2 0.2

Thus, when all = FALSE, the function will retain the matched code with the largest weight "3271".

concord(sourcevar = "8546",
        origin = "HS5", destination = "NAICS",
        dest.digit = 4, all = FALSE)
[1] "3271"

Getting Product Differentiation

Rauch (1999) classifies each SITC Rev. 2 industry according to three possible types:

The get_proddiff function concords users' input codes to SITC2 codes and then extracts the corresponding Rauch classifications.

There are two main options. First, users can set prop = "n", prop = "r", or prop = "w", in which case the function will return the proportion of "w", "r", or "n" in the resulting vector of Rauch indices.

# get the proportion of type "r"
get_proddiff(sourcevar = c("120600", "854690"), origin = "HS5", prop = "r")
120600 854690 
     1      0

If prop is not set to any of these, then the function returns, for each input code, a dataframe that summarizes all the frequencies and proportions of "w", "r", and "n".

get_proddiff(sourcevar = c("120600", "854690"), origin = "HS5", prop = "")
$`120600`
  rauch freq proportion
1     w    0          0
2     r    1          1
3     n    0          0

$`854690`
  rauch freq proportion
1     w    0          0
2     r    0          0
3     n    1          1

Second, users can choose Rauch's conservative classification with setting = CON (default). setting = LIB returns Rauch's liberal classification.

get_proddiff(sourcevar = c("120600", "854690"), origin = "HS5", setting = "LIB", prop = "")
$`120600`
  rauch freq proportion
1     w    1          1
2     r    0          0
3     n    0          0

$`854690`
  rauch freq proportion
1     w    0          0
2     r    0          0
3     n    1          1

Getting Product Elasticity

Broda and Weinstein (2006) estimate product-level import demand elasticities for 73 countries using HS0 3-digit codes.

The get_sigma function concords users' input codes to 3-digit HS0 codes and then extracts the corresponding product-level elasticities in the country selected by the user.

There are two main options. First, when give_avg = TRUE (default), each output element will be a simple average of all elasticities (of matched codes) in the corresponding vector.

get_sigma(sourcevar = c("120600", "854690"), origin = "HS5",
          country = "USA", give_avg = TRUE)
[1] 3.733456 1.233216

Users can also set give_avg = FALSE to obtain the full vector of elasticities for all matching codes of each element in the input vector. In this case, there were only one matches per input.

get_sigma(sourcevar = c("120600", "854690"), origin = "HS5",
          country = "USA", give_avg = FALSE)
$`120600`
$`120600`$elasticity
[1] 3.733456


$`854690`
$`854690`$elasticity
[1] 1.233216

Second, for the United States (only), Broda and Weinstein (2006) have also estimated elasticities based on more fine-grained 5-digit SITC3 codes. Users can obtain elasticities in the United States via this method with use_SITC = TRUE.

get_sigma(sourcevar = c("120600", "854690"), origin = "HS5",
          country = "USA", use_SITC = TRUE, give_avg = TRUE)
[1] 2.562991 1.345522

Getting Industry Upstreamness/Downstreamness (developer version only)

Building on Antras, Chor, Fally, Hillberry (2012), Antras and Chor (2018) estimate industry-level upstreamness/downstreamness for 2-digit ISIC3 codes in 40 countries (+ Rest of the World, RoW) between 1995 and 2011.

The get_upstream function concords users' input codes to 2-digit ISIC3 codes and then uses the corresponding codes as input to calculate weighted estimates of upstreamness or downstreamness in the country and year selected by the user.

The argument sourcevar sets the industry codes to look up, origin indicates the classification system of the input codes, country takes ISO 3-letter codes, year takes an integer between 1995 and 2011, and setting accepts one of the four available measures as defined in Antras and Chor (2018):

The example below returns the upstreamness ("GVC_Ui") of HS5 industries in the United States in 2011.

get_upstream(sourcevar = c("0101", "0301", "7014", "8420"), origin = "HS5",
             country = "USA", year = "2011",
             setting = "GVC_Ui", detailed = FALSE)
[1] 2.595109 2.595109 2.563818 1.795285

The argument detailed allows users to return more detailed industry-level GVC_Ui estimates following Antras, Chor, Fally, and Hillberry (2012). When set to TRUE, the function concords each element of the input vector to 6-digit BEA codes, and then calculates weighted average estimates of upstreamness (GVC_Ui). Note that such estimates only exist for USA in 2002, 2007, and 2012.

get_upstream(sourcevar = c("0101", "0301", "7014", "8420"), origin = "HS5",
             country = "USA", year = "2012",
             setting = "GVC_Ui", detailed = TRUE)
[1] 2.488410 2.488410 2.515886 1.588522

Getting Industry Intermediateness (developer version only)

The get_intermediate function calculates and returns the proportion of intermediate goods production in an industry based on product descriptions.

The function uses keywords ("part(s)", "intermediate", and "component") to identify intermediate-goods producing industries (at the most disaggregated level in the description data), and then calculates and returns the proportion these industries occupy among each input code. Larger values indicate higher levels of intermediateness in an industry.

For example, users can get the level/proportion of intermediate goods production in the 4-digit NAICS2017 industries below.

get_intermediate(sourcevar = c("3131", "3363"), origin = "NAICS2017")
[1] 0.0 0.5

Or the level/proportion of intermediate goods production in the 2-digit HS5 industries below.

get_intermediate(sourcevar = c("03", "84"), origin = "HS5")
[1] 0.0000000 0.1937984

References



insongkim/concordance documentation built on March 31, 2022, 8:57 p.m.