library(dplyr, warn.conflicts = F)
Updated: r Sys.time()
Reporters' and partners' codes are converted from character to numeric (integer) format.
Non-numeric trailing symbols are removed from HS codes, as they were found in reporters 646 and 208.
In the trade dataset received from UNSD, a modified M49 classification is used for reporters. For example, the code 842 is used for the United States, including Virgin Islands and Puerto Rico. Appearingly the US reports its trade flows including these territories.
For partners the standard M49 classification is used: the US is under #840, and the territories have their own codes. So in the column with reporters only #842 is presented. But in the column with partners you will find 840 (the US) and 850 (US Virgin Islands).
For further steps (conversion area codes into FAO classification and mirroring) we need to have identical codification in partners' and reporters' columns. Since we can not split trade flows of the reporter #842, we have to merge trade flows of partners #840 and #850 and assign partner code #842. Analogous steps are taken for France, Italy, Norway, Switzerland and US Minor Outlying Islands.
After reassigning of codes weight, other quantity and value are aggregated.
Conversion of area codes into FAO standard is required for further steps of mapping of HS commodity codes to FCL and applying country specific adjustments to quantities and commodities.
Conversion of reporters' codes is done with using of countrycode R-package with matching by country names. Exceptions are stored in convertComtradeM49ToFAO()
R-function.
Conversion of partners' codes is done with country profiles table from FAOSTAT package. Exceptions are stored in convertTLPartnerToFAO()
R-function.
TODO: to clarify should we use swsUtils::m492fs()
function for both columns.
With current procedure all reporters' codes are mapped to FAO list, but there is no assurance that all codes are converted properly.
As about partners, following M49 codes are not mapped to FAO classification: 473, 490, 527, 568, 577, 637, 711, 837, 838, 839, 899. All of them are group of countries, for example Other Oceania, nes or Southern African Customs Union. Currently these trade flows are dropped from further processing.
TODO: We could add quantities and values from these trade flows.
Every country uses its own extended HS codes. Mapping of country-specific HS codes to FCL is maintained by ESS FAO specialists. Links are stored in MDB database files, which are used by Jellyfish software (pre-SWS era). Processed table from such file looks like following.
data("hsfclmap2", package = "hsfclmap") hsfclmap2 %>% select_(~ - fclorig, ~ - mdbyear) %>% head
A link HS-to-FCL is represented not as one-to-one correspondence, but as a numeric range of HS codes on the left-hand side and a single FCL code on the right-hand side.
During the process of mapping for every record in the table we should find all HS codes which fall into the range and assign corresponding FCL code to all of them.
Such approach assumes all HS codes in a group (by year, reporter, flow) have identical lengths in Comtrade dataset and in MDB files. For example, HS code 23.02.20.00.00
should fall into interval 23.02.20 - 23.02.30
, because this interval includes all items under header 23, subheading 02 and subheadings numbered from 20 to 30. So more correct representation of the interval would be 23.02.[20-30].*
But in case of direct numeric representation such matching will fail: 2,302,200,000 is bigger than upper bound of the interval (230,230).
Some of the following issues are recorded in function manualCorrections()
. The function is applied every time HS-to-FCL links are exported.
TODO: In some cases it is better to extend upper-bound by zeros instead of 9. For example, in HS interval 040510000:04058999
field to_HS should be extended by 9, but in case of 0708900000:07089000
zeros should be used.
During data processing we found out that in some cases lengths of HS codes in Comtrade data and in MDB maps don't match. Full list of mismatched codes one can find in this Shiny application, what was designed specially for investigation of the problem.
The reason for this problem is not clear. In some cases UNSD uses other sources of trade data than FAO used before (e.g. members of the EU). To complete HS-to-FCL mapping we have to align HS-codes on both sides. Our current approach is the following:
For example, in Comtrade dataset Canada reports import of a commodity under HS code 10.01.10.00
in 2011. Length of the HS code is 8 digits. But corresponding MDB mapping table Canada is represented only by 10-digit HS codes. Manually we can find suitable interval of HS codes: from 10.01.10.00.00
to 10.01.10.00.00
which is mapped to FCL code 15. But mathematically 10011000
is not equal to 1001100000
. So to complete the operation we add two additional zeros to the HS code from the Comtrade dataset.
On the same year Indonesia reports export of HS 15.11.10.00.00
--- 10-digit code. MDB table provides us only with 9-digit codes. Suitable HS range would be this one: from 15.11.00.00.0
to 15.11.99.99.9
, which is linked with FCL #257. We extend lower-bound (from_HS) with zero and upper-bound (to_HS) with 9 and get interval 1511000000:1511999999
where the code of question (1511100000
) perfectly falls.
After converting of HS codes to FCL we get several identical trade flows per each year/reporter/partner/flow/FCL. At this stage we aggregate such trade flows.
TODO: During aggregation we lose information about missing values and quantities, because NAs are removed during summarizing. We could do missing values imputation on HS level.
Re-export and and re-import trade flows are added to corresponding export and import trade flows.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.