pu_translate: Translate between names of units and abbreviations

View source: R/political_units.R

pu_translateR Documentation

Translate between names of units and abbreviations

Description

Convert names of political units to standardized abbreviations or convert standardized abbreviations to names.

Usage

pu_translate(
  x,
  superunit = NULL,
  fuzzy = T,
  reverse = F,
  ad_level = NULL,
  lang = "en",
  messages = 1,
  stringdist_params = NULL,
  standardize_name = F,
  add_parens = c("ISO3", "ISO2", "name")
)

Arguments

x

(chr vectr) A chr vector of names to abbreviate.

superunit

(chr vectr) Optional. A superunit whose subunits are the only ones being considered. Set to "world" if only sovereign countries are desired.

fuzzy

(lgl scalr) Whether to use fuzzy matching if no exact match exists.

reverse

(lgl scalr) Whether to translate from abbreviations to names.

ad_level

The administrative level of the unit. 0 = countries, 1 = first level (e.g. US states), 2 = second level (e.g. US counties). By default searches any level, which may give bad results!

lang

(chr scalr) If translating back to names, which language to use.

messages

(num scalr) Whether to give helpful messages. 0 = none, 1 = some, 2 = lots.

stringdist_params

(list) If using fuzzy matching, a list of parameters to pass to stringdist function.

standardize_name

(lgl scalr) If true, will translate names to abbreviations, and then back to names. This converts the names to the standard version in the dataset.

add_parens

(chr or NULL) Adds parenthesized versions of units. Useful for diambiguation.

Details

Superunits

Superunits are the abbreviations of the units one level above in the hierarchy. Countries have "world" as their superunit. One can supply multiple superunits meaning that all available translations will be used. They are not used in any order of preference and coalitions will result in an error as usual. if you have complex data of mixed level units, it is probably easier to to use a split-apply-combine approach. I.e., split the names into those that belong to the world, and to each different country, then translate each subset and combine the results. This can be done e.g. using plyr::ddply().

Languages

The list of translations is being slowly extended as I find the need to do so. Currently, there is good support for Danish and English. There is reasonable support for German and Italian. There is some support for Norwegian and Swedish, but mostly from the name->ISO direction.

ISO codes and dependencies

In many cases, it is clear whether a given unit belongs to some other unit. Florida clearly belongs to the USA, and Vietnam clearly belongs to the world. But what about Hong Kong and the US Virgin Islands? These are not ordinary first-level administrative units as the usual Chinese provinces or US states, but neither are they independent in the same as Norway and Mexico are. Sometimes, these grey zone units have official ISO codes and sometimes not. I made the consistent call to place them under their respective sovereign countries, which in some cases can cause problems. Hong Kong (HKG), for instance, is listed under China but has it's own ISO code. This means that it will not be translated when one uses the 'world' superunit. Unfortunately, there doesn't seem to be any obvious solution to these problems, so one will simply have to be careful when using lists of names that contain mixed-level units. Fortunately, the function throws useful messages to warn the user of these problems.

Value

A character vector.

Examples

pu_translate("Denmark")
pu_translate("DNK", reverse = T)
pu_translate("DNK", reverse = T, lang = "de")
#throws an error due to multiple Georgias
pu_translate("Georgia")
#solve by subsetting to specific superunits
pu_translate("Georgia", superunit = "world")
#complex problems can happen when one has mixed level units, e.g Georgia (country) and Hong Kong (quasi-country under Chinese rule)
pu_translate(c("Hong Kong", "Georgia"), superunit = "world") #clearly wrong!
pu_translate(c("Hong Kong", "Georgia"), superunit = c("world", "CHN")) #right!
#duplicated names in Latin America
pu_translate("Cordoba") #bad, multiple matches
pu_translate("Cordoba (ARG)") #works

Deleetdk/kirkegaard documentation built on March 26, 2024, 1:19 a.m.