knitr::opts_chunk$set(echo = TRUE)

Introduction

This package contains a dictionary for Vietnamese provinces names, admin2 names and admin3 names, useful to ensure consistency between different data packages using Vietnamese provinces, admin2 and/or admin3. It also contains dictionaries for the Camdodian, Lao and Thai admin1 names and Lao admin2. The package also contains the list names of the admin1 ordered by year. These lists contain for each of change, the name of the "new" admin1 in English. We also have a list by country (Vietnam, Laos, Cambodge, Thailand) of the history of change in administrative boundaries

devtools::install_github("choisy/dictionary")

Loading and attaching the dictionary package:

library(dictionary)

Usage examples

Dictionnary

We can use some examples to demonstrate our package:

admin3

We create a vector containing some admin3 names in Vietnamese:

communes_vn <- c("Xã Nghĩa Lợi", "Xã Nam Mẫu", "Xã Xuân Vinh",
                 "Thị trấn Thứ Ba", "Xã Hải Dương", "Xã Vĩnh Lộc A",
                 "Xã Nội Hoàng", "Xã Điện Trung", "Xã Hoàng Đồng", "Xã Nàn Sán")

We can use the character vector admin3 to translate the Vietnamese vector but first the vector should be encode in UNICODE to avoid encoding trouble. For that, we use the stringi package:

uni_communes_vn <- stringi::stri_escape_unicode(communes_vn)

And now, we can use the vector admin3

admin3_transl <- admin3[uni_communes_vn]

We obtain a character vector, with the original names as name and the translation as value:

str(admin3_transl)
admin3_transl

admin2 and admin1

Each dictionary is a names vector:

# For Vietnam:
str(vn_admin1)
str(vn_admin2)
# For Laos:
str(la_admin1)
str(la_admin2)
# For Cambodge:
str(kh_admin1)
# For Thailand:
str(th_admin1)

To use them, it's the same principle for the admin1 and the admin2: We create two vectors that needed to be translate:

districts_vn <- c("Huyện Tây Hòa", "Huyện Kim Thành", "Quận Liên Chiểu",
                  "Huyện Tương Dương", "Huyện Bình Đại",
                  "Thị xã Mường Lay", "Thành phố Vĩnh Long",
                  "Huyện Gia Viễn", "Huyện Văn Quan", "Thị xã Sông Công")
provinces_vn <- c("Ninh Bình", "Hà Nội", "An Giang", "Bình Phước", "Lào Cai",
                  "Phú Thọ", "Bắc Giang", "Vĩnh Phúc", "Bạc Liêu", "Hà Giang")

But, we use the character vector admin2 and admin1 contains in the package to translate the two vectors after changing the encoding to UNICODE:

uni_districts_vn <- stringi::stri_escape_unicode(districts_vn)
uni_provinces_vn <- stringi::stri_escape_unicode(provinces_vn)
admin2_transl <- vn_admin2[uni_districts_vn]
admin1_transl <- vn_admin1[uni_provinces_vn]

We obtain two character vectors, with the original names as name and the translation as value:

str(admin2_transl)
admin2_transl
str(admin1_transl)
admin1_transl

List of admin1 per year

It's a list object :

str(vn_admin1_year)

To see a particular year (for example 1997) :

vn_admin1_year$`1997`

History

The package contains also a list by country of the history of change in administrative boundaries:

str(th_history)

It's List of 2 lists containing 4 or 6 elements, depending of the event. If the history contains complex event, meaning some splits or merges events where the admin2 are detailed, two slots with these detailed are added :

For example, the Laos history contains complex event:

la_history

Function

In the package, the function match_pattern with the XX_admin1_year can be used to return the year of expression of the admin1 in a data frame.

For example, if we use a data frame containing data expressed by admin1:

data <- data.frame(
  admin1 = unique(grep("Xaisomboun", la_admin1, value = TRUE, invert = TRUE)),
  value = sample(1:3, length(unique(la_admin1)[-1]), T))
head(data)

We can now, used the match_pattern function:

match_pattern(data, "admin1", la_admin1_year)
# the data are expressed with the admin1 definition of 2006


choisy/dictionary documentation built on Aug. 21, 2019, 3:52 p.m.