knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE, out.width = "100%" ) library(ggplot2) library(sf) library(dplyr) library(tidyr)
The tidycensuskr package provides easy access to South Korean census and socioeconomic statistics, along with corresponding geospatial boundary data. With this package, R users can query and visualize population, housing, economy, tax, and mortality data linked to administrative districts.
Load the package:
library(tidycensuskr)
tidycensuskr will work at its full potential with the companion data package tidycensuskr.sf, which contains the district boundaries of South Korea. The package can be installed from R-universe:
install.packages("tidycensuskr.sf", repos = "https://sigmafelix.r-universe.dev")
After installing the companion package, three RDS files for 2010, 2015, and 2020 will be accessible through the function system.file(). For example, the RDS file path of the 2010 district boundaries can be loaded as follows:
fs10 <- system.file("extdata", "adm2_sf_2010.rds", package = "tidycensuskr.sf") adm2_sf_2010 <- readRDS(fs10)
South Korean census data is organized by three levels of administrative divisions:
The table below provides a rough comparison of administrative divisions across South Korea, the United States, the European Union, and the United Kingdom (England). While the correspondence is not exact, it can be helpful to understand the approximate levels when working with census or regional data.
| South Korea | US | EU (NUTS[^1]) | UK (England) | |-------------------|-------------------------------------------|---------------|----------------------------------| | Si/Do | State | NUTS1 | Regions / Combined Authorities | | Si/Gun/Gu | County | NUTS2 | County | | Eup/Myeon/Dong| Townships / Towns / Census County Division | NUTS3 | Districts / Wards / Boroughs |
[^1]: NUTS: Nomenclature of Territorial Units for Statistics, a geocode standard for referencing the subdivisions of countries for statistical purposes.

Because administrative boundaries and coding systems can vary across years and data sources, tidycensuskr standardizes administrative codes to allow consistent integration of statistics.
Currently, for 2020 data there are 250 Si-Gun-Gu and 17 Si-Do.
data(adm2_sf_2020) print(length(unique(adm2_sf_2020$adm2_code)))
The package provides census and survey data through:
- The function anycensus() for querying subsets
- The built-in dataset censuskor in long format
anycensus()The function anycensus() returns a tidy tibble with columns such as:
year: year of the datasetadm1, adm1_code: Si-Do (province) level administrative unit name and its corresponding code adm2, adm2_code: Si-Gun-Gu (district) level administrative unit name and its corresponding code Columns containing the values are added as a wide form. The column adm2_code links census data directly to boundary files retrieved with load_districts().
df_2020 <- anycensus(year = 2020, type = "mortality", level = "adm2") head(df_2020)
The function can also aggregate values to higher administrative units. By specifying level = "adm1" and providing an aggregation function, we obtain province-level (adm1) results that summarize across all districts.
df_2020_sido <- anycensus(year = 2020, type = "mortality", level = "adm1", aggregator = mean, na.rm = TRUE) head(df_2020_sido)
censuskorYou can access the whole dataset directly using the function data(censuskor) which returns the built-in dataset in a long form.
year: year of the datasetadm1, adm1_code: Si-Do (province) level administrative unit name and its corresponding code adm2, adm2_code: Si-Gun-Gu (district) level administrative unit name and its corresponding code type: Types of census or surveyclass1, class2: Classification variables providing further breakdownsunit: Measurement unit for the valuevalue: The observed census value for the given combination of year, region, and category data(censuskor) head(censuskor)
Since anycensus() returns tidy data, visualization with ggplot2 is straightforward.
ggplot(df_2020, aes(x = `all causes_male_p1p`, y = `all causes_female_p1p`)) + geom_point() + labs( x = "Male mortality (per 100,000 population)", y = "Female mortality (per 100,000 population)", title = "Male vs. Female Age-standardized Mortality Rates in South Korea (2020)" ) + theme_minimal(base_size = 10)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.