clean_admin_names: Clean and Match Administrative Names

View source: R/clean_admin_names.R

clean_admin_namesR Documentation

Clean and Match Administrative Names

Description

This function takes administrative names and cleans them using various matching and string distance algorithms. It can also match the cleaned names with a base list provided by the user or fetched from 'GeoNames', which is a official repository of standard spellings of all foreign geographic names.

Usage

clean_admin_names(
  admin_names_to_clean,
  country_code,
  admin_level = "adm2",
  user_base_admin_names = NULL,
  user_base_only = FALSE,
  report_mode = FALSE
)

Arguments

admin_names_to_clean

A character vector of administrative names to clean.

country_code

sed if 'use_get_admin_names' is TRUE. A character string or numerical value of the country code (e.g., "KE"). This can be in various formats such as country name, ISO codes, UN codes, etc., see countrycode::codelist() for the full list of codes and naming conventions used.

admin_level

A character string indicating the administrative level (e.g., "adm2").

user_base_admin_names

A character of of administrative names that the use would like to use as reference. This is no necessary, downloaded 'GeoNames' will be used if missing.

user_base_only

A logical indicating whether to use only the user-provided base administrative names ('user_base_admin_names') for matching. If TRUE, 'country_code' and 'admin_names_to_clean' are not required. Default is FALSE.

report_mode

A logical indicating whether to return a detailed report. Default is FALSE.

Value

If 'report_mode' is set to TRUE, a data frame containing the original admin names and the matched and cleaned admin names with inormation of the source of data used to clean including the algorithm used, else a cleaned list of names is returned.

See Also

countrycode::codelist() for the full list of codes and naming conventions.

Examples

 
# Example with country code
base_names <- c(
  "Paris", "Marseille", "Lyon",
  "Toulouse", "Nice", "Nantes", "Strasbourg",
  "Montpellier", "Bordeaux", "Lille"
)

unclean_names <- c(
  "Pariis", "Marseill", "Lyone",
  "Toulous", "Niice", "Nantees", "Strasbourgh",
  "Montpeelier", "Bordeuax", "Lilie"
)

france_new <- clean_admin_names(
  country_code = "Fr",
  user_base_admin_names = base_names,
  admin_names_to_clean = unclean_names
)

print(france_new)



epiCleanr documentation built on Sept. 28, 2023, 5:07 p.m.