knitr::opts_chunk$set(collapse = T, comment = "#>")
options(tibble.print_min = 4L, tibble.print_max = 4L)
set.seed(42)

library(DT)
library(tidygeocoder)
library(gt)
library(dplyr)

Overview

The supported geocoding services are shown in the table below. The method is used to select the geocoding service in tidygeocoder functions such as geo() and reverse_geo(). The usage rate limitations are listed for the free tier of the service when applicable and many services have faster rates available with paid plans.

Also note that there are many other considerations when selecting a geocoding service such as if the service uses open source data with permissive licensing, how the service uses or stores your data, and if there are restrictions on how you can use the data provided by the service. Refer to each service's documentation for details.

library(dplyr)
check_mark <- "\U2705" #unicode character for heavy white check mark

geocoder_summary_table <-
  tidygeocoder::api_info_reference %>%
    mutate(
      service = paste0(
        '[', method_display_name, '](', site_url, ')'
      ),
      batch_geocoding = ifelse(method %in% names(tidygeocoder:::batch_func_map), check_mark, ''),
      api_key_required = ifelse(method %in% tidygeocoder::api_key_reference[['method']], check_mark, ''),
      api_documentation = paste0(
        '[docs](', api_documentation_url, ')'
      )
    ) %>%
    left_join(tidygeocoder::min_time_reference %>% select(method, description), by = 'method') %>%
    select(service, method, api_key_required, batch_geocoding, usage_limitations = description, api_documentation) %>%
    mutate(across(method, function(x) stringr::str_c('`', x, '`'))) %>% # format method column
    tidyr::replace_na(list(usage_limitations = ''))

# Format column names
colnames(geocoder_summary_table) <- colnames(geocoder_summary_table) %>%
  stringr::str_replace_all('_', ' ') %>%
  stringr::str_to_title() %>%
  stringr::str_replace_all('Api', 'API')

geocoder_summary_table %>%
  knitr::kable()

Highlights:

Data Privacy

Due diligence must be exercised when geocoding sensitive data as tidygeocoder utilizes third party web services to perform geocoding. Within the context of healthcare, using patient or study subject address data with a third party geocoding service can risk violating privacy rules for International Review Boards (IRBs) and HIPAA.

Further details on possible risk are described here. Refer to the documentation on your selected geocoding service (see links above) for information on how your data will be utilized and stored.

Some options you could consider if the privacy of your data is a concern:

See the geo() or reverse_geo() documentation pages for more documentation on the parameters mentioned above.

Usage Notes

tidygeocoder::geo(address = "New York, USA", method = "arcgis",
  custom_query = list(token = "<API_KEY>"))

API Parameters

The api_parameter_reference maps the API parameters for each geocoding service to a common set of "generic" parameters. The generic_name below is the generic parameter name while the api_name is the parameter name for the specified geocoding service (method). Refer to ?api_parameter_reference for more details.

api_parameter_reference %>% 
  mutate(across(c(method, generic_name, api_name), as.factor)) %>%
  datatable(filter = 'top', rownames = FALSE, 
  options = list(
    lengthMenu = c(5, 10, 15, 20, nrow(.)),
    pageLength = 10,
    autoWidth = TRUE)
  )

API Key Retrieval

API keys are retrieved from environmental variables. The name of the environmental variable used for each service is stored in the api_key_reference dataset. See ?api_key_reference.

api_key_reference %>%
  gt() %>%
  opt_table_outline() %>%
  opt_table_lines() %>%
  tab_options(column_labels.font.weight = 'bold')

Minimum Time Per Query

The minimum time (in seconds) required per query to comply with the usage limitations policies of each geocoding service is stored in the min_time_reference dataset. See ?min_time_reference.

min_time_reference %>%
  gt() %>%
  opt_table_outline() %>%
  opt_table_lines() %>%
  tab_options(column_labels.font.weight = 'bold')

Links to the usage policies for each geocoding service:

cat(tidygeocoder:::get_api_usage_bullets(), sep = '\n')

Batch Query Size Limits

The maximum number of inputs (geographic coordinates or addresses) per batch query for each geocoding service is stored in the batch_limit_reference dataset. See ?batch_limit_reference.

batch_limit_reference %>%
  gt() %>%
  fmt_number(columns = 'batch_limit', decimals = 0) %>%
  opt_table_outline() %>%
  opt_table_lines() %>%
  tab_options(column_labels.font.weight = 'bold')


jessecambon/tidygeocoder documentation built on Jan. 26, 2023, 4:03 p.m.