get_fips: Fix misspelled names and assign fips codes to US states and...

View source: R/get_fips.R

get_fipsR Documentation

Fix misspelled names and assign fips codes to US states and counties

Description

Assigns fips codes to US states/territories or counties/county-equivalents. When a name is misspelled, a fips code and the correctly spelled name is assigned using approximate name matching algorithms.

Usage

get_fips(
  data,
  state_col = "stateProvince",
  county_col = "county",
  assign_counties = TRUE
)

Arguments

data

Data.frame containing a column with state names and (optionally) a column of county names.

state_col

Character string specifying state column name. Defaults to "stateProvince" (Darwin Core standard).

county_col

Character string specifying county column name. Defaults to "county" (Darwin Core standard).

assign_counties

Logical. If TRUE, fips codes are assigned for states AND counties. If FALSE, only state fips codes are assigned.

Details

Fips codes are assigned based on 2019 reference data from the US Census Bureau. County data includes counties and county-equivalents (e.g. parishes, boroughs, census areas, independent cities) for the District of Columbia, US territories, and all 50 US states.

When assigning fips codes, approximate name matches are possible when names have variable nomenclature (e.g. "Anoka", "Anoka Co.", "Anoka County") or are simply misspelled (e.g "Florda"). See "Value" section for validating the quality of approximate matches.

Only current county/county-equivalent names (as of December 2020) are used for assigning fips. Fungal records from counties that had substantially different names in the past or counties that no longer exist (e.g. Bedford City) may not have fips codes assigned. The only way to circumvent this issue is to include ALL historical names for every county in the reference dataset, which currently has not been done.

Value

Returns input data.frame with the following output fields appended:

state_name

Character string. Matched state name.

state_fips

Character string. Two digit state fips code.

state_matchtype

Character string. EXACT: a state name was matched exactly to the state listed in the fungal dataset; PARTIAL: a state name was matched partially to the state listed in the fungal dataset; MISPELLED: a state name was matched approximately to the misspelled state listed in the fungal dataset; NONE: a state name could not be matched to the state listed in the fungal dataset.

state_conf

Integer. The confidence score when a mispelled state name is approximately matched (0-100). Names with NONE, EXACT, or PARTIAL matchtypes all get a score of 100.

county_name

Character string. Matched county name.

county_fips

Character string. Five digit county fips code.

county_matchtype

Character string. EXACT: a county name was matched exactly to the county listed in the fungal dataset; PARTIAL: a county name was matched partially to the county listed in the fungal dataset; MISPELLED: a county name was matched approximately to the misspelled county listed in the fungal dataset; NONE: a county name could not be matched to the county listed in the fungal dataset.

county_conf

Integer. Confidence score when a misspelled county name is approximately matched (0-100). Names with NONE, EXACT, or PARTIAL matchtypes all get a score of 100.

Examples

library(fungarium)

#import sample data set
data(agaricales)

#filter records for specific state
agaricales_mn <- agaricales[agaricales$stateProvince=="Minnesota",]

#fix misspelled counties and assign fips codes
agaricales_fips <- get_fips(agaricales_mn)


hjsimpso/fungarium documentation built on Aug. 23, 2023, 3:59 p.m.