match_misconduct: Find matches

Description Usage Arguments Details Value Examples

View source: R/misconduct.R

Description

Finds which people in the pool match people in the known group; returns potential matches for each. Remember that there may be both false positives (different people with similar or identical names) and false negatives (people who don't match since their names aren't similar enough ("Ann Doe" and "Nancy Doe")).

Usage

1
2
3
4
5
6
match_misconduct(
  pool,
  misconduct_db,
  remove_odd_pool = TRUE,
  fraction_firstname_mismatch_allowed = 0.7
)

Arguments

pool

Vector of people to check against a known database

misconduct_db

Tibble of people and other information in a misconduct database.

remove_odd_pool

If TRUE, delete names that might have been errors: NA, "A.", etc.

fraction_firstname_mismatch_allowed

What fraction of letters can be different between the first names to count as a match

Details

The fraction_firstname_mismatch_allowed goes from 0 to 1; if 0, every letter in both first names must match exactly; if 1, every letter can be different. The default value allows for some mismatch ("Will" and "Billy"). Higher values lead to more false positives but fewer false negatives (though even the extremes still allow some of each).

If the pool of names comes from, say, natural language processing of a website, there may be incorrect names included. remove_odd_pool=TRUE will remove these from the pool ("people" with no first name, no last name, or a period in their last name). If you know this isn't an issue, then set this to FALSE

Value

A tibble that has the rows of misconduct_db who may match the people in the pool along with the people in the pool who might match and the fraction of letters in their first names that don't match(the first two columns, Pool and FirstNameMismatchFraction)

Examples

1
2
3
4
5
6
7
8
# We are using an archived version of the page for reproducibility;
# in most uses, you will want to use the current version of the page
url <- paste0("https://web.archive.org/web/20200819142546/",
"http://www.nasonline.org/member-directory/living-member-list.html")
nasem <- extract_people(con=url)
asmd <- get_misconduct(agree=TRUE)
apparent_matches <- match_misconduct(nasem, asmd)
print(apparent_matches[,c("Pool", "Person", "FirstNameMismatchFraction", "Specific Outcome")])

bomeara/misconduct documentation built on Nov. 1, 2021, 7:49 a.m.