extract_people: Extract people's names from a file or website

Description Usage Arguments Details Value Examples

View source: R/misconduct.R

Description

This uses natural language processing to find all potential human names in a web page or file (ideally plain text; csv works). The code within has been developed with the help of Lincoln Mullen's natural-language-processing guide (https://rpubs.com/lmullen/nlp-chapter). However, if you already have people's names (say, at a meeting registration), just use that vector directly – this function might miss or split names it does not recognize.

Usage

1

Arguments

con

A connection object or a character string

text

Raw text

...

Other arguments to pass to readLines

Details

This can use either a URL or file (entered by the con argument) or a vector of text (text). If text is not NULL, it will use the text; otherwise it will use con.

Value

A vector of people's names

Examples

1
2
3
4
5
# We are using an archived version of the page for reproducibility;
# in most uses, you will want to use the current version of the page
url <- paste0("https://web.archive.org/web/20200819142546/",
"http://www.nasonline.org/member-directory/living-member-list.html")
nasem <- extract_people(con=url)

bomeara/misconduct documentation built on Nov. 1, 2021, 7:49 a.m.