txtDisambig: functionName()

Description Usage Arguments Details

Description

Disambiguation Assistant —- This function: a) pulls a .txt file b) locates the body of the text (and stores the metadata) c) does some very light (reversible) text prep – including storing paragraph breaks d) pulls *Char.csv name alternate file e) creates a name.alt.df data frame, with columns of: i) name alternates ii) regular expressions to locate those name alternates iii) uniqnames for each name alternate (column 1) iv) sticky names for ngrams (i.e. replaces " " with "_") f) sorts the data frame by name alternate, from longest (most spaces) to shortest g) uses gsub() embedded in a for() loop to replace all the name alternates with sticky alternates h) glues the text file back together and saves it as a .txt file

Usage

1
2
txtDisambig(filename = filename.v, local = FALSE,
  write.report = TRUE, return.results = FALSE)

Arguments

filename

Character string of filename with associated .txt and Char.csv files.

local

Logical vector. If FALSE (default), looks in Google Drive for files. If TRUE, looks for filename in a folder with path data/filename/.

write.report

Logical vector. If TRUE (default), writes a -txtDisambig file with results.

return.results

Logical vector. Default is FALSE. If TRUE returns a list of length 2: [[1]] a vector of names associated with more than one uniqname, and [[2]] names not found in the associated text file.

Details

i) finds all duplicate name alternates, and returns a matrix collated with uniqnames

j) NOTE: a companion function removeBrackets removes all brackets and sticky spaces.


seanrsilver/novnet documentation built on June 19, 2019, 12:44 a.m.