Description Usage Arguments Details
Disambiguation Assistant —- This function: a) pulls a .txt file b) locates the body of the text (and stores the metadata) c) does some very light (reversible) text prep – including storing paragraph breaks d) pulls *Char.csv name alternate file e) creates a name.alt.df data frame, with columns of: i) name alternates ii) regular expressions to locate those name alternates iii) uniqnames for each name alternate (column 1) iv) sticky names for ngrams (i.e. replaces " " with "_") f) sorts the data frame by name alternate, from longest (most spaces) to shortest g) uses gsub() embedded in a for() loop to replace all the name alternates with sticky alternates h) glues the text file back together and saves it as a .txt file
1 2 | txtDisambig(filename = filename.v, local = FALSE,
write.report = TRUE, return.results = FALSE)
|
filename |
Character string of filename with associated .txt and Char.csv files. |
local |
Logical vector. If FALSE (default), looks in Google Drive for files. If TRUE, looks for filename in a folder with path data/filename/. |
write.report |
Logical vector. If TRUE (default), writes a -txtDisambig file with results. |
return.results |
Logical vector. Default is FALSE. If TRUE returns a list of length 2: [[1]] a vector of names associated with more than one uniqname, and [[2]] names not found in the associated text file. |
i) finds all duplicate name alternates, and returns a matrix collated with uniqnames
j) NOTE: a companion function removeBrackets removes all brackets and sticky spaces.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.