omit_duplicates: omit_duplicates
In mariolaespinosa/historicalnetworks: Mapping Historical Citation Networks

Description Usage Arguments Details Value

View source: R/omit_duplicates.R

omit_duplicates

1	omit_duplicates(df, exact = FALSE)

`corpus`	A dataframe representing a corpus of downloaded texts generated by `build_corpus`
`strict`	Should works be considered duplicates only if they share both the same author's last name and the same city (along with matching title, publication date, and volume number)?

Because the Internet Archive's collection of texts includes many works more than once, the output created by 'build_corpus' will likely contain duplicates. 'omit_duplicates' takes a fairly conservative approach to filtering out these duplicates. By default, the function considers works to be duplicates if the first ten words of the title are identical and they have the same publication date, volume number, and either the same author's last name, or the same city of publication (formatting issues are particularly common for these two pieces of metadata). Setting the 'exact' argument to 'TRUE' will only consider works to be duplicates if they share both the same author's last name and the same city of publication.

A dataframe

mariolaespinosa/historicalnetworks documentation built on Feb. 9, 2022, 12:31 p.m.