Description Usage Arguments Value
View source: R/deduplication_functions.R
Removes documents from a data frame that are highly similar to other documents in the same data frame.
1 | remove_similar(data, distance_data, id_column, distance_column, cutoff)
|
data |
the data frame containing all documents |
distance_data |
a data frame with document identification and distance information |
id_column |
the name or index of the column in the distance dataset that contains document IDs |
distance_column |
the name or index of the column in the distance dataset that contains distance scores |
cutoff |
the maximum distance at which documents should be considered duplicates |
the documents data frame with duplicate documents removed
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.