Description Usage Arguments Details Value Examples
Creates a List of different types of Duplicates in a textmeta-object.
1 2 3 4 5 6 7 8 9 |
object |
A textmeta-object. |
paragraph |
Logical: Should be set to |
x |
An R Object. |
... |
Further arguments for print and summary. Not implemented. |
This function helps to identify different types of Duplicates and gives the ability to exclude these for further Analysis (e.g. LDA).
Named List:
uniqueTexts |
Character vector of IDs so that each text occurs once - if a text occurs twice or more often in the corpus, the ID of the first text regarding the list-order is returned |
notDuplicatedTexts |
Character vector of IDs of texts which are represented only once in the whole corpus |
idFakeDups |
List of character vectors: IDs of texts which originally has the same ID but belongs to different texts grouped by their original ID |
idRealDups |
List of character vectors: IDs of texts which originally has the same ID and text but different meta information grouped by their original ID |
allTextDups |
List of character vectors: IDs of texts which occur twice or more often grouped by text equality |
textMetaDups |
List of character vectors: IDs of texts which occur twice or more often and have the same meta information grouped by text and meta equality |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | texts <- list(A="Give a Man a Fish, and You Feed Him for a Day.
Teach a Man To Fish, and You Feed Him for a Lifetime",
A="A fake duplicate",
B="So Long, and Thanks for All the Fish",
B="So Long, and Thanks for All the Fish",
C="A very able manipulative mathematician, Fisher enjoys a real mastery
in evaluating complicated multiple integrals.",
C="A very able manipulative mathematician, Fisher enjoys a real mastery
in evaluating complicated multiple integrals.")
corpus <- textmeta(meta=data.frame(id=c("A", "A", "B", "B", "C", "C"),
title=c("Fishing", "Fake duplicate", "Don't panic!", "towel day", "Sir Ronald", "Sir Ronald"),
date=c("1885-01-02", "1885-01-03", "1979-03-04", "1979-03-05", "1951-05-06", "1951-05-06"),
stringsAsFactors=FALSE), text=texts)
duplicates <- deleteAndRenameDuplicates(object=corpus)
duplist(object=duplicates, paragraph = FALSE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.