Description Usage Arguments Value
Compare the documents in corpus dtm.x with reference corpus dtm.y.
1 2 3 | documents.compare(dtm.x, dtm.y = NULL, measure = "cosine",
min.similarity = 0.1, n.topsim = NULL, only.from = NULL,
return.zeros = F)
|
dtm.x |
the main document-term matrix |
dtm.y |
the 'reference' document-term matrix. If NULL, documents of dtm.x are compared to each ohter |
measure |
the measure that should be used to calculate similarity/distance/adjacency. Currently only cosine is supported |
min.similarity |
a threshold for similarity. lower values are deleted. Set to 0.1 by default. |
n.topsim |
An alternative or additional sort of threshold for similarity. Only keep the [n.topsim] highest similarity scores for x. Can return more than [n.topsim] similarity scores in the case of duplicate similarities. |
only.from |
A vector of ids that match the documents (rownames) in dtm. Use to compare only these documents to other documents. |
return.zeros |
If true, all comparison results are returned, including those with zero similarity (quite possibly the worst thing to do with large data) |
A data frame with sets of documents and their similarities.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.