Description Usage Arguments Value
Compare all documents within a document term matrix that are dated (e.g., pubished) within a given number of days (window.size) from each other.
1 2 3 4 5 | documents.window.compare(dtm, document.date, window.size = 3,
time.unit = "days", window.direction = "<=>", measure = "cosine",
min.similarity = NULL, n.topsim = NULL, only.from = NULL,
return.date = F, return.datedif = T, return.zeros = F,
only.complete.window = F)
|
dtm |
a document-term matrix in the tm format |
document.date |
a vector of date class, of the same length and order as the documents (rows) of the dtm. |
window.size |
the timeframe in days within which articles must occur in order to be compared. e.g., if 0, articles are only compared to articles of the same day. If 1, articles are compared to all articles of the previous, same or next day. |
time.unit |
a string indicating what time unit to use. Can be 'mins','hours','days','months' or 'years'. |
window.direction |
For a more specific selection of which articles in the window to compare to. This is given with a combination of the symbols '<' (before x) '=' (simultanous with x) and '>' (after x). default is '<=>', which means all articles. '<>' means all articles before or after the [time.unit] of an article itself. '<' means all previous articles, and '<=' means all previous and simultaneous articles. etc. |
measure |
the measure that should be used to calculate similarity/distance/adjacency. Currently only cosine is supported |
min.similarity |
a threshold for similarity. lower values are deleted |
n.topsim |
An alternative or additional sort of threshold for similarity. Only keep the [n.topsim] highest similarities for x. |
only.from |
A vector of ids that match the documents (rownames) in dtm. Use to compare only these documents to other documents. |
return.date |
If true, the dates for x and y are given in the output |
return.zeros |
If true, all comparison results are returned, including those with zero similarity (quite possibly the worst thing to do with large data) |
only.complete.window |
if True, only compare articles (x) of which a full window of reference articles (y) is available. Thus, for the first and last [window.size] days, there will be no results for x. |
get.overlap.terms |
Add the overlapping terms of documents to the output. |
A data frame with columns x, y and similarity. If return.date == T, date.x and date.y are returned as well.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.