Description Usage Arguments Details Value Methods (by class) References
View source: R/text_processing.R
Calculate the Jaccard (similarity) coefficient between words.
1 2 3 4 5 6 7 8 9 | jaccard_coef(x, ...)
## S3 method for class 'list'
jaccard_coef(x, max.size = 1000, dist = FALSE, ...)
## S3 method for class 'character'
jaccard_coef(x, max.size = 1000,
stopwds = unique(c(tm::stopwords(), letters)), ignore.case = TRUE,
dist = FALSE, ...)
|
x |
Character vector with the phrases (tweets) to be analyzed. |
... |
Further arguments to be passed to the method. |
max.size |
Max number of words to analyze. |
dist |
When true computes one minus Jaccard coef. |
stopwds |
Character vector of stopwords. |
ignore.case |
When true converts all to lower. |
The Jaccard index is used as a measure of similarity between two elements. In particular for a given pair of elements x,y it is calculated as
J(S,T) = |S intersection T|/|S U T|
Where S is the set of groups where x is present and T is the set of groups where y. The resulting value is defined between 0 and 1, where 0 corresponds to no similarity at all (the elements don't have a group in common) and 1 represents perfect similarity (both elements are present in the same groups).
A list including a lower triangular dgCMatrix
matrix.
list
: Method Processes a list of character vectors such as
the one obtained from tw_extract()
character
: Computes the coef from a vector of characters
(splits the text)
Conover, M., Ratkiewicz, J., & Francisco, M. (2011). "Political polarization on twitter". Icwsm, 133(26), 89<e2><80><93>96. http://doi.org/10.1021/ja202932e
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.