jaccard_coef: Jaccard coefficient

Description Usage Arguments Details Value Methods (by class) References

View source: R/text_processing.R

Description

Calculate the Jaccard (similarity) coefficient between words.

Usage

1
2
3
4
5
6
7
8
9
jaccard_coef(x, ...)

## S3 method for class 'list'
jaccard_coef(x, max.size = 1000, dist = FALSE, ...)

## S3 method for class 'character'
jaccard_coef(x, max.size = 1000,
  stopwds = unique(c(tm::stopwords(), letters)), ignore.case = TRUE,
  dist = FALSE, ...)

Arguments

x

Character vector with the phrases (tweets) to be analyzed.

...

Further arguments to be passed to the method.

max.size

Max number of words to analyze.

dist

When true computes one minus Jaccard coef.

stopwds

Character vector of stopwords.

ignore.case

When true converts all to lower.

Details

The Jaccard index is used as a measure of similarity between two elements. In particular for a given pair of elements x,y it is calculated as

J(S,T) = |S intersection T|/|S U T|

Where S is the set of groups where x is present and T is the set of groups where y. The resulting value is defined between 0 and 1, where 0 corresponds to no similarity at all (the elements don't have a group in common) and 1 represents perfect similarity (both elements are present in the same groups).

Value

A list including a lower triangular dgCMatrix matrix.

Methods (by class)

References

Conover, M., Ratkiewicz, J., & Francisco, M. (2011). "Political polarization on twitter". Icwsm, 133(26), 89<e2><80><93>96. http://doi.org/10.1021/ja202932e


gvegayon/twitterreport documentation built on May 17, 2019, 9:30 a.m.