specificTerms: List terms specific of a document or level
In RcmdrPlugin.temis: Graphical Integrated Text Mining Solution

specificTerms

R Documentation

List terms specific of a document or level

Description

List terms most associated (positively or negatively) with each document or each of a variable's levels.

Usage

specificTerms(dtm, variable, p = 0.1, n.max = 25, sparsity = 0.95, min.occ = 2)

Arguments

`dtm`	a document-term matrix.
`variable`	a vector whose length is the number of rows of `dtm`, or `NULL` to report specific terms by document.
`p`	the maximum probability up to which terms should be reported.
`n.max`	the maximum number of terms to report for each level.
`sparsity`	Optional sparsity threshold (between 0 and 1) below which terms should be skipped. See `removeSparseTerms` from tm.
`min.occ`	the minimum number of occurrences in the whole `dtm` below which terms should be skipped.

Details

Specific terms reported here are those whose observed frequency in the document or level has the lowest probability under an hypergeometric distribution, based on their global frequencies in the corpus and on the number of occurrences of all terms in the document or variable level considered. The positive or negative character of the association is visible from the sign of the t value, or by comparing the value of the “% Term/Level” column with that of the “Global %” column.

All terms with a probability below p are reported, up to n.max terms for each category.

Value

A list of matrices, one for each level of the variable, with seven columns:

`\dQuote{% Term/Level}`	the percent of the term's occurrences in all terms occurrences in the level.
`\dQuote{% Level/Term}`	the percent of the term's occurrences that appear in the level (rather than in other levels).
`\dQuote{Global %}`	the percent of the term's occurrences in all terms occurrences in the corpus.
`\dQuote{Level}`	the number of occurrences of the term in the level (“internal”).
`\dQuote{Global}`	the number of occurrences of the term in the corpus.
`\dQuote{t value}`	the quantile of a normal distribution corresponding the probability “Prob.”.
`\dQuote{Prob.}`	the probability of observing such an extreme (high or low) number of occurrences of the term in the level, under an hypergeometric distribution.

Author(s)

Milan Bouchet-Valat

RcmdrPlugin.temis
Graphical Integrated Text Mining Solution

specificTerms: List terms specific of a document or level
In RcmdrPlugin.temis: Graphical Integrated Text Mining Solution

List terms specific of a document or level

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Related to specificTerms in RcmdrPlugin.temis...

R Package Documentation

Browse R Packages

We want your feedback!

RcmdrPlugin.temis Graphical Integrated Text Mining Solution

specificTerms: List terms specific of a document or level In RcmdrPlugin.temis: Graphical Integrated Text Mining Solution

List terms specific of a document or level

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Related to specificTerms in RcmdrPlugin.temis...

R Package Documentation

Browse R Packages

We want your feedback!

RcmdrPlugin.temis
Graphical Integrated Text Mining Solution

specificTerms: List terms specific of a document or level
In RcmdrPlugin.temis: Graphical Integrated Text Mining Solution