textstat_keyness: Calculate keyness statistics
In quanteda.textstats: Textual Statistics for the Quantitative Analysis of Textual Data

textstat_keyness

R Documentation

Calculate keyness statistics

Description

Calculate "keyness", a score for features that occur differentially across different categories. Here, the categories are defined by reference to a "target" document index in the dfm, with the reference group consisting of all other documents.

Usage

textstat_keyness(
  x,
  target = 1L,
  measure = c("chi2", "exact", "lr", "pmi"),
  sort = TRUE,
  correction = c("default", "yates", "williams", "none"),
  ...
)

Arguments

`x`	a dfm containing the features to be examined for keyness
`target`	the document index (numeric, character or logical) identifying the document forming the "target" for computing keyness; all other documents' feature frequencies will be combined for use as a reference
`measure`	(signed) association measure to be used for computing keyness. Currently available: `"chi2"`; `"exact"` (Fisher's exact test); `"lr"` for the likelihood ratio; `"pmi"` for pointwise mutual information. Note that the "exact" test is very computationally intensive and therefore much slower than the other methods.
`sort`	logical; if `TRUE` sort features scored in descending order of the measure, otherwise leave in original feature order
`correction`	if `"default"`, Yates correction is applied to `"chi2"`; William's correction is applied to `"lr"`; and no correction is applied for the `"exact"` and `"pmi"` measures. Specifying a value other than the default can be used to override the defaults, for instance to apply the Williams correction to the chi2 measure. Specifying a correction for the `"exact"` and `"pmi"` measures has no effect and produces a warning.
`...`	not used

Value

a data.frame of computed statistics and associated p-values, where the features scored name each row, and the number of occurrences for both the target and reference groups. For measure = "chi2" this is the chi-squared value, signed positively if the observed value in the target exceeds its expected value; for measure = "exact" this is the estimate of the odds ratio; for measure = "lr" this is the likelihood ratio G2 statistic; for "pmi" this is the pointwise mutual information statistics.

textstat_keyness returns a data.frame of features and their keyness scores and frequency counts.

References

Bondi, M. & Scott, M. (eds) (2010). Keyness in Texts. Amsterdam, Philadelphia: John Benjamins.

Stubbs, M. (2010). Three Concepts of Keywords. In Keyness in Texts, Bondi, M. & Scott, M. (eds): 1–42. Amsterdam, Philadelphia: John Benjamins.

Scott, M. & Tribble, C. (2006). Textual Patterns: Keyword and Corpus Analysis in Language Education. Amsterdam: Benjamins: 55.

Dunning, T. (1993). Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics, 19(1): 61–74.

Examples

library("quanteda")

# compare pre- v. post-war terms using grouping
period <- ifelse(docvars(data_corpus_inaugural, "Year") < 1945, "pre-war", "post-war")
dfmat1 <- tokens(data_corpus_inaugural) %>%
    dfm() %>%
    dfm_group(groups = period)
head(dfmat1) # make sure 'post-war' is in the first row
head(tstat1 <- textstat_keyness(dfmat1), 10)
tail(tstat1, 10)

# compare pre- v. post-war terms using logical vector
dfmat2 <- dfm(tokens(data_corpus_inaugural))
head(textstat_keyness(dfmat2, docvars(data_corpus_inaugural, "Year") >= 1945), 10)

# compare Trump 2017 to other post-war preseidents
dfmat3 <- dfm(tokens(corpus_subset(data_corpus_inaugural, period == "post-war")))
head(textstat_keyness(dfmat3, target = "2017-Trump"), 10)

# using the likelihood ratio method
head(textstat_keyness(dfm_smooth(dfmat3), measure = "lr", target = "2017-Trump"), 10)

quanteda.textstats documentation built on Sept. 11, 2024, 6:39 p.m.

quanteda.textstats index

Package overview README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

quanteda.textstats
Textual Statistics for the Quantitative Analysis of Textual Data

textstat_keyness: Calculate keyness statistics
In quanteda.textstats: Textual Statistics for the Quantitative Analysis of Textual Data

Calculate keyness statistics

Description

Usage

Arguments

Value

References

Examples

Related to textstat_keyness in quanteda.textstats...

R Package Documentation

Browse R Packages

We want your feedback!

quanteda.textstats Textual Statistics for the Quantitative Analysis of Textual Data

textstat_keyness: Calculate keyness statistics In quanteda.textstats: Textual Statistics for the Quantitative Analysis of Textual Data

Calculate keyness statistics

Description

Usage

Arguments

Value

References

Examples

Related to textstat_keyness in quanteda.textstats...

R Package Documentation

Browse R Packages

We want your feedback!

quanteda.textstats
Textual Statistics for the Quantitative Analysis of Textual Data

textstat_keyness: Calculate keyness statistics
In quanteda.textstats: Textual Statistics for the Quantitative Analysis of Textual Data