corp_coco: Co-occurrence comparison
In CorporaCoCo: Corpora Co-Occurrence Comparison

View source: R/corp_coco.R

corp_coco

R Documentation

Co-occurrence comparison

Description

Calculates statistically significant difference in co-occurrence counts.

Usage

  corp_coco(A, B, nodes, collocates = NULL, fdr = 0.01)

  # Deprecated
  coco(A, B, nodes, fdr = 0.01, collocates = NULL)

Arguments

`A`	A `corp_cooccurrence` object. For the deprecated `coco` function this is a `data.frame` of co-occurrence counts as returned by `corp_get_counts`.
`B`	A `corp_cooccurrence` object. For the deprecated `coco` function this is a `data.frame` of co-occurrence counts as returned by `corp_get_counts`.
`nodes`	A `character vector` of node types or `character string` representing a single node type.
`collocates`	A `character vector` of collocates types or `character string` representing a single collocate type. The `collocates` essentially act as a filter on the `y` column of the returned data structure. `collocates` should be used to target the testing; reducing the number of tests will reduce the loss of power from the multiple test correction.
`fdr`	The desired level at which to control the False Discovery Rate. Default value is `0.01`.

Details

The corp_coco function implements the method introduced in Wiegand and Hennessey et al. (2017a) (described in more detail from a linguistic perspective in Wiegand, 2019).

fdr indicates the level at which the False Discovery Rate will be controlled because the method carries out a large number of tests. For a description of the form of FDR used see Benjamini and Hochberg (1995). For description of the p_adjusted column in the returned structure see p.adjust.

The returned data structure is a data.table. A data.table is also a data.frame and will behave exactly as such if the data.table library is not loaded.

The returned data.table contains details of all the co-occurrences for which there is evidence of a difference in co-occurrence between the two supplied data sets. The effect size is calculated as the log base 2 of the odds ratio. The effects size and its confidence interval are captured in the effect_size, CI_lower and CI_upper columns. The p_value column contains the non-adjusted p-value from the Fisher's Exact Test.

Value

A data.table of the form

    Classes ‘data.table’ and 'data.frame': 11 variables:
     $ x           : chr
     $ y           : chr
     $ H_A         : int
     $ M_A         : int
     $ H_B         : int
     $ M_B         : int
     $ effect_size : num
     $ CI_lower    : num
     $ CI_upper    : num
     $ p_value     : num
     $ p_adjusted  : num
     - attr(*, "sorted")= chr  "x" "y"
     - attr(*, ".internal.selfref")=<externalptr> 
     - attr(*, "coco_metadata")=List of 5
      ..$ nodes      : chr
      ..$ collocates : chr
      ..$ fdr        : num
      ..$ PACKAGE_VERSION:Classes 'package_version', 'numeric_version'
      .. ..$ : int
      ..$ date  : Date, format: "2016-11-01"

References

Y. Benjamini and Y. Hochberg (1995) Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological) 57 (1)289–300.

* Wiegand, V., Hennessey, A., Tench, C. R., & Mahlberg, M. (2017a, May 24). Comparing co-occurrences between corpora. 38th ICAME conference, Charles University, Prague. * Wiegand, V. (2019). A Corpus Linguistic Approach to Meaning-Making Patterns in Surveillance Discourse [PhD, University of Birmingham]. https://etheses.bham.ac.uk/id/eprint/9778

CorporaCoCo documentation built on Aug. 8, 2022, 5:09 p.m.