corp_coco: Co-occurrence comparison

View source: R/corp_coco.R

corp_cocoR Documentation

Co-occurrence comparison

Description

Calculates statistically significant difference in co-occurrence counts.

Usage

  corp_coco(A, B, nodes, collocates = NULL, fdr = 0.01)

  # Deprecated
  coco(A, B, nodes, fdr = 0.01, collocates = NULL)

Arguments

A

A corp_cooccurrence object. For the deprecated coco function this is a data.frame of co-occurrence counts as returned by corp_get_counts.

B

A corp_cooccurrence object. For the deprecated coco function this is a data.frame of co-occurrence counts as returned by corp_get_counts.

nodes

A character vector of node types or character string representing a single node type.

collocates

A character vector of collocates types or character string representing a single collocate type. The collocates essentially act as a filter on the y column of the returned data structure. collocates should be used to target the testing; reducing the number of tests will reduce the loss of power from the multiple test correction.

fdr

The desired level at which to control the False Discovery Rate. Default value is 0.01.

Details

The corp_coco function implements the method introduced in Wiegand and Hennessey et al. (2017a) (described in more detail from a linguistic perspective in Wiegand, 2019).

fdr indicates the level at which the False Discovery Rate will be controlled because the method carries out a large number of tests. For a description of the form of FDR used see Benjamini and Hochberg (1995). For description of the p_adjusted column in the returned structure see p.adjust.

The returned data structure is a data.table. A data.table is also a data.frame and will behave exactly as such if the data.table library is not loaded.

The returned data.table contains details of all the co-occurrences for which there is evidence of a difference in co-occurrence between the two supplied data sets. The effect size is calculated as the log base 2 of the odds ratio. The effects size and its confidence interval are captured in the effect_size, CI_lower and CI_upper columns. The p_value column contains the non-adjusted p-value from the Fisher's Exact Test.

Value

A data.table of the form

    Classes ‘data.table’ and 'data.frame': 11 variables:
     $ x           : chr
     $ y           : chr
     $ H_A         : int
     $ M_A         : int
     $ H_B         : int
     $ M_B         : int
     $ effect_size : num
     $ CI_lower    : num
     $ CI_upper    : num
     $ p_value     : num
     $ p_adjusted  : num
     - attr(*, "sorted")= chr  "x" "y"
     - attr(*, ".internal.selfref")=<externalptr> 
     - attr(*, "coco_metadata")=List of 5
      ..$ nodes      : chr
      ..$ collocates : chr
      ..$ fdr        : num
      ..$ PACKAGE_VERSION:Classes 'package_version', 'numeric_version'
      .. ..$ : int
      ..$ date  : Date, format: "2016-11-01"

References

Y. Benjamini and Y. Hochberg (1995) Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological) 57 (1)289–300.

* Wiegand, V., Hennessey, A., Tench, C. R., & Mahlberg, M. (2017a, May 24). Comparing co-occurrences between corpora. 38th ICAME conference, Charles University, Prague. * Wiegand, V. (2019). A Corpus Linguistic Approach to Meaning-Making Patterns in Surveillance Discourse [PhD, University of Birmingham]. https://etheses.bham.ac.uk/id/eprint/9778


CorporaCoCo documentation built on Aug. 8, 2022, 5:09 p.m.