Calculates statistically significant difference in co-occurrence counts.
The desired level at which to control the False Discovery Rate.
Default value is
This function implements the method described in Hennessey and Wiegand (2017).
A and B are
data.frames of the form
1 2 3 4 5
data.frames encapsulate the co-occurrence counts for the
(x, y) term pairs within a corpus. For a description of the
columns see the details section of the
The nodes essentially act as a filter on the A$x and B$x columns. For a description of the use of nodes see Hennessey and Wiegand (2017).
fdr indicates the level at which the False Discovery Rate will be
controlled. For a description of the form of FDR used see
Benjamini and Hochberg (1995). For a description of the use of FDR in
this context see Hennessey and Wiegand (2017). For description of the
p_adjusted column in the returned structure see
The returned data structure is a
data.table is also a
data.frame and will behave exactly
as such if the
data.table library is not loaded.
data.table contains details of all the
co-occurrences for which there is evidence of a difference in
co-occurrence between the two supplied data sets.
The effect size is calculated as the log base 2 of the odds ratio.
The effects size and its confidence interval are captured in the
effect_size, CI_lower and CI_upper columns.
The p_value column contains the non-adjusted p-value from the
Fisher's Exact Test.
For more details see Hennessey and Wiegand (2017).
For an example of usage see the ‘Proof of Concept’ vignette.
data.table of the form
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Classes ‘data.table’ and 'data.frame': 11 variables: $ x : chr $ y : chr $ H_A : int $ M_A : int $ H_B : int $ M_B : int $ effect_size : num $ CI_lower : num $ CI_upper : num $ p_value : num $ p_adjusted : num - attr(*, "sorted")= chr "x" "y" - attr(*, ".internal.selfref")=<externalptr> - attr(*, "coco_metadata")=List of 4 ..$ nodes : chr ..$ fdr : num ..$ PACKAGE_VERSION:Classes 'package_version', 'numeric_version' .. ..$ : int ..$ date : Date, format: "2016-11-01"
Y. Benjamini and Y. Hochberg (1995) Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological) 57 (1)289–300.
A. Hennessey and V. Wiegand and C. R. Tench and M. Mahlberg (2017) Comparing co-occurrences between corpora. In preparation.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.