collex_fye: Iterative _Fisher-Yates Exact_ test for...

Description Usage Arguments Value Examples

View source: R/corplingr_collex_fye.R

Description

This is a vectorised wrapper for the dhyper function in the stats package. The implementation of the code is adapted from Gries (2012). collex_fye also provides a logical argument (i.e., two_sided) whose value is passed to the alternative argument of the embedded fisher.test if two_sided is TRUE.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
collex_fye(
  a = "frequency of co-occurrence of the collocate and the node",
  a_exp = "expected frequency",
  n_w_in_corp = "total frequency of collexemes/collocates in the whole corpus",
  corpus_size = "total size of the corpus",
  n_pattern = "total frequency of the construction/node word in the whole corpus",
  two_sided = FALSE,
  collstr_res = TRUE,
  float = 3
)

Arguments

a

cell a in a 2-by-2 crosstabulation matrix (viz. representing the co-occurrence tokens of the levels of the variables. For instance, word-word co-occurrences, or word-construction co-occurrences).

a_exp

expected frequency for cell a in the 2-by-2 crosstabulation matrix

n_w_in_corp

the total frequency of the collexemes/collocates of the target construction/node word in the corpus.

corpus_size

the total size (in word tokens) of the corpus.

n_pattern

the total frequency of occurrence of the target construction/node word in the corpus.

two_sided

logical; whether to perform one-sided test (FALSE – Default) or two-sided (TRUE).

collstr_res

logical; whether output the FYE p-value as the Collostruction Strength value (TRUE – the default) or just report the p-value (FALSE).

float

the floating digits of the Collostruction/Collocation Strength. The default value is 3.

Value

Numeric vector of the same length as a interpreted as the Collostruction Strength of the construction/node word with the collexemes/collocates. Collostruction Strength is (i) the negative logarithm to the base of ten of the Fisher-Yates Exact test p-value when a > a_exp, and (ii) the positive logarithm when a <= a_exp.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
## Not run: 
# do the collocate search using "corpus_path" input-option
library(tidyverse)
df <- colloc_default(corpus_path = orti_bali_path,
                     pattern = "^nuju$",
                     window = "b", # focusing on both left and right context window
                     span = 3) # retrieve 3 collocates to the left and right of the node
# prepare the collexeme analysis input tibble
# and select to focus on R1 and R2 collocates.
collex_tb <- collex_prepare(df, span = c("r1", "r2"))

# run the Fisher-Yates Exact (FYE) Test in vectorised fashion with the help of purrr's pmap
# the example below runs the one-tailed FYE and output the p-value in log10 of CollStr value
collex_tb <- mutate(collex_tb,
                    collstr = purrr::pmap_dbl(list(a, a_exp, n_w_in_corp, corpus_size, n_pattern),
                                   collex_fye, two_sided = FALSE, collstr_res = TRUE))
# preview the results
collex_tb

## End(Not run)

gederajeg/corplingr documentation built on Dec. 20, 2021, 9:50 a.m.