corplingr: Tidy Concordances, Collocates, and Wordlist

Description Usage Arguments Value Examples

View source: R/corplingr_collex_prepare_leipzig.R

This function is designed to handle the output of colloc_leipzig to generate a tidy data frame required as input of collex_fye. The latter is used to compute collexeme/collocate strength using one-tailed Fisher-Yates Exact test.

collex_prepare_leipzig(
  list_output = NULL,
  leipzig_wordlist_path = leipzig_mywordlist_path,
  node_pattern = "regex for the node word",
  span = NULL,
  stopwords_list = NULL
)

`list_output`	The list output of `colloc_leipzig`.
`leipzig_wordlist_path`	Full path to the wordlist table for each Leipzig Corpus File. This can be a plain text file or an .RData file.
`node_pattern`	Regex patterns of the node word specified in `colloc_leipzig`.
`span`	Character vector of the context-window span user wants to focus on for the collexeme/collocate analysis. For instance, single span: `"l1"`, `"r1"`; or multiple spans: `c("r1", "r2")`.
`stopwords_list`	A character vector of the stopword list.

A tibble data frame

## Not run: 
# retrieve collocates for a given word
rgx <- "\\bmengakhir\\b"
coll_df <- colloc_leipzig(leipzig_path = leipzig_corpus_path,
                          pattern = rgx,
                          window = "r",
                          span = 4,
                          save_results = FALSE,
                          to_lower_colloc = TRUE)

# get only the collocates output
list_output <- coll_df$collocates

# collstr analysis for collocates from Leipzig Corpora
### prepare input table for coll.analysis ### <--- HERE IS THE CALL FOR collex_prepare_leipzig()
collex_tb <- collex_prepare_leipzig(list_output = coll_df,
                                   leipzig_wordlist_path = leipzig_mywordlist_path,
                                   node_pattern = rgx,
                                   span = c("r1"),
                                   stopwords_list = NULL)
# remove any NA row data
collex_tb <- dplyr::filter_all(collex_tb,
                              dplyr::all_vars(!is.na(.)))

# compute one-tailed FYE for collexeme analysis
collex_tb <- dplyr::mutate(collex_tb,
                          collstr = collex_fye(a = .data$a, # here is the call to collex_fye
                                               n_corpus = .data$corpus_size,
                                               n_coll = .data$n_w_in_corp,
                                               n_cxn = .data$n_pattern))

# sort in decreasing order by collostruction strength
dplyr::arrange(collex_tb, dplyr::desc(collstr))

## End(Not run)