colloc_leipzig: Generate tidyverse-style window-span collocates for the...
In gederajeg/corplingr: Tidy Concordances, Collocates, and Wordlist

Description Usage Arguments Value Examples

View source: R/corplingr_colloc_leipzig.R

The function produces tibble-output collocates for Leipzig Corpora files.

colloc_leipzig(
  leipzig_path = NULL,
  leipzig_corpus_list = NULL,
  pattern = NULL,
  window = "b",
  span = 2,
  case_insensitive = TRUE,
  to_lower_colloc = TRUE,
  save_results = FALSE,
  coll_output_name = "colloc_tidy_colloc_out.txt",
  sent_output_name = "colloc_tidy_sent_out.txt"
)

`leipzig_path`	character strings of (i) file names of the Leipzig corpus if they are in the working directory, or (ii) the complete file path to each of the Leipzig corpus files.
`leipzig_corpus_list`	specify this argument if each Leipzig corpus file has been loaded as R object and acts as an element of a list. Example of this type of data-input can be seen in `data("demo_corpus_leipzig")`. So specify either `leipzig_path` OR `leipzig_corpus_list` and set one of them to `NULL`.
`pattern`	regular expressions/exact patterns for the target pattern.
`window`	window-span direction of the collocates: `"r"` ('right of the node'), `"l"` ('left of the node'), or the DEFAULT is `"b"` ('both left and right context-window').
`span`	integer vector indicating the span of the collocate scope.
`case_insensitive`	whether the search pattern ignores case (TRUE – the default) or not (FALSE).
`to_lower_colloc`	whether to lowercase the retrieved collocates and the nodes (TRUE – default) or not (FALSE).
`save_results`	whether to output the collocates into a tab-separated plain text (TRUE) or not (FALSE – default).
`coll_output_name`	name of the file for the collocate tables.
`sent_output_name`	name of the file for the full sentence match containing the collocates.

a list of two tibbles: (i) for collocates with sentence number of the match, window span information, and the corpus files, and (ii) full-sentences per match with sentence number and corpus file

## Not run: 
# get the corpus filepaths
# so this example use the filepath input rather than list of corpus
leipzig_corpus_path <- c("my/path/to/leipzig_corpus_file_1M-sent_1.txt",
                       "my/path/to/leipzig_corpus_file_300K-sent_2.txt",
                       "my/path/to/leipzig_corpus_file_300K-sent_3.txt")

# run the function
colloc <- colloc_leipzig(leipzig_path = leipzig_corpus_path[2:3],
                              pattern = "\\bterelakkan\\b",
                              window = "b",
                              span = 3,
                              save_results = FALSE,
                              to_lower_colloc = TRUE)
# Inspect outputs
## This one outputs the collocates tibble
colloc$collocates

## This one outputs the sentence matches tibble
colloc$sentence_matches

## End(Not run)