colloc_leipzig: Generate window-span collocates for the Leipzig Corpora
In gederajeg/collogetr: Collocates Retriever and Collocation Association Measure for the Indonesian Leipzig Corpora

Description Usage Arguments Value Examples

The function produces tibble-output collocates for the Leipzig corpus files.

colloc_leipzig(
  leipzig_path = NULL,
  leipzig_corpus_list = NULL,
  pattern = NULL,
  case_insensitive = TRUE,
  window = "b",
  span = 2,
  split_corpus_pattern = "([^a-zA-Z-¬]+|--)",
  to_lower_colloc = TRUE,
  save_interim = FALSE,
  freqlist_output_file = "collogetr_out_1_freqlist.txt",
  colloc_output_file = "collogetr_out_2_collocates.txt",
  corpussize_output_file = "collogetr_out_3_corpus_size.txt",
  search_pattern_output_file = "collogetr_out_4_search_pattern.txt"
)

`leipzig_path`	Character strings of (i) file names of the Leipzig corpus if they are in the working directory, or (ii) the complete file path to each of the Leipzig corpus files.
`leipzig_corpus_list`	Specify this argument if each Leipzig corpus file has been loaded as R object and acts as an element of a named list. Example of this type of data-input can be seen in `data("demo_corpus_leipzig")`. So specify either `leipzig_path` OR `leipzig_corpus_list` and set one of them to `NULL`.
`pattern`	Character vector input containing a set of exact word forms.
`case_insensitive`	Logical; whether the search for the `pattern` ignores case (`TRUE` – default) or not (`FALSE`).
`window`	Character; window-span direction of the collocates: `"r"` ('right of the node'), `"l"` ('left of the node'), or the default is `"b"` ('both left and right context-window').
`span`	A numeric vector indicating the span of the collocate scope. The default is `2` words around the node word.
`split_corpus_pattern`	Regular expressions used to tokenise the corpus into word-vector. The default regex is `"([^a-zA-Z-\u00AC]+\|--)"`. The character `"\u00AC"` is a hexademical version of `"¬"`, which may occur in the Leipzig Corpora as separator between root and suffixes of a word, in addition to hypen. This procedure supports the vectorised method of the function to generate the collocate of the search pattern.
`to_lower_colloc`	Logical; whether to lowercase the retrieved collocates and the nodes (`TRUE` – default) or not (`FALSE`).
`save_interim`	Logical; whether to save interim results into plain text files or not (`FALSE` – default).
`freqlist_output_file`	Character strings for the name of the file for the word frequency in a corpus.
`colloc_output_file`	Character strings for the name of the file for the raw collocate table.
`corpussize_output_file`	Character strings for the name of the file for the total word-size of a corpus.
`search_pattern_output_file`	Character strings for the name of the file for the search_pattern.

List of raw collocate items, frequency list of all words in the loaded corpus files, the total word tokens in each loaded corpus, and the search pattern.

collout <- colloc_leipzig(leipzig_corpus_list = demo_corpus_leipzig,
                     pattern = "mengatakan",
                     window = "r",
                     span = 3,
                     save_interim = FALSE)
# collout <- colloc_leipzig(leipzig_corpus_path = c('path_to_corpus1.txt',
#                                                     'path_to_corpus2.txt'),
#                             pattern = "mengatakan",
#                             window = "r",
#                             span = 3,
#                             save_interim = TRUE # save interim output file
#                             # you need to specify path in the argument
#                             # with \code{...output_file}
#                             )

gederajeg/collogetr documentation built on April 16, 2020, 11:58 a.m.

gederajeg/collogetr index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

gederajeg/collogetr
Collocates Retriever and Collocation Association Measure for the Indonesian Leipzig Corpora

colloc_leipzig: Generate window-span collocates for the Leipzig Corpora
In gederajeg/collogetr: Collocates Retriever and Collocation Association Measure for the Indonesian Leipzig Corpora

Description

Usage

Arguments

Value

Examples

Related to colloc_leipzig in gederajeg/collogetr...

R Package Documentation

Browse R Packages

We want your feedback!

gederajeg/collogetr Collocates Retriever and Collocation Association Measure for the Indonesian Leipzig Corpora

colloc_leipzig: Generate window-span collocates for the Leipzig Corpora In gederajeg/collogetr: Collocates Retriever and Collocation Association Measure for the Indonesian Leipzig Corpora

Description

Usage

Arguments

Value

Examples

Related to colloc_leipzig in gederajeg/collogetr...

R Package Documentation

Browse R Packages

We want your feedback!

gederajeg/collogetr
Collocates Retriever and Collocation Association Measure for the Indonesian Leipzig Corpora

colloc_leipzig: Generate window-span collocates for the Leipzig Corpora
In gederajeg/collogetr: Collocates Retriever and Collocation Association Measure for the Indonesian Leipzig Corpora