Description Usage Arguments Value Examples
View source: R/colloc_leipzig.R
The function produces tibble-output collocates for the Leipzig corpus files.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | colloc_leipzig(
leipzig_path = NULL,
leipzig_corpus_list = NULL,
pattern = NULL,
case_insensitive = TRUE,
window = "b",
span = 2,
split_corpus_pattern = "([^a-zA-Z-¬]+|--)",
to_lower_colloc = TRUE,
save_interim = FALSE,
freqlist_output_file = "collogetr_out_1_freqlist.txt",
colloc_output_file = "collogetr_out_2_collocates.txt",
corpussize_output_file = "collogetr_out_3_corpus_size.txt",
search_pattern_output_file = "collogetr_out_4_search_pattern.txt"
)
|
leipzig_path |
Character strings of (i) file names of the Leipzig corpus if they are in the working directory, or (ii) the complete file path to each of the Leipzig corpus files. |
leipzig_corpus_list |
Specify this argument if each Leipzig corpus file has been loaded as R object and acts as an element of a named list.
Example of this type of data-input can be seen in |
pattern |
Character vector input containing a set of exact word forms. |
case_insensitive |
Logical; whether the search for the |
window |
Character; window-span direction of the collocates: |
span |
A numeric vector indicating the span of the collocate scope. The default is |
split_corpus_pattern |
Regular expressions used to tokenise the corpus into word-vector.
The default regex is |
to_lower_colloc |
Logical; whether to lowercase the retrieved collocates and the nodes ( |
save_interim |
Logical; whether to save interim results into plain text files or not ( |
freqlist_output_file |
Character strings for the name of the file for the word frequency in a corpus. |
colloc_output_file |
Character strings for the name of the file for the raw collocate table. |
corpussize_output_file |
Character strings for the name of the file for the total word-size of a corpus. |
search_pattern_output_file |
Character strings for the name of the file for the search_pattern. |
List of raw collocate items, frequency list of all words in the loaded corpus files, the total word tokens in each loaded corpus, and the search pattern.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | collout <- colloc_leipzig(leipzig_corpus_list = demo_corpus_leipzig,
pattern = "mengatakan",
window = "r",
span = 3,
save_interim = FALSE)
# collout <- colloc_leipzig(leipzig_corpus_path = c('path_to_corpus1.txt',
# 'path_to_corpus2.txt'),
# pattern = "mengatakan",
# window = "r",
# span = 3,
# save_interim = TRUE # save interim output file
# # you need to specify path in the argument
# # with \code{...output_file}
# )
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.