Description Usage Arguments Value Examples
View source: R/corplingr_colloc_default.R
This function retrieve collocates for a word within the user-defined context window based on raw/unannotated corpus texts.
The function use vectorisation approach to determine the vector-position of the collocates in relation to the vector-position of the node-word in the corpus word-vector.
There is the argument of tokenise_corpus_to_sentence
(cf. below) that allows user to first split the input, raw corpus into character vector whose elements correspond to a sentence line.
1 2 3 4 5 6 7 8 9 10 11 |
corpus_path |
character strings of (full) filepath for the corpus text files in |
corpus_list |
a named list object containing elements constituting a corpus text. The name of each element should correspond to the corpus file. There can be more than one element (hence more than one corpus text) within this list object. |
pattern |
regular expressions/exact patterns for the target pattern. |
window |
window-span direction of the collocates: |
span |
integer vector indicating the span of the collocate scope. |
word_split_regex |
user-defined regular expressions to tokenise the corpus.
The default is to split at non alphabetic characters but retain hypen "-" as to maintain reduplication, for instance.
The regex for this default setting is |
case_insensitive |
whether the search pattern ignores case (TRUE – the default) or not (FALSE). |
to_lower_colloc |
whether to lowercase the retrieved collocates (TRUE – default) or not (FALSE). |
tokenise_corpus_to_sentence |
whether to tokenise the input corpus by sentence so that the script can handle the collocates for not crossing sentence boundary.
The default is |
A list of three elements:
A tibble of all words in the corpus including the sentence number;
A tibble of all retrieved collocates, including their span position and sentence number;
Regular expression object of the search pattern.
1 2 3 4 5 6 7 8 | ## Not run:
# do the collocate search using "corpus_path" input-option
df <- colloc_default(corpus_path = orti_bali_path,
pattern = "^nuju$",
window = "b", # focusing on both left and right context window
span = 3) # retrieve 3 collocates to the left and right of the node
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.