Description Usage Arguments Value Examples
View source: R/corplingr_concord_others.R
The function generates a tidy concordance for a search pattern in a (set of) corpus (files). The function requires the corpus file(s) loaded and ready in the console as a vector of text with more than one line of texts/sentences. Each line should not correspond to one sentence. See Examples below for details.
1 2 3 4 5 6 7 | concord_others(
corpus_vector = "character vector of text loaded/read into console",
pattern = "regular expressions",
to_lower_corpus = TRUE,
case_insensitive = TRUE,
context_char = 50
)
|
corpus_vector |
the vector of corpus texts. |
pattern |
regular expressions for the search pattern. |
to_lower_corpus |
whether to lowercase the corpus ( |
case_insensitive |
whether to ignore the case for the search |
context_char |
integer vector for the specified number of character as context to the left and right of the node pattern. |
A tibble/data frame for the concordance match with LEFT
and RIGHT
contexts.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | ## Not run:
# Load or read in the corpus data
# "load" approach
my_corpus_data <- "/Your/Path/To/Corpus.RData"
load(my_corpus_data)
# "read" approach
my_corpus_path <- "/Your/Path/To/Corpus.txt"
corp <- readr::read_lines(my_corpus_path)
# Inspect the first two elements.
head(corp, 2)
[1] "Hari yang panas itu berangsur-angsur menjadi dingin, karena matahari,
raja siang itu, akan masuk ke dalam peraduannya, ke balik Gunung Sibualbuali,
yang menjadi watas dataran tinggi Sipirok yang bagus itu."
[2] "Langit di sebelah barat pun merah kuning rupanya, dan sinar matahari
yang turun itu nampaklah di atas puncak kayu yang tinggi-tinggi, indah
rupanya, sebagai disepuh dengan emas juwita."
# OPTIONAL
# Trim down leading and trailing white space
# with str_trim from the stringr package
corp <- stringr::str_trim(corp)
# remove excessive white space in the text into just one space
corp <- stringr::str_replace_all(corp, "\\s{2,}", " ")
# get concordance for a pattern
concordance <- concord_others(corpus_vector = corp,
pattern = "\\bmemandang\\b",
to_lower_corpus = TRUE,
case_insensitive = TRUE,
context_char = 100)
# check the output
str(concordance)
head(concordance)
# save the output as tab-separated text file
# it can be opened in a spreadsheet software for further annotation
readr::write_delim(concordance,
path = "/Users/Primahadi/Desktop/my_concordance.txt",
delim = "\t")
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.