concord_others: Simple concordance function

Description Usage Arguments Value Examples

View source: R/corplingr_concord_others.R

Description

The function generates a tidy concordance for a search pattern in a (set of) corpus (files). The function requires the corpus file(s) loaded and ready in the console as a vector of text with more than one line of texts/sentences. Each line should not correspond to one sentence. See Examples below for details.

Usage

1
2
3
4
5
6
7
concord_others(
  corpus_vector = "character vector of text loaded/read into console",
  pattern = "regular expressions",
  to_lower_corpus = TRUE,
  case_insensitive = TRUE,
  context_char = 50
)

Arguments

corpus_vector

the vector of corpus texts.

pattern

regular expressions for the search pattern.

to_lower_corpus

whether to lowercase the corpus (TRUE – the default) first or leave it as is (FALSE).

case_insensitive

whether to ignore the case for the search pattern argument (TRUE – the default) or not (FALSE).

context_char

integer vector for the specified number of character as context to the left and right of the node pattern.

Value

A tibble/data frame for the concordance match with LEFT and RIGHT contexts.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
## Not run: 
# Load or read in the corpus data
# "load" approach
my_corpus_data <- "/Your/Path/To/Corpus.RData"
load(my_corpus_data)

# "read" approach
my_corpus_path <- "/Your/Path/To/Corpus.txt"
corp <- readr::read_lines(my_corpus_path)

# Inspect the first two elements.
head(corp, 2)
[1] "Hari yang panas itu berangsur-angsur menjadi dingin, karena matahari,
     raja siang itu, akan masuk ke dalam peraduannya, ke balik Gunung Sibualbuali,
     yang menjadi watas dataran tinggi Sipirok yang bagus itu."
[2] "Langit di sebelah barat pun merah kuning rupanya, dan sinar matahari
     yang turun itu nampaklah di atas puncak kayu yang tinggi-tinggi, indah
     rupanya, sebagai disepuh dengan emas juwita."

# OPTIONAL
# Trim down leading and trailing white space
# with str_trim from the stringr package
corp <- stringr::str_trim(corp)
# remove excessive white space in the text into just one space
corp <- stringr::str_replace_all(corp, "\\s{2,}", " ")


# get concordance for a pattern
concordance <- concord_others(corpus_vector = corp,
                                   pattern = "\\bmemandang\\b",
                                   to_lower_corpus = TRUE,
                                   case_insensitive = TRUE,
                                   context_char = 100)

# check the output
str(concordance)
head(concordance)

# save the output as tab-separated text file
# it can be opened in a spreadsheet software for further annotation
readr::write_delim(concordance,
                   path = "/Users/Primahadi/Desktop/my_concordance.txt",
                   delim = "\t")

## End(Not run)

gederajeg/corplingr documentation built on Dec. 20, 2021, 9:50 a.m.