para_conc: Parallel concordance

Description Usage Arguments Value Examples

View source: R/para_conc.R

Description

A function to generate a concordance from parallel, bilingual corpora.

Usage

1
2
3
4
5
6
7
8
para_conc(
  source_text = "The source text corpus",
  target_text = "The target text corpus",
  pattern = "Search pattern for words in the source text",
  case_insensitive = FALSE,
  conc_sample = 25,
  filename = "parallel_conc.txt"
)

Arguments

source_text

character vector of the source-text corpora

target_text

character vector of the target-text corpora

pattern

regular expression search pattern for the source-text node word

case_insensitive

logical; whether the search pattern is case insensitive. Default to FALSE

conc_sample

random sample of the concordance lines

filename

file name of the parallel concordance output

Value

A tibble of parallel concordance with source-text node word and its left and right context, and their target-text translation. By default, para_conc() also automatically saves the concordance into a tab-separated plain text named "parallel_conc.txt". Users can specify their own output file name.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
para_conc(sci_en, sci_id, pattern = "should", conc_sample = 20)
          # we delete the automatic output file to remove warning in R CMD check
          unlink("parallel_conc.txt")

          # example when automatic output file is suppressed with filename = FALSE
          # and only producing a tibble/data frame.
          para_conc(sci_en, sci_id,
                    pattern = "should",
                    conc_sample = 20,
                    filename = FALSE) # suppress automatic output

gederajeg/paracorp documentation built on Jan. 2, 2022, 4:33 a.m.