freqlist_leipzig_all: Frequency list of all words in a Leipzig Corpus file

Description Usage Arguments Value Examples

View source: R/corplingr_freqlist_leipzig_all.R

Description

The function generates a frequency list of all word-tokens in a single Leipzig Corpus file. While users can input all filepath to all corpus files, for memory-efficiency, it is recommended that each file is processed in separate function-call. If it is decided to process all corpus files, the functions output a List with as many elements as the number of the input filepath.

Usage

1
2
3
4
5
freqlist_leipzig_all(
  split_regex = "([^a-zA-Z0-9-]+|--)",
  leipzig_path = NULL,
  case_insensitive = TRUE
)

Arguments

split_regex

user-defined regular expressions to tokenise the corpus.

leipzig_path

full filepath to one or more of the Leipzig Corpus file(s).

case_insensitive

logical; ignoring (TRUE) or maintaining (FALSE) the case when splitting the corpus into word token.

Value

A tibble of frequency list in descending order of the frequency.

Examples

1
2
3
4
5
## Not run: 
wlist_all <- freqlist_leipzig_all(split_regex = "([^a-zA-Z0-9-]+|--)",
                                  leipzig_path = leipzig_corpus_path[1])

## End(Not run)

gederajeg/corplingr documentation built on Dec. 20, 2021, 9:50 a.m.