corpus_size_leipzig: Generate Leipzig corpus-size

Description Usage Arguments Value

View source: R/corplingr_corpus_size_leipzig.R

Description

function to get a total word-token count of a given leipzig corpus file. It is built on top of str_count.

Usage

1
2
3
4
corpus_size_leipzig(
  leipzig_path = "(full) filepath to Leipzig corpus files",
  word_regex = "\\b(?i)([-a-zA-Z0-9]+)\\b"
)

Arguments

leipzig_path

file path to the directory folder in which the Leipzig corpus files are stored

word_regex

regular expressions defining what "a word" is

Value

tibble containing corpus_id, size, and size_print (for text-printing)


gederajeg/corplingr documentation built on Dec. 20, 2021, 9:50 a.m.