corpus_size_leipzig: Generate Leipzig corpus-size
In gederajeg/corplingr: Tidy Concordances, Collocates, and Wordlist

function to get a total word-token count of a given leipzig corpus file. It is built on top of str_count.

corpus_size_leipzig(
  leipzig_path = "(full) filepath to Leipzig corpus files",
  word_regex = "\\b(?i)([-a-zA-Z0-9]+)\\b"
)

`leipzig_path`	file path to the directory folder in which the Leipzig corpus files are stored
`word_regex`	regular expressions defining what "a word" is

tibble containing corpus_id, size, and size_print (for text-printing)

gederajeg/corplingr documentation built on Dec. 20, 2021, 9:50 a.m.

gederajeg/corplingr index

README.md

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Description