inst/breakrules/update_breakrules.R

# read RBBI rules from ICU sources and update static files

# rules for words
word <- readLines("https://raw.githubusercontent.com/unicode-org/icu/main/icu4c/source/data/brkitr/rules/word.txt") 
writeLines(word, "../inst/breakrules/breakrules_word.txt")

# rules for sentences
sent <- readLines("https://raw.githubusercontent.com/unicode-org/icu/main/icu4c/source/data/brkitr/rules/sent.txt") 
writeLines(word, "../inst/breakrules/breakrules_sentence.txt")

Try the quanteda package in your browser

Any scripts or data that you put into this service are public.

quanteda documentation built on May 31, 2023, 8:28 p.m.