quanteda: Quantitative Analysis of Textual Data

library(profvis) # for profiling
library(tokenizers)

corp <- readRDS("/home/kohei/Documents/Brexit/Data/data_corpus_guardian.RDS")

system.time(
    tokens(corp, what = 'word', verbose = TRUE)
)

system.time(
    tokens(corp, what = 'fastestword', verbose = TRUE)
)

txt <- rep(paste0(letters, collapse=' '), 10000)

microbenchmark::microbenchmark(
    tokenizers::tokenize_words(txt),
    tokens(txt, what = 'word'),
    unit = 'relative'
)

profvis(
  tokens(txt, what = 'fastestword')
)

quanteda/quanteda documentation built on Jan. 9, 2025, 8:40 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

quanteda/quanteda
Quantitative Analysis of Textual Data

tests/benchmarks/benchmark_tokens/tokenize.R
In quanteda/quanteda: Quantitative Analysis of Textual Data

R Package Documentation

Browse R Packages

We want your feedback!

quanteda/quanteda Quantitative Analysis of Textual Data

tests/benchmarks/benchmark_tokens/tokenize.R In quanteda/quanteda: Quantitative Analysis of Textual Data

R Package Documentation

Browse R Packages

We want your feedback!

quanteda/quanteda
Quantitative Analysis of Textual Data

tests/benchmarks/benchmark_tokens/tokenize.R
In quanteda/quanteda: Quantitative Analysis of Textual Data