quanteda: Quantitative Analysis of Textual Data

library(quanteda)
quanteda_options(threads = 7)

load("/home/kohei/Documents/Brexit/Analysis/data_corpus_guardian.RData")
corp <- data_corpus_guardian

load("/home/kohei/Documents/Brexit/Analysis/data_tokens_guardian.RData")
toks <- data_tokens_guardian

microbenchmark::microbenchmark(
    tokens_segment(toks, '^\\p{Pe}$', valuetype = 'regex', extract_pattern = TRUE),
    tokens_segment(toks, '^\\p{Pe}$', valuetype = 'regex', extract_pattern = FALSE),
    times = 5
)

microbenchmark::microbenchmark(
    tokens_segment(toks, '^\\p{Pe}$', valuetype = 'regex', pattern_position = 'after',
                   use_docvars = FALSE),
    tokens_segment(toks, '^\\p{Pe}$', valuetype = 'regex', pattern_position = 'after',
                   use_docvars = TRUE),
    times = 5
)

microbenchmark::microbenchmark(
    corpus = corpus_segment(corp, '\\p{P}', valuetype = 'regex', extract_pattern = TRUE),
    token = tokens_segment(toks, '^\\p{P}$', valuetype = 'regex', extract_pattern = TRUE),
    times = 5
)

profvis::profvis(tokens_segment(toks, '^\\p{P}$', valuetype = 'regex', pattern_position = 'after'))

quanteda/quanteda documentation built on Jan. 9, 2025, 8:40 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

quanteda/quanteda
Quantitative Analysis of Textual Data

tests/benchmarks/benchmark_tokens/segment.R
In quanteda/quanteda: Quantitative Analysis of Textual Data

R Package Documentation

Browse R Packages

We want your feedback!

quanteda/quanteda Quantitative Analysis of Textual Data

tests/benchmarks/benchmark_tokens/segment.R In quanteda/quanteda: Quantitative Analysis of Textual Data

R Package Documentation

Browse R Packages

We want your feedback!

quanteda/quanteda
Quantitative Analysis of Textual Data

tests/benchmarks/benchmark_tokens/segment.R
In quanteda/quanteda: Quantitative Analysis of Textual Data