tokenizers: Fast, Consistent Tokenization of Natural Language Text

context("Stem tokenizers")

test_that("Word stem tokenizer works as expected", {
  out_l <- tokenize_word_stems(docs_l)
  out_c <- tokenize_word_stems(docs_c)
  out_1 <- tokenize_word_stems(docs_c[1], simplify = TRUE)

  expect_is(out_l, "list")
  expect_is(out_l[[1]], "character")
  expect_is(out_c, "list")
  expect_is(out_c[[1]], "character")
  expect_is(out_1, "character")

  expect_identical(out_l, out_c)
  expect_identical(out_l[[1]], out_1)
  expect_identical(out_c[[1]], out_1)

  expect_named(out_l, names(docs_l))
  expect_named(out_c, names(docs_c))

  expect_error(tokenize_word_stems(bad_list))
})

test_that("Stem tokenizer produces correct output", {
  # skip_on_os("windows")
  out_1 <- tokenize_word_stems(docs_c[1], simplify = TRUE)
  expected <- c("in", "my", "purs", "and", "noth")
  expect_identical(out_1[20:24], expected)
})

lmullen/tokenizers documentation built on March 28, 2024, 11:12 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

lmullen/tokenizers
Fast, Consistent Tokenization of Natural Language Text

tests/testthat/test-stem.R
In lmullen/tokenizers: Fast, Consistent Tokenization of Natural Language Text

R Package Documentation

Browse R Packages

We want your feedback!

lmullen/tokenizers Fast, Consistent Tokenization of Natural Language Text

tests/testthat/test-stem.R In lmullen/tokenizers: Fast, Consistent Tokenization of Natural Language Text

R Package Documentation

Browse R Packages

We want your feedback!

lmullen/tokenizers
Fast, Consistent Tokenization of Natural Language Text

tests/testthat/test-stem.R
In lmullen/tokenizers: Fast, Consistent Tokenization of Natural Language Text