sparklyr: R Interface to Apache Spark

skip_connection("ml-feature-tokenizer")
skip_on_livy()
skip_on_arrow_devel()

skip_databricks_connect()
test_that("ft_tokenizer() param setting", {
  test_requires_version("3.0.0")
  sc <- testthat_spark_connection()
  test_args <- list(
    input_col = "foo",
    output_col = "bar"
  )
  test_param_setting(sc, ft_tokenizer, test_args)
})

test_that("ft_tokenizer.tbl_spark() works as expected", {
  sc <- testthat_spark_connection()
  test_requires("janeaustenr")
  austen <- austen_books()
  austen_tbl <- testthat_tbl("austen")

  spark_tokens <- austen_tbl %>%
    na.omit() %>%
    dplyr::filter(length(text) > 0) %>%
    head(10) %>%
    ft_tokenizer("text", "tokens") %>%
    sdf_read_column("tokens") %>%
    lapply(unlist)

  r_tokens <- austen %>%
    dplyr::filter(nzchar(text)) %>%
    head(10) %>%
    `$`("text") %>%
    tolower() %>%
    strsplit("\\s")

  expect_identical(spark_tokens, r_tokens)
})

test_clear_cache()

rstudio/sparklyr documentation built on April 30, 2024, 4:01 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

rstudio/sparklyr
R Interface to Apache Spark

tests/testthat/test-ml-feature-tokenizer.R
In rstudio/sparklyr: R Interface to Apache Spark

R Package Documentation

Browse R Packages

We want your feedback!

rstudio/sparklyr R Interface to Apache Spark

tests/testthat/test-ml-feature-tokenizer.R In rstudio/sparklyr: R Interface to Apache Spark

R Package Documentation

Browse R Packages

We want your feedback!

rstudio/sparklyr
R Interface to Apache Spark

tests/testthat/test-ml-feature-tokenizer.R
In rstudio/sparklyr: R Interface to Apache Spark