h2o.tokenize: Tokenize String
In h2o: R Interface for the 'H2O' Scalable Machine Learning Platform

h2o.tokenize

R Documentation

Tokenize String

Description

h2o.tokenize is similar to h2o.strsplit, the difference between them is that h2o.tokenize will store the tokenized text into a single column making it easier for additional processing (filtering stop words, word2vec algo, ...).

Usage

h2o.tokenize(x, split)

Arguments

`x`	The column or columns whose strings to tokenize.
`split`	The regular expression to split on.

Value

An H2OFrame with a single column representing the tokenized Strings. Original rows of the input DF are separated by NA.

Examples

## Not run: 
library(h2o)
h2o.init()
string_to_tokenize <- as.h2o("Split at every character and tokenize.")
tokenize_string <- h2o.tokenize(as.character(string_to_tokenize), "")

## End(Not run)

h2o documentation built on May 29, 2024, 4:26 a.m.