h2o.word2vec: Trains a word2vec model on a String column of an H2O data...
In h2o: R Interface for the 'H2O' Scalable Machine Learning Platform

h2o.word2vec

R Documentation

Trains a word2vec model on a String column of an H2O data frame

Description

Trains a word2vec model on a String column of an H2O data frame

Usage

h2o.word2vec(
  training_frame = NULL,
  model_id = NULL,
  min_word_freq = 5,
  word_model = c("SkipGram", "CBOW"),
  norm_model = c("HSM"),
  vec_size = 100,
  window_size = 5,
  sent_sample_rate = 0.001,
  init_learning_rate = 0.025,
  epochs = 5,
  pre_trained = NULL,
  max_runtime_secs = 0,
  export_checkpoints_dir = NULL
)

Arguments

`training_frame`	Id of the training data frame.
`model_id`	Destination id for this model; auto-generated if not specified.
`min_word_freq`	This will discard words that appear less than <int> times Defaults to 5.
`word_model`	The word model to use (SkipGram or CBOW) Must be one of: "SkipGram", "CBOW". Defaults to SkipGram.
`norm_model`	Use Hierarchical Softmax Must be one of: "HSM". Defaults to HSM.
`vec_size`	Set size of word vectors Defaults to 100.
`window_size`	Set max skip length between words Defaults to 5.
`sent_sample_rate`	Set threshold for occurrence of words. Those that appear with higher frequency in the training data will be randomly down-sampled; useful range is (0, 1e-5) Defaults to 0.001.
`init_learning_rate`	Set the starting learning rate Defaults to 0.025.
`epochs`	Number of training iterations to run Defaults to 5.
`pre_trained`	Id of a data frame that contains a pre-trained (external) word2vec model
`max_runtime_secs`	Maximum allowed runtime in seconds for model training. Use 0 to disable. Defaults to 0.
`export_checkpoints_dir`	Automatically export generated models to this directory.

Examples

## Not run: 
library(h2o)
h2o.init()

# Import the CraigslistJobTitles dataset
job_titles <- h2o.importFile(
    "https://s3.amazonaws.com/h2o-public-test-data/smalldata/craigslistJobTitles.csv",
    col.names = c("category", "jobtitle"), col.types = c("String", "String"), header = TRUE
)

# Build and train the Word2Vec model
words <- h2o.tokenize(job_titles, " ")
vec <- h2o.word2vec(training_frame = words)
h2o.findSynonyms(vec, "teacher", count = 20)

## End(Not run)

h2o documentation built on May 29, 2024, 4:26 a.m.