ModelGenerator | R Documentation |
It provides a method for generating n-gram models. The n-gram models may be customized by specifying data cleaning and tokenization options.
It provides a method that generates a n-gram model. The n-gram model may be customized by specifying the data cleaning and tokenization options.
The data cleaning options include removal of punctuation, stop words, extra space, non-dictionary words and bad words. The tokenization options include n-gram number and word stemming.
wordpredictor::Base
-> ModelGenerator
new()
It initializes the current object. It is used to set the maximum n-gram number, sample size, input file name, data cleaner options, tokenization options and verbose option.
ModelGenerator$new( name = NULL, desc = NULL, fn = NULL, df = NULL, n = 4, ssize = 0.3, dir = ".", dc_opts = list(), tg_opts = list(), ve = 0 )
name
The model name.
desc
The model description.
fn
The model file name.
df
The path of the input text file. It should be the short file name and should be present in the data directory.
n
The n-gram size of the model.
ssize
The sample size as a proportion of the input file.
dir
The directory containing the input and output files.
dc_opts
The data cleaner options.
tg_opts
The token generator options.
ve
The level of detail in the information messages.
generate_model()
It generates the model using the parameters passed to the object's constructor. It generates a n-gram model file and saves it to the model directory.
ModelGenerator$generate_model()
# Start of environment setup code # The level of detail in the information messages ve <- 0 # The name of the folder that will contain all the files. It will be # created in the current directory. NULL implies tempdir will be used fn <- NULL # The required files. They are default files that are part of the # package rf <- c("input.txt") # An object of class EnvManager is created em <- EnvManager$new(ve = ve, rp = "./") # The required files are downloaded ed <- em$setup_env(rf, fn) # End of environment setup code # ModelGenerator class object is created mg <- ModelGenerator$new( name = "default-model", desc = "1 MB size and default options", fn = "def-model.RDS", df = "input.txt", n = 4, ssize = 0.99, dir = ed, dc_opts = list(), tg_opts = list(), ve = ve ) # The n-gram model is generated mg$generate_model() # The test environment is removed. Comment the below line, so the # files generated by the function can be viewed em$td_env()
clone()
The objects of this class are cloneable with this method.
ModelGenerator$clone(deep = FALSE)
deep
Whether to make a deep clone.
## ------------------------------------------------
## Method `ModelGenerator$generate_model`
## ------------------------------------------------
# Start of environment setup code
# The level of detail in the information messages
ve <- 0
# The name of the folder that will contain all the files. It will be
# created in the current directory. NULL implies tempdir will be used
fn <- NULL
# The required files. They are default files that are part of the
# package
rf <- c("input.txt")
# An object of class EnvManager is created
em <- EnvManager$new(ve = ve, rp = "./")
# The required files are downloaded
ed <- em$setup_env(rf, fn)
# End of environment setup code
# ModelGenerator class object is created
mg <- ModelGenerator$new(
name = "default-model",
desc = "1 MB size and default options",
fn = "def-model.RDS",
df = "input.txt",
n = 4,
ssize = 0.99,
dir = ed,
dc_opts = list(),
tg_opts = list(),
ve = ve
)
# The n-gram model is generated
mg$generate_model()
# The test environment is removed. Comment the below line, so the
# files generated by the function can be viewed
em$td_env()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.