TokenGenerator | R Documentation |
It generates n-gram tokens along with their frequencies. The data may be saved to a file in plain text format or as a R object.
wordpredictor::Base
-> TokenGenerator
new()
It initializes the current obj. It is used to set the file name, tokenization options and verbose option.
TokenGenerator$new(fn = NULL, opts = list(), ve = 0)
fn
The path to the input file.
opts
The options for generating the n-gram tokens.
n. The n-gram size.
save_ngrams. If the n-gram data should be saved.
min_freq. All n-grams with frequency less than min_freq are ignored.
line_count. The number of lines to process at a time.
stem_words. If words should be transformed to their stems.
dir. The dir where the output file should be saved.
format. The format for the output. There are two options.
plain. The data is stored in plain text.
obj. The data is stored as a R obj.
ve
The level of detail in the information messages.
generate_tokens()
It generates n-gram tokens and their frequencies from the given file name. The tokens may be saved to a text file as plain text or a R object.
TokenGenerator$generate_tokens()
The data frame containing n-gram tokens along with their frequencies.
# Start of environment setup code # The level of detail in the information messages ve <- 0 # The name of the folder that will contain all the files. It will be # created in the current directory. NULL implies tempdir will be used fn <- NULL # The required files. They are default files that are part of the # package rf <- c("test-clean.txt") # An object of class EnvManager is created em <- EnvManager$new(ve = ve, rp = "./") # The required files are downloaded ed <- em$setup_env(rf, fn) # End of environment setup code # The n-gram size n <- 4 # The test file name tfn <- paste0(ed, "/test-clean.txt") # The n-gram number is set tg_opts <- list("n" = n, "save_ngrams" = TRUE, "dir" = ed) # The TokenGenerator object is created tg <- TokenGenerator$new(tfn, tg_opts, ve = ve) # The n-gram tokens are generated tg$generate_tokens() # The test environment is removed. Comment the below line, so the # files generated by the function can be viewed em$td_env()
clone()
The objects of this class are cloneable with this method.
TokenGenerator$clone(deep = FALSE)
deep
Whether to make a deep clone.
## ------------------------------------------------
## Method `TokenGenerator$generate_tokens`
## ------------------------------------------------
# Start of environment setup code
# The level of detail in the information messages
ve <- 0
# The name of the folder that will contain all the files. It will be
# created in the current directory. NULL implies tempdir will be used
fn <- NULL
# The required files. They are default files that are part of the
# package
rf <- c("test-clean.txt")
# An object of class EnvManager is created
em <- EnvManager$new(ve = ve, rp = "./")
# The required files are downloaded
ed <- em$setup_env(rf, fn)
# End of environment setup code
# The n-gram size
n <- 4
# The test file name
tfn <- paste0(ed, "/test-clean.txt")
# The n-gram number is set
tg_opts <- list("n" = n, "save_ngrams" = TRUE, "dir" = ed)
# The TokenGenerator object is created
tg <- TokenGenerator$new(tfn, tg_opts, ve = ve)
# The n-gram tokens are generated
tg$generate_tokens()
# The test environment is removed. Comment the below line, so the
# files generated by the function can be viewed
em$td_env()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.