ModelPredictor | R Documentation |
It provides a method for predicting the new word given a set of previous words. It also provides a method for calculating the Perplexity score for a set of words. Furthermore it provides a method for calculating the probability of a given word and set of previous words.
wordpredictor::Base
-> ModelPredictor
new()
It initializes the current object. It is used to set the model file name and verbose options.
ModelPredictor$new(mf, ve = 0)
mf
The model file name.
ve
The level of detail in the information messages.
get_model()
Returns the Model class object.
ModelPredictor$get_model()
The Model class object is returned.
calc_perplexity()
The Perplexity for the given sentence is calculated. For each word, the probability of the word given the previous words is calculated. The probabilities are multiplied and then inverted. The nth root of the result is the perplexity, where n is the number of words in the sentence. If the stem_words tokenization option was specified when creating the given model file, then the previous words are converted to their stems.
ModelPredictor$calc_perplexity(words)
words
The list of words.
The perplexity of the given list of words.
# Start of environment setup code # The level of detail in the information messages ve <- 0 # The name of the folder that will contain all the files. It will be # created in the current directory. NULL implies tempdir will be used fn <- NULL # The required files. They are default files that are part of the # package rf <- c("def-model.RDS") # An object of class EnvManager is created em <- EnvManager$new(ve = ve, rp = "./") # The required files are downloaded ed <- em$setup_env(rf, fn) # End of environment setup code # The model file name mfn <- paste0(ed, "/def-model.RDS") # ModelPredictor class object is created mp <- ModelPredictor$new(mf = mfn, ve = ve) # The sentence whoose Perplexity is to be calculated l <- "last year at this time i was preparing for a trip to rome" # The line is split in to words w <- strsplit(l, " ")[[1]] # The Perplexity of the sentence is calculated p <- mp$calc_perplexity(w) # The sentence Perplexity is printed print(p) # The test environment is removed. Comment the below line, so the # files generated by the function can be viewed em$td_env()
predict_word()
Predicts the next word given a list of previous words. It checks the last n previous words in the transition probabilities data, where n is equal to 1 - n-gram size of model. If there is a match, the top 3 next words with highest probabilities are returned. If there is no match, then the last n-1 previous words are checked. This process is continued until the last word is checked. If there is no match, then empty result is returned. The given words may optionally be stemmed.
ModelPredictor$predict_word(words, count = 3, dc = NULL)
words
A character vector of previous words or a single vector containing the previous word text.
count
The number of results to return.
dc
A DataCleaner object. If it is given, then the given words
The top 3 predicted words along with their probabilities.
# Start of environment setup code # The level of detail in the information messages ve <- 0 # The name of the folder that will contain all the files. It will be # created in the current directory. NULL implies tempdir will be used fn <- NULL # The required files. They are default files that are part of the # package rf <- c("def-model.RDS") # An object of class EnvManager is created em <- EnvManager$new(ve = ve, "rp" = "./") # The required files are downloaded ed <- em$setup_env(rf, fn) # End of environment setup code # The model file name mfn <- paste0(ed, "/def-model.RDS") # ModelPredictor class object is created mp <- ModelPredictor$new(mf = mfn, ve = ve) # The next word is predicted nws <- mp$predict_word("today is", count = 10) # The predicted next words are printed print(nws) # The test environment is removed. Comment the below line, so the # files generated by the function can be viewed em$td_env()
get_word_prob()
Calculates the probability of the given word given the previous words. The last n words are converted to numeric hash using digest2int function. All other words are ignored. n is equal to 1 - size of the n-gram model. The hash is looked up in a data frame of transition probabilities. The last word is converted to a number by checking its position in a list of unique words. If the hash and the word position were found, then the probability of the previous word and hash is returned. If it was not found, then the hash of the n-1 previous words is taken and the processed is repeated. If the data was not found in the data frame, then the word probability is returned. This is known as back-off. If the word probability could not be found then the default probability is returned. The default probability is calculated as 1/(N+V), Where N = number of words in corpus and V is the number of dictionary words.
ModelPredictor$get_word_prob(word, pw)
word
The word whose probability is to be calculated.
pw
The previous words.
The probability of the word given the previous words.
# Start of environment setup code # The level of detail in the information messages ve <- 0 # The name of the folder that will contain all the files. It will be # created in the current directory. NULL implies tempdir will be used fn <- NULL # The required files. They are default files that are part of the # package rf <- c("def-model.RDS") # An object of class EnvManager is created em <- EnvManager$new(ve = ve, "rp" = "./") # The required files are downloaded ed <- em$setup_env(rf, fn) # End of environment setup code # The model file name mfn <- paste0(ed, "/def-model.RDS") # ModelPredictor class object is created mp <- ModelPredictor$new(mf = mfn, ve = ve) # The probability that the next word is "you" given the prev words # "how" and "are" prob <- mp$get_word_prob(word = "you", pw = c("how", "are")) # The probability is printed print(prob) # The test environment is removed. Comment the below line, so the # files generated by the function can be viewed em$td_env()
clone()
The objects of this class are cloneable with this method.
ModelPredictor$clone(deep = FALSE)
deep
Whether to make a deep clone.
## ------------------------------------------------
## Method `ModelPredictor$calc_perplexity`
## ------------------------------------------------
# Start of environment setup code
# The level of detail in the information messages
ve <- 0
# The name of the folder that will contain all the files. It will be
# created in the current directory. NULL implies tempdir will be used
fn <- NULL
# The required files. They are default files that are part of the
# package
rf <- c("def-model.RDS")
# An object of class EnvManager is created
em <- EnvManager$new(ve = ve, rp = "./")
# The required files are downloaded
ed <- em$setup_env(rf, fn)
# End of environment setup code
# The model file name
mfn <- paste0(ed, "/def-model.RDS")
# ModelPredictor class object is created
mp <- ModelPredictor$new(mf = mfn, ve = ve)
# The sentence whoose Perplexity is to be calculated
l <- "last year at this time i was preparing for a trip to rome"
# The line is split in to words
w <- strsplit(l, " ")[[1]]
# The Perplexity of the sentence is calculated
p <- mp$calc_perplexity(w)
# The sentence Perplexity is printed
print(p)
# The test environment is removed. Comment the below line, so the
# files generated by the function can be viewed
em$td_env()
## ------------------------------------------------
## Method `ModelPredictor$predict_word`
## ------------------------------------------------
# Start of environment setup code
# The level of detail in the information messages
ve <- 0
# The name of the folder that will contain all the files. It will be
# created in the current directory. NULL implies tempdir will be used
fn <- NULL
# The required files. They are default files that are part of the
# package
rf <- c("def-model.RDS")
# An object of class EnvManager is created
em <- EnvManager$new(ve = ve, "rp" = "./")
# The required files are downloaded
ed <- em$setup_env(rf, fn)
# End of environment setup code
# The model file name
mfn <- paste0(ed, "/def-model.RDS")
# ModelPredictor class object is created
mp <- ModelPredictor$new(mf = mfn, ve = ve)
# The next word is predicted
nws <- mp$predict_word("today is", count = 10)
# The predicted next words are printed
print(nws)
# The test environment is removed. Comment the below line, so the
# files generated by the function can be viewed
em$td_env()
## ------------------------------------------------
## Method `ModelPredictor$get_word_prob`
## ------------------------------------------------
# Start of environment setup code
# The level of detail in the information messages
ve <- 0
# The name of the folder that will contain all the files. It will be
# created in the current directory. NULL implies tempdir will be used
fn <- NULL
# The required files. They are default files that are part of the
# package
rf <- c("def-model.RDS")
# An object of class EnvManager is created
em <- EnvManager$new(ve = ve, "rp" = "./")
# The required files are downloaded
ed <- em$setup_env(rf, fn)
# End of environment setup code
# The model file name
mfn <- paste0(ed, "/def-model.RDS")
# ModelPredictor class object is created
mp <- ModelPredictor$new(mf = mfn, ve = ve)
# The probability that the next word is "you" given the prev words
# "how" and "are"
prob <- mp$get_word_prob(word = "you", pw = c("how", "are"))
# The probability is printed
print(prob)
# The test environment is removed. Comment the below line, so the
# files generated by the function can be viewed
em$td_env()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.