Nothing
sbo
provides utilities for building and evaluating text predictors
based on Stupid
Back-off N-gram models
in R. It includes functions such as:
kgram_freqs()
: Extract (k)-gram frequency tables from a text
corpussbo_predictor()
: Train a next-word predictor via Stupid Back-off.eval_sbo_predictor()
: Test text predictions against an independent
corpus.You can install the latest release of sbo
from CRAN:
install.packages("sbo")
You can install the development version of sbo
from GitHub:
# install.packages("devtools")
devtools::install_github("vgherard/sbo")
This example shows how to build a text predictor with sbo
:
library(sbo)
p <- sbo_predictor(sbo::twitter_train, # 50k tweets, example dataset
N = 3, # Train a 3-gram model
dict = sbo::twitter_dict, # Top 1k words appearing in corpus
.preprocess = sbo::preprocess, # Preprocessing transformation
EOS = ".?!:;" # End-Of-Sentence characters
)
The object p
can now be used to generate predictive text as follows:
predict(p, "i love") # a character vector
#> [1] "you" "it" "my"
predict(p, "you love") # another character vector
#> [1] "<EOS>" "me" "the"
predict(p,
c("i love", "you love", "she loves", "we love", "you love", "they love")
) # a character matrix
#> [,1] [,2] [,3]
#> [1,] "you" "it" "my"
#> [2,] "<EOS>" "me" "the"
#> [3,] "you" "my" "me"
#> [4,] "you" "our" "it"
#> [5,] "<EOS>" "me" "the"
#> [6,] "to" "you" "and"
For help, see the sbo
website.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.