knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
sbo
provides utilities for building and evaluating text predictors based on
Stupid Back-off N-gram models
in R. It includes functions such as:
kgram_freqs()
: Extract $k$-gram frequency tables from a text corpussbo_predictor()
: Train a next-word predictor via Stupid Back-off.eval_sbo_predictor()
: Test text predictions against an independent corpus.You can install the latest release of sbo
from CRAN:
install.packages("sbo")
You can install the development version of sbo
from GitHub:
# install.packages("devtools") devtools::install_github("vgherard/sbo")
This example shows how to build a text predictor with sbo
:
library(sbo) p <- sbo_predictor(sbo::twitter_train, # 50k tweets, example dataset N = 3, # Train a 3-gram model dict = sbo::twitter_dict, # Top 1k words appearing in corpus .preprocess = sbo::preprocess, # Preprocessing transformation EOS = ".?!:;" # End-Of-Sentence characters )
The object p
can now be used to generate predictive text as follows:
predict(p, "i love") # a character vector predict(p, "you love") # another character vector predict(p, c("i love", "you love", "she loves", "we love", "you love", "they love") ) # a character matrix
For more general purpose utilities to work with $n$-gram models, you can also check out my package {kgrams}
.
For help, see the sbo
website.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.