DataScienceSalon/predictifyR.3.0: Word Prediction Language Model Evaluation

An experimental study of language models, corpus design and word prediction efficacy in four phases. The initial phase was an exploratory data analysis of the English language HC Corpus, a collection of freely available texts comprised of over 2.5 billion words from 67 languages. Next, linquistically representative corpora of various sizes were built and training, validation, and test sets were preprocesed for modeling. The subsequent language modeling phase concerned the implementation of Good-Turing / Katz, Kneser Ney, Modified Kneser-Ney and Topic Model language models. Finally, the language models were executed on corpora of various sizes and word prediction perplexity measures were taken to illuminate word prediction accuracy.

README.md

Vignettes Man pages API and functions Files

Package details
Maintainer	John James <j2sdatalab@gmail.com>
License	MIT
Version	0.1.0
Package repository	View on GitHub
Installation	Install the latest version of this package by entering the following in R: `install.packages("remotes") remotes::install_github("DataScienceSalon/predictifyR.3.0")`