ngram: Fast n-Gram 'Tokenization'
Version 3.0.3

An n-gram is a sequence of n "words" taken, in order, from a body of text. This is a collection of utilities for creating, displaying, summarizing, and "babbling" n-grams. The 'tokenization' and "babbling" are handled by very efficient C code, which can even be built as its own standalone library. The babbler is a simple Markov chain. The package also offers a vignette with complete example 'workflows' and information about the utilities offered in the package.

AuthorDrew Schmidt [aut, cre], Christian Heckendorf [aut]
Date of publication2017-03-24 05:37:04 UTC
MaintainerDrew Schmidt <wrathematics@gmail.com>
LicenseBSD 2-clause License + file LICENSE
Version3.0.3
URL https://github.com/wrathematics/ngram
Package repositoryView on CRAN
InstallationInstall the latest version of this package by entering the following in R:
install.packages("ngram")

Getting started

Package overview
README.md

Popular man pages

babble: ngram Babbler
getters: ngram Getters
ngram-package: ngram: An n-gram Babbler
preprocess: Basic Text Preprocessor
rcorpus: Random Corpus
splitter: Character Splitter
tokenize-asweka: Weka-like n-gram Tokenization
See all...

All man pages Function index File listing

Man pages

babble: ngram Babbler
concatenate: Concatenate
getseed: getseed
getters: ngram Getters
multiread: Multiread
ngram-class: Class ngram
ngram-package: ngram: An n-gram Babbler
ngram-print: ngram printing
phrasetable: Get Phrasetable
preprocess: Basic Text Preprocessor
rcorpus: Random Corpus
splitter: Character Splitter
string.summary: Text Summary
tokenize: n-gram Tokenization
tokenize-asweka: Weka-like n-gram Tokenization
wordcount: wordcount

Functions

Tokenize Man page
Tokenize-AsWeka Man page
babble Man page
babble,ngram-method Man page
check.is.char Source code
check.is.flag Source code
check.is.int Source code
check.is.natnum Source code
check.is.number Source code
check.is.posint Source code
check.is.string Source code
check.is.string.or.null Source code
check.is.strings Source code
concatenate Man page Source code
get.nextwords Man page
get.nextwords,ngram-method Man page
get.ngrams Man page
get.ngrams,ngram-method Man page
get.phrasetable Man page Source code
get.string Man page
get.string,ngram-method Man page
getseed Man page Source code
getters Man page
is.annoying Source code
is.badval Source code
is.inty Source code
is.negative Source code
is.string Source code
is.zero Source code
multiread Man page Source code
ng.print Source code
ngram Man page
ngram,character-method Man page
ngram-class Man page
ngram-package Man page
ngram-print Man page
ngram_asweka Man page Source code
phrasetable Man page
preprocess Man page Source code
print,ngram-method Man page
print.string_summary Source code
rcorpus Man page Source code
show,ngram-method Man page
spacenames Source code
splitter Man page Source code
string.summary Man page Source code
tablesort Source code
title_case Source code
tokenize Man page
wordcount Man page
wordcount,character-method Man page
wordcount,ngram-method Man page

Files

inst
inst/CITATION
inst/shiny
inst/shiny/shiny.r
inst/shiny/babbler
inst/shiny/babbler/server.r
inst/shiny/babbler/ui.r
inst/benchmarks
inst/benchmarks/ngonly.r
inst/benchmarks/rbenchmark.r
inst/benchmarks/bench.r
inst/doc
inst/doc/ngram-guide.pdf
inst/doc/ngram-guide.Rnw
tests
tests/splitter.R
tests/ngram_asweka.R
src
src/Makevars
src/ngram
src/ngram/Makefile
src/ngram/examples
src/ngram/examples/main.c
src/ngram/examples/test.c
src/ngram/src
src/ngram/src/io.c
src/ngram/src/summary.h
src/ngram/src/print.h
src/ngram/src/gen.c
src/ngram/src/summary.c
src/ngram/src/hash.h
src/ngram/src/gen.h
src/ngram/src/lex.h
src/ngram/src/common_defs.h
src/ngram/src/hash.c
src/ngram/src/counts.c
src/ngram/src/process.c
src/ngram/src/ngram.h
src/ngram/src/lex.c
src/ngram/src/counts.h
src/ngram/src/wordcmp.h
src/ngram/src/print.c
src/ngram/src/sorts.c
src/ngram/src/sorts.h
src/ngram/src/wordcmp.c
src/ngram/src/process.h
src/ngram/src/rand
src/ngram/src/rand/rand.h
src/ngram/src/rand/utils.c
src/ngram/src/rand/mrg
src/ngram/src/rand/mrg/rand_mrg.h
src/ngram/src/rand/mrg/rand_mrg.c
src/ngram/src/rand/mt
src/ngram/src/rand/mt/rand_mt.h
src/ngram/src/rand/mt/rand_mt.c
src/ngram/src/rand/samplers.c
src/ngram/src/rand/platform.h
src/ngram/src/rand/rng_interface.c
src/ngram/src/rand/rng_interface.h
src/ngram/CMakeLists.txt
src/ngram/README
src/ngram/mk
src/ngram/LICENSE
src/gen.c
src/getseed.c
src/ngram_native.c
src/count.c
src/phrase_table.c
src/ngram.h
src/converters.c
src/asweka.c
src/print.c
src/constructor.c
NAMESPACE
demo
demo/demo.r
demo/00Index
R
R/string.summary.r
R/multiread.r
R/getseed.r
R/print.r
R/preprocess.r
R/wordcount.r
R/splitter.r
R/babble.r
R/ngram.r
R/ngram-package.R
R/getters.r
R/concatenate.r
R/checks.r
R/rcorpus.r
R/ngram_asweka.r
R/phrasetable.r
vignettes
vignettes/cover
vignettes/cover/cover.pdf
vignettes/ngram-guide.Rnw
vignettes/build_pdf.sh
vignettes/include
vignettes/include/preamble.tex
vignettes/include/05-benchmarks.tex
vignettes/include/ngram.bib
vignettes/include/01-introduction.tex
vignettes/include/00-copyright.tex
vignettes/include/titlepage.tex
vignettes/include/03-utilities.tex
vignettes/include/pics
vignettes/include/pics/uch_small.png
vignettes/include/04-use.tex
vignettes/include/02-installation.tex
README.md
MD5
DESCRIPTION
ChangeLog
man
man/preprocess.Rd
man/tokenize-asweka.Rd
man/wordcount.Rd
man/ngram-package.Rd
man/getters.Rd
man/rcorpus.Rd
man/string.summary.Rd
man/ngram-class.Rd
man/splitter.Rd
man/phrasetable.Rd
man/multiread.Rd
man/babble.Rd
man/getseed.Rd
man/tokenize.Rd
man/ngram-print.Rd
man/concatenate.Rd
cleanup
LICENSE
ngram documentation built on May 20, 2017, 2:35 a.m.

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

Please suggest features or report bugs in the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.