ngram: Fast n-Gram 'Tokenization'

An n-gram is a sequence of n "words" taken, in order, from a body of text. This is a collection of utilities for creating, displaying, summarizing, and "babbling" n-grams. The 'tokenization' and "babbling" are handled by very efficient C code, which can even be built as its own standalone library. The babbler is a simple Markov chain. The package also offers a vignette with complete example 'workflows' and information about the utilities offered in the package.

AuthorDrew Schmidt [aut, cre], Christian Heckendorf [aut]
Date of publication2017-01-17 19:41:25
MaintainerDrew Schmidt <wrathematics@gmail.com>
LicenseBSD 2-clause License + file LICENSE
Version3.0.2
https://github.com/wrathematics/ngram

View on CRAN

Files in this package

ngram
ngram/inst
ngram/inst/CITATION
ngram/inst/shiny
ngram/inst/shiny/shiny.r
ngram/inst/shiny/babbler
ngram/inst/shiny/babbler/server.r
ngram/inst/shiny/babbler/ui.r
ngram/inst/benchmarks
ngram/inst/benchmarks/ngonly.r
ngram/inst/benchmarks/rbenchmark.r
ngram/inst/benchmarks/bench.r
ngram/inst/doc
ngram/inst/doc/ngram-guide.pdf
ngram/inst/doc/ngram-guide.Rnw
ngram/tests
ngram/tests/splitter.R
ngram/tests/ngram_asweka.R
ngram/src
ngram/src/Makevars
ngram/src/ngram
ngram/src/ngram/Makefile
ngram/src/ngram/examples
ngram/src/ngram/examples/main.c
ngram/src/ngram/examples/test.c
ngram/src/ngram/src
ngram/src/ngram/src/io.c
ngram/src/ngram/src/summary.h
ngram/src/ngram/src/print.h
ngram/src/ngram/src/gen.c
ngram/src/ngram/src/summary.c
ngram/src/ngram/src/hash.h
ngram/src/ngram/src/gen.h
ngram/src/ngram/src/lex.h
ngram/src/ngram/src/common_defs.h
ngram/src/ngram/src/hash.c
ngram/src/ngram/src/counts.c
ngram/src/ngram/src/process.c
ngram/src/ngram/src/ngram.h
ngram/src/ngram/src/lex.c
ngram/src/ngram/src/counts.h
ngram/src/ngram/src/wordcmp.h
ngram/src/ngram/src/print.c
ngram/src/ngram/src/sorts.c
ngram/src/ngram/src/sorts.h
ngram/src/ngram/src/wordcmp.c
ngram/src/ngram/src/process.h
ngram/src/ngram/src/rand
ngram/src/ngram/src/rand/rand.h
ngram/src/ngram/src/rand/utils.c
ngram/src/ngram/src/rand/mrg
ngram/src/ngram/src/rand/mrg/rand_mrg.h
ngram/src/ngram/src/rand/mrg/rand_mrg.c
ngram/src/ngram/src/rand/mt
ngram/src/ngram/src/rand/mt/rand_mt.h
ngram/src/ngram/src/rand/mt/rand_mt.c
ngram/src/ngram/src/rand/samplers.c
ngram/src/ngram/src/rand/platform.h
ngram/src/ngram/src/rand/rng_interface.c
ngram/src/ngram/src/rand/rng_interface.h
ngram/src/ngram/CMakeLists.txt
ngram/src/ngram/README
ngram/src/ngram/mk
ngram/src/ngram/LICENSE
ngram/src/gen.c
ngram/src/getseed.c
ngram/src/count.c
ngram/src/phrase_table.c
ngram/src/ngram.h
ngram/src/converters.c
ngram/src/asweka.c
ngram/src/print.c
ngram/src/constructor.c
ngram/NAMESPACE
ngram/demo
ngram/demo/demo.r
ngram/demo/00Index
ngram/R
ngram/R/string.summary.r
ngram/R/multiread.r
ngram/R/getseed.r
ngram/R/print.r
ngram/R/preprocess.r
ngram/R/wordcount.r
ngram/R/splitter.r
ngram/R/babble.r
ngram/R/ngram.r
ngram/R/ngram-package.R
ngram/R/getters.r
ngram/R/concatenate.r
ngram/R/checks.r
ngram/R/rcorpus.r
ngram/R/ngram_asweka.r
ngram/R/phrasetable.r
ngram/vignettes
ngram/vignettes/cover
ngram/vignettes/cover/cover.pdf
ngram/vignettes/ngram-guide.Rnw
ngram/vignettes/build_pdf.sh
ngram/vignettes/include
ngram/vignettes/include/preamble.tex
ngram/vignettes/include/05-benchmarks.tex
ngram/vignettes/include/ngram.bib
ngram/vignettes/include/01-introduction.tex
ngram/vignettes/include/00-copyright.tex
ngram/vignettes/include/titlepage.tex
ngram/vignettes/include/03-utilities.tex
ngram/vignettes/include/pics
ngram/vignettes/include/pics/uch_small.png
ngram/vignettes/include/04-use.tex
ngram/vignettes/include/02-installation.tex
ngram/README.md
ngram/MD5
ngram/DESCRIPTION
ngram/ChangeLog
ngram/man
ngram/man/preprocess.Rd ngram/man/tokenize-asweka.Rd ngram/man/wordcount.Rd ngram/man/ngram-package.Rd ngram/man/getters.Rd ngram/man/rcorpus.Rd ngram/man/string.summary.Rd ngram/man/ngram-class.Rd ngram/man/splitter.Rd ngram/man/phrasetable.Rd ngram/man/multiread.Rd ngram/man/babble.Rd ngram/man/getseed.Rd ngram/man/tokenize.Rd ngram/man/ngram-print.Rd ngram/man/concatenate.Rd
ngram/cleanup
ngram/LICENSE

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.