ngram: Fast n-Gram 'Tokenization'

Share:

An n-gram is a sequence of n "words" taken from a body of text in order. This package offers utilities for creating, displaying, summarizing, and "babbling" n-grams. The 'tokenization' and "babbling" are handled by very efficient C code, which can even be build as its own standalone library. The babbler is a simple Markov chain. The package also offers a vignette with complete example 'workflows' and information about the utilities offered in the package.

Author
Drew Schmidt [aut, cre], Christian Heckendorf [aut]
Date of publication
2016-07-13 10:19:37
Maintainer
Drew Schmidt <wrathematics@gmail.com>
License
BSD 2-clause License + file LICENSE
Version
3.0.1
URLs

View on CRAN

Man pages

babble
ngram Babbler
concatenate
Concatenate
getseed
getseed
getters
ngram Getters
multiread
Multiread
ngram-class
Class ngram
ngram-package
ngram: An n-gram Babbler
ngram-print
ngram printing
phrasetable
Get Phrasetable
preprocess
Basic Text Preprocessor
rcorpus
Random Corpus
splitter
Character Splitter
string.summary
Text Summary
tokenize
n-gram Tokenization
tokenize-asweka
Weka-like n-gram Tokenization
wordcount
wordcount

Files in this package

ngram
ngram/inst
ngram/inst/CITATION
ngram/inst/shiny
ngram/inst/shiny/shiny.r
ngram/inst/shiny/babbler
ngram/inst/shiny/babbler/server.r
ngram/inst/shiny/babbler/ui.r
ngram/inst/benchmarks
ngram/inst/benchmarks/ngonly.r
ngram/inst/benchmarks/rbenchmark.r
ngram/inst/benchmarks/bench.r
ngram/tests
ngram/tests/splitter.R
ngram/tests/ngram_asweka.R
ngram/src
ngram/src/Makevars
ngram/src/ngram
ngram/src/ngram/Makefile
ngram/src/ngram/examples
ngram/src/ngram/examples/main.c
ngram/src/ngram/examples/test.c
ngram/src/ngram/src
ngram/src/ngram/src/io.c
ngram/src/ngram/src/summary.h
ngram/src/ngram/src/print.h
ngram/src/ngram/src/gen.c
ngram/src/ngram/src/summary.c
ngram/src/ngram/src/hash.h
ngram/src/ngram/src/gen.h
ngram/src/ngram/src/lex.h
ngram/src/ngram/src/common_defs.h
ngram/src/ngram/src/hash.c
ngram/src/ngram/src/counts.c
ngram/src/ngram/src/process.c
ngram/src/ngram/src/ngram.h
ngram/src/ngram/src/lex.c
ngram/src/ngram/src/counts.h
ngram/src/ngram/src/wordcmp.h
ngram/src/ngram/src/print.c
ngram/src/ngram/src/sorts.c
ngram/src/ngram/src/sorts.h
ngram/src/ngram/src/wordcmp.c
ngram/src/ngram/src/process.h
ngram/src/ngram/src/rand
ngram/src/ngram/src/rand/rand.h
ngram/src/ngram/src/rand/utils.c
ngram/src/ngram/src/rand/mrg
ngram/src/ngram/src/rand/mrg/rand_mrg.h
ngram/src/ngram/src/rand/mrg/rand_mrg.c
ngram/src/ngram/src/rand/mt
ngram/src/ngram/src/rand/mt/rand_mt.h
ngram/src/ngram/src/rand/mt/rand_mt.c
ngram/src/ngram/src/rand/samplers.c
ngram/src/ngram/src/rand/platform.h
ngram/src/ngram/src/rand/rng_interface.c
ngram/src/ngram/src/rand/rng_interface.h
ngram/src/ngram/CMakeLists.txt
ngram/src/ngram/README
ngram/src/ngram/mk
ngram/src/ngram/LICENSE
ngram/src/gen.c
ngram/src/getseed.c
ngram/src/count.c
ngram/src/phrase_table.c
ngram/src/ngram.h
ngram/src/converters.c
ngram/src/asweka.c
ngram/src/print.c
ngram/src/constructor.c
ngram/NAMESPACE
ngram/demo
ngram/demo/demo.r
ngram/demo/00Index
ngram/R
ngram/R/string.summary.r
ngram/R/multiread.r
ngram/R/getseed.r
ngram/R/print.r
ngram/R/preprocess.r
ngram/R/wordcount.r
ngram/R/splitter.r
ngram/R/babble.r
ngram/R/ngram.r
ngram/R/ngram-package.R
ngram/R/getters.r
ngram/R/concatenate.r
ngram/R/rcorpus.r
ngram/R/ngram_asweka.r
ngram/R/phrasetable.r
ngram/vignettes
ngram/vignettes/cover
ngram/vignettes/cover/cover.pdf
ngram/vignettes/ngram-guide.Rnw
ngram/vignettes/build_pdf.sh
ngram/vignettes/include
ngram/vignettes/include/preamble.tex
ngram/vignettes/include/05-benchmarks.tex
ngram/vignettes/include/ngram.bib
ngram/vignettes/include/01-introduction.tex
ngram/vignettes/include/00-copyright.tex
ngram/vignettes/include/titlepage.tex
ngram/vignettes/include/03-utilities.tex
ngram/vignettes/include/pics
ngram/vignettes/include/pics/uch_small.png
ngram/vignettes/include/04-use.tex
ngram/vignettes/include/02-installation.tex
ngram/README.md
ngram/MD5
ngram/DESCRIPTION
ngram/ChangeLog
ngram/man
ngram/man/preprocess.Rd
ngram/man/tokenize-asweka.Rd
ngram/man/wordcount.Rd
ngram/man/ngram-package.Rd
ngram/man/getters.Rd
ngram/man/rcorpus.Rd
ngram/man/string.summary.Rd
ngram/man/ngram-class.Rd
ngram/man/splitter.Rd
ngram/man/phrasetable.Rd
ngram/man/multiread.Rd
ngram/man/babble.Rd
ngram/man/getseed.Rd
ngram/man/tokenize.Rd
ngram/man/ngram-print.Rd
ngram/man/concatenate.Rd
ngram/cleanup
ngram/LICENSE