textreg: n-Gram Text Regression, aka Concise Comparative Summarization

Function for sparse regression on raw text, regressing a labeling vector onto a feature space consisting of all possible phrases.

AuthorLuke Miratrix
Date of publication2015-11-11 16:46:48
MaintainerLuke Miratrix <lmiratrix@stat.harvard.edu>
LicenseGPL (>= 2)
Version0.1.3

View on CRAN

Man pages

bathtub: Sample of cleaned OSHA accident summaries.

build.corpus: Build a corpus that can be used in the textreg call.

calc.loss: Calculate total loss of model (Squared hinge loss).

clean.text: Clean text and get it ready for textreg.

cluster.phrases: Cluster phrases based on similarity of appearance.

convert.tm.to.character: Convert tm corpus to vector of strings.

cpp_build.corpus: Driver function for the C++ function.

cpp_textreg: Driver function for the C++ function.

dirtyBathtub: Sample of raw-text OSHA accident summaries.

find.CV.C: K-fold cross-validation to determine optimal tuning parameter

find.threshold.C: Conduct permutation test on labeling to get null distribution...

grab.fragments: Grab all fragments in a corpus with given phrase.

is.fragment.sample: Is object a fragment.sample object?

is.textreg.corpus: Is object a textreg.corpus object?

is.textreg.result: Is object a textreg.result object?

list.table.chart: Graphic showing multiple word lists side-by-side.

make.appearance.matrix: Make phrase appearance matrix from textreg result.

make.count.table: Count number of times documents have a given phrase.

make.CV.chart: Plot K-fold cross validation curves

make.list.table: Collate multiple regression runs.

make.path.matrix: Generate matrix describing gradient descent path of textreg.

make.phrase.correlation.chart: Generate visualization of phrase overlap.

make.phrase.matrix: Make a table of where phrases appear in a corpus

make_search_phrases: Convert phrases to appropriate search string.

make.similarity.matrix: Calculate similarity matrix for set of phrases.

path.matrix.chart: Plot optimization path of textreg.

phrase.count: Count phrase appearance.

phrase.matrix: Make matrix of where phrases appear in corpus.

plot.textreg.result: Plot the sequence of features as they are introduced with the...

predict.textreg.result: Predict labeling with the selected phrases.

print.fragment.sample: Pretty print results of phrase sampling object.

print.textreg.corpus: Pretty print textreg corpus object

print.textreg.result: Pretty print results of textreg regression.

reformat.textreg.model: Clean up output from textreg.

sample.fragments: Sample fragments of text to contextualize a phrase.

save.corpus.to.files: Save corpus to text (and RData) file.

stem.corpus: Step corpus with annotation.

testCorpora: Some small, fake test corpora.

textreg: Sparse regression of labeling vector onto all phrases in a...

textreg-package: Sparse regression package for text that allows for multiple...

tm_gregexpr: Call gregexpr on the content of a tm Corpus.

Files in this package

textreg
textreg/inst
textreg/inst/doc
textreg/inst/doc/bathtub_vignette.R
textreg/inst/doc/bathtub_vignette.pdf
textreg/inst/doc/bathtub_vignette.Rnw
textreg/inst/test-all.R
textreg/src
textreg/src/Makevars
textreg/src/textreg.cpp
textreg/src/Makevars.win
textreg/NAMESPACE
textreg/data
textreg/data/testCorpora.RData
textreg/data/bathtub.RData
textreg/data/dirtyBathtub.RData
textreg/R
textreg/R/prediction_code.R textreg/R/cross_validation_code.R textreg/R/vizualize_phrases.R textreg/R/sequenceplotter.R textreg/R/package_and_data_documentation.R textreg/R/textreg.R textreg/R/clean_text.R textreg/R/stempp.R textreg/R/text_searching.R textreg/R/make_word_lists.R
textreg/vignettes
textreg/vignettes/bathtub_vignette.Rnw
textreg/MD5
textreg/build
textreg/build/vignette.rds
textreg/DESCRIPTION
textreg/man
textreg/man/convert.tm.to.character.Rd textreg/man/save.corpus.to.files.Rd textreg/man/build.corpus.Rd textreg/man/textreg.Rd textreg/man/cluster.phrases.Rd textreg/man/make.phrase.matrix.Rd textreg/man/phrase.matrix.Rd textreg/man/textreg-package.Rd textreg/man/testCorpora.Rd textreg/man/print.textreg.corpus.Rd textreg/man/find.threshold.C.Rd textreg/man/make.similarity.matrix.Rd textreg/man/calc.loss.Rd textreg/man/bathtub.Rd textreg/man/make.list.table.Rd textreg/man/find.CV.C.Rd textreg/man/stem.corpus.Rd textreg/man/make.count.table.Rd textreg/man/cpp_textreg.Rd textreg/man/print.textreg.result.Rd textreg/man/dirtyBathtub.Rd textreg/man/make.CV.chart.Rd textreg/man/print.fragment.sample.Rd textreg/man/make.path.matrix.Rd textreg/man/grab.fragments.Rd textreg/man/is.textreg.corpus.Rd textreg/man/make.phrase.correlation.chart.Rd textreg/man/sample.fragments.Rd textreg/man/path.matrix.chart.Rd textreg/man/list.table.chart.Rd textreg/man/is.fragment.sample.Rd textreg/man/make_search_phrases.Rd textreg/man/make.appearance.matrix.Rd textreg/man/reformat.textreg.model.Rd textreg/man/predict.textreg.result.Rd textreg/man/tm_gregexpr.Rd textreg/man/clean.text.Rd textreg/man/phrase.count.Rd textreg/man/plot.textreg.result.Rd textreg/man/is.textreg.result.Rd textreg/man/cpp_build.corpus.Rd

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.