textreg: n-Gram Text Regression, aka Concise Comparative Summarization

Function for sparse regression on raw text, regressing a labeling vector onto a feature space consisting of all possible phrases.

Install the latest version of this package by entering the following in R:
AuthorLuke Miratrix
Date of publication2017-03-17 07:24:28 UTC
MaintainerLuke Miratrix <lmiratrix@stat.harvard.edu>
LicenseGPL (>= 2)

View on CRAN

Man pages

bathtub: Sample of cleaned OSHA accident summaries.

build.corpus: Build a corpus that can be used in the textreg call.

calc.loss: Calculate total loss of model (Squared hinge loss).

clean.text: Clean text and get it ready for textreg.

cluster.phrases: Cluster phrases based on similarity of appearance.

convert.tm.to.character: Convert tm corpus to vector of strings.

cpp_build.corpus: Driver function for the C++ function.

cpp_textreg: Driver function for the C++ function.

dirtyBathtub: Sample of raw-text OSHA accident summaries.

find.CV.C: K-fold cross-validation to determine optimal tuning parameter

find.threshold.C: Conduct permutation test on labeling to get null distribution...

grab.fragments: Grab all fragments in a corpus with given phrase.

is.fragment.sample: Is object a fragment.sample object?

is.textreg.corpus: Is object a textreg.corpus object?

is.textreg.result: Is object a textreg.result object?

list.table.chart: Graphic showing multiple word lists side-by-side.

make.appearance.matrix: Make phrase appearance matrix from textreg result.

make.count.table: Count number of times documents have a given phrase.

make.CV.chart: Plot K-fold cross validation curves

make.list.table: Collate multiple regression runs.

make.path.matrix: Generate matrix describing gradient descent path of textreg.

make.phrase.correlation.chart: Generate visualization of phrase overlap.

make.phrase.matrix: Make a table of where phrases appear in a corpus

make_search_phrases: Convert phrases to appropriate search string.

make.similarity.matrix: Calculate similarity matrix for set of phrases.

path.matrix.chart: Plot optimization path of textreg.

phrase.count: Count phrase appearance.

phrase.matrix: Make matrix of where phrases appear in corpus.

phrases: Get the phrases from the textreg.result object?

plot.textreg.result: Plot the sequence of features as they are introduced with the...

predict.textreg.result: Predict labeling with the selected phrases.

print.fragment.sample: Pretty print results of phrase sampling object.

print.textreg.corpus: Pretty print textreg corpus object

print.textreg.result: Pretty print results of textreg regression.

reformat.textreg.model: Clean up output from textreg.

sample.fragments: Sample fragments of text to contextualize a phrase.

save.corpus.to.files: Save corpus to text (and RData) file.

stem.corpus: Step corpus with annotation.

testCorpora: Some small, fake test corpora.

textreg: Sparse regression of labeling vector onto all phrases in a...

textreg-package: Sparse regression package for text that allows for multiple...

tm_gregexpr: Call gregexpr on the content of a tm Corpus.


bathtub Man page
build.corpus Man page
calc.loss Man page
clean.text Man page
cluster.phrases Man page
convert.tm.to.character Man page
cpp_build.corpus Man page
cpp_textreg Man page
dirtyBathtub Man page
find.CV.C Man page
find.threshold.C Man page
fragment.sample Man page
grab.fragments Man page
is.fragment.sample Man page
is.textreg.corpus Man page
is.textreg.result Man page
list.table.chart Man page
make.appearance.matrix Man page
make.count.table Man page
make.CV.chart Man page
make.list.table Man page
make.path.matrix Man page
make.phrase.correlation.chart Man page
make.phrase.matrix Man page
make_search_phrases Man page
make.similarity.matrix Man page
path.matrix.chart Man page
phrase.count Man page
phrase.matrix Man page
phrases Man page
plot.textreg.result Man page
predict.textreg.result Man page
print.fragment.sample Man page
print.textreg.corpus Man page
print.textreg.result Man page
reformat.textreg.model Man page
sample.fragments Man page
save.corpus.to.files Man page
stem.corpus Man page
testCorpora Man page
textreg Man page
textreg.corpus Man page
textreg-package Man page
textreg.result Man page
tm_gregexpr Man page


tests/testthat/test-various-regressions.R tests/testthat/test-cross-validation.R tests/testthat/test-ngram-call-basics.R tests/testthat/test-build.corpus.R tests/testthat/test-make-word-lists.R tests/testthat/test-perfect-predictors.R tests/testthat/test-zero-labeling-and-text-files.R tests/testthat/test-prediction.R tests/testthat/test-vizualization.R tests/testthat/test-tm-compatability.R tests/testthat/test-positive_feature_rescaling.R tests/testthat/test-find-threshold-C.R tests/testthat/test-Lq.R tests/testthat/test-clean-text.R tests/testthat/test-path-matrix.R tests/testthat/test-text-searching.R
R/prediction_code.R R/cross_validation_code.R R/vizualize_phrases.R R/sequenceplotter.R R/package_and_data_documentation.R R/textreg.R R/clean_text.R R/stempp.R R/text_searching.R R/make_word_lists.R
man/convert.tm.to.character.Rd man/phrases.Rd man/save.corpus.to.files.Rd man/build.corpus.Rd man/textreg.Rd man/cluster.phrases.Rd man/make.phrase.matrix.Rd man/phrase.matrix.Rd man/textreg-package.Rd man/testCorpora.Rd man/print.textreg.corpus.Rd man/find.threshold.C.Rd man/make.similarity.matrix.Rd man/calc.loss.Rd man/bathtub.Rd man/make.list.table.Rd man/find.CV.C.Rd man/stem.corpus.Rd man/make.count.table.Rd man/cpp_textreg.Rd man/print.textreg.result.Rd man/dirtyBathtub.Rd man/make.CV.chart.Rd man/print.fragment.sample.Rd man/make.path.matrix.Rd man/grab.fragments.Rd man/is.textreg.corpus.Rd man/make.phrase.correlation.chart.Rd man/sample.fragments.Rd man/path.matrix.chart.Rd man/list.table.chart.Rd man/is.fragment.sample.Rd man/make_search_phrases.Rd man/make.appearance.matrix.Rd man/reformat.textreg.model.Rd man/predict.textreg.result.Rd man/tm_gregexpr.Rd man/clean.text.Rd man/phrase.count.Rd man/plot.textreg.result.Rd man/is.textreg.result.Rd man/cpp_build.corpus.Rd

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.