textreg: Sparse regression of labeling vector onto all phrases in a...
In textreg: n-Gram Text Regression, aka Concise Comparative Summarization

Description Usage Arguments Details Value Examples

Given a labeling and a corpus, find phrases that predict this labeling. This function calls a C++ function that builds a tree of phrases and searches it using greedy coordinate descent to solve the optimization problem associated with the associated sparse regression.

textreg(corpus, labeling, banned = NULL, objective.function = 2,
  C = 1, a = 1, maxIter = 40, verbosity = 1,
  step.verbosity = verbosity, positive.only = FALSE,
  binary.features = FALSE, no.regularization = FALSE,
  positive.weight = 1, Lq = 2, min.support = 1, min.pattern = 1,
  max.pattern = 100, gap = 0, token.type = "word",
  convergence.threshold = 1e-04)

`corpus`	A list of strings or a corpus from the `tm` package.
`labeling`	A vector of +1/-1 or TRUE/FALSE indicating which documents are considered relevant and which are baseline. The +1/-1 can contain 0 whcih means drop the document.
`banned`	List of words that should be dropped from consideration.
`objective.function`	2 is hinge loss. 0 is something. 1 is something else.
`C`	The regularization term. 0 is no regularization.
`a`	What percent of regularization should be L1 loss (a=1) vs L2 loss (a=0)
`maxIter`	Number of gradient descent steps to take (not including intercept adjustments)
`verbosity`	Level of output. 0 is no printed output.
`step.verbosity`	Level of output for line searches. 0 is no printed output.
`positive.only`	Disallow negative features if true
`binary.features`	Just code presence/absence of a feature in a document rather than count of feature in document.
`no.regularization`	Do not renormalize the features at all. (Lq will be ignored.)
`positive.weight`	Scale weight pf all positively marked documents by this value. (1, i.e., no scaling) is default) NOT FULLY IMPLEMENTED
`Lq`	Rescaling to put on the features (2 is standard). Can be from 1 up. Values above 10 invoke an infinity-norm.
`min.support`	Only consider phrases that appear this many times or more.
`min.pattern`	Only consider phrases this long or longer
`max.pattern`	Only consider phrases this short or shorter
`gap`	Allow phrases that have wildcard words in them. Number is how many wildcards in a row.
`token.type`	"word" or "character" as tokens.
`convergence.threshold`	How to decide if descent has converged. (Will go for three steps at this threshold to check for flatness.)