find.threshold.C: Conduct permutation test on labeling to get null distribution...
In textreg: n-Gram Text Regression, aka Concise Comparative Summarization

Description Usage Arguments Details Value Examples

First determines what regularization will give null model on labeling. Then permutes labeling repeatidly, recording what regularization will give null model for permuted labeling. This allows for permutation-style inference on the relationship of the labeling to the text, and allows for appropriate selection of the tuning parameter.

find.threshold.C(corpus, labeling, banned = NULL, R = 0,
  objective.function = 2, a = 1, verbosity = 0,
  step.verbosity = verbosity, positive.only = FALSE,
  binary.features = FALSE, no.regularization = FALSE,
  positive.weight = 1, Lq = 2, min.support = 1, min.pattern = 1,
  max.pattern = 100, gap = 0, token.type = "word",
  convergence.threshold = 1e-04)

`corpus`	A list of strings or a corpus from the `tm` package.
`labeling`	A vector of +1/-1 or TRUE/FALSE indicating which documents are considered relevant and which are baseline. The +1/-1 can contain 0 whcih means drop the document.
`banned`	List of words that should be dropped from consideration.
`R`	Number of times to scramble labling. 0 means use given labeling and find single C value.
`objective.function`	2 is hinge loss. 0 is something. 1 is something else.
`a`	What percent of regularization should be L1 loss (a=1) vs L2 loss (a=0)
`verbosity`	Level of output. 0 is no printed output.
`step.verbosity`	Level of output for line searches. 0 is no printed output.
`positive.only`	Disallow negative features if true
`binary.features`	Just code presence/absence of a feature in a document rather than count of feature in document.
`no.regularization`	Do not renormalize the features at all. (Lq will be ignored.)
`positive.weight`	Scale weight pf all positively marked documents by this value. (1, i.e., no scaling) is default) NOT FULLY IMPLEMENTED
`Lq`	Rescaling to put on the features (2 is standard). Can be from 1 up. Values above 10 invoke an infinity-norm.
`min.support`	Only consider phrases that appear this many times or more.
`min.pattern`	Only consider phrases this long or longer
`max.pattern`	Only consider phrases this short or shorter
`gap`	Allow phrases that have wildcard words in them. Number is how many wildcards in a row.
`token.type`	"word" or "character" as tokens.
`convergence.threshold`	How to decide if descent has converged. (Will go for three steps at this threshold to check for flatness.)

Important: use the same parameter values as used with the original textreg call!

A list of numbers (the Cs) R+1 long. The first number is always the C used for the _passed_ labeling. The remainder are shuffles.

1 2	data( testCorpora ) find.threshold.C( testCorpora$testI$corpus, testCorpora$testI$labelI, c(), R=5, verbosity=1 )

textreg documentation built on May 2, 2019, 8:34 a.m.

textreg index

Package overview README.md Using the textreg package Using the textreg package

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

textreg
n-Gram Text Regression, aka Concise Comparative Summarization

find.threshold.C: Conduct permutation test on labeling to get null distribution...
In textreg: n-Gram Text Regression, aka Concise Comparative Summarization

Description

Usage

Arguments

Details

Value

Examples

Related to find.threshold.C in textreg...

R Package Documentation

Browse R Packages

We want your feedback!

textreg n-Gram Text Regression, aka Concise Comparative Summarization

find.threshold.C: Conduct permutation test on labeling to get null distribution... In textreg: n-Gram Text Regression, aka Concise Comparative Summarization

Description

Usage

Arguments

Details

Value

Examples

Related to find.threshold.C in textreg...

R Package Documentation

Browse R Packages

We want your feedback!

textreg
n-Gram Text Regression, aka Concise Comparative Summarization

find.threshold.C: Conduct permutation test on labeling to get null distribution...
In textreg: n-Gram Text Regression, aka Concise Comparative Summarization