findReasonableLambdaHelper: Function to run on a dataset with not too much missing data...
In EMLasso: Fit (logistic) LASSO when there is missing data in the predictors

Description Usage Arguments Value Note Author(s) Examples

View source: R/findReasonableLambdaHelper.R

Expects a singly imputed dataset and fits a logistic LASSO so the user can pick a set that wil probably be interesting.

  findReasonableLambdaHelper(ds, out, family = "binomial",
    showFirst = 20, showPlot = TRUE, type.measure = "auc",
    repsNeededForFirstOccurrence = 3,
    weights = rep(1, nrow(ds)), ..., verbosity = 0,
    minNumHigher = 20, minNumLower = 20, maxNumLower = 30,
    imputeDs2FitDsProperties = normalImputationConversion(),
    standardize = FALSE, nfolds = 10)

  ## S3 method for class 'LambdaHelper'
object[i, j, drop = TRUE]

  getLambdas(x, ...)

  ## S3 method for class 'lambdaregion'
getLambdas(x, ...)

  ## S3 method for class 'LambdaHelper'
getLambdas(x, ...)

`ds`	dataset to investigate
`out`	outcome vector
`family`	see `glmnet`. Defaults to "binomial" (i.e. lasso penalized logistic regression).
`showFirst`	show the top coefficients (first `showFirst` occurring)
`showPlot`	if `TRUE` (the default), visually supports the decision
`type.measure`	see `cv.glmnet`
`repsNeededForFirstOccurrence`	How many times (i.e. for how many lambda values) must a coefficient be consecutively nonzero before we count it as "occurring"
`weights`	vector with weight to be assigned to each row of `ds`
`...`	passed on to `plotex` (if relevant)
`verbosity`	The higher this value, the more levels of progress and debug information is displayed (note: in R for Windows, turn off buffered output)
`minNumHigher`	How many lambdas higher than the optimum do you minimally want (if available)
`minNumLower`	How many lambdas lower than the optimum do you minimally want (if available)
`maxNumLower`	How many lambdas lower than the optimum do you maximally want
`imputeDs2FitDsProperties`	see `imputeDs2FitDs` and `EMLasso`
`standardize`	see `glmnet`. Defaults to FALSE.
`nfolds`	see `glmnet`. Defaults to 10.
`object`	`LambdaHelper`
`i`	row index
`j`	column index. If this is missing, the `i`th lambda is returned
`drop`	if `TRUE` the result is coerced to the simplest structure possible
`x`	object to find 'interesting' set of lambdas for

list of class "LambdaHelper":

`topres`	`data.frame` with `showFirst` rows, and columns: `variable` (name), `lambda`,`critl` (lower bound of criterion), `crit` (estimate of criterion), `critu` (upper bound of criterion), `critsd` (sd of criterion), `index` (at which lambda index does this variable first occur)
`allLambda`	vector of lambda values
`regionDfr`	`data.frame` w 3 rows 3 columns: `name` (values: "lower lambda", "optimum", and "higher lambda"), `idx` and `lambda`
`regionOfInterestData`	see `getMinMaxPosLikeGlmnet`

depends on the parameters

vector of lambda values, normally high to low

EMLasso is pretty heavy and has to be run per lambda. This functions helps preselect some lambda values, and can typically avoid useless calculations for non-interesting lambda values.

Nick Sabbe nick.sabbe@ugent.be

aDfr<-generateTypicalIndependentDfr(numCat=10, numCnt=10, numObs=100, catProbs=rep(1/3,3),
rcnt=typicalRandomNorm, doShuffle=TRUE, verbosity=1)

outlins<- -mean(aDfr$cnt1)+aDfr$cnt1+2*(aDfr$cat1=="b")
outprobs<-expit(outlins)
y<-factor(sapply(outprobs, function(prob){sample(c("no", "yes"), 1, prob=c(1-prob,prob))}))

rlh<-findReasonableLambdaHelper(aDfr, y, verbosity=10)
data(emlcvfit, package="EMLasso")
rlh<-findReasonableLambdaHelper(aDfr, y, verbosity=10)
rlh[1]
rlh[1:5, NULL]
data(emlcvfit, package="EMLasso")
rlh<-findReasonableLambdaHelper(aDfr, y, verbosity=10)
getLambdas(rlh$regionOfInterestData)
data(emlcvfit, package="EMLasso")
rlh<-findReasonableLambdaHelper(aDfr, y, verbosity=10)
getLambdas(rlh)