# grpPUlasso: Solve PU problem with lasso or group lasso penalty. In PUlasso: High-Dimensional Variable Selection with Presence-Only Data

## Description

Fit a model using PUlasso algorithm over a regularization path. The regularization path is computed at a grid of values for the regularization parameter lambda.

## Usage

 ```1 2 3 4 5 6 7 8``` ```grpPUlasso(X, z, py1, initial_coef = NULL, group = 1:ncol(X), penalty = NULL, lambda = NULL, nlambda = 100, lambdaMinRatio = ifelse(N < p, 0.05, 0.005), maxit = ifelse(method == "CD", 1000, N * 10), maxit_inner = 1e+05, weights = NULL, eps = 1e-04, inner_eps = 0.01, verbose = FALSE, stepSize = NULL, stepSizeAdjustment = NULL, batchSize = 1, updateFrequency = N, samplingProbabilities = NULL, method = c("CD", "GD", "SGD", "SVRG", "SAG"), trace = c("none", "param", "fVal", "all")) ```

## Arguments

 `X` Input matrix; each row is an observation. Can be a matrix or a sparse matrix. `z` Response vector representing whether an observation is labeled or unlabeled. `py1` True prevalence Pr(Y=1) `initial_coef` A vector representing an initial point where we start PUlasso algorithm from. `group` A vector representing grouping of the coefficients. For the least ambiguity, it is recommended if group is provided in the form of vector of consecutive ascending integers. `penalty` penalty to be applied to the model. Default is sqrt(group size) for each of the group. `lambda` A user supplied sequence of lambda values. If unspecified, the function automatically generates its own lambda sequence based on nlambda and lambdaMinRatio. `nlambda` The number of lambda values. `lambdaMinRatio` Smallest value for lambda, as a fraction of lambda.max which leads to the intercept only model. `maxit` Maximum number of iterations. `maxit_inner` Maximum number of iterations for a quadratic sub-problem for CD. `weights` observation weights. Default is 1 for each observation. `eps` Convergence threshold for the outer loop. The algorithm iterates until the maximum change in coefficients is less than eps in the outer loop. `inner_eps` Convergence threshold for the inner loop. The algorithm iterates until the maximum change in coefficients is less than eps in the inner loop. `verbose` A logical value. if TRUE, the function prints out the fitting process. `stepSize` A step size for gradient-based optimization. if NULL, a step size is taken to be stepSizeAdj/mean(Li) where Li is a Lipschitz constant for ith sample `stepSizeAdjustment` A step size adjustment. By default, adjustment is 1 for GD and SGD, 1/8 for SVRG and 1/16 for SAG. `batchSize` A batch size. Default is 1. `updateFrequency` An update frequency of full gradient for method =="SVRG" `samplingProbabilities` sampling probabilities for each of samples for stochastic gradient-based optimization. if NULL, each sample is chosen proportionally to Li. `method` Optimization method. Default is Coordinate Descent. CD for Coordinate Descent, GD for Gradient Descent, SGD for Stochastic Gradient Descent, SVRG for Stochastic Variance Reduction Gradient, SAG for Stochastic Averaging Gradient. `trace` An option for saving intermediate quantities. All intermediate standardized-scale parameter estimates(trace=="param"), objective function values at each iteration(trace=="fVal"), or both(trace=="all") are saved in optResult. Since this is computationally very heavy, it should be only used for decently small-sized dataset and small maxit. A default is "none".

## Value

coef A p by length(lambda) matrix of coefficients

std_coef A p by length(lambda) matrix of coefficients in a standardized scale

lambda The actual sequence of lambda values used.

nullDev Null deviance defined to be 2*(logLik_sat -logLik_null)

deviance Deviance defined to be 2*(logLik_sat -logLik(model))

optResult A list containing the result of the optimization. fValues, subGradients contain objective function values and subgradient vectors at each lambda value. If trace = TRUE, corresponding intermediate quantities are saved as well.

iters Number of iterations(EM updates) if method = "CD". Number of steps taken otherwise.

## Examples

 ```1 2``` ```data("simulPU") fit<-grpPUlasso(X=simulPU\$X,z=simulPU\$z,py1=simulPU\$truePY1) ```

PUlasso documentation built on May 2, 2019, 11:40 a.m.