grpPUlasso: Solve PU problem with lasso or group lasso penalty.

Description Usage Arguments Value Examples

View source: R/grpPUlasso.R

Description

Fit a model using PUlasso algorithm over a regularization path. The regularization path is computed at a grid of values for the regularization parameter lambda.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
grpPUlasso(
  X,
  z,
  py1,
  initial_coef = NULL,
  group = 1:ncol(X),
  penalty = NULL,
  lambda = NULL,
  nlambda = 100,
  lambdaMinRatio = ifelse(N < p, 0.05, 0.005),
  maxit = ifelse(method == "CD", 1000, N * 10),
  maxit_inner = 1e+05,
  weights = NULL,
  eps = 1e-04,
  inner_eps = 0.01,
  verbose = FALSE,
  stepSize = NULL,
  stepSizeAdjustment = NULL,
  batchSize = 1,
  updateFrequency = N,
  samplingProbabilities = NULL,
  method = c("CD", "GD", "SGD", "SVRG", "SAG"),
  trace = c("none", "param", "fVal", "all")
)

Arguments

X

Input matrix; each row is an observation. Can be a matrix or a sparse matrix.

z

Response vector representing whether an observation is labeled or unlabeled.

py1

True prevalence Pr(Y=1)

initial_coef

A vector representing an initial point where we start PUlasso algorithm from.

group

A vector representing grouping of the coefficients. For the least ambiguity, it is recommended if group is provided in the form of vector of consecutive ascending integers.

penalty

penalty to be applied to the model. Default is sqrt(group size) for each of the group.

lambda

A user supplied sequence of lambda values. If unspecified, the function automatically generates its own lambda sequence based on nlambda and lambdaMinRatio.

nlambda

The number of lambda values.

lambdaMinRatio

Smallest value for lambda, as a fraction of lambda.max which leads to the intercept only model.

maxit

Maximum number of iterations.

maxit_inner

Maximum number of iterations for a quadratic sub-problem for CD.

weights

observation weights. Default is 1 for each observation.

eps

Convergence threshold for the outer loop. The algorithm iterates until the maximum change in coefficients is less than eps in the outer loop.

inner_eps

Convergence threshold for the inner loop. The algorithm iterates until the maximum change in coefficients is less than eps in the inner loop.

verbose

A logical value. if TRUE, the function prints out the fitting process.

stepSize

A step size for gradient-based optimization. if NULL, a step size is taken to be stepSizeAdj/mean(Li) where Li is a Lipschitz constant for ith sample

stepSizeAdjustment

A step size adjustment. By default, adjustment is 1 for GD and SGD, 1/8 for SVRG and 1/16 for SAG.

batchSize

A batch size. Default is 1.

updateFrequency

An update frequency of full gradient for method =="SVRG"

samplingProbabilities

sampling probabilities for each of samples for stochastic gradient-based optimization. if NULL, each sample is chosen proportionally to Li.

method

Optimization method. Default is Coordinate Descent. CD for Coordinate Descent, GD for Gradient Descent, SGD for Stochastic Gradient Descent, SVRG for Stochastic Variance Reduction Gradient, SAG for Stochastic Averaging Gradient.

trace

An option for saving intermediate quantities. All intermediate standardized-scale parameter estimates(trace=="param"), objective function values at each iteration(trace=="fVal"), or both(trace=="all") are saved in optResult. Since this is computationally very heavy, it should be only used for decently small-sized dataset and small maxit. A default is "none".

Value

coef A p by length(lambda) matrix of coefficients

std_coef A p by length(lambda) matrix of coefficients in a standardized scale

lambda The actual sequence of lambda values used.

nullDev Null deviance defined to be 2*(logLik_sat -logLik_null)

deviance Deviance defined to be 2*(logLik_sat -logLik(model))

optResult A list containing the result of the optimization. fValues, subGradients contain objective function values and subgradient vectors at each lambda value. If trace = TRUE, corresponding intermediate quantities are saved as well.

iters Number of iterations(EM updates) if method = "CD". Number of steps taken otherwise.

Examples

1
2
data("simulPU")
fit<-grpPUlasso(X=simulPU$X,z=simulPU$z,py1=simulPU$truePY1)

PUlasso documentation built on Jan. 17, 2021, 9:05 a.m.