pliable: Fit a pliable lasso model over a path of regularization...
In pliable: The Pliable Lasso

Description Usage Arguments Details Value See Also Examples

View source: R/funs.R

Fit a pliable lasso model over a path of regularization values

pliable(x, z, y, lambda = NULL, nlambda = 50, alpha = 0.5,
  family = c("gaussian", "binomial"), lambda.min.ratio = NULL,
  w = NULL, thr = 1e-05, tt = NULL, penalty.factor = rep(1,
  ncol(x)), maxit = 10000, mxthit = 100, mxkbt = 100, mxth = 1000,
  kpmax = 1e+05, kthmax = 1e+05, verbose = FALSE, maxinter = 500,
  zlinear = TRUE, screen = NULL)

`x`	n by p matrix of predictors
`z`	n by nz matrix of modifying variables. The elements of z may represent quantitative or categorical variables, or a mixture of the two. Categorical varables should be coded by 0-1 dummy variables: for a k-level variable, one can use either k or k-1 dummy variables.
`y`	n-vector of responses. The x and z variables are centered in the function. We recommmend that x and z also be standardized before the call via x<-scale(x,T,T)/sqrt(nrow(x)-1); z<-scale(z,T,T)/sqrt(nrow(z)-1)
`lambda`	decreasing sequence of lam values desired. Default NULL. If NULL, computed in function
`nlambda`	number of lambda values desired (default 50).
`alpha`	mixing parameter- default 0.5
`family`	response type- either "gaussian", "binomial". In the binomial case, y should be 0s and 1s. family "cox" (Prop Hazards model for right-censored data) will be implemented if there is popular demand
`lambda.min.ratio`	the smallest value for lambda , as a fraction of lambda.max, the (data derived) entry value (i.e. the smallest value for which all coefficients are zero). Default is 0.001 if n>p, and 0.01 if n< p. Along with "screen" (see below), this parameter is a main parameter controlling the runtime of the algorithm, With a smaller lambda.min.ratio, more interactions will typically be entered. If the algorithm stalls or stops near the end of the path, trying increasing lambda.min.ratio On the other hand, if a more complex fit is desired, or the cross-validation curve from cv.pliable is still going down at the end of the path, consider decreasing lambda.min.ratio
`w`	n-vector of observation wts (Default: all ones). Not currently used with Cox family.
`thr`	convergence threshold for estimation of beta and joint estimation of (beta,theta). Default 1e-5
`tt`	initial factor for Generalized grad algorithm for joint estimation of main effects and interaction parameters- default 0.1/mean(x^2)
`penalty.factor`	vector of penalty factors, one for each X predictors. Default: a vector of ones
`maxit`	maximum number of iterations in loop for one lambda. Default 10000
`mxthit`	maximum number of theta solution iterations. Default 1000
`mxkbt`	maximum number of backtracking iterations. Default 100
`mxth`	maximum internal theta storage. Default 1000
`kpmax`	maximum dimension of main effect storage. Default 100000
`kthmax`	maximum dimension of interaction storage. Default 100000
`verbose`	Should information be printed along the way? Default FALSE
`maxinter`	Maximum number of interactions allowed in the model. Default 500
`zlinear`	Should an (unregularized) linear term for z be included in the model? Default TRUE.
`screen`	Screening quantile for features: for example a value screen=0.7 means that we keep only those features with univariate t-stats above the 0.7 quantile of all features. Default is NULL, meaning that all features are kept

This function fits a pliable lasso model over a sequence of lambda (regularization) values. The inputs are a feature matrix x, response vector y and a matrix of modifying variables z. The pliable lasso is a generalization of the lasso in which the weights multiplying the features x are allowed to vary as a function of a set of modifying variables z. The model has the form:

yhat=beta0+ z%*% betaz+ rowSums( x[,j]*beta[j] + x[,j]*z%*% theta[j,])

The parameters are beta0 (scalar), betaz (nz-vector), beta (p-vector) and theta (p by nz). The (convex) optimization problem is

minimize_(beta0,betaz,beta,theta) (1/(2n))* sum_i ((y_i-yhat_i)^2) +(1-alpha)*lambda* sum_j (||(beta_j,theta_j)||_2 +||theta_j)|| +alpha*lambda* sum_j||theta_j||_1

A blockwise coordinate descent procedure is employed for the optimization.

A trained pliable object with the following components:

param: values of lambda used

beta: matrix of estimated beta coefficients ncol(x) by length(lambda).

theta: array of estimated theta coefficients ncol(x) by ncol(z) by length(lambda).

betaz: estimated (main effect) coefficients for z ncol(z) by length(lambda).

xbar: rowMeans of x

zbar: rowMeans of z

w: observation weights used

args: list of arguments in the original call

nulldev: null deviance for model

dev.ratio: deviance/null.dev for each model in path

nbeta: number of nonzero betas for each model in path

nbeta.with.int: number of betas with some nonzero theta values, for each model in path

ntheta: number of nonzero theta for each model in path

df: rough degrees of freedom for each model in path (=nbeta)

istor: for internal use

fstor: for internal use

thstor: for internal use

jerr: for internal use

call: the call made to pliable

cv.pliable, predict.pliable, plot.pliable, plot.cv.pliable

# Train a pliable lasso model- Gaussian case
#  Generate some data
set.seed(9334)
n = 20; p = 3 ;nz=3
x = matrix(rnorm(n*p), n, p)
mx=colMeans(x)
sx=sqrt(apply(x,2,var))
x=scale(x,mx,sx)
z =matrix(rnorm(n*nz),n,nz)
mz=colMeans(z)
sz=sqrt(apply(z,2,var))
z=scale(z,mz,sz)
y =4*x[,1] +5*x[,1]*z[,3]+ 3*rnorm(n)
 # Fit model
 fit = pliable(x,z,y)
 fit   #examine results
plot(fit)  #plot path

#Estimate main tuning parameter lambda by cross-validation
 #NOT RUN
 # cvfit=cv.pliable(fit,x,z,y,nfolds=5) # returns lambda.min and lambda.1se,
    #  the minimizing and 1se rules
 # plot(cvfit)

# Predict using the fitted model
#ntest=500
#xtest = matrix(rnorm(ntest*p),ntest,p)
#xtest=scale(xtest,mx,sx)
#ztest =matrix(rnorm(ntest*nz),ntest,nz)
#ztest=scale(ztest,mz,sz)
#ytest = 4*xtest[,1] +5*xtest[,1]*ztest[,3]+ 3*rnorm(ntest)
#pred= predict(fit,xtest,ztest,lambda=cvfit$lambda.min)
#plot(ytest,pred)

#Binary outcome example
n = 20 ; p = 3 ;nz=3
x = matrix(rnorm(n*p), n, p)
mx=colMeans(x)
sx=sqrt(apply(x,2,var))
x=scale(x,mx,sx)
z =matrix(rnorm(n*nz),n,nz)
mz=colMeans(z)
sz=sqrt(apply(z,2,var))
z=scale(z,mz,sz)
y =4*x[,1] +5*x[,1]*z[,3]+ 3*rnorm(n)
y=1*(y>0)
 fit = pliable(x,z,y,family="binomial")



# Example where z is not observed in the test set, but predicted
 #        from a supervised learning algorithm


n = 20; p = 3 ;nz=3
x = matrix(rnorm(n*p), n, p)
mx=colMeans(x)
sx=sqrt(apply(x,2,var))
x=scale(x,mx,sx)
z =matrix(rnorm(n*nz),n,nz)
mz=colMeans(z)
sz=sqrt(apply(z,2,var))
z=scale(z,mz,sz)
y =4*x[,1] +5*x[,1]*z[,3]+ 3*rnorm(n)

fit = pliable(x,z,y)
# predict z  from x; here we use glmnet, but any other supervised method can be used


zfit=cv.glmnet(x,z,family="mgaussian")

# Predict using the fitted model
# ntest=100
#xtest =matrix(rnorm(ntest*nz),ntest,p)
#xtest=scale(xtest,mx,sx)
#ztest =predict(zfit,xtest,s=cvfit$lambda.min)[,,1]
#ytest = 4*xtest[,1] +5*xtest[,1]*ztest[,3]+ 3*rnorm(ntest)

#pred= predict(fit,xtest,ztest)
#plot(ytest,pred[,27])  # Just used 27th path member, for illustration; see cv.pliable for
#  details of how to cross-validate in this setting