Description Usage Arguments Details Value See Also Examples
Fit a pliable lasso model over a path of regularization values
1 2 3 4 5 6 | pliable(x, z, y, lambda = NULL, nlambda = 50, alpha = 0.5,
family = c("gaussian", "binomial"), lambda.min.ratio = NULL,
w = NULL, thr = 1e-05, tt = NULL, penalty.factor = rep(1,
ncol(x)), maxit = 10000, mxthit = 100, mxkbt = 100, mxth = 1000,
kpmax = 1e+05, kthmax = 1e+05, verbose = FALSE, maxinter = 500,
zlinear = TRUE, screen = NULL)
|
x |
n by p matrix of predictors |
z |
n by nz matrix of modifying variables. The elements of z may represent quantitative or categorical variables, or a mixture of the two. Categorical varables should be coded by 0-1 dummy variables: for a k-level variable, one can use either k or k-1 dummy variables. |
y |
n-vector of responses. The x and z variables are centered in the function. We recommmend that x and z also be standardized before the call via x<-scale(x,T,T)/sqrt(nrow(x)-1); z<-scale(z,T,T)/sqrt(nrow(z)-1) |
lambda |
decreasing sequence of lam values desired. Default NULL. If NULL, computed in function |
nlambda |
number of lambda values desired (default 50). |
alpha |
mixing parameter- default 0.5 |
family |
response type- either "gaussian", "binomial". In the binomial case, y should be 0s and 1s. family "cox" (Prop Hazards model for right-censored data) will be implemented if there is popular demand |
lambda.min.ratio |
the smallest value for lambda , as a fraction of lambda.max, the (data derived) entry value (i.e. the smallest value for which all coefficients are zero). Default is 0.001 if n>p, and 0.01 if n< p. Along with "screen" (see below), this parameter is a main parameter controlling the runtime of the algorithm, With a smaller lambda.min.ratio, more interactions will typically be entered. If the algorithm stalls or stops near the end of the path, trying increasing lambda.min.ratio On the other hand, if a more complex fit is desired, or the cross-validation curve from cv.pliable is still going down at the end of the path, consider decreasing lambda.min.ratio |
w |
n-vector of observation wts (Default: all ones). Not currently used with Cox family. |
thr |
convergence threshold for estimation of beta and joint estimation of (beta,theta). Default 1e-5 |
tt |
initial factor for Generalized grad algorithm for joint estimation of main effects and interaction parameters- default 0.1/mean(x^2) |
penalty.factor |
vector of penalty factors, one for each X predictors. Default: a vector of ones |
maxit |
maximum number of iterations in loop for one lambda. Default 10000 |
mxthit |
maximum number of theta solution iterations. Default 1000 |
mxkbt |
maximum number of backtracking iterations. Default 100 |
mxth |
maximum internal theta storage. Default 1000 |
kpmax |
maximum dimension of main effect storage. Default 100000 |
kthmax |
maximum dimension of interaction storage. Default 100000 |
verbose |
Should information be printed along the way? Default FALSE |
maxinter |
Maximum number of interactions allowed in the model. Default 500 |
zlinear |
Should an (unregularized) linear term for z be included in the model? Default TRUE. |
screen |
Screening quantile for features: for example a value screen=0.7 means that we keep only those features with univariate t-stats above the 0.7 quantile of all features. Default is NULL, meaning that all features are kept |
This function fits a pliable lasso model over a sequence of lambda (regularization) values. The inputs are a feature matrix x, response vector y and a matrix of modifying variables z. The pliable lasso is a generalization of the lasso in which the weights multiplying the features x are allowed to vary as a function of a set of modifying variables z. The model has the form:
yhat=beta0+ z%*% betaz+ rowSums( x[,j]*beta[j] + x[,j]*z%*% theta[j,])
The parameters are beta0 (scalar), betaz (nz-vector), beta (p-vector) and theta (p by nz). The (convex) optimization problem is
minimize_(beta0,betaz,beta,theta) (1/(2n))* sum_i ((y_i-yhat_i)^2) +(1-alpha)*lambda* sum_j (||(beta_j,theta_j)||_2 +||theta_j)|| +alpha*lambda* sum_j||theta_j||_1
A blockwise coordinate descent procedure is employed for the optimization.
A trained pliable object with the following components:
param: values of lambda used
beta: matrix of estimated beta coefficients ncol(x) by length(lambda).
theta: array of estimated theta coefficients ncol(x) by ncol(z) by length(lambda).
betaz: estimated (main effect) coefficients for z ncol(z) by length(lambda).
xbar: rowMeans of x
zbar: rowMeans of z
w: observation weights used
args: list of arguments in the original call
nulldev: null deviance for model
dev.ratio: deviance/null.dev for each model in path
nbeta: number of nonzero betas for each model in path
nbeta.with.int: number of betas with some nonzero theta values, for each model in path
ntheta: number of nonzero theta for each model in path
df: rough degrees of freedom for each model in path (=nbeta)
istor: for internal use
fstor: for internal use
thstor: for internal use
jerr: for internal use
call: the call made to pliable
cv.pliable, predict.pliable, plot.pliable, plot.cv.pliable
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | # Train a pliable lasso model- Gaussian case
# Generate some data
set.seed(9334)
n = 20; p = 3 ;nz=3
x = matrix(rnorm(n*p), n, p)
mx=colMeans(x)
sx=sqrt(apply(x,2,var))
x=scale(x,mx,sx)
z =matrix(rnorm(n*nz),n,nz)
mz=colMeans(z)
sz=sqrt(apply(z,2,var))
z=scale(z,mz,sz)
y =4*x[,1] +5*x[,1]*z[,3]+ 3*rnorm(n)
# Fit model
fit = pliable(x,z,y)
fit #examine results
plot(fit) #plot path
#Estimate main tuning parameter lambda by cross-validation
#NOT RUN
# cvfit=cv.pliable(fit,x,z,y,nfolds=5) # returns lambda.min and lambda.1se,
# the minimizing and 1se rules
# plot(cvfit)
# Predict using the fitted model
#ntest=500
#xtest = matrix(rnorm(ntest*p),ntest,p)
#xtest=scale(xtest,mx,sx)
#ztest =matrix(rnorm(ntest*nz),ntest,nz)
#ztest=scale(ztest,mz,sz)
#ytest = 4*xtest[,1] +5*xtest[,1]*ztest[,3]+ 3*rnorm(ntest)
#pred= predict(fit,xtest,ztest,lambda=cvfit$lambda.min)
#plot(ytest,pred)
#Binary outcome example
n = 20 ; p = 3 ;nz=3
x = matrix(rnorm(n*p), n, p)
mx=colMeans(x)
sx=sqrt(apply(x,2,var))
x=scale(x,mx,sx)
z =matrix(rnorm(n*nz),n,nz)
mz=colMeans(z)
sz=sqrt(apply(z,2,var))
z=scale(z,mz,sz)
y =4*x[,1] +5*x[,1]*z[,3]+ 3*rnorm(n)
y=1*(y>0)
fit = pliable(x,z,y,family="binomial")
# Example where z is not observed in the test set, but predicted
# from a supervised learning algorithm
n = 20; p = 3 ;nz=3
x = matrix(rnorm(n*p), n, p)
mx=colMeans(x)
sx=sqrt(apply(x,2,var))
x=scale(x,mx,sx)
z =matrix(rnorm(n*nz),n,nz)
mz=colMeans(z)
sz=sqrt(apply(z,2,var))
z=scale(z,mz,sz)
y =4*x[,1] +5*x[,1]*z[,3]+ 3*rnorm(n)
fit = pliable(x,z,y)
# predict z from x; here we use glmnet, but any other supervised method can be used
zfit=cv.glmnet(x,z,family="mgaussian")
# Predict using the fitted model
# ntest=100
#xtest =matrix(rnorm(ntest*nz),ntest,p)
#xtest=scale(xtest,mx,sx)
#ztest =predict(zfit,xtest,s=cvfit$lambda.min)[,,1]
#ytest = 4*xtest[,1] +5*xtest[,1]*ztest[,3]+ 3*rnorm(ntest)
#pred= predict(fit,xtest,ztest)
#plot(ytest,pred[,27]) # Just used 27th path member, for illustration; see cv.pliable for
# details of how to cross-validate in this setting
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.