smog.default: Generalized linear model constraint on hierarchical structure...

Description Usage Arguments Details Value Penalized regression model Author(s) References See Also Examples

View source: R/smog.R

Description

smog fits a linear non-penalized phynotype (demographic) variables such as age, gender, treatment, etc, and penalized groups of prognostic effect (main effect) and predictive effect (interaction effect), by satisfying the hierarchy structure: if a predictive effect exists, its prognostic effect must be in the model. It can deal with continuous, binomial or multinomial, and survival response variables, underlying the assumption of Gaussian, binomial (multinomial), and Cox proportional hazard models, respectively. It can accept formula, and output coefficients table, fitted.values, and convergence information produced in the algorithm iterations.

Usage

1
2
3
4
5
6
7
8
9
## Default S3 method:
smog(x, y, g, v, label, lambda1, lambda2, lambda3,
  family = "gaussian", subset = NULL, rho = 10, scale = TRUE,
  eabs = 0.001, erel = 0.001, LL = 1, eta = 1.25, maxitr = 1000,
  ...)

## S3 method for class 'formula'
smog(formula, data = list(), g, v, label, lambda1,
  lambda2, lambda3, ...)

Arguments

x

a model matrix, or a data frame of dimensions n by p, in which the columns represents the predictor variables.

y

response variable, corresponds to the family description. When family is ”gaussian” or ”binomial”, y ought to be a numeric vector of observations of length n; when family is ”coxph”, y represents the survival objects, containing the survival time and the censoring status. See Surv.

g

a vector of group labels for the predictor variables.

v

a vector of binary values, represents whether or not the predictor variables are penalized. Note that 1 indicates penalization and 0 for not penalization.

label

a character vector, represents the type of predictors in terms of treatment, prognostic, and predictive effects by using ”t”, ”prog”, and ”pred”, respectively.

lambda1

penalty parameter for the L2 norm of each group of prognostic and predictive effects.

lambda2

ridge penalty parameter for the squared L2 norm of each group of prognostic and predictive effects.

lambda3

penalty parameter for the L1 norm of predictive effects.

family

a description of the distribution family for the response variable variable. For continuous response variable, family is ”gaussian”; for multinomial or binary response variable, family is ”binomial”; for survival response variable, family is ”coxph”, respectively.

subset

an optional vector specifying a subset of observations to be used in the model fitting. Default is NULL.

rho

the penalty parameter used in the alternating direction method of multipliers (ADMM) algorithm. Default is 10.

scale

whether or not scale the design matrix. Default is TRUE.

eabs

the absolute tolerance used in the ADMM algorithm. Default is 1e-3.

erel

the reletive tolerance used in the ADMM algorithm. Default is 1e-3.

LL

initial value for the Lipschitz continuous constant for approximation to the objective function in the Majorization- Minimization (MM) (or iterative shrinkage-thresholding algorithm (ISTA)). Default is 1.

eta

gradient stepsize for the backtrack line search for the Lipschitz continuous constant. Default is 1.25.

maxitr

the maximum iterations for convergence in the ADMM algorithm. Default is 1000.

...

other relevant arguments that can be supplied to smog.

formula

an object of class ”formula”: a symbolic description of the model to be fitted. Should not include the intercept.

data

an optional data frame, containing the variables in the model.

Details

The formula has the form response ~ 0 + terms where terms is a series of predictor variables to be fitted for response. For gaussian family, the response is a continuous vector. For binomial family, the response is a factor vector, in which the last level denotes the ”pivot”. For coxph family, the response is a Surv object, containing the survival time and censoring status.

Value

smog returns an object of class inhering from ”smog”. The generic accessor functions coef, coefficients, fitted.value, and predict can be used to extract various useful features of the value returned by smog. An object of ”smog” is a list containing at least the following components:

coefficients

Data frame containing the nonzero predictor variables' indexes, names, and estimates. When family is ”binomial”, the estimates have K-1 columns, each column representing the weights for the corresponding group. The last group behaves the ”pivot”.

fitted.values

The fitted mean values for the response variable, for family is ”gaussian”. When family is ”binomial", the fitted.values are the probabilies for each class; when family is ”coxph”, the fitted.values are risk scores.

residuals

The residual is trivial for family = "gaussian". For family = "binomial", Pearson residuals is returned; and for family = "coxph", it yields deviance residuals, i.e., standardized martingale residuals.

model

A list of estimates for the intercept, treatment effect, and prognostic and predictive effects for the selectd biomarkers.

weight

The weight of predictors resulted from the penalty funciton, is used to calculate the degrees of freedom.

DF

the degrees of freedom. When family = ”gaussian”, DF = tr(x_{λ}'(x_{λ}'x_{λ}+W)x_{λ}). For other families, DF is approximated by diag(1/(1+W)).

criteria

model selection criteria, including the correction Akaike's Information Criterion (AIC), AIC, Bayesian Information Criterion (BIC), and the generalized cross-validation score (GCV), respectively. See also cv.smog.

llikelihood

the log-likelihood value for the converged model.

loglike

the penalized log-likelihood values for each iteration in the algorithm.

PrimalError

the averged norms ||β-Z||/√{p} for each iteration, in the ADMM algorithm.

DualError

the averaged norms ||Z^{t+1}-Z^{t}||/√{p} for each iteration, in the ADMM algorithm.

converge

the number of iterations processed in the ADMM algorithm.

call

the matched call.

formula

the formula supplied.

Penalized regression model

The regression function contains the non-penalized predictor variables, and many groups of prognostic and predictive terms, where in each group the prognostic term comes first, followed by the predictive term.

Author(s)

Chong Ma, chongma8903@gmail.com.

References

\insertRef

ma2019structuralsmog

See Also

cv.smog, predict.smog, plot.smog.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
 

n=100;p=20
set.seed(2018)
# generate design matrix x
s=10
x=matrix(0,n,1+2*p)
x[,1]=sample(c(0,1),n,replace = TRUE)
x[,seq(2,1+2*p,2)]=matrix(rnorm(n*p),n,p)
x[,seq(3,1+2*p,2)]=x[,seq(2,1+2*p,2)]*x[,1]

g=c(p+1,rep(1:p,rep(2,p)))  # groups 
v=c(0,rep(1,2*p))           # penalization status
label=c("t",rep(c("prog","pred"),p))  # type of predictor variables

# generate beta
beta=c(rnorm(13,0,2),rep(0,ncol(x)-13))
beta[c(2,4,7,9)]=0

# generate y
data1=x%*%beta
noise1=rnorm(n)
snr1=as.numeric(sqrt(var(data1)/(s*var(noise1))))
y1=data1+snr1*noise1
lfit1=smog(x,y1,g,v,label,lambda1=8,lambda2=0,lambda3=8,family = "gaussian")

## generate binomial data
prob=exp(as.matrix(x)%*%as.matrix(beta))/(1+exp(as.matrix(x)%*%as.matrix(beta)))
y2=ifelse(prob<0.5,0,1)
lfit2=smog(x,y2,g,v,label,lambda1=0.03,lambda2=0,lambda3=0.03,family = "binomial")

## generate survival data
# Weibull latent event times
lambda = 0.01; rho = 1
V = runif(n)
Tlat = (- log(V) / (lambda*exp(x %*% beta)) )^(1/rho)
C = rexp(n, 0.001)  ## censoring time
time = as.vector(pmin(Tlat, C))
status = as.numeric(Tlat <= C)
y3 = as.matrix(cbind(time = time, status = status))

lfit3=smog(x,y3,g,v,label,lambda1=0.2,lambda2=0,lambda3=0.2,family = "coxph")

smog documentation built on Aug. 10, 2020, 5:07 p.m.