simml: Single-index models with multiple-links (main function)

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/simml.main.R

Description

simml is the wrapper function for Single-index models with multiple-links (SIMML). The function estimates a linear combination (a single-index) of covariates X, and models the treatment-specific outcome y, via treatment-specific nonparametrically-defined link functions.

Usage

1
2
3
4
5
6
7
8
9
simml(y, A, X, Xm = NULL, aug = NULL, family = "gaussian",
  R = NULL, bs = "cr", k = 8, sp = NULL, linear.link = FALSE,
  method = "GCV.Cp", gamma = 1, rho = 0, beta.ini = NULL,
  ind.to.be.positive = NULL, scale.si.01 = FALSE, max.iter = 20,
  eps.iter = 0.01, trace.iter = TRUE, lambda = 0, pen.order = 0,
  scale.X = TRUE, center.X = TRUE, ortho.constr = TRUE,
  si.main.effect = FALSE, random.effect = FALSE, z = NULL,
  plots = FALSE, bootstrap = FALSE, nboot = 200, boot.conf = 0.95,
  seed = 1357)

Arguments

y

a n-by-1 vector of treatment outcomes; y is a member of the exponential family; any distribution supported by mgcv::gam; y can also be an ordinal categorial response with R categories taking a value from 1 to R.

A

a n-by-1 vector of treatment variable; each element is assumed to take a value in a finite discrete space.

X

a n-by-p matrix of baseline covarates.

Xm

a n-by-q design matrix associated with an X main effect model; the defult is NULL and it is taken as a vector of zeros

aug

a n-by-1 additional augmentation vector associated with the X main effect; the default is NULL and it is taken as a vector of zeros

family

specifies the distribution of y; e.g., "gaussian", "binomial", "poisson"; can be any family supported by mgcv::gam; can also be "ordinal", for an ordinal categorical response y.

R

the number of response categories for the case of family = "ordinal".

bs

basis type for the treatment (A) and single-index joint effect; the defult is "ps" (p-splines); any basis supported by mgcv::gam can be used, e.g., "cr" (cubic regression splines); see mgcv::s for detail.

k

basis dimension for the spline-type-represented treatment-specific link functions.

sp

smoothing paramter for the treatment-specific link functions; if NULL, then estimated from the data.

linear.link

if TRUE, the link function is restricted to be linear.

method

the smoothing parameter estimation method; "GCV.Cp" to use GCV for unknown scale parameter and Mallows' Cp/UBRE/AIC for known scale; any method supported by mgcv::gam can be used.

gamma

increase this beyond 1 to produce smoother models. gamma multiplies the effective degrees of freedom in the GCV or UBRE/AIC (see mgcv::gam for detail); the default is 1.

rho

a tuning parameter associated with the additional augmentation vector aug; the default is 0.

beta.ini

an initial value for beta.coef; a p-by-1 vector; the defult is NULL, in which case a linear model estimate is used.

ind.to.be.positive

for identifiability of the solution beta.coef, the user can restrict the jth (e.g., j=1) component of beta.coef to be positive; by default, we match the "overall" sign of beta.coef with that of the linear estimate (i.e., the initial estimate), by restricting the inner product between the two to be positive.

scale.si.01

if TRUE, re-scale the index coefficients to restrict the index to the interval [0,1]; in such a case, an intercept term is induced.

max.iter

an integer specifying the maximum number of iterations for beta.coef update.

eps.iter

a value specifying the convergence criterion of algorithm.

trace.iter

if TRUE, trace the estimation process and print the differences in beta.coef.

lambda

a regularization parameter associated with the penalized LS for beta.coef update; the default is 0, and the index coefficients are not penalized.

pen.order

0 indicates the ridge penalty; 1 indicates the 1st difference penalty; 2 indicates the 2nd difference penalty, used in a penalized least squares (LS) estimation of beta.coef.

scale.X

if TRUE, scale X to have unit variance.

center.X

if TRUE, center X to have zero mean.

ortho.constr

separates the interaction effects from the main effect (without this, the interaction effect can be confounded by the main effect; the default is TRUE.

si.main.effect

if TRUE, once the convergence in the estimates of beta.coef is reached, include the main effect associated with the fitted single-index (beta.coef'X) to the final fit; the default is FALSE.

random.effect

if TRUE, as part of the main effects, the user can incorporate z-specific random intercepts.

z

a factor that specifies the random intercepts when random.effect = TRUE.

plots

if TRUE, produce a plot for the estimated effect contrast (for binary treatment cases) (on a linear predictor scale).

bootstrap

if TRUE, compute bootstrap confidence intervals for the single-index coefficients, beta.coef; the default is FALSE.

nboot

when bootstrap=TRUE, a value specifying the number of bootstrap replications.

boot.conf

a value specifying the confidence level of the bootstrap confidence intervals; the defult is boot.conf = 0.95.

seed

when bootstrap=TRUE, randomization seed used in bootstrap resampling.

Details

SIMML captures the effect of covariates via a single-index and their interaction with the treatment via nonparametric link functions. Interaction effects are determined by distinct shapes of the link functions. The estimated single-index is useful for comparing differential treatment efficacy. The resulting simml object can be used to estimate an optimal treatment decision rule for a new patient with pretreatment clinical information.

Value

a list of information of the fitted SIMML including

beta.coef

the estimated single-index coefficients.

g.fit

a mgcv:gam object containing information about the estimated treatment-specific link functions.

beta.ini

the initial value used in the estimation of beta.coef

beta.path

solution path of beta.coef over the iterations

d.beta

records the change in beta.coef over the solution path, beta.path

scale.X

sd of pretreatment covariates X

center.X

mean of pretreatment covariates X

L

number of different treatment options

p

number of pretreatment covariates X

n

number of subjects

boot.ci

(1-boot.alpha/2) percentile bootstrap CIs (LB, UB) associated with beta.coef

Author(s)

Park, Petkova, Tarpey, Ogden

See Also

pred.simml, fit.simml

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
family <- "gaussian"   #"poisson"
delta = 1              # moderate main effect
s=2                    # if s=2 (s=1), a nonlinear (linear) contrast function
n=500                  # number of subjects
p=10                   # number of pretreatment covariates

# generate training data
data <- generate.data(n= n, p=p, delta = delta, s= s, family = family)
data$SNR  # the ratio of interactions("signal") vs. main effects("noise")
A <- data$A
y <- data$y
X <- data$X

# generate testing data
data.test <- generate.data(n=10^5, p=p, delta = delta,  s= s, family = family)
A.test <- data.test$A
y.test <- data.test$y
X.test <- data.test$X
data.test$value.opt     # the optimal "value"


# fit SIMML
#1) SIMML without X main effect
simml.obj1 <- simml(y, A, X, family = family)

#2) SIMML with X main effect (estimation efficiency for the g term of SIMML can be improved)
simml.obj2 <- simml(y, A, X, Xm = X, family = family)


# apply the estimated SIMML to the testing set and obtain treatment assignment rules.
simml.trt.rule1 <- pred.simml(simml.obj1, newX= X.test)$trt.rule
# "value" estimation (estimated by IPWE)
simml.value1 <-  mean(y.test[simml.trt.rule1 == A.test])
simml.value1

simml.trt.rule2 <- pred.simml(simml.obj2, newX= X.test)$trt.rule
simml.value2 <-  mean(y.test[simml.trt.rule2 == A.test])
simml.value2

# compare these to the optimal "value"
data.test$value.opt



# fit MC (modified covariates) model of Tien et al 2014
n.A <- summary(as.factor(A)); pi.A <- n.A/sum(n.A)
mc  <- (as.numeric(A) + pi.A[1] -2) *cbind(1, X)  # 0.5*(-1)^as.numeric(A) *cbind(1, X)
mc.coef  <-  coef(glm(y ~ mc, family =  family))
mc.trt.rule <- (cbind(1, X.test) %*% mc.coef[-1] > 0) +1
# "value" estimation (estimated by IPWE)
mc.value  <-  mean(y.test[mc.trt.rule == A.test])
mc.value


# visualization of the estimated link functions of SIMML
simml.obj1$beta.coef        # estimated single-index coefficients
g.fit <- simml.obj1$g.fit   # estimated trt-specific link functions; "g.fit" is a mgcv::gam object.
#plot(g.fit)


# can improve visualization by using the package "mgcViz"
#install.packages("mgcViz")
# mgcViz depends on "rgl". "rgl" depends on XQuartz, which you can download from xquartz.org
#library(mgcViz)
# transform the "mgcv::gam" object to a "mgcViz" object (to improve visualization)
g.fit <- getViz(g.fit)

plot1  <- plot( sm(g.fit,1) )  # for treatment group 1
plot1 + l_fitLine(colour = "red") + l_rug(mapping = aes(x=x, y=y), alpha = 0.8) +
  l_ciLine(mul = 5, colour = "blue", linetype = 2) +
  l_points(shape = 19, size = 1, alpha = 0.1) +
  xlab(expression(paste("z = ", alpha*minute, "x")))  +  ylab("y") +
  ggtitle("Treatment group 1 (Trt =1)") +  theme_classic()

plot2 <- plot( sm(g.fit,2) )   # for treatment group 2
plot2 + l_fitLine(colour = "red") + l_rug(mapping = aes(x=x, y=y), alpha = 0.8) +
  l_ciLine(mul = 5, colour = "blue", linetype = 2) +
  l_points(shape = 19, size = 1, alpha = 0.1) +
  xlab(expression(paste("z = ", alpha*minute, "x"))) +ylab("y") +
  ggtitle("Treatment group 2 (Trt =2)") + theme_classic()


trans = function(x) x + g.fit$coefficients[2]
plotDiff(s1 = sm(g.fit, 2), s2 = sm(g.fit, 1), trans=trans) +  l_ciPoly() +
  l_fitLine() + geom_hline(yintercept = 0, linetype = 2) +
  xlab(expression(paste("z = ", alpha*minute, "x")) ) +
  ylab("(Treatment 2 effect) - (Treatment 1 effect)") +
  ggtitle("Contrast between two treatment effects") +
  theme_classic()


# yet another way of visualization, using ggplot2
#library(ggplot2)
dat  <- data.frame(y= simml.obj1$g.fit$model$y,
                   x= simml.obj1$g.fit$model$single.index,
                   Treatment= simml.obj1$g.fit$model$A)
g.plot<- ggplot(dat, aes(x=x,y=y,color=Treatment,shape=Treatment,linetype=Treatment))+
   geom_point(aes(color=Treatment, shape=Treatment), size=1, fill="white") +
   scale_colour_brewer(palette="Set1", direction=-1) +
   xlab(expression(paste(beta*minute,"x"))) + ylab("y")
g.plot + geom_smooth(method=gam, formula= y~ s(x, bs=simml.obj1$bs, k=simml.obj1$k),
                     se=TRUE, fullrange=TRUE, alpha = 0.35)



# can obtain bootstrap CIs for beta.coef.
simml.obj <- simml(y,A,X,Xm=X, family=family,bootstrap=TRUE,nboot=15)  #nboot=500.
simml.obj$beta.coef
round(simml.obj$boot.ci,3)

# compare the estimates to the true beta.coef.
data$true.beta



# an application to data with ordinal categorical response
dat <- ordinal.data(n=500, p=5, R = 11,  # 11 response levels
                   s = "nonlinear",     # nonlinear interactions
                   delta = 1)
dat$SNR
y <- dat$y  # ordinal response
X <- dat$X  # X matrix
A <- dat$A  # treatment
dat$true.beta  # the "true" single-index coefficient


# 1) fit a cumulative logit simml, with a flexible link function
res <-  simml(y,A,X, family="ordinal", R=11)
res$beta.coef  # single-index coefficients.
res$g.fit$family$getTheta(TRUE)  # the estimated R-1 threshold values.

# 2) fit a cumulative logit simml, with a linear link function
res2 <-  simml(y,A,X, family="ordinal", R=11, linear.link = TRUE)
res2$beta.coef  # single-index coefficients.


family = mgcv::ocat(R=11)  # ocat: ordered categorical response family, with R categories.
# the treatment A's effect.
tmp <- mgcv::gam(y ~ A, family =family)
exp(coef(tmp)[2])  #odds ratio (OR) comparing treatment A=2 vs. A=1.

ind2 <- pred.simml(res)$trt.rule ==2  # subgroup recommended with A=2 under SIMML ITR
tmp2 <- mgcv::gam(y[ind2] ~ A[ind2], family = family)
exp(coef(tmp2)[2]) #OR comparing treatment A=2 vs. A=1, for subgroup recommended with A=2

ind1 <- pred.simml(res)$trt.rule ==1  # subgroup recommended with A=1 under SIMML ITR
tmp1 <- mgcv::gam(y[ind1] ~ A[ind1], family = family)
exp(coef(tmp1)[2]) #OR comparing treatment A=2 vs. A=1, for subgroup recommended with A=2

simml documentation built on May 25, 2021, 9:08 a.m.

Related to simml in simml...