cv.famTEMsel: Functional Additive Models for Treatment Effect-Modifier...

Description Usage Arguments Value Author(s) See Also Examples

View source: R/famTEMsel.main.R

Description

Does k-fold cross-validation for famTEMsel, selects an optimal regularization parameter index, lambda.opt.index, and returns the estimated constrained functional additive model given the optimal regularization parameter index lambda.opt.index.

Usage

1
2
3
4
5
6
7
cv.famTEMsel(y, A, X, Z = NULL, mu.hat = NULL, n.folds = 5, d = 3,
  k = 6, bs = "ps", sp = NULL, lambda.opt.index = NULL,
  lambda = NULL, nlambda = 30, lambda.min.ratio = 0.01,
  lambda.index.grid = 1:floor(nlambda/3), cv1sd = FALSE,
  thol = 1e-05, max.ite = 1e+05, regfunc = "L1", max.iter = 10,
  eps.iter = 0.01, eps.num.deriv = 1e-04, trace.iter = TRUE,
  plots = TRUE)

Arguments

y

a n-by-1 vector of responses

A

a n-by-1 vector of treatment variable; each element represents one of the L(>1) treatment conditions; e.g., c(1,2,1,1,3...); can be a factor-valued

X

a length-p list of functional-valued covariates, with its jth element corresponding to a n-by-n.eval[j] matrix of the observed jth functional covariates; n.eval[j] represents the number of evaluation points of the jth functional covariates

Z

a n-by-q matrix of scalar-valued covaraites

mu.hat

a n-by-1 vector of the fitted (X,Z)-main effect term of the model provided by the user; defult is NULL, in which case mu.hat is taken to be a vector of zeros; the optimal choice for this vector is E(y|X,Z)

n.folds

number of folds for cross-validation; the default is 5.

d

number of basis spline functions to be used for each component function; the default value is 3; d=1 corresponds to the linear model

k

dimension of the basis for representing each single-index coefficient function; see mgcv::gam for detail; the default value is 6.

bs

type of basis for representing the single-index coefficient functions; the defult is "ps" (p-splines); any basis supported by mgcv::gam can be used, e.g., "cr" (cubic regression splines).

sp

smoothing parameter associated with the single-index coefficient function; the default is NULL, in which case the smoothing parameter is estimated based on generalized cross-validation.

lambda.opt.index

a user-supplied optimal regularization parameter index to be used; the default is NULL, in which case n.folds cross-validation is performed to select an optimal index.

lambda

a user-supplied regularization parameter sequence; typical usage is to have the program compute its own lambda sequence based on nlambda and lambda.min.ratio.

nlambda

total number of lambda values; the default value is 30.

lambda.min.ratio

the smallest value for lambda, as a fraction of lambda.max, the (data derived) entry value (i.e. the smallest value for which all coefficients are zero); the default is 1e-2.

lambda.index.grid

a set of indices of lambda, in which the search for an optimal regularization parameter is to be conducted.

cv1sd

if TRUE, an optimal regularization parameter is chosen based on: the mean cross-validated error + 1 SD of the mean cross-validated error, which typically results in an increase in regularization; the defualt is FALSE.

thol

stopping precision for the coordinate-descent algorithm; the default value is 1e-5.

max.ite

number of maximum iterations for the coordinate-descent procedure used in estimating the component functions; the default value is 1e+5.

regfunc

type of the regularizer for variable selection; the default is "L1"; can also be "MCP" or "SCAD".

max.iter

number of maximum iterations for the iterative procedure used in estimating the single-index coefficient functions; the default value is 1e+1.

eps.iter

a value specifying the convergence criterion for the iterative procedure used in estimating the single-index coefficient functions; the defult is 1e-2.

eps.num.deriv

a small value that defines a finite difference used in computing the numerical (1st) derivative of the estimated component function; the default is 1e-4.

trace.iter

if TRUE, trace the estimation process by printing the difference in the estimated single-index basis coefficients (as compared to the previous iteration), and the functional norms of the estimated component functions, for each iteration.

plots

if TRUE, produce a cross-validation plot of the estimated mean squared error versus the regulariation parameter index.

Value

a list of information of the fitted constrained functional additive model including

famTEMsel.obj

an object of class famTEMsel, which contains the sequence of the set of fitted component functions samTEMsel.obj implied by the sequence of the regularization parameters lambda and the corresponding set of fitted single-index coefficient functions si.fit; see famTEMsel for detail.

lambda.opt.index

an index number, indicating the index of the estimated optimal regularization parameter in lambda.

nonzero.index

a set of numbers, indicating the indices of estimated nonzero component functions, evalated at the regularization parameter index lambda.opt.index.

nonzero.X.index

a set of numbers, indicating the indices of estimated nonzero component functions associated with the p functional covariates, evalated at the regularization parameter index lambda.opt.index.

func_norm.opt

a p-by-1 vector, indicating the norms of the estimated component functions evaluatd at the regularization parameter index lambda.opt.index, with each element corresponding to the norm of each estimated component function.

cv.storage

a n.folds-by-nlambda matrix of the estimated mean squared errors, with each column corresponding to each of the regularization parameters in lambda and each row corresponding to each of the n.folds folds.

mean.cv

a nlambda-by-1 vector of the estimated mean squared errors, with each element corresponding to each of the regularization parameters in lambda.

sd.cv

a nlambda-by-1 vector of the standard deviation of the estimated mean squared errors, with each element corresponding to each of the regularization parameters in lambda.

Author(s)

Park, Petkova, Tarpey, Ogden

See Also

famTEMsel, predict_famTEMsel, plot_famTEMsel

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
p = q = 10 # p and q are the numbers of functional and scalar pretreatment covariates, respectively.
n.train = 300  # training set sample size
n.test = 1000  # testing set sample size
n = n.train + n.test

# generate p pretreatment functional covariates X by first seting up functional basis:
n.eval = 50; s = seq(0, 1, length.out = n.eval)  # a grid of support points
b1 = sqrt(2)*sin(2*pi*s)
b2 = sqrt(2)*cos(2*pi*s)
b3 = sqrt(2)*sin(4*pi*s)
b4 = sqrt(2)*cos(4*pi*s)
B  = cbind(b1, b2, b3, b4)   # a (n.eval-by-4) basis matrix
# randomly generate basis coefficients, and then add measurement noise
X = vector("list", length= p);  for(j in 1:p){
  X[[j]] = matrix(rnorm(n*4, 0, 1), n, 4) %*% t(B) +
    matrix(rnorm(n*n.eval, 0, 0.25), n, n.eval)    # measurement noise
}
Z  = matrix(rnorm(n*q, 0, 1), n, q)   # q scalar covariates
A  = rbinom(n, 1, 0.5) + 1  # treatment variable taking a value in {1,2} with equal prob.

# X main effect on y; depends on the first 5 covariates
# the effect is generated randomly; randomly generated basis coefficients, scaled to unit L2 norm.
tmp = apply(matrix(rnorm(4*5), 4,5), 2, function(s) s/sqrt(sum(s^2) ))
main.effect = rep(0, n); for(j in 1:5){
  main.effect = main.effect + cos(X[[j]]%*% B %*%tmp[,j]/n.eval) # nonlinear effect (cosine)
}; rm(tmp)
# Z main effect on y; also depends on first 5 covariates
for(k in 1:5){
  main.effect = main.effect + cos(Z[,k])
}

# define (interaction effect) coefficient functions associted with X[[1]] and X[[2]]
beta1 =  B %*% c(0.5,0.5,0.5,0.5)
beta2 =  B %*% c(0.5,-0.5,0.5,-0.5)
# A-by-X ineraction effect on y; depends only on X[[1]] and X[[2]].
interaction.effect = (A-1.5)*( 2*sin(X[[1]]%*%beta1/n.eval) + 2*sin(X[[2]]%*%beta2/n.eval))
# A-by-Z ineraction effect on y; depends only on Z[,1] and Z[,2].
interaction.effect =  interaction.effect +  (A-1.5)*(Z[,1] + 2*sin(Z[,2]))

# generate outcome y
noise = rnorm(n, 0, 0.5)
y = main.effect  + interaction.effect + noise

var.main <- var(main.effect)
var.interaction <- var(interaction.effect)
var.noise <- var(noise)
SNR <- var.interaction/ (var.main + var.noise)
SNR  # "signal-to-noise" ratio


# train/test set splitting
train.index = 1:n.train
y.train = y[train.index]
X.train= X.test = vector("list", p);  for(j in 1:p){
X.train[[j]] = X[[j]][train.index,]
X.test[[j]]  = X[[j]][-train.index,]
}
A.train = A[train.index]
A.test  = A[-train.index]
y.train = y[train.index]
y.test  = y[-train.index]
Z.train = Z[train.index,]
Z.test  = Z[-train.index,]


# obtain an optimal regularization parameter and the corresponding model by running cv.famTEMsel().
cv.obj = cv.famTEMsel(y.train, A.train, X.train, Z.train)
lambda.opt.index = cv.obj$lambda.opt.index   # optimal regularization parameter index
cv.obj$func_norm.opt  # L2 norm of the component functions, associated with lambda.opt.index.
famTEMsel.obj = cv.obj$famTEMsel.obj  # extract the fitted model associted with lambda.opt.index.
# see also, famTEMsel() for the detail of famTEMsel.obj.

famTEMsel.obj$nonzero.index  # set of indices for the component functions estimated as nonzero
# plot the component functions estimated as nonzero
plot_famTEMsel(famTEMsel.obj, which.index = famTEMsel.obj$nonzero.index)

# make ITRs for subjects with pretreatment characteristics, X.test and Z.test
trt.rule = make_ITR_famTEMsel(famTEMsel.obj, newX = X.test, newZ = Z.test)$trt.rule
head(trt.rule)

# an (IPWE) estimate of the "value" of this particualr treatment rule, trt.rule:
mean(y.test[A.test==trt.rule])

# compare the above value to the following estimated "values" of "naive" treatment rules:
mean(y.test[A.test==1])   # a rule that assigns everyone to A=1
mean(y.test[A.test==2])   # a rule that assigns everyone to A=2

syhyunpark/famTEMsel documentation built on Feb. 6, 2020, 12:13 a.m.