Description Usage Arguments Details Value Author(s) References See Also Examples
Carries out the targeted minimum loss estimation (TMLE) of a non-parametric variable importance measure of a continuous exposure.
1 2 3 4 5 6 7 8 | tmle.npvi(obs, f = identity, nMax = 30L, flavor = c("learning",
"superLearning"), lib = list(), nodes = 1L, cvControl = NULL,
family = c("parsimonious", "gaussian"), cleverCovTheta = FALSE,
bound = 1, B = 1e+05, trueGMu = NULL, iter = 5L, stoppingCriteria = list(mic = 0.01,
div = 0.01, psi = 0.1), gmin = 0.05, gmax = 0.95, mumin = quantile(f(obs[obs[,
"X"] != 0, "X"]), type = 1, probs = 0.01), mumax = quantile(f(obs[obs[,
"X"] != 0, "X"]), type = 1, probs = 0.99), verbose = FALSE,
tabulate = TRUE, exact = TRUE, light = TRUE)
|
obs |
A
|
f |
A |
nMax |
An |
flavor |
Indicates whether the construction of the relevant features of P_n^0
and P_n^k, the (non-targeted yet) initial and (targeted) successive
updated estimators of the true distribution of (W,X,Y) relies on the
Super Learning methodology (option "superLearning") or not (option
"learning", default value). In the former case, the |
lib |
A |
nodes |
An |
cvControl |
'NULL' (default value) or an |
family |
Indicates whether the simulation of the conditional distribution of X given W under P_n^k (the initial estimator if k=0 or its kth update if k ≥ 1) should be based on a weighted version of the empirical measure (case "parsimonious", default value and faster execution) or on a Gaussian model (case "gaussian"). |
cleverCovTheta |
A |
bound |
A positive |
B |
An |
trueGMu |
Either |
iter |
An |
stoppingCriteria |
A |
gmin |
A positive |
gmax |
A positive |
mumin |
A |
mumax |
A |
verbose |
Prescribes the amount of information output by the function. Defaults to
|
tabulate |
A |
exact |
A |
light |
A |
The parameter of interest is defined as ψ=Ψ(P) with
Ψ(P) = E_P[f(X) * (θ(X,W) - θ(0,W))] / E_P[f(X)^2],
with P the distribution of the random vector (W,X,Y), θ(X,W) = E_P[Y|X,W], 0 the reference value for X, and f a user-supplied function such that f(0)=0 (e.g., f=identity, the default value).
The TMLE procedure stops when the maximal number of
iterations, iter
, is reached or when at least one of
the following criteria is met:
The empirical mean P_n effIC(P_n^{k+1}) of the
efficient influence curve at P_n^{k+1} scaled by the
estimated standard deviation of the efficient influence curve
at P_n^{k+1} is smaller, in absolute value, than
mic
.
The total variation (TV) distance between P_n^k and
P_n^k+1 is smaller than div
.
The change between the successive values
Psi(P_n^k) and Psi(P_n^{k+1}) is smaller than
psi
.
If lib
is an empty list (list()
, default value) then the
default algorithms for the chosen flavor
are loaded
(learningLib
when flavor
is set to "learning" or
superLearningLib
when flavor
is set to "superLearning").
A valid lib
argument must mimick the structure of either
learningLib
or superLearningLib
, depending on
flavor
.
The "superLearning" flavor
requires the SuperLearner
package and, by default, the e1071
, gam
,
glmnet
, polspline
and randomForest
packages.
If family
is set to "parsimonious" (recommended) then the
packages sgeostat
and geometry
are required.
Returns an object of class "NPVI" summarizing the different steps
of the TMLE procedure. The method
getHistory
outputs the
"history" of the procedure (see getHistory
). The object notably
includes the following information:
obs |
The |
psi |
The TMLE of the parameter of interest. Use the |
psi.sd |
The estimated standard deviation of the TMLE of the
parameter of interest. Use the |
Antoine Chambaz, Pierre Neuvial
Chambaz, A., Neuvial, P., & van der Laan, M. J. (2012). Estimation of a non-parametric variable importance measure of a continuous exposure. Electronic journal of statistics, 6, 1059–1099.
Chambaz, A., Neuvial, P. (2015). tmle.npvi: targeted, integrative search of associations between DNA copy number and gene expression, accounting for DNA methylation. To appear in Bioinformatics Applications Notes.
getSample, getHistory
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | set.seed(12345)
##
## Simulating a data set and computing the true value of the parameter
##
## Parameters for the simulation (case 'f=identity')
O <- cbind(W=c(0.05218652, 0.01113460),
X=c(2.722713, 9.362432),
Y=c(-0.4569579, 1.2470822))
O <- rbind(NA, O)
lambda0 <- function(W) {-W}
p <- c(0, 1/2, 1/2)
omega <- c(0, 3, 3)
S <- matrix(c(10, 1, 1, 0.5), 2 ,2)
## Simulating a data set of 200 i.i.d. observations
sim <- getSample(2e2, O, lambda0, p=p, omega=omega, sigma2=1, Sigma3=S)
obs <- sim$obs
## Adding (dummy) baseline covariates
V <- matrix(runif(3*nrow(obs)), ncol=3)
colnames(V) <- paste("V", 1:3, sep="")
obs <- cbind(V, obs)
## Caution! MAKING '0' THE REFERENCE VALUE FOR 'X'
X0 <- O[2,2]
obsC <- obs
obsC[, "X"] <- obsC[, "X"] - X0
obs <- obsC
## True psi and confidence intervals (case 'f=identity')
sim <- getSample(1e4, O, lambda0, p=p, omega=omega, sigma2=1, Sigma3=S)
truePsi <- sim$psi
confInt0 <- truePsi + c(-1, 1)*qnorm(.975)*sqrt(sim$varIC/nrow(sim$obs))
confInt <- truePsi + c(-1, 1)*qnorm(.975)*sqrt(sim$varIC/nrow(obs))
cat("\nCase f=identity:\n")
msg <- paste("\ttrue psi is: ", signif(truePsi, 3), "\n", sep="")
msg <- paste(msg, "\t95%-confidence interval for the approximation is: ",
signif(confInt0, 3), "\n", sep="")
msg <- paste(msg, "\toptimal 95%-confidence interval is: ",
signif(confInt, 3), "\n", sep="")
cat(msg)
##
## TMLE procedure
##
## Running the TMLE procedure
npvi <- tmle.npvi(obs, f=identity, flavor="learning", B=5e4, nMax=10)
## Summarizing its results
npvi
setConfLevel(npvi, 0.9)
npvi
history <- getHistory(npvi)
print(round(history, 4))
hp <- history[, "psi"]
hs <- history[, "sic"]
hs[1] <- NA
ics <- c(-1,1) %*% t(qnorm(0.975)*hs/sqrt(nrow(getObs(npvi))))
pch <- 20
ylim <- range(c(confInt, hp, ics+hp), na.rm=TRUE)
xs <- (1:length(hs))-1
plot(xs, hp, ylim=ylim, pch=pch, xlab="Iteration", ylab=expression(psi[n]),
xaxp=c(0, length(hs)-1, length(hs)-1))
dummy <- sapply(seq(along=xs), function(x) lines(c(xs[x],xs[x]), hp[x]+ics[, x]))
abline(h=confInt, col=4)
abline(h=confInt0, col=2)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.