View source: R/Functions_Rsurrogate.R
R.s.estimate | R Documentation |
This function calculates the proportion of treatment effect on the primary outcome explained by the treatment effect on the surrogate marker(s). This function is intended to be used for a fully observed continuous outcome. The user can also request a variance estimate and a 95% confidence interval, both estimated using perturbating-resampling. If a confidence interval is requested three versions are provided: a normal approximation based interval, a quantile based interval, and Fieller's confidence interval.
R.s.estimate(sone, szero, yone, yzero, var = FALSE, conf.int = FALSE, weight.perturb = NULL, number = "single", type = "robust",extrapolate = FALSE, transform = FALSE)
sone |
numeric vector or matrix; surrogate marker for treated observations, assumed to be continuous. If there are multiple surrogates then this should be a matrix with n_1 (number of treated observations) rows and n.s (number of surrogate markers) columns. |
szero |
numeric vector; surrogate marker for control observations, assumed to be continuous.If there are multiple surrogates then this should be a matrix with n_0 (number of control observations) rows and n.s (number of surrogate markers) columns. |
yone |
numeric vector; primary outcome for treated observations, assumed to be continuous. |
yzero |
numeric vector; primary outcome for control observations, assumed to be continuous. |
var |
TRUE or FALSE; indicates whether a variance estimate is requested, default is FALSE. |
conf.int |
TRUE or FALSE; indicates whether a 95% confidence interval is requested, default is FALSE |
weight.perturb |
a n_1+n_0 by x matrix of weights where n_1 = length of yone and n_0 = length of yzero; used for perturbation-resampling, default is null. |
number |
specifies the number of surrogate markers; choices are "multiple" or "single", default is "single" |
type |
specifies the type of estimation; choices are "robust" or "model" or "freedman", default is "robust" |
extrapolate |
TRUE or FALSE; indicates whether the user wants to use extrapolation. |
transform |
TRUE or FALSE; indicates whether the user wants to use a transformation for the surrogate marker. |
Let Y^{(1)} and Y^{(0)} denote the primary outcome under the treatment and primary outcome under the control,respectively. Let S^{(1)} and S^{(0)} denote the surrogate marker under the treatment and the surrogate marker under the control,respectively. The residual treatment effect is defined as
Δ_S=\int_{-∞}^{∞} E(Y^{(1)}|S^{(1)}=s) dF_0(s) - \int_{-∞}^{∞} E(Y^{(0)}|S^{(0)}=s) dF_0(s),
where Δ_S(s)= E(Y^{(1)}|S^{(1)}=s)-E(Y^{(0)}|S^{(0)}=s) and F_0(\cdot) is the marginal cumulative distribution function of S^{(0)}, the surrogate marker measure under the control. The proportion of treatment effect explained by the surrogate marker, which we denote by R_S, can be expressed using a contrast between Δ_S and Δ:
R_S=\{Δ-Δ_S\}/Δ=1-Δ_S/Δ.
The definition and estimation of Δ is described in the delta.estimate documentation.
A flexible model-based approach to estimate Δ_S in the single marker setting is to specify:
E(S^{(0)})=α_0 \quad\mbox{and}\quad E(S^{(1)})-E(S^{(0)}) = α_1,
E(Y^{(0)} | S^{(0)}) = β_0 + β_1 S^{(0)} \quad \mbox{and} \quad E(Y^{(1)} | S^{(1)}) = (β_0 +β_2)+ (β_1+β_3) S^{(1)}.
It can be shown that when these models hold, Δ_S = β_2 + β_3 α_0. Thus, reasonable estimates for Δ_S and R_S using this approach would be \hat{Δ}_S = \hat{β}_2 + \hat{β}_3 \hat{α}_0 and \hat{R}_S = 1-\hat{Δ}_S / \hat{Δ}.
For robust estimation of Δ_S in the single marker setting, we estimate μ_1(s) = E(Y^{(1)}|S^{(1)}=s) nonparametrically using kernel smoothing:
\hat{μ}_1(s) = \frac{∑_{i=1}^{n_1} K_h≤ft (S_{1i}-s \right ) Y_{1i} }{∑_{i=1}^{n_1} K_h≤ft (S_{1i}-s \right )}
where S_{1i} is the observed S^{(1)} for person i, Y_{1i} is the observed Y^{(1)} for person i, K(\cdot) is a smooth symmetric density function with finite support, K_h(\cdot)=K(\cdot/h)/h and h is a specified bandwidth. As in most nonparametric functional estimation procedures, the choice of the smoothing parameter h is critical. To eliminate the impact of the bias of the conditional mean function on the resulting estimator, we require the standard undersmoothing assumption of h=O(n_1^{-δ}) with δ \in (1/4,1/3). To obtain an appropriate h we first use bw.nrd to obtain h_{opt}; and then we let h = h_{opt}n_1^{-c_0} with c_0 = 0.25. We then estimate Δ_S as
\hat{Δ}_S= ∑_{i=1}^{n_0} \frac{\hat{μ}_1(S_{0i})- Y_{0i}}{n_0}
where S_{0i} is the observed S^{(0)} for person i and Y_{0i} is the observed Y^{(0)} for person i. Lastly, we estimate R_S as \hat{R}_S = 1-\hat{Δ}_S/\hat{Δ}.
This function also allows for estimation of R_S using Freedman's approach. Let Y denote the primary outcome, S denote the surrogate marker, and G denote the treatment group (0 for control, 1 for treatment). Freedman's approach to calculating the proportion of treatment effect explained by the surrogate marker is to fit the following two regression models:
E(Y|G) = γ_0 + γ_1 I(G=1) \quad \mbox{and} \quad E(Y|G, S) = γ_{0S} + γ_{1S}I(G=1) + γ_{2S} S
and estimating the proportion of treatment effect explained, denoted by R_S, as 1-\hat{γ}_{1S}/\hat{γ}_1.
This function also estimates R_S in a multiple marking setting. A flexible model-based approach to estimate Δ_S in the multiple marker setting is to specify models for E(Y|G, S) and E(S_j | G) for each S_j in S = \{S_1,...S_p\} (where p is the number of surrogate markers). Without loss of generality, consider the case where there are three surrogate markers, S = \{S_1, S_2, S_3\} and one specifies the following linear models:
E(Y^{(0)} | S^{(0)}) = β_0 + β_1 S_1^{(0)} + β_2 S_2^{(0)} + β_3 S_3^{(0)}
E(Y^{(1)} | S^{(1)}) = (β_0+β_4) + (β_1+β_5) S_1^{(1)} + (β_2+β_6) S_2^{(1)} + (β_3+β_7) S_3^{(1)}
E(S_j^{(0)}) = α_j, ~~~~j=1,2,3.
It can be shown that when these models hold
Δ_{S} = β_4 + β_5α_1 + β_6 α_2 + β_7 α_3.
Thus, reasonable estimates for Δ_{S} and R_{S} here would be easily obtained by replacing the unknown regression coefficients in the models above by their consistent estimators.
For robust estimation of S Δ_S in the multiple marker setting, we use a two-stage procedure combining the model-based approach and the nonparametric estimation procedure from the single marker setting. Specifically, we use a working semiparametric model:
E(Y^{(1)}|S^{(1)}=S)=β_0 + β_1 S_1^{(1)} + β_2 S_2^{(1)} + β_3 S_3^{(1)}
and define Q^{(1)} = \hat{β}_0 + \hat{β}_1 S_1^{(1)} + \hat{β}_2 S_2^{(1)} + \hat{β}_3 S_3^{(1)} and Q^{(0)} = \hat{β}_0 + \hat{β}_1 S_1^{(0)} + \hat{β}_2 S_2^{(0)} + \hat{β}_3 S_3^{(0)} to reduce the dimension of S in the first stage and in the second stage, we apply the robust approach used in the single marker setting to estimate its surrogacy.
To use Freedman's approach in the presence of multiple markers, the markers are simply additively entered into the second regression model.
Variance estimation and confidence interval construction are performed using perturbation-resampling. Specifically, let ≤ft \{ V^{(b)} = (V_{11}^{(b)}, ...V_{1n_1}^{(b)}, V_{01}^{(b)}, ...V_{0n_0}^{(b)})^T, b=1,....,D \right \} be n \times D independent copies of a positive random variables V from a known distribution with unit mean and unit variance. Let
\hat{Δ}^{(b)} = \frac{ ∑_{i=1}^{n_1} V_{1i}^{(b)} Y_{1i}}{ ∑_{i=1}^{n_1} V_{1i}^{(b)}} - \frac{ ∑_{i=1}^{n_0} V_{0i}^{(b)} Y_{0i}}{ ∑_{i=1}^{n_0} V_{0i}^{(b)}}.
The variance of \hat{Δ} is obtained as the empirical variance of \{\hat{Δ}^{(b)}, b = 1,...,D\}. In this package, we use weights generated from an Exponential(1) distribution and use D=500. Variance estimates for \hat{Δ}_S and \hat{R}_S are calculated similarly. We construct two versions of the 95\% confidence interval for each estimate: one based on a normal approximation confidence interval using the estimated variance and another taking the 2.5th and 97.5th empirical percentile of the perturbed quantities. In addition, we use Fieller's method to obtain a third confidence interval for R_S as
≤ft\{1-r: \frac{(\hat{Δ}_S-r\hat{Δ})^2}{\hat{σ}_{11}-2r\hatσ_{12}+r^2\hatσ_{22}} ≤ c_{α}\right\},
where \hat{Σ}=(\hatσ_{ij})_{1≤ i,j≤ 2} and c_α is the (1-α)th percentile of
≤ft\{\frac{\{\hat{Δ}^{(b)}_S-(1-\hat R_S)\hat{Δ}^{(b)}\}^2}{\hat{σ}_{11}-2(1-\hat R_S)\hatσ_{12}+(1-\hat R_S)^2\hatσ_{22}}, b=1, \cdots, C\right\}
where α=0.05.
Note that if the observed supports for S are not the same, then \hat{μ}_1(s) for S_{0i} = s outside the support of S_{1i} may return NA (depending on the bandwidth). If extrapolation = TRUE, then the \hat{μ}_1(s) values for these surrogate values are set to the closest non-NA value. If transform = TRUE, then S_{1i} and S_{0i} are transformed such that the new transformed values, S^{tr}_{1i} and S^{tr}_{0i} are defined as: S^{tr}_{gi} = F([S_{gi} - μ]/σ) for g=0,1 where F(\cdot) is the cumulative distribution function for a standard normal random variable, and μ and σ are the sample mean and standard deviation, respectively, of (S_{1i}, S_{0i})^T.
A list is returned:
R.s |
the estimate, \hat{R}_S, described above. |
R.s.var |
the variance estimate of \hat{R}_S; if var = TRUE or conf.int = TRUE. |
conf.int.normal.R.s |
a vector of size 2; the 95% confidence interval for \hat{R}_S based on a normal approximation; if conf.int = TRUE. |
conf.int.quantile.R.s |
a vector of size 2; the 95% confidence interval for \hat{R}_S based on sample quantiles of the perturbed values, described above; if conf.int = TRUE. |
conf.int.fieller.R.s |
a vector of size 2; the 95% confidence interval for \hat{R}_S based on Fieller's approach, described above; if conf.int = TRUE. |
For all options other then "freedman", the following are also returned:
delta |
the estimate, \hat{Δ}, described in delta.estimate documentation. |
delta.s |
the estimate, \hat{Δ}_S, described above. |
delta.var |
the variance estimate of \hat{Δ}; if var = TRUE or conf.int = TRUE. |
delta.s.var |
the variance estimate of \hat{Δ}_S; if var = TRUE or conf.int = TRUE. |
conf.int.normal.delta |
a vector of size 2; the 95% confidence interval for \hat{Δ} based on a normal approximation; if conf.int = TRUE. |
conf.int.quantile.delta |
a vector of size 2; the 95% confidence interval for \hat{Δ} based on sample quantiles of the perturbed values, described above; if conf.int = TRUE. |
conf.int.normal.delta.s |
a vector of size 2; the 95% confidence interval for \hat{Δ}_S based on a normal approximation; if conf.int = TRUE. |
conf.int.quantile.delta.s |
a vector of size 2; the 95% confidence interval for \hat{Δ}_S based on sample quantiles of the perturbed values, described above; if conf.int = TRUE. |
If the treatment effect is not significant, the user will receive the following message: "Warning: it looks like the treatment effect is not significant; may be difficult to interpret the proportion of treatment effect explained in this setting". If the treatment effect is negative, the user will receive the following message: "Warning: it looks like you need to switch the treatment groups" as this package assumes throughout that higher values are better. In the single marker case with the robust estimation approach, if the observed support of the surrogate marker for the control group is outside the observed support of the surrogate marker for the treatment group, the user will receive the following message: "Warning: observed supports do not appear equal, may need to consider a transformation or extrapolation"
Layla Parast
Freedman, L. S., Graubard, B. I., & Schatzkin, A. (1992). Statistical validation of intermediate endpoints for chronic diseases. Statistics in medicine, 11(2), 167-178.
Parast, L., McDermott, M., Tian, L. (2016). Robust estimation of the proportion of treatment effect explained by surrogate marker information. Statistics in Medicine, 35(10):1637-1653.
Wang, Y., & Taylor, J. M. (2002). A measure of the proportion of treatment effect explained by a surrogate marker. Biometrics, 58(4), 803-812.
Fieller, Edgar C. (1954). Some problems in interval estimation. Journal of the Royal Statistical Society. Series B (Methodological), 175-185.
Fieller, E. C. (1940). The biological standardization of insulin. Supplement to the Journal of the Royal Statistical Society, 1-64.
data(d_example) names(d_example) R.s.estimate(yone=d_example$y1, yzero=d_example$y0, sone=d_example$s1.a, szero=d_example$s0.a, number = "single", type = "robust") R.s.estimate(yone=d_example$y1, yzero=d_example$y0, sone=cbind(d_example$s1.a,d_example$s1.b, d_example$s1.c), szero=cbind(d_example$s0.a, d_example$s0.b, d_example$s0.c), number = "multiple", type = "model")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.