estimator.main: Estimator implementation
In tpgarcia/stride: STRIDE: Robust powerful mixture models

Description Usage Arguments Value Details References

Main function to estimate the distribution function for mixture data where the population identifiers are unknown, but the probability of belonging to a population is known. The distribution functions are evaluated at time points tval and adjust for dynamic landmark prediction and one discrete covariate (zz) and one continuous covariate (ww).

estimator.main(
  data,
  n,
  p,
  m,
  r,
  qvs,
  tval,
  tval0,
  method.label,
  z.use,
  w.use,
  update.qs,
  run.prediction.accuracy,
  do_cross_validation_AUC_BS
)

`data`	data matrix obtained from `make.data.set`
`n`	sample size, must be at least 1.
`p`	number of populations, must be at least 2.
`m`	number of different mixture proportions, must be at least 2.
`r`	numeric vector including the number of individuals in each mixture proportion group.
`qvs`	a numeric matrix of size `p` by `m` containing all possible mixture proportions (i.e., the probability of belonging to each population k, k=1,...,p.).
`tval`	numeric vector of time points at which the distribution function is evaluated, all values must be non-negative.
`tval0`	numeric vector of time points representing the landmark times. All values must be non-negative and smaller than the maximum of `tval`.
`method.label`	character vector of methods implemented. This is the result from `get_method_label()`/
`z.use`	numeric vector at which to evaluate the discrete covariate Z at in the estimated distribution function. The values of `z.use` must be in the range of the observed `zz`.
`w.use`	numeric vector at which to evaluate the continuous covariate W at in the estimated distribution function. The values of `w.use` must be in the range of the observed `ww`.
`update.qs`	logical indicator. If TRUE, the mixture proportions `q` will be updated. This is currently not implemented.
`run.prediction.accuracy`	logical indicator. If TRUE, then we compute the prediction accuracy measures, including the area under the receiver operating characteristic curve (AUC) and the Brier Score (BS). Prediction accuracy is only valid in simulation studies where `know.true.groups`=TRUE and `true.group.identifier` is available.
`do_cross_validation_AUC_BS`	logical indicator. If TRUE, then we compute the prediction accuracy measures, including the area under the receiver operating characteristic curve (AUC) and the Brier Score (BS) using cross-validation. Prediction accuracy is only valid in simulation studies where `know.true.groups`=TRUE and `true.group.identifier` is available.

estimator.main returns a list containing

Ft.store: a numeric array. When run.prediction.accuracy is FALSE, then the results are the the estimated distribution functions for all p populations. The dimension of the array is \# of methods by length(tval) by lenth(tval0) by length(z.use) (when z.use is non-NULL) by length(w.use) (when w.use is non-NULL) by p.

When run.prediction.accuracy is TRUE, then the results are the area under the receiver operating characteristic curve (AUC) and Brier Score (BS) for the p populations. The dimension of the array is \# of methods by length(tval) by length(tval0) by 2, where the last dimension stores the AUC and BS results.

Results for both the estimated distributon functions and prediction accuracy measures (AUC, BS) are only valid when t≥q t_0, so arrays show NA for any combination for which t<t_0.
St.store: a numeric array. When run.prediction.accuracy is FALSE, then the results are the the estimated distribution functions for all m mixture proportion groups. The dimension of the array is \# of methods by length(tval) by lenth(tval0) by length(z.use) (when z.use is non-NULL) by length(w.use) (when w.use is non-NULL) by m.

When run.prediction.accuracy is TRUE, then the results are the area under the receiver operating characteristic curve (AUC) and Brier Score (BS) for the m mixture proportion groups. The dimension of the array is \# of methods by length(tval) by length(tval0) by 2, where the last dimension stores the AUC and BS results.

Results for both the estimated distributon functions and prediction accuracy measures (AUC, BS) are only valid when t≥q t_0, so arrays show NA for any combination for which t<t_0.
problem: a numeric indicator of errors in the NPNA estimator. If NULL, no error is reported. Otherwise, there is an error in the computation of the NPNA estimator.

We estimate nonparametric distribution functions for mixture data where the population identifiers are unknown, and the probability of belonging to a population is known (typically estimated with external data). The distribution functions are evaluated at time points tval. All estimators adjust for dynamic landmark prediction. Dynamic landmark prediction means that the distribution function is computed knowing that the survival time, T, satisfies T >t_0 where t_0 are the time points in tval0. The NPNA, NPNA_avg, and NPNA_wrog adjust for one discrete covariate (zz) and one continuous covariate (ww).

Garcia, T.P. and Parast, L. (2020). Dynamic landmark prediction for mixture data. Biostatistics, doi:10.1093/biostatistics/kxz052.

Garcia, T.P., Marder, K. and Wang, Y. (2017). Statistical modeling of Huntington disease onset. In Handbook of Clinical Neurology, vol 144, 3rd Series, editors Andrew Feigin and Karen E. Anderson.

Qing, J., Garcia, T.P., Ma, Y., Tang, M.X., Marder, K., and Wang, Y. (2014). Combining isotonic regression and EM algorithm to predict genetic risk under monotonicity constraint. Annals of Applied Statistics, 8(2), 1182-1208.

Wang, Y., Garcia, T.P., and Ma. Y. (2012). Nonparametric estimation for censored mixture data with application to the Cooperative Huntington's Observational Research Trial. Journal of the American Statistical Association, 107, 1324-1338.

tpgarcia/stride documentation built on March 18, 2021, 3:42 p.m.