Outcome-dependent sampling (ODS) schemes are cost-effective ways to enhance study efficiency. In ODS designs, one observes the exposure/covariates with a probability that depends on the outcome variable. Popular ODS designs include case-control for binary outcome, case-cohort for time-to-event outcome, and continuous outcome ODS design (Zhou et al. 2002). Because ODS data has biased sampling nature, standard statistical analysis such as linear regression will lead to biases estimates of the population parameters. This package implements four statistical methods related to ODS designs: (1) An empirical likelihood method analyzing the primary continuous outcome with respect to exposure variables in continuous ODS design (Zhou et al., 2002). (2) A partial linear model analyzing the primary outcome in continuous ODS design (Zhou, Qin and Longnecker, 2011). (3) Analyze a secondary outcome in continuous ODS design (Pan et al. 2018). (4) An estimated likelihood method analyzing a secondary outcome in case-cohort data (Pan et al. 2017).

The references are the following:

Zhou H, Weaver M, Qin J, Longnecker M, Wang M. (2002). A semiparametric empirical likelihood method for data from an outcome‐dependent sampling scheme with a continuous outcome. *Biometrics*, 58(2):413-421.

Zhou H, Qin G, Longnecker M. (2011). A partial linear model in the outcome‐dependent sampling setting to evaluate the effect of prenatal PCB exposure on cognitive function in children. *Biometrics*, 67(3):876-885.

Pan Y, Cai J, Kim S, Zhou H. (2017). Regression analysis for secondary response variable in a case‐cohort study. *Biometrics*.

Pan Y, Cai J, Longnecker M, Zhou H. (2018). Secondary outcome analysis for data from an outcome‐dependent sampling design. *Statistics in medicine*, 37(15):2321-2337.

We assume that in the population, the primary outcome variable $Y$ follows the linear model:
$$
Y = \beta_{0} + \beta_{1}X + \epsilon
$$
where $X$ are the covariates, and $\epsilon\sim N(0, \sigma^2)$. In continuous ODS design, a simple random sample is taken from the full cohort, then two supplemental samples are taken from tails of the $Y$ distribution, i.e. $(-\infty, \mu_{Y} - a*\sigma_{Y})$ and $(\mu_{Y} + a*\sigma_{Y}, +\infty)$. As ODS data is not a simple random sample of the overall population, naive regression analysis will yield to invalid estimates of the population parameters. Zhou et al. (2002) develops a semiparametric empirical likelihood estimator (MSELE) for conducting inference on the parameters in the linear model.

Function **odsmle** provides the parameter estimates, and function **se.spmle** calculates the standard error for MSELE estimator.

We assume that in the population, the primary outcome variable $Y$ follows the partial linear model: $$ E(Y|X,Z)=g(X)+Z^{T}\gamma $$ where $X$ is the expensive exposure, $Z$ are other covariates. $g(\cdot)$ is an unknown smooth function. Zhou, Qin and Longnecker (2011) considers a penalized spline method to estimate the nonparamatric function $g(\cdot)$ and other regression coefficients $\gamma$ under the ODS sampling scheme.

Function **Estimate_PLMODS** computes the parameter estimates and standard error in the partial linear model. Function **gcv_ODS** calculates the generalized cross-validation (GCV) for selecting the smoothing parameter. The details can be seen in Zhou, Qin and Longnecker (2011).

We assume that in the population, the primary outcome $Y_1$ and the secondary outcome $Y_2$ satisfy the following conditional mean model:
$$
E(Y_1|X,Z)=\beta_0+\beta_1X+\beta_2Z
$$
$$
E(Y_2|X,Z)=\gamma_0+\gamma_1X+\gamma_2Z
$$
Pan et al. (2018) proposed an augmented inverse probability weighted estimating equation to analyze the secondary outcome (parameters: $\gamma_0, \gamma_1, \gamma_2$) for data obtained from the continuous ODS design. Function **secondary_ODS** computes the parameter estimates and standard error for $(\beta, \gamma)$.

When the primary outcome is survival time, case-cohort design is commonly used to enhance study efficiency. We assume that the primary outcome (survival time) follows the Cox model:
$$
\lambda(t|X,Y_2,Z)=\lambda_0(t)\exp(\gamma_1X+\gamma_2Y_2+\gamma_3Z)
$$
$Y_2$ is a secondary outcome that satisfy the following linear model:
$$
Y_2 = \beta_{0} + \beta_{1}X + \beta_2Z + \epsilon
$$
where $\epsilon\sim N(0, \sigma^2)$. Pan et al. (2017) proposed a nonparametric estimated likelihood approach for analyzing the secondary outcome $Y_2$ when the data is obtained from a case-cohort study. Function **secondary_casecohort** computes the parameter estimates and standard error for $(\beta, \gamma)$.

```
install.packages("devtools")
devtools::install_github("Yinghao-Pan/ODS")
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.