# Estimate_PLMODS: Partial linear model for ODS data In Yinghao-Pan/ODS: Statistical Methods for Outcome-Dependent Sampling Designs

## Description

`Estimate_PLMODS` computes the estimate of parameters in a partial linear model in the setting of outcome-dependent sampling. See details in Zhou, Qin and Longnecker (2011).

## Usage

 ```1 2``` ```Estimate_PLMODS(X, Y, Z, n_f, eta00, q_s, Cpt, mu_Y, sig_Y, degree, nknots, tol, iter) ```

## Arguments

 `X` n by 1 matrix of the observed exposure variable `Y` n by 1 matrix of the observed outcome variable `Z` n by p matrix of the other covariates `n_f` n_f = c(n0, n1, n2), where n0 is the SRS sample size, n1 is the size of the supplemental sample chosen from (-infty, mu_Y-a*sig_Y), n2 is the size of the supplemental sample chosen from (mu_Y+a*sig_Y, +infty). `eta00` a column matrix. eta00 = (theta^T pi^T v^T sig0_sq)^T where theta=(alpha^T, gamma^T)^T. We refer to Zhou, Qin and Longnecker (2011) for details of these notations. `q_s` smoothing parameter `Cpt` cut point a `mu_Y` mean of Y in the population `sig_Y` standard deviation of Y in the population `degree` degree of the truncated power spline basis, default value is 2 `nknots` number of knots of the truncated power spline basis, default value is 10 `tol` convergence criteria, the default value is 1e-6 `iter` maximum iteration number, the default value is 30

## Details

We assume that in the population, the primary outcome variable Y follows the following partial linear model:

E(Y|X,Z)=g(X)+Z^{T}*gamma

where X is the expensive exposure, Z are other covariates. In ODS design, a simple random sample is taken from the full cohort, then two supplemental samples are taken from two tails of Y, i.e. (-Infty, mu_Y - a*sig_Y) and (mu_Y + a*sig_Y, +Infty). Because ODS data has biased sampling nature, naive regression analysis will yield biased estimates of the population parameters. Zhou, Qin and Longnecker (2011) describes a semiparametric empirical likelihood estimator for estimating the parameters in the partial linear model.

## Value

Parameter estimates and standard errors for the partial linear model:

E(Y|X,Z)=g(X)+Z^{T}*gamma

where the unknown smooth function g is approximated by a spline function with fixed knots. The results contain the following components:

 `alpha` spline coefficient `gam` other linear regression coefficients `std_gam` standard error of gam `cov_mtxa` covariance matrix of alpha `step` numbers of iteration requied to acheive convergence `pi0` estimated notation pi `v0` estimated notation vtheta `sig0_sq0` estimated variance of error

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52``` ```library(ODS) # take the example data from the ODS package # please see the documentation for details about the data set ods_data nknots = 10 degree = 2 # get the initial value of the parameters from standard linear regression based on SRS data # dataSRS = ods_data[1:200,] YS = dataSRS[,1] XS = dataSRS[,2] ZS = dataSRS[,3:5] knots = quantileknots(XS, nknots, 0) # the power basis spline function MS = Bfct(as.matrix(XS), degree, knots) DS = cbind(MS, ZS) theta00 = as.numeric(lm(YS ~ DS -1)\$coefficients) sig0_sq00 = var(YS - DS %*% theta00) pi00 = c(0.15, 0.15) v00 = c(0, 0) eta00 = matrix(c(theta00, pi00, v00, sig0_sq00), ncol=1) mu_Y = mean(YS) sig_Y = sd(YS) Y = matrix(ods_data[,1]) X = matrix(ods_data[,2]) Z = matrix(ods_data[,3:5], nrow=400) # In this ODS data, the supplemental samples are taken from (-Infty, mu_Y-a*sig_Y) # # and (mu_Y+a*sig_Y, +Infty), where a=1 # n_f = c(200, 100, 100) Cpt = 1 # GCV selection to find the optimal smoothing parameter # q_s1 = logspace(-6, 7, 10) gcv1 = rep(0, 10) for (j in 1:10) { result = Estimate_PLMODS(X,Y,Z,n_f,eta00,q_s1[j],Cpt,mu_Y,sig_Y) etajj = matrix(c(result\$alpha, result\$gam, result\$pi0, result\$v0, result\$sig0_sq0), ncol=1) gcv1[j] = gcv_ODS(X,Y,Z,n_f,etajj,q_s1[j],Cpt,mu_Y,sig_Y) } b = which(gcv1 == min(gcv1)) q_s = q_s1[b] q_s # Estimation of the partial linear model in the setting of outcome-dependent sampling # result = Estimate_PLMODS(X, Y, Z, n_f, eta00, q_s, Cpt, mu_Y, sig_Y) result ```

Yinghao-Pan/ODS documentation built on Nov. 28, 2018, 6:14 p.m.