ccp: Fit misspecified model with measurement error

View source: R/ccp.R

ccpR Documentation

Fit misspecified model with measurement error

Description

This function fits categorical model to a categorized continuous variable with measurement error. Though the methodology does not require specific models, we provide linear regression and logistic regression in this package. If the main data set has no replicates, an external data set with replicates is needed. Further, the nuisance parameters (μ_x, σ^2_x, σ^2_u) should be transportable (see Chapter 2.2.4 and 2.2.5 of Carroll et al. (2006)).

Usage

ccp(y, W_int, W_ext = NULL, C = NULL, Type, print.summary = TRUE, standardize = TRUE)

Arguments

y

A vector of the response variable. It can be binary for logistic regression, or continuous for linear regression.

W_int

A n \times R matrix of covariate W (main data set), where W_{ir} = X_i + U_{ir}. R is the replicates per each observation. X is the true but unobserved value and U is iid measurement error which is independent of X. If W_{int} is a vector (i.e., no replicates), where W_i = X_i + U_i, W_{ext} is needed.

W_ext

A N \times K matrix of covariates W from the external data set, where W_{ik} = X_i + U_{ik}. Note that when W_{int} is a matrix, W_{ext} will be ignored and the estimation will be only based on W_{ext}. The default setting is "NULL".

C

A vector indicating 4 cut points, usually representing the quartiles of X. The default setting is "NULL", and it will be automatically calculated based on nuisance parameters.

Type

Model type, either "logistic" or "linear".

print.summary

Print a summary of all estimates. The default setting is TRUE.

standardize

if standardization needs to be performed. The default is TRUE.

Details

Let Y be the 0/1 dependent variable, and X be the continuous predictor subject to measurement error. The true model is \hbox{Pr}(Y = 1|X)=H(β_0+Xβ_1), where H(x)=\exp(x)/\{1+\exp(x)\} is the logistic distribution function. Let W be the observed variable with measurement error, W=X+U. U \sim N(0,σ^2_u) is the measurement error and X \sim N(μ_x,σ^2_x) is unobserved true value. The misspecified model and the asymptotic theory for the parameters are derived on the manuscript "Categorizing a Continuous Predictor Subject to Measurement Error". If I(X\in C_j) means that X is in category j = 1, ..., 5, the categorical (but incorrect) model is \hbox{Pr}(Y=1|X)=H\{∑_{j=1}^5θ_jI(X\in C_j)\}.

When Y is a continuous variable, the true model and misspecified categorical model will be linear regression correspondingly. Other assumptions for X, U and W remain the same as described above.

Value

A list of

theta5-theta1

Estimate of θ_5-θ_1, interpreted as log relative risk in logistic regression.

theta

Estimates of Θ = (θ_1, ..., θ_5).

nuisance

Estimates of nuisance parameters (μ_x, σ_x^2, σ_u^2) as well as parameters in the true model (β_0, β_1).

se.theta

Standard errors of Θ.

se.nuisance

Standard errors of nuisance parameters and parameters in the true model.

References

Betsabe Blas, Tianying Wang, Victor Kipnis, Kevin Dodd and Raymond Carroll, "Categorizing a Continuous Predictor Subject to Measurement Error" (2018+).

Examples

## This is an example using simulated EATS data

## Parameter values
mux = -0.30 #true mean of X
su2 = 2.69 #true variance of U
sx2 = 1.01 #true variance of X
lambda = sx2 / (sx2 + su2) #attenuation
b = 1.54 #beta_1
a = -1.33 #beta_0

## Sample size
n = 629
k = 2 # Number of replicates in external data set
## Generate data set W_ij=x_i+u_ij
set.seed(20173)
x = rnorm(n, mux, sqrt(sx2))
u = matrix(rnorm(n * k, 0, sqrt(su2)), n, k)
ww = matrix(rep(x, k), n, k, byrow = FALSE) + u # Matrix of observed W with replicates

## Generate values of the variable y
fHm <- function(x){1 / (1 + exp(-(a + b * x)))}
pr = fHm(x)
y = vector()
for(i in 1:n){y[i] = rbinom(1, 1, pr[i])}

## Apply ccp for logistic model
ccp(y = y, W_int = ww, Type = "logistic")



tianyingw/CCP documentation built on Aug. 20, 2022, 1:30 a.m.