ccp | R Documentation |
This function fits categorical model to a categorized continuous variable with measurement error. Though the methodology does not require specific models, we provide linear regression and logistic regression in this package. If the main data set has no replicates, an external data set with replicates is needed. Further, the nuisance parameters (μ_x, σ^2_x, σ^2_u) should be transportable (see Chapter 2.2.4 and 2.2.5 of Carroll et al. (2006)).
ccp(y, W_int, W_ext = NULL, C = NULL, Type, print.summary = TRUE, standardize = TRUE)
y |
A vector of the response variable. It can be binary for logistic regression, or continuous for linear regression. |
W_int |
A n \times R matrix of covariate W (main data set), where W_{ir} = X_i + U_{ir}. R is the replicates per each observation. X is the true but unobserved value and U is iid measurement error which is independent of X. If |
W_ext |
A N \times K matrix of covariates W from the external data set, where W_{ik} = X_i + U_{ik}. Note that when |
C |
A vector indicating 4 cut points, usually representing the quartiles of X. The default setting is "NULL", and it will be automatically calculated based on nuisance parameters. |
Type |
Model type, either "logistic" or "linear". |
print.summary |
Print a summary of all estimates. The default setting is |
standardize |
if standardization needs to be performed. The default is |
Let Y be the 0/1 dependent variable, and X be the continuous predictor subject to measurement error. The true model is \hbox{Pr}(Y = 1|X)=H(β_0+Xβ_1), where H(x)=\exp(x)/\{1+\exp(x)\} is the logistic distribution function. Let W be the observed variable with measurement error, W=X+U. U \sim N(0,σ^2_u) is the measurement error and X \sim N(μ_x,σ^2_x) is unobserved true value. The misspecified model and the asymptotic theory for the parameters are derived on the manuscript "Categorizing a Continuous Predictor Subject to Measurement Error". If I(X\in C_j) means that X is in category j = 1, ..., 5, the categorical (but incorrect) model is \hbox{Pr}(Y=1|X)=H\{∑_{j=1}^5θ_jI(X\in C_j)\}.
When Y is a continuous variable, the true model and misspecified categorical model will be linear regression correspondingly. Other assumptions for X, U and W remain the same as described above.
A list of
theta5-theta1 |
Estimate of θ_5-θ_1, interpreted as log relative risk in logistic regression. |
theta |
Estimates of Θ = (θ_1, ..., θ_5). |
nuisance |
Estimates of nuisance parameters (μ_x, σ_x^2, σ_u^2) as well as parameters in the true model (β_0, β_1). |
se.theta |
Standard errors of Θ. |
se.nuisance |
Standard errors of nuisance parameters and parameters in the true model. |
Betsabe Blas, Tianying Wang, Victor Kipnis, Kevin Dodd and Raymond Carroll, "Categorizing a Continuous Predictor Subject to Measurement Error" (2018+).
## This is an example using simulated EATS data ## Parameter values mux = -0.30 #true mean of X su2 = 2.69 #true variance of U sx2 = 1.01 #true variance of X lambda = sx2 / (sx2 + su2) #attenuation b = 1.54 #beta_1 a = -1.33 #beta_0 ## Sample size n = 629 k = 2 # Number of replicates in external data set ## Generate data set W_ij=x_i+u_ij set.seed(20173) x = rnorm(n, mux, sqrt(sx2)) u = matrix(rnorm(n * k, 0, sqrt(su2)), n, k) ww = matrix(rep(x, k), n, k, byrow = FALSE) + u # Matrix of observed W with replicates ## Generate values of the variable y fHm <- function(x){1 / (1 + exp(-(a + b * x)))} pr = fHm(x) y = vector() for(i in 1:n){y[i] = rbinom(1, 1, pr[i])} ## Apply ccp for logistic model ccp(y = y, W_int = ww, Type = "logistic")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.