calreg: Regression On Unobserved Exposure Using Calibration.

View source: R/calreg.R

calregR Documentation

Regression On Unobserved Exposure Using Calibration.

Description

Perform a linear regression on an unobserved exposure (X) using a proxy (Z) whose relationship with the exposure has been studied using an external dataset.

Usage

calreg(
  formula,
  data,
  fitter = "lm",
  calibration,
  method = "delta",
  n.impute = 50
)

Arguments

formula

formula for the linear model.

data

[data.frame] dataset used to fit the linear model relating the outcome (Y) to the (unobserved) exposure (X).

calibration

a lm object or nls object relating a proxy (Z) to the unobserved exposure (X).

method

[character] Can be "delta" or "MI" to use, respectively, a delta method or multiple imputation.

n.impute

[integer, >0] Number of imputed dataset to be used. Only relevant when method="MI".

Details

Consider a first sample (X_i,Z_i)_{i \in \{1,\ldots,m\}} that is used to estimate \alpha in:

X = f(alpha,Z) + \varepsilon_{\alpha}

This is the model to give to the argument calibration.

The aim is to use a second sample (Y_j,Z_j)_{j \in \{1,\ldots,n\}} to estimate \beta_1 in:

Y = \beta_0 + \beta_1 X + \varepsilon_{\beta}

The formula of this model should be given to the argument formula and the dataset to the argument data.

The exposure \(X\) in the second sample is computed:

  • based on the conditional expectation of the exposure given the proxy from the first model (method="delta").

  • based on multiple sampling of the coefficients from the first model (method="MI"). For each sample an exposure is computed, a linear model is then estimated based on this exposure. The results are then pooled using mice::pool.

When using the delta method, the uncertainty is decomposed into two parts:

  • one related to the finite number of observations in the second sample.

  • one related to the estimation of the parameters in the calibration model, to account for the fact that \(X\) is estimated and not observed.

Value

A data.frame containing the estimates, standard errors, confidence intervals and p-values for each regression coefficient. The output has an attribute "regression" containing the fitted linear model (ignoring the uncertainty related to the calibration) and an attribute "var.add" representing additional variance-covariance matrix due to the calibration.

Examples

library(lava)
n <- 1e2

## linear case
mSim.lin <- lvm(fMRI ~ occ, occ ~ blood)
distribution(mSim.lin, ~blood) <- uniform.lvm(-0.9,2)

set.seed(10)
d1.lin <- sim(mSim.lin, n = n)[,c("occ","blood"),drop=FALSE]
d2.lin <- sim(mSim.lin, n = n)[,c("fMRI","blood"),drop=FALSE]

e1.lin <- lm(occ ~ blood, data = d1.lin)
res.lin <- calreg(fMRI ~ occ, data = d2.lin, calibration = e1.lin)
res.lin
summary(attr(res.lin, "regression"))$coef

## non-linear case
mSim.nlin <- lvm(fMRI ~ occ, occ[mu:0.1] ~ 0*blood)
distribution(mSim.nlin, ~blood) <- uniform.lvm(-0.9,3)
constrain(mSim.nlin, mu~blood) <- function(x){2*x/(1+x)}

set.seed(10)
d1.nlin <- sim(mSim.nlin, n = n)[,c("occ","blood"),drop=FALSE]
d2.nlin <- sim(mSim.nlin, n = n)[,c("fMRI","blood"),drop=FALSE]

## gg <- ggplot(d1.nlin, aes(x = blood)) + geom_point(aes(y = occ))

e1.nlin <- nls(occ ~ (occmax * blood)/(EC + blood), data = d1.nlin,
         start = list(occmax = 1, EC = 1))
d1.nlin$fit <- fitted(e1.nlin)

## gg + geom_line(data = d1.nlin, aes(y = fit), color = "red")

res.nlin <- calreg(fMRI ~ occ, data = d2.nlin, calibration = e1.nlin)
res.nlin
summary(attr(res.nlin, "regression"))$coef


bozenne/butils documentation built on Oct. 14, 2023, 6:19 a.m.