calreg: Regression On Unobserved Exposure Using Calibration.
In bozenne/butils: Various useful functions

calreg

R Documentation

Regression On Unobserved Exposure Using Calibration.

Description

Perform a linear regression on an unobserved exposure (X) using a proxy (Z) whose relationship with the exposure has been studied using an external dataset.

Usage

calreg(
  formula,
  data,
  fitter = "lm",
  calibration,
  method = "delta",
  n.impute = 50
)

Arguments

`formula`	formula for the linear model.
`data`	[data.frame] dataset used to fit the linear model relating the outcome (Y) to the (unobserved) exposure (X).
`calibration`	a `lm` object or `nls` object relating a proxy (Z) to the unobserved exposure (X).
`method`	[character] Can be `"delta"` or `"MI"` to use, respectively, a delta method or multiple imputation.
`n.impute`	[integer, >0] Number of imputed dataset to be used. Only relevant when `method="MI"`.

Details

Consider a first sample (X_i,Z_i)_{i \in \{1,\ldots,m\}} that is used to estimate \alpha in:

X = f(alpha,Z) + \varepsilon_{\alpha}

This is the model to give to the argument calibration.

The aim is to use a second sample (Y_j,Z_j)_{j \in \{1,\ldots,n\}} to estimate \beta_1 in:

Y = \beta_0 + \beta_1 X + \varepsilon_{\beta}

The formula of this model should be given to the argument formula and the dataset to the argument data.

The exposure \(X\) in the second sample is computed:

based on the conditional expectation of the exposure given the proxy from the first model (method="delta").
based on multiple sampling of the coefficients from the first model (method="MI"). For each sample an exposure is computed, a linear model is then estimated based on this exposure. The results are then pooled using mice::pool.

When using the delta method, the uncertainty is decomposed into two parts:

one related to the finite number of observations in the second sample.
one related to the estimation of the parameters in the calibration model, to account for the fact that \(X\) is estimated and not observed.

Value

A data.frame containing the estimates, standard errors, confidence intervals and p-values for each regression coefficient. The output has an attribute "regression" containing the fitted linear model (ignoring the uncertainty related to the calibration) and an attribute "var.add" representing additional variance-covariance matrix due to the calibration.

Examples

library(lava)
n <- 1e2

## linear case
mSim.lin <- lvm(fMRI ~ occ, occ ~ blood)
distribution(mSim.lin, ~blood) <- uniform.lvm(-0.9,2)

set.seed(10)
d1.lin <- sim(mSim.lin, n = n)[,c("occ","blood"),drop=FALSE]
d2.lin <- sim(mSim.lin, n = n)[,c("fMRI","blood"),drop=FALSE]

e1.lin <- lm(occ ~ blood, data = d1.lin)
res.lin <- calreg(fMRI ~ occ, data = d2.lin, calibration = e1.lin)
res.lin
summary(attr(res.lin, "regression"))$coef

## non-linear case
mSim.nlin <- lvm(fMRI ~ occ, occ[mu:0.1] ~ 0*blood)
distribution(mSim.nlin, ~blood) <- uniform.lvm(-0.9,3)
constrain(mSim.nlin, mu~blood) <- function(x){2*x/(1+x)}

set.seed(10)
d1.nlin <- sim(mSim.nlin, n = n)[,c("occ","blood"),drop=FALSE]
d2.nlin <- sim(mSim.nlin, n = n)[,c("fMRI","blood"),drop=FALSE]

## gg <- ggplot(d1.nlin, aes(x = blood)) + geom_point(aes(y = occ))

e1.nlin <- nls(occ ~ (occmax * blood)/(EC + blood), data = d1.nlin,
         start = list(occmax = 1, EC = 1))
d1.nlin$fit <- fitted(e1.nlin)

## gg + geom_line(data = d1.nlin, aes(y = fit), color = "red")

res.nlin <- calreg(fMRI ~ occ, data = d2.nlin, calibration = e1.nlin)
res.nlin
summary(attr(res.nlin, "regression"))$coef

bozenne/butils documentation built on July 3, 2024, 2:34 p.m.