misclass: Misclassification of exposure or outcome
In episensr: Basic Sensitivity Analysis of Epidemiological Results

misclass

R Documentation

Misclassification of exposure or outcome

Description

misclass() and probsens() allow to provide adjusted measures of association corrected for misclassification of the exposure or the outcome.

Usage

misclass(
  case,
  exposed,
  type = c("exposure", "exposure_pv", "outcome"),
  bias_parms = NULL,
  alpha = 0.05
)

probsens(
  case,
  exposed,
  type = c("exposure", "exposure_pv", "outcome"),
  reps = 1000,
  seca = list(dist = c("constant", "uniform", "triangular", "trapezoidal", "normal",
    "beta"), parms = NULL),
  seexp = NULL,
  spca = list(dist = c("constant", "uniform", "triangular", "trapezoidal", "normal",
    "beta"), parms = NULL),
  spexp = NULL,
  corr_se = NULL,
  corr_sp = NULL,
  alpha = 0.05
)

Arguments

`case`	Outcome variable. If a variable, this variable is tabulated against.
`exposed`	Exposure variable.
`type`	Choice of misclassification: exposure: bias analysis for exposure misclassification; corrections using sensitivity and specificity: nondifferential and independent errors, exposure_pv: bias analysis for exposure misclassification; corrections using PPV/NPV: nondifferential and independent errors, outcome: bias analysis for outcome misclassification.
`bias_parms`	Vector defining the bias parameters. This vector has 4 elements between 0 and 1, in the following order: Sensitivity of exposure (when `type = "exposure"`) or outcome (when `type = "outcome"`) classification among those with the outcome (when `type = "exposure"`) or exposure (when `type = "outcome"`), Sensitivity of exposure (or outcome) classification among those without the outcome (or exposure), Specificity of exposure (or outcome) classification among those with the outcome (or exposure), and Specificity of exposure (or outcome) classification among those without the outcome (or exposure). If PPV/NPV is chosen in case of exposure misclassification, this vector is the following: Positive predictive value among those with the outcome, Positive predictive value among those without the outcome, Negative predictive value among those with the outcome, Negative predictive value among those without the outcome.
`alpha`	Significance level.
`reps`	Number of replications to run.
`seca`	List defining sensitivity among cases: The sensitivity of exposure classification among those with the outcome (when `type = "exposure"`), or The sensitivity of outcome classification among those with the exposure (when `type = "outcome"`). The first argument provides the probability distribution function (constant, uniform, triangular, trapezoidal, truncated normal, or beta) and the second its parameters as a vector. Lower and upper bounds of the truncated normal have to be between 0 and 1. constant: constant value, uniform: min, max, triangular: lower limit, upper limit, mode, trapezoidal: min, lower mode, upper mode, max, normal: lower bound, upper bound, mean, sd. beta: alpha, beta. If PPV/NPV is chosen in case of exposure misclassification, the same four (4) parameters `seca`, `seexp`, `spca`, `spexp` as for Se/Sp have to be used but with the following meaning, and only for a beta distributions and no correlation between distributions: Positive predictive value among those with the outcome, Positive predictive value among those without the outcome, Negative predictive value among those with the outcome, Negative predictive value among those without the outcome.
`seexp`	List defining sensitivity among controls: The sensitivity of exposure classification among those without the outcome (when `type = "exposure"`), or The sensitivity of outcome classification among those without the exposure (when `type = "outcome"`).
`spca`	List as above for `seca` but for specificity.
`spexp`	List as above for `seexp` but for specificity.
`corr_se`	Correlation between case and non-case sensitivities. If PPV/NPV is chosen in case of exposure misclassification, correlations are set to NULL.
`corr_sp`	Correlation between case and non-case specificities.

Value

A list with elements (for misclass()):

`obs_data`	The analyzed 2 x 2 table from the observed data.
`corr_data`	The expected observed data given the true data assuming misclassification.
`obs_measures`	A table of observed relative risk and odds ratio with confidence intervals.
`adj_measures`	A table of corrected relative risks and odds ratios.
`bias_parms`	Input bias parameters.

A list with elements (for probsens()):

`obs_data`	The analyzed 2 x 2 table from the observed data.
`obs_measures`	A table of observed relative risk and odds ratio with confidence intervals.
`adj_measures`	A table of corrected relative risks and odds ratios.
`sim_df`	Data frame of random parameters and computed values.
`reps`	Number of replications.

Simple bias analysis with `misclass()`

misclass() allows you to run a simple sensitivity analysis for disease or exposure misclassification. Confidence interval for odds ratio adjusted using sensitivity and specificity is computed as in Chu et al. (2006), for exposure misclassification.

For exposure misclassification, bias-adjusted measures are available using sensitivity and specificity, or using predictive values.

Probabilistic sensitivity analysis with `probsens()`

probsens() performs a summary-level probabilistic sensitivity analysis to correct for exposure misclassification or outcome misclassification and random error. Non-differential misclassification is assumed when only the two bias parameters seca and spca are provided. Adding the 2 parameters seexp and spexp (i.e. providing the 4 bias parameters) evaluates a differential misclassification.

For exposure misclassification, bias-adjusted measures are available using sensitivity and specificity, or using predictive values. However, only a beta distribution is available for predictive values.

Correlations between sensitivity (or specificity) of exposure classification among cases and controls can be specified and use the NORmal To Anything (NORTA) transformation (Li & Hammond, 1975).

In case of negative (<=0) adjusted count in the 2-by-2 table following given prior Se/Sp distribution(s), draws are discarded.

Updated calculations, probabilistic bias analysis

episensr 2.0.0 introduced updated calculations of probabilistic bias analyses by (1) using the NORTA transformation to define a correlation between distributions, and (2) sampling true prevalences and then sampling the adjusted cell counts rather than just using the expected cell counts from a simple quantitative bias analysis. This updated version should be preferred but if you need to run an old analysis, you can easily revert to the computation using probsens_legacy() as follows:

library(episensr)
probsens <- probsens_legacy

References

Fox, M.P, MacLehose, R.F., Lash, T.L., 2021 Applying Quantitative Bias Analysis to Epidemiologic Data, pp.141–176, 233–256, 293–308, Springer.

Li, S.T., Hammond, J.L., 1975. Generation of Pseudorandom Numbers with Specified Univariate Distributions and Correlation Coefficients. IEEE Trans Syst Man Cybern 5:557-561.

Chu, H., Zhaojie, W., Cole, S.R., Greenland, S., Sensitivity analysis of misclassification: A graphical and a Bayesian approach, Annals of Epidemiology 2006;16:834-841.

Barros, A. & Hirakata, V.N., 2003. Alternatives for Logistic Regression in Cross-sectional Studies: An Empirical Comparison of Models that Directly Estimate the Prevalence Ratio. BMC Medical Research Methodology 3:21.

McNutt, L-A, Wu, C., Xue, X., Hafner J.P., 2003. Estimating the Relative Risk in Cohort Studies and Clinical Trials of Common Outcomes. American Journal of Epidemiology 157(10):940-943.

Greenland, S. (2004). Model-based Estimation of Relative Risks and Other Epidemiologic Measures in Studies of Common Outcomes and in Case-Control Studies. American Journal of Epidemiology 160(4):301-305.

Zhou, G. (2004). A Modified Poisson Regression Approach to Prospective Studies with Binary Data. American Journal of Epidemiology 159(7):702-706.

Examples

# The data for this example come from:
# Fink, A.K., Lash,  T.L. A null association between smoking during pregnancy
# and breast cancer using Massachusetts registry data (United States).
# Cancer Causes Control 2003;14:497-503.
misclass(matrix(c(215, 1449, 668, 4296),
dimnames = list(c("Breast cancer+", "Breast cancer-"),
c("Smoker+", "Smoker-")),
nrow = 2, byrow = TRUE),
type = "exposure",
bias_parms = c(.78, .78, .99, .99))

misclass(matrix(c(4558, 3428, 46305, 46085),
dimnames = list(c("AMI death+", "AMI death-"),
c("Male+", "Male-")),
nrow = 2, byrow = TRUE),
type = "outcome",
bias_parms = c(.53, .53, .99, .99))

# The following example comes from Chu et al. Sensitivity analysis of
# misclassification: A graphical and a Bayesian approach.
# Annals of Epidemiology 2006;16:834-841.
misclass(matrix(c(126, 92, 71, 224),
dimnames = list(c("Case", "Control"), c("Smoker +", "Smoker -")),
nrow = 2, byrow = TRUE),
type = "exposure",
bias_parms = c(.94, .94, .97, .97))

# The next example, using PPV/NPV, comes from Bodnar et al. Validity of birth
# certificate-derived maternal weight data.
# Paediatric and Perinatal Epidemiology 2014;28:203-212.
misclass(matrix(c(599, 4978, 31175, 391851),
dimnames = list(c("Preterm", "Term"), c("Underweight", "Normal weight")),
nrow = 2, byrow = TRUE),
type = "exposure_pv",
bias_parms = c(0.65, 0.74, 1, 0.98))
#
# The data for this example come from:
# Greenland S., Salvan A., Wegman D.H., Hallock M.F., Smith T.J.
# A case-control study of cancer mortality at a transformer-assembly facility.
# Int Arch Occup Environ Health 1994; 66(1):49-54.
greenland <- matrix(c(45, 94, 257, 945), dimnames = list(c("BC+", "BC-"),
c("Smoke+", "Smoke-")), nrow = 2, byrow = TRUE)
set.seed(123)
# Exposure misclassification, non-differential
probsens(greenland, type = "exposure", reps = 20000,
seca = list("trapezoidal", c(.75, .85, .95, 1)),
spca = list("trapezoidal", c(.75, .85, .95, 1)))

# Exposure misclassification, differential
probsens(greenland, type = "exposure", reps = 20000,
seca = list("trapezoidal", c(.75, .85, .95, 1)),
seexp = list("trapezoidal", c(.7, .8, .9, .95)),
spca = list("trapezoidal", c(.75, .85, .95, 1)),
spexp = list("trapezoidal", c(.7, .8, .9, .95)),
corr_se = .8,
corr_sp = .8)

probsens(greenland, type = "exposure", reps = 20000,
seca = list("beta", c(908, 16)),
seexp = list("beta", c(156, 56)),
spca = list("beta", c(153, 6)),
spexp = list("beta", c(205, 18)),
corr_se = .8,
corr_sp = .8)

probsens(matrix(c(338, 490, 17984, 32024),
dimnames = list(c("BC+", "BC-"), c("Smoke+", "Smoke-")), nrow = 2, byrow = TRUE),
type = "exposure",
reps = 1000,
seca = list("trapezoidal", c(.8, .9, .9, 1)),
spca = list("trapezoidal", c(.8, .9, .9, 1)))

# Disease misclassification
probsens(matrix(c(173, 602, 134, 663),
dimnames = list(c("BC+", "BC-"), c("Smoke+", "Smoke-")), nrow = 2, byrow = TRUE),
type = "outcome",
reps = 20000,
seca = list("uniform", c(.8, 1)),
spca = list("uniform", c(.8, 1)))

probsens(matrix(c(338, 490, 17984, 32024),
dimnames = list(c("BC+", "BC-"), c("Smoke+", "Smoke-")), nrow = 2, byrow = TRUE),
type = "outcome",
reps = 20000,
seca = list("uniform", c(.2, .6)),
seexp = list("uniform", c(.1, .5)),
spca = list("uniform", c(.99, 1)),
spexp = list("uniform", c(.99, 1)),
corr_se = .8,
corr_sp = .8)

probsens(matrix(c(173, 602, 134, 663),
dimnames = list(c("BC+", "BC-"), c("Smoke+", "Smoke-")), nrow = 2, byrow = TRUE),
type = "outcome",
reps = 20000,
seca = list("beta", c(100, 5)),
seexp = list("beta", c(110, 10)),
spca = list("beta", c(120, 15)),
spexp = list("beta", c(130, 30)),
corr_se = .8,
corr_sp = .8)

# Fox M.P., MacLehose R.F., Lash T.L.
# SAS and R code for probabilistic quantitative bias analysis for
# misclassified binary variables and binary unmeasured confounders
# Int J Epidemiol 2023:1624-1633.
## Not run: 
fox <- matrix(c(40, 20, 60, 80),
dimnames = list(c("Diseased", "Non-diseased"), c("Exposed", "Unexposed")),
nrow = 2, byrow = TRUE)
set.seed(1234)
probsens(fox, type = "exposure", reps = 10^6,
seca = list("beta", c(25, 3)),
spca = list("trapezoidal", c(.9, .93, .97, 1)),
seexp = list("beta", c(47, 7)),
spexp = list("trapezoidal", c(.8, .83, .87, .9)),
corr_se = .8,
corr_sp = .8)

## End(Not run)

# Using PPV/NPV, from Bodnar et al. Validity of birth certificate-derived maternal
# weight data. Paediatric and Perinatal Epidemiology 2014;28:203-212.
set.seed(1234)
probsens(matrix(c(599, 4978, 31175, 391851),
dimnames = list(c("Preterm", "Term"), c("Underweight", "Normal weight")),
nrow = 2, byrow = TRUE),
type = "exposure_pv", reps = 10^6,
seca = list("beta", c(50, 27)),  ## PPV_case
spca = list("beta", c(120, .5)),  ## NPV_case
seexp = list("beta", c(132, 47)),  ## PPV_ctrl
spexp = list("beta", c(115, 2)))  ## NPV_ctrl

episensr documentation built on June 8, 2025, 10:34 a.m.