Home

/

CRAN

/

SAMBA

/

obsloglikEM: Estimate parameters in the disease model using observed data...

obsloglikEM: Estimate parameters in the disease model using observed data...
In SAMBA: Selection and Misclassification Bias Adjustment for Logistic Regression Models

Description Usage Arguments Details Value References Examples

View source: R/obsloglikEM.R

obsloglikEM jointly estimates the disease model and sensitivity model parameters using profile likelihood methods. Estimation involves an expectation-maximization algorithm.

1 2	obsloglikEM(Dstar, Z, X, start, beta0_fixed = NULL, weights = NULL, expected = TRUE, tol = 1e-06, maxit = 50)

`Dstar`	Numeric vector containing observed disease status. Should be coded as 0/1
`Z`	Numeric matrix of covariates in disease model. 'Z' should not contain an intercept
`X`	Numeric matrix of covariates in sensitivity model. Set to NULL to fit model with no covariates in sensitivity model. 'X' should not contain an intercept
`start`	Numeric vector of starting values for theta and beta (theta, beta). Theta is the parameter of the disease model, and beta is the parameter of the sensitivity model
`beta0_fixed`	Optional numeric vector of values of sensitivity model intercept to profile over. If a single value, corresponds to fixing intercept at specified value. Default is NULL
`weights`	Optional vector of patient-specific weights used for selection bias adjustment. Default is NULL
`expected`	Whether or not to calculate the covariance matrix via the expected fisher information matrix. Default is TRUE
`tol`	stop estimation when subsequent log-likelihood estimates are within this value
`maxit`	Maximum number of iterations of the estimation algorithm

We are interested in modeling the relationship between binary disease status and covariates Z using a logistic regression model. However, D may be misclassified, and our observed data may not well-represent the population of interest. In this setting, we estimate parameters from the disease model using the following modeling framework. Notation:

D: Binary disease status of interest.
D*: Observed binary disease status. Potentially a misclassified version of D. We assume D = 0 implies D* = 0.
S: Indicator for whether patient from population of interest is included in the analytical dataset.
Z: Covariates in disease model of interest.
W: Covariates in model for patient inclusion in analytical dataset (selection model).
X: Covariates in model for probability of observing disease given patient has disease (sensitivity model).

Model Structure:

Disease Model: logit(P(D=1|X)) = theta_0 + theta_Z Z
Selection Model: P(S=1|W,D)
Sensitivity Model: logit(P(D* = 1| D = 1, S = 1, X)) = beta_0 + beta_X X

A "SAMBA.fit" object with nine elements: 'param', the final estimate of the coeficients organized as (theta, beta), 'variance', the covariance matrix of the final estimate, param.seq', the sequence of estimates at each step of the EM algorithm, and 'loglik.seq', the log likelihood at each step. The rest of the elements are Dstar', 'X', 'Z', and 'weights'.

Statistical inference for association studies using electronic health records: handling both selection bias and outcome misclassification Lauren J Beesley and Bhramar Mukherjee medRxiv 2019.12.26.19015859

library(SAMBA)
# These examples are generated from the vignette. See it for more details.

# Generate IPW weights from the true model
expit <- function(x) exp(x) / (1 + exp(x))
prob.WD <- expit(-0.6 + 1 * samba.df$D + 0.5 * samba.df$W)
weights <- nrow(samba.df) * (1  / prob.WD) / (sum(1 / prob.WD))

# Get initial parameter estimates
logit <- function(x) log(x / (1 - x))
fitBeta  <- glm(Dstar ~ X, binomial(), data = samba.df)
fitTheta <- glm(Dstar ~ Z, binomial(), data = samba.df)

sens <- sensitivity(samba.df$Dstar, samba.df$X, mean(samba.df$D), r = 2)
start <- c(coef(fitTheta), logit(sens$c_marg), coef(fitBeta)[2])

# Direct observed data likelihood maximization without fixed intercept
fit1 <- obsloglikEM(samba.df$Dstar, samba.df$Z, samba.df$X, start = start,
                 weights = weights)
obsloglik1 <- list(param = fit1$param, variance = diag(fit1$variance))

# Direct observed data likelihood maximization with fixed intercept
fit2   <- obsloglikEM(samba.df$Dstar, samba.df$Z, samba.df$X, start = start,
                 beta0_fixed = logit(sens$c_marg), weights = weights)
# since beta0 is fixed, its variance is NA

list(param = fit2$param, variance = diag(fit2$variance))

SAMBA documentation built on Feb. 20, 2020, 9:07 a.m.

SAMBA index

README.md Using SAMBA

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

SAMBA
Selection and Misclassification Bias Adjustment for Logistic Regression Models

obsloglikEM: Estimate parameters in the disease model using observed data...
In SAMBA: Selection and Misclassification Bias Adjustment for Logistic Regression Models

Description

Usage

Arguments

Details

Value

References

Examples

Related to obsloglikEM in SAMBA...

R Package Documentation

Browse R Packages

We want your feedback!

SAMBA Selection and Misclassification Bias Adjustment for Logistic Regression Models

obsloglikEM: Estimate parameters in the disease model using observed data... In SAMBA: Selection and Misclassification Bias Adjustment for Logistic Regression Models

Description

Usage

Arguments

Details

Value

References

Examples

Related to obsloglikEM in SAMBA...

R Package Documentation

Browse R Packages

We want your feedback!

SAMBA
Selection and Misclassification Bias Adjustment for Logistic Regression Models

obsloglikEM: Estimate parameters in the disease model using observed data...
In SAMBA: Selection and Misclassification Bias Adjustment for Logistic Regression Models