simex-package: Error or misclassification correction in models using...

Description Details Author(s) References See Also Examples

Description

Package simex is an implementation of the SIMEX–algorithm by Cook and Stephanski and the MCSIMEX–Algorithm by Küchenhoff, Mwalili and Lesaffre.

Details

Package: simex
Type: Package
Version: 1.8
Date: 2019-07-28
License: GPL 2 or above
LazyLoad: yes

The package includes first of all the implementation for the SIMEX– and MCSIMEX–Algorithms. Jackknife and asymptotic variance estimation are implemented. Various methods and analytic tools are provided for a simple and fast access to the SIMEX– and MCSIMEX–Algorithm.

Functions simex() and mcsimex() can be used on models issued from lm(), glm() with asymtotic estimation. Models from nls(), gam() (package mgcv), polr() (package MASS), lme(), nlme() (package nlme) and coxph() (package survival) can also be corrected with these algorithms, but without asymptotic estimations.

Author(s)

Wolfgang Lederer, Heidi Seibold, Helmut Küchenhoff

Maintainer: Wolfgang Lederer,wolfgang.lederer@gmail.com

References

Lederer, W. and Küchenhoff, H. (2006) A short introduction to the SIMEX and MCSIMEX. R News, 6/4, 26 – 31

See Also

simex, mcsimex, misclass

and for functions generating the initial naive models: lm, glm, nls, gam, lme, nlme, polr, coxph

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
# See example(simex) and example(mcsimex)
## Seed
set.seed(49494)

## simulating the measurement error standard deviations
sd_me1 <- 0.3
sd_me2 <- 0.4
temp <- runif(100, min = 0, max = 0.6)
sd_me_het1 <- sort(temp)
temp2 <- rnorm(100, sd = 0.1)
sd_me_het2 <- abs(sd_me_het1 + temp2)

## simulating the independent variables x (real and with measurement error):
x_real1 <- rnorm(100)
x_real2 <- rpois(100, lambda = 2)
x_real3 <- -4*x_real1 + runif(100, min = -2, max = 2)  # correlated to x_real

x_measured1 <- x_real1 + sd_me1 * rnorm(100)
x_measured2 <- x_real2 + sd_me2 * rnorm(100)
x_het1 <- x_real1 + sd_me_het1 * rnorm(100)
x_het2 <- x_real3 + sd_me_het2 * rnorm(100)

## calculating dependent variable y:
y1  <- x_real1 + rnorm(100, sd = 0.05)
y2 <- x_real1 + 2*x_real2 + rnorm(100, sd = 0.08)
y3 <- x_real1 + 2*x_real3 + rnorm(100, sd = 0.08)


### one variable with homoscedastic measurement error
(model_real <- lm(y1  ~ x_real1))

(model_naiv <- lm(y1  ~ x_measured1, x = TRUE))

(model_simex <- simex(model_naiv, SIMEXvariable = "x_measured1", measurement.error = sd_me1))
plot(model_simex)


### two variables with homoscedastic measurement errors
(model_real2 <- lm(y2 ~ x_real1 + x_real2))

(model_naiv2 <- lm(y2 ~ x_measured1 + x_measured2, x = TRUE))

(model_simex2 <- simex(model_naiv2, SIMEXvariable = c("x_measured1", "x_measured2"),
                       measurement.error = cbind(sd_me1, sd_me2)))

plot(model_simex2)


### one variable with increasing heteroscedastic measurement error
model_real

(mod_naiv1 <- lm(y1  ~ x_het1, x = TRUE))

(mod_simex1 <- simex(mod_naiv1, SIMEXvariable = "x_het1",
     measurement.error = sd_me_het1, asymptotic = FALSE))

plot(mod_simex1)

## Not run: 
### two correlated variables with heteroscedastic measurement errors
(model_real3 <- lm(y3 ~ x_real1 + x_real3))

(mod_naiv2 <- lm(y3 ~ x_het1 + x_het2, x = TRUE))

(mod_simex2 <- simex(mod_naiv2, SIMEXvariable = c("x_het1", "x_het2"),
                     measurement.error = cbind(sd_me_het1, sd_me_het2), asymptotic = FALSE))
plot(mod_simex2)


### two variables, one with homoscedastic, one with heteroscedastic measurement error
model_real2

(mod_naiv3 <- lm(y2 ~ x_measured1 + x_het2, x = TRUE))

(mod_simex3 <- simex(mod_naiv3, SIMEXvariable = c("x_measured1", "x_het2"),
                     measurement.error = cbind(sd_me1, sd_me_het2), asymptotic = FALSE))


### glm: two variables, one with homoscedastic, one with heteroscedastic measurement error
t <- x_real1 + 2*x_real2
g <- 1 / (1 + exp(-t))
u <- runif(100)
ybin <- as.numeric(u < g)


(logit_real <- glm(ybin ~ x_real1 + x_real2, family = binomial))

(logit_naiv <- glm(ybin ~ x_measured1 + x_het2, x = TRUE, family = binomial))

(logit_simex <- simex(logit_naiv, SIMEXvariable = c("x_measured1", "x_het2"),
                      measurement.error = cbind(sd_me1, sd_me_het2), asymptotic = FALSE))
summary(logit_simex)
print(logit_simex)
plot(logit_simex)

## End(Not run)

Example output

Call:
lm(formula = y1 ~ x_real1)

Coefficients:
(Intercept)      x_real1  
  -0.004752     0.994554  


Call:
lm(formula = y1 ~ x_measured1, x = TRUE)

Coefficients:
(Intercept)  x_measured1  
   -0.01217      0.90327  


Naive model:
lm(formula = y1 ~ x_measured1, x = TRUE)

SIMEX-Variables: x_measured1
Number of Simulations: 100

Coefficients:
(Intercept)  x_measured1  
   -0.01528      0.97007  


Call:
lm(formula = y2 ~ x_real1 + x_real2)

Coefficients:
(Intercept)      x_real1      x_real2  
   -0.01411      0.99094      2.00011  


Call:
lm(formula = y2 ~ x_measured1 + x_measured2, x = TRUE)

Coefficients:
(Intercept)  x_measured1  x_measured2  
     0.2050       0.8909       1.8593  


Naive model:
lm(formula = y2 ~ x_measured1 + x_measured2, x = TRUE)

SIMEX-Variables: x_measured1, x_measured2
Number of Simulations: 100

Coefficients:
(Intercept)  x_measured1  x_measured2  
   -0.04414      0.93716      1.97311  


Call:
lm(formula = y1 ~ x_real1)

Coefficients:
(Intercept)      x_real1  
  -0.004752     0.994554  


Call:
lm(formula = y1 ~ x_het1, x = TRUE)

Coefficients:
(Intercept)       x_het1  
    -0.0189       0.8513  


Naive model:
lm(formula = y1 ~ x_het1, x = TRUE)

SIMEX-Variables: x_het1
Number of Simulations: 100

Coefficients:
(Intercept)       x_het1  
  -0.002565     0.938262  


Call:
lm(formula = y3 ~ x_real1 + x_real3)

Coefficients:
(Intercept)      x_real1      x_real3  
    0.01734      0.97012      1.99334  


Call:
lm(formula = y3 ~ x_het1 + x_het2, x = TRUE)

Coefficients:
(Intercept)       x_het1       x_het2  
    -0.1016      -0.1986       1.7146  


Naive model:
lm(formula = y3 ~ x_het1 + x_het2, x = TRUE)

SIMEX-Variables: x_het1, x_het2
Number of Simulations: 100

Coefficients:
(Intercept)       x_het1       x_het2  
    -0.1096      -0.1167       1.7426  


Call:
lm(formula = y2 ~ x_real1 + x_real2)

Coefficients:
(Intercept)      x_real1      x_real2  
   -0.01411      0.99094      2.00011  


Call:
lm(formula = y2 ~ x_measured1 + x_het2, x = TRUE)

Coefficients:
(Intercept)  x_measured1       x_het2  
     3.9284       1.3993       0.1356  


Naive model:
lm(formula = y2 ~ x_measured1 + x_het2, x = TRUE)

SIMEX-Variables: x_measured1, x_het2
Number of Simulations: 100

Coefficients:
(Intercept)  x_measured1       x_het2  
     3.9008       2.1769       0.3238  


Call:  glm(formula = ybin ~ x_real1 + x_real2, family = binomial)

Coefficients:
(Intercept)      x_real1      x_real2  
     -1.088        0.690        2.895  

Degrees of Freedom: 99 Total (i.e. Null);  97 Residual
Null Deviance:	    80.99 
Residual Deviance: 40.09 	AIC: 46.09

Call:  glm(formula = ybin ~ x_measured1 + x_het2, family = binomial, 
    x = TRUE)

Coefficients:
(Intercept)  x_measured1       x_het2  
  1.8511007    0.1799577   -0.0006873  

Degrees of Freedom: 99 Total (i.e. Null);  97 Residual
Null Deviance:	    80.99 
Residual Deviance: 80.49 	AIC: 86.49

Naive model:
glm(formula = ybin ~ x_measured1 + x_het2, family = binomial,     x = TRUE)

SIMEX-Variables: x_measured1, x_het2
Number of Simulations: 100

Coefficients:
(Intercept)  x_measured1       x_het2  
    1.83420      0.28642      0.02438  

Call:
simex(model = logit_naiv, SIMEXvariable = c("x_measured1", "x_het2"), 
    measurement.error = cbind(sd_me1, sd_me_het2), asymptotic = FALSE)

Naive model: 
glm(formula = ybin ~ x_measured1 + x_het2, family = binomial, 
    x = TRUE)

Simex variable :
                    x_measured1 x_het2           
Measurement error : "0.3"       "heteroscedastic"


Number of iterations:  100 

Residuals: 
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-0.913794  0.117733  0.130605  0.001902  0.154765  0.233736 

Coefficients: 

Jackknife variance: 
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.83420    0.29571   6.203 1.36e-08 ***
x_measured1  0.28642    1.00333   0.285    0.776    
x_het2       0.02438    0.24870   0.098    0.922    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Naive model:
glm(formula = ybin ~ x_measured1 + x_het2, family = binomial,     x = TRUE)

SIMEX-Variables: x_measured1, x_het2
Number of Simulations: 100

Coefficients:
(Intercept)  x_measured1       x_het2  
    1.83420      0.28642      0.02438  

simex documentation built on July 31, 2019, 9:03 a.m.