Zachary McCaw
`r Sys.Date()`

Data for $n=10^{3}$ subjects were simulated from the model:

$$ \eta_{i} = x_{i}'\beta \ z_{i} \sim N(\eta_{i},1) \ y_{i} = I[z_{i}>0] $$

Only the binary outcomes $y_{i}$, and not the latent variables $z_{i}$, are observed. The linear predictor $\eta_{i}$ depends on three independent covariates $(x_{1,i},x_{2,i},x_{3,i})$, drawn from the normal, $t_{3}$ and $\chi_{2}^{2}$ distributions. In each case, the distribution was centered and scaled to have mean zero and unit variance. The regression coefficient $\beta$ was chosen s.t. the approximate power of a Wald test to reject $H_{0}:\beta_{j}=0$ is $50\%$.

The function `fitProbit`

uses the parameter expanded EM algorithm to estimate $\beta$, then attempts to refine the estimate using IRLS.

For each of $R=10^{3}$ Monte Carlo data sets, the regression coefficient $\beta$ and its variance were estimated using `fitProbit`

.

```
# Observations
n = 1e3;
# MC replicates
R = 1e3;
# Regression Coefficients
gamma = (qnorm(0.975))/sqrt(n);
beta = gamma*c(1,1,1);
# Simulation
set.seed(200);
Sim = foreach(r=1:R,.combine=rbind) %do% {
# Draw covariates
x1 = rnorm(n);
x2 = rt(n,df=3)/sqrt(3);
x3 = (rchisq(n,df=2)-2)/2;
X = cbind(x1,x2,x3);
# Linear predictor
eta = X %*% beta;
# Outcome
y = simProbit(eta);
# Fit probit model
M = fitProbit(y=y,X=X);
# Coefficients
b = M$Beta;
names(b) = paste0("b",seq(from=0,to=3));
# Variance estimates
v = diag(M$Var);
names(v) = paste0("v",seq(from=0,to=3));
# Output
Out = c(b,v);
return(Out);
}
```

Estimation of the regression coefficients is unbiased.

The variance $\sigma_{j}^{2}$ of $\beta_{j}$ was estimated using component $I_{jj}$ of the Fisher information matrix. To assess the accuracy of this estimate, the mean of $\sigma_{j}^{2}$ across MC replicates was compared with the empirical variance of the point estimates:

$$
\rho_{j} = \frac{\text{Mean}*{r}\left(\sigma*{j}^{2,(r)}\right)}{\text{Var}*{r}\left(\beta*{j}^{(r)}\right)}
$$

Confidence intervals for $\rho_{j}$ were constructed using its asymptotic variance on the log scale. Estimation of the variance in $\beta_{j}$ is accurate in that the confidence interval for each $\rho_{j}$ includes one.

Data for $n=10^{3}$ subjects were simulated from the model:

$$ \tau_{i} \sim \Gamma(\nu/2,\nu/2) \ z_{i} \sim N(\eta_{i},1/\tau_{i}) \ y_{i} = I(z_{i}>0) $$

The linear predictor $\eta_{i}$ was constructed as for the probit model. In the robit model, the latent $z_{i}$ follow a marginal $t_{\nu}$ distribution.

The function `fitRobit`

uses the parameter expanded EM algorithm to estimate $\beta$

For each of $R=10^{3}$ Monte Carlo data sets, the regression coefficient $\beta$ and its variance were estimated using `fitRobit`

, allowing for up to $10$ iterations of EM.

```
# Simulation
set.seed(201);
rSim = foreach(r=1:R,.combine=rbind) %do% {
# Draw covariates
x1 = rnorm(n);
x2 = rt(n,df=3)/sqrt(3);
x3 = (rchisq(n,df=2)-2)/2;
X = cbind(x1,x2,x3);
# Linear predictor
eta = X %*% beta;
# Outcome
y = simRobit(eta=eta,nu=7);
# Fit probit model
M = fitRobit(y=y,X=X,m1=10);
# Coefficients
b = M$Beta;
names(b) = paste0("b",seq(from=0,to=3));
# Variance estimates
v = diag(M$Var);
names(v) = paste0("v",seq(from=0,to=3));
# Output
Out = c(b,v);
return(Out);
}
```

Estimation of the regression coefficients is unbiased.

For each coefficient $\beta_{j}$, the ratio $\rho_{j}$ of the mean estimated variance, across MC replicates, was compared with the empirical variance of the point estimates. Estimation of the variance in $\beta_{j}$ is accurate in so far as the confidence interval for each $\rho_{j}$ includes one.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.