knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, fig.width = 6, fig.height = 5 )

The main function of the 'gfilinreg' package, namely `gfilinreg`

, returns some
weighted samples of the generalized fiducial distribution of the parameters of
a linear regression model whose error terms are distributed according to a
Gaussian, Student, Cauchy, or a logistic distribution.

Let's have a look at a simple linear regression model with Gaussian errors:

library(gfilinreg) set.seed(666L) x <- rgamma(6L, shape = 10, rate = 1) y <- rnorm(6L, mean = x, sd = 2) dat <- data.frame(x = x, y = y) fidsamples <- gfilinreg(y ~ x, data = dat, distr = "normal", L = 100L, nthreads = 2L)

The algorithm involves a partition of the hypercube ${(0,1)}^{p+1}$, where $p$
is the dimension of the model (the number of columns of the $X$ matrix), and
the integer argument `L`

of the `gfilinreg`

function is the desired number of
subdivisions of each edge of the hypercube.

*Note.**I set nthreads = 2L because of CRAN constraints: CRAN does not allow more
than two parallel computations. This is also why I set nthreads = 2L in all
the examples of the package*.

A quick summary of the fiducial samples is provided by the `gfiSummary`

function:

```
gfiSummary(fidsamples)
```

Let's compare with `lm`

:

lmfit <- lm(y ~ x, data = dat) coefficients(lmfit) sigma(lmfit) confint(lmfit)

Excepted for the scale parameter $\sigma$, results are very similar.

One can generate simulations of fiducial predictive distributions and, based on these distributions, one can get fiducial prediction intervals:

new <- data.frame(x = c(9, 10, 11)) fidpred <- gfilinregPredictive(fidsamples, newdata = new) gfiSummary(fidpred) predict(lmfit, newdata = new, interval = "prediction")

Again, there is a strong agreement between the fiducial results and the frequentist results.

Now we perform some simulations of a "t-test model" with Cauchy errors, we store the fiducial summaries for each simulated dataset and we also store the maximum likelihood estimates thanks to the 'heavy' package. We simulate $1000$ datasets.

library(gfilinreg) library(heavy) library(data.table) nsims <- 1000L MAXLHD <- matrix(NA_real_, nrow = nsims, ncol = 3L) colnames(MAXLHD) <- c("group1", "group2", "sigma") FIDlist <- vector("list", length = nsims) group <- gl(2L, 3L) set.seed(666L) for(i in 1L:nsims){ # simulated dataset dat <- data.frame( group = group, y = c(rcauchy(3L), 2 + rcauchy(3L)) ) # max-likelihood estimates hfit <- heavyLm(y ~ 0 + group, data = dat, family = Cauchy()) MAXLHD[i, ] <- c(hfit[["coefficients"]], sqrt(hfit[["sigma2"]])) # fiducial stuff fidsamples <- gfilinreg(y ~ 0 + group, data = dat, L = 100L, distr = "cauchy") FIDlist[[i]] <- cbind( parameter = c("group1", "group2", "sigma"), as.data.table(gfiSummary(fidsamples)) ) } FID <- rbindlist(FIDlist)

The above simulations are not performed in this vignette. They are available in the package:

library(data.table) data("FID") data("MAXLHD")

Our three estimates of the `group1`

parameter (maximum likelihood,
fiducial mean, fiducial median) have a similar distribution:

library(kde1d) group1_maxlhd <- MAXLHD[, "group1"] group1_fid_mean <- FID[parameter == "group1"][["mean"]] group1_fid_median <- FID[parameter == "group1"][["median"]] kfit_maxlhd <- kde1d(group1_maxlhd, mult = 4) kfit_fid_mean <- kde1d(group1_fid_mean, mult = 4) kfit_fid_median <- kde1d(group1_fid_median, mult = 4) curve( dkde1d(x, kfit_maxlhd), from = -6, to = 6, axes = FALSE, lwd = 3, col = "red", xlab = "beta1", ylab = NA ) axis(1) curve( dkde1d(x, kfit_fid_mean), add = TRUE, lwd = 2, col = "green", lty = "dashed" ) curve( dkde1d(x, kfit_fid_median), add = TRUE, lwd = 2, col = "blue", lty = "dashed" )

The distributions of the three estimates of the `group2`

parameter are similar
as well:

group2_maxlhd <- MAXLHD[, "group2"] group2_fid_mean <- FID[parameter == "group2"][["mean"]] group2_fid_median <- FID[parameter == "group2"][["median"]] kfit_maxlhd <- kde1d(group2_maxlhd, mult = 4) kfit_fid_mean <- kde1d(group2_fid_mean, mult = 4) kfit_fid_median <- kde1d(group2_fid_median, mult = 4) curve( dkde1d(x, kfit_maxlhd), from = -4, to = 8, axes = FALSE, lwd = 3, col = "red", xlab = "beta2", ylab = NA ) axis(1) curve( dkde1d(x, kfit_fid_mean), add = TRUE, lwd = 2, col = "green", lty = "dashed" ) curve( dkde1d(x, kfit_fid_median), add = TRUE, lwd = 2, col = "blue", lty = "dashed" )

For the scale parameter `sigma`

, there are some differences. The maximum
likelihood estimates often underestimate the true value, and the true value is
close to the median of the fiducial medians:

sigma_maxlhd <- MAXLHD[, "sigma"] sigma_fid_mean <- FID[parameter == "sigma"][["mean"]] sigma_fid_median <- FID[parameter == "sigma"][["median"]] kfit_maxlhd <- kde1d(sigma_maxlhd, xmin = 0, mult = 4) kfit_fid_mean <- kde1d(sigma_fid_mean, xmin = 0, mult = 4) kfit_fid_median <- kde1d(sigma_fid_median, xmin = 0, mult = 4) curve( dkde1d(x, kfit_maxlhd), from = 0, to = 4, axes = FALSE, lwd = 2, col = "red", xlab = "sigma", ylab = NA ) axis(1) abline(v = median(sigma_maxlhd), col = "red", lwd = 2, lty = "dashed") abline(v = 1, col = "yellow", lwd = 3) # true value curve( dkde1d(x, kfit_fid_mean), add = TRUE, lwd = 2, col = "green" ) abline(v = median(sigma_fid_mean), col = "green", lwd = 2, lty = "dashed") curve( dkde1d(x, kfit_fid_median), add = TRUE, lwd = 2, col = "blue" ) abline(v = median(sigma_fid_median), col = "blue", lwd = 2, lty = "dashed")

Below are the coverage probabilities of the fiducial confidence intervals estimated from the $500$ simulations.

# group1 group1 <- FID[parameter == "group1"] mean(group1[["lwr"]] < 0) mean(0 < group1[["upr"]]) mean(group1[["lwr"]] < 0 & 0 < group1[["upr"]]) # group2 group2 <- FID[parameter == "group2"] mean(group2[["lwr"]] < 2) mean(2 < group2[["upr"]]) mean(group2[["lwr"]] < 2 & 2 < group2[["upr"]]) # sigma sigma <- FID[parameter == "sigma"] mean(sigma[["lwr"]] < 1) mean(1 < sigma[["upr"]]) mean(sigma[["lwr"]] < 1 & 1 < sigma[["upr"]])

For the parameters `group1`

and `group2`

, the coverage probabilities are close
to the nominal level. For `sigma`

, only the upper bound yields a coverage
probability close to the nominal level. The lower bound is too large on average.
But remember that the datasets we simulated are small ($3$ observations per
group).

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.