knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
predict
methodThe predict
method for mixpoissonreg
objects allows one to obtain standard errors for the means in the scale of the linear predictors. To this end one must set the
argument type
to "link" to indicate that the predicted values will be given in the scale of the linear predictors and se.fit
to TRUE
. The predict
method will return a list
with two components: fit
containing the fitted values and se.fit
containing the standard errors.
For example:
library(mixpoissonreg) fit <- mixpoissonreg(daysabs ~ gender + math + prog, data = Attendance, em_controls = list(maxit=1)) pred_fit <- predict(fit, type = "link", se.fit = TRUE) head(pred_fit$fit) head(pred_fit$se.fit)
We are using the option em_controls = list(maxit=1)
to reduce computational cost.
Since no data was passed, the predicted values were obtained with respect to the model's data.
To pass a new data, one must pass a data.frame
with the values of the covariates of interest to the newdata
argument. For instance:
predict(fit, newdata = data.frame(gender = c("male", "female"), math = c(34, 85), prog = factor(c("General", "Academic"), levels = c("General", "Academic", "Vocational"))), type = "link", se.fit = TRUE)
augment
methodOne can also obtain the standard errors in a tidy manner by using the augment method with the argument type.predict
to "link" and the argument se_fit
to TRUE
. This will return a tibble::tibble
object with several columns, including .fitted
containing the fitted values and .se.fit
containing the standard errors.
The following example shows how to use the augment
method to obtain the fitted values in the scale of the linear predictor along with the corresponding standard errors:
library(dplyr) augment(fit, type.predict = "link", se_fit = TRUE) %>% dplyr::select(.fitted, .se.fit)
predict
methodThe predict
method for mixpoissonreg
objects does not provide standard errors for the fitted values at the response scale, however it provides confidence intervals for the
means at the response scale. To this end it is enough to set the interval
argument to "confidence" since the default type of prediction is already "response". It is also possible
to change the level of confidence by setting the level
argument, which defaults to 0.95
.
For example:
fit_math <- mixpoissonreg(daysabs ~ math, data = Attendance, em_controls = list(maxit=1)) head(predict(fit_math, interval = "confidence")) head(predict(fit_math, interval = "confidence", level = 0.99))
One can, for instance, use this matrix to create a plot of the fitted values against the math
variable along with their confidence intervals using R's base graphics:
data("Attendance", package = "mixpoissonreg") new_math <- seq(0, 100, by=0.25) conf_fit <- predict(fit_math, newdata = data.frame(math = new_math), interval = "confidence") graphics::plot(Attendance$math, Attendance$daysabs, xlab = "Math score", ylab = "Days abscent") curve(exp(fit_math$coefficients$mean[1] + fit_math$coefficients$mean[2]*x), add = TRUE) graphics::lines(new_math, conf_fit[, 2], col = grDevices::rgb(0.7, 0.7, 0.7)) graphics::lines(new_math, conf_fit[, 3], col = grDevices::rgb(0.7, 0.7, 0.7)) graphics::polygon(c(new_math, rev(new_math)), c(conf_fit[, 2], rev(conf_fit[, 3])), col = grDevices::rgb(0.7, 0.7, 0.7, 0.7), border = NA)
Observe that the confidence intervals tend to be narrow. They are confidence intervals for the means, not for the response variables. The intervals that play the role of "confidence intervals" for response variables are the prediction intervals, which are much wider and provide estimate of intervals for which future observations will fall with the prescribed probability.
augment
methodThe augment
method produces confidence intervals for the mean parameters in the response scale by default. The confidence level can be changed by setting the
level
argument to the desired level.
For example, let us consider the fit_math
object and get the confidence intervals with 99% level along with the response variable daysabs
and the covariate math
:
augment(fit_math, level = 0.99) %>% select(math, daysabs, .fittedlwrconf, .fitteduprconf)
Let us now use the augment
method to produce a plot of the data, the fitted curve and the confidence intervals with confidence level of 99%:
library(ggplot2) fit_data <- augment(fit_math) %>% dplyr::select(math, daysabs) %>% dplyr::rename("Math Score" = math, "Days Abscent" = daysabs) new_math <- seq(0, 100, by=0.25) fit_int <- augment(fit_math, newdata = data.frame(math = new_math), level = 0.99) %>% dplyr::rename("Math Score" = math) %>% mutate("Days Abscent" = 0) ggplot(fit_data, aes(x = `Math Score`, y = `Days Abscent`)) + geom_point() + geom_function(fun = function(x){exp(fit_math$coefficients$mean[1] + fit_math$coefficients$mean[2]*x)}, colour = "blue") + geom_ribbon(data = fit_int, aes(ymin = .fittedlwrconf, ymax = .fitteduprconf), fill = "grey70", alpha = 0.7)
predict
methodLet us now use the predict
method to obtain prediction intervals for the reponse variables. As we did not obtain an approximate distribution for the response variable,
we obtain the prediction intervals by simulation, which is computationally intensive.
To obtain prediction intervals, one must set the interval
argument to "prediction" since the default type of prediction is already "response". It is also possible
to change the significance level by entering the level
argument, which defaults to 0.95
, to change the number of mean and prediction parameters generated by setting the nsim_pred
argument to the desired value, the default is 100, and to change the number of response variables y generated for each pair of mean and precision parameters by setting the nsim_pred_y
to the desired value, the default is 100.
For example, we will consider nsim_pred = 50
and nsim_pred_y = 50
to avoid long computations in our example:
fit_math <- mixpoissonreg(daysabs ~ math, data = Attendance, em_controls = list(maxit=1)) head(predict(fit_math, interval = "prediction", nsim_pred = 50, nsim_pred_y = 50)) head(predict(fit_math, interval = "prediction", level = 0.99, nsim_pred = 50, nsim_pred_y = 50))
Notice that the intervals are now much wider.
We will now use the above matrix to create a plot of the fitted values against the math
variable along with their prediction intervals using R's base graphics:
new_math <- seq(0, 100, by=0.25) pred_fit <- predict(fit_math, newdata = data.frame(math = new_math), interval = "prediction", nsim_pred = 50, nsim_pred_y = 50) plot(Attendance$math, Attendance$daysabs, xlab = "Math score", ylab = "Days abscent") curve(exp(fit_math$coefficients$mean[1] + fit_math$coefficients$mean[2]*x), add = TRUE) lines(new_math, pred_fit[, 2], col = grDevices::rgb(0.7, 0.7, 0.7)) lines(new_math, pred_fit[, 3], col = grDevices::rgb(0.7, 0.7, 0.7)) polygon(c(new_math, rev(new_math)), c(pred_fit[, 2], rev(pred_fit[, 3])), col = grDevices::rgb(0.7, 0.7, 0.7, 0.7), border = NA)
Observe that unlike the confidence intervals, the prediction intervals now cover the majority of the observed response variables. These intervals should be used for future observations as they are intervals such that the future response variables will fall with the prescribed probability.
augment
methodTo obtain prediction intervals using the augment
method, one must set the pred_int
argument to TRUE
. The significance level can be changed by setting the
level
argument to the desired level. To change the number of mean and prediction parameter generated, one must set the nsim_pred
argument to the desired value, the default is 100, and to change the number of response variables y generated for each pair of mean and precision parameters, one must set the nsim_pred_y
to the desired value, the default is 100.
For example, let us consider the fit_math
object and obtain prediction intervals with 99% level along with the response variable daysabs
and the covariate math
. We will
also set nsim_pred
and nsim_pred_y
to 10 to reduce computational cost.
augment(fit_math, pred_int = TRUE, level = 0.99, nsim_pred = 10, nsim_pred_y = 10) %>% select(math, daysabs, .fittedlwrpred, .fitteduprpred)
Let us now use the augment
method to produce a plot of the data, the fitted curve and the prediction intervals with the default significance level:
fit_data <- augment(fit_math) %>% dplyr::select(math, daysabs) %>% dplyr::rename("Math Score" = math, "Days Abscent" = daysabs) new_math <- seq(0, 100, by=0.25) fit_pred <- augment(fit_math, newdata = data.frame(math = new_math), pred_int = TRUE, nsim_pred = 10, nsim_pred_y = 10) %>% dplyr::rename("Math Score" = math) %>% mutate("Days Abscent" = 0) ggplot(fit_data, aes(x = `Math Score`, y = `Days Abscent`)) + geom_point() + geom_function(fun = function(x){exp(fit_math$coefficients$mean[1] + fit_math$coefficients$mean[2]*x)}, colour = "blue") + geom_ribbon(data = fit_pred, aes(ymin = .fittedlwrpred, ymax = .fitteduprpred), fill = "grey70", alpha = 0.7)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.