Conclusion

Bayesian model averaging is widely accepted as a solid theoretical foundation for addressing model uncertainty; however, it involves the challenge of the elicitation of priors for all parameters of each model and the probability of each model. When prior information is unavailable or minuscule with respect to the information provided by the data, default priors can be used to characterize prior probability distributions of model parameters. Still, BMA results may be sensitive to prior specification, and the selection of default prior.

In this paper, the predictive performance of competing default priors was compared on the basis of mean squared error (MSE) on hold-out samples. Predictions were rendered for four model estimators: (1) the BMA model, (2) the best predictive model (BPM), (3) the highest probability model (HPM), and (4) the median probability model (MPM). The movie dataset was randomly split into a training set,(80% of observations) which was used to train the model, and a test set, (20% of observations) which was used to assess the quality of the resulting predictive distributions. The mean MSE was computed for each prior and estimator, for a total of 36 models. This analysis was repeated r trials times for r trials different random splits. The prediction results and parameter estimates were evaluated for the top ten performing models.

Overall, there was no statistically significant difference in performance among nine of the top ten models. Moreover, six of the top ten models had nearly identical parameter estimates. Critics score, the number of IMDb votes (log), the feature film indicator variable, the drama genre indicator variable, and year of theatrical release appeared in each of the top 10 models. The most influential variables in eight out of the top ten models were:
Feature film indicator
Drama genre indicator
IMDb votes
Year of theatrical release

Best Picture nomination variable appeared in two models (g-prior MPM, AIC MPM) and the Best Actor indicator had a slightly negative influence in the AIC (MPM) model.

In short, the selection of default prior was not decisive for inference nor prediction on this dataset. Future studies should be conducted on a range of datasets with different parameter sets to ascertain the degree to which the impact of default prior specification is data dependent.




DataScienceSalon/Bayesian-Regression documentation built on May 29, 2019, 12:06 a.m.