knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(BayesPD)
In Bayesian Analysis, we sometimes need to compare the output of the beta distributions, so this function simplifies the process. All you need to do is supplied the sample size, and parameters for the beta distribution, and it will return the probability that the sample of one distribution is smaller than the other.
There are two groups of parameters required, $beta(a_1,b_1)$ and $beta(a_2,b_2)$. Only one of them can be vectors of length bigger than 1.
beta_sample_comparison(sample_size = 9000, beta_a1 = 50:60, beta_b1 = 40:50, beta_a2 = 31, beta_b2 = 21, plot = TRUE)
The example shows that we have ten pairs of $beta(a_1,b_1)$, and we want to know how the output of $beta(a_1,b_1)$ compare to the output of $beta(a_2,b_2)$ with $beta(a_1,b_1)$ changing.
Given the data: number of children of men in their $30$s with and without bachelor's degrees, respectively.
Let $\theta_A$ and $\theta_B$ be the average of each group. Using a Poisson sampling model, a $gamma(2, 1)$ prior for each $\theta$ and the data, obtain $5000$ samples of $\tilde{Y}_A$ and $\tilde{Y}_B$ from the posterior predictive distribution of the two samples. We have the Monte Carlo approximations to these two posterior predictive distributions in the first two pictures.
out <- gamma_poisson_ppd(sample_size = 5000, gamma_a1 = 2, gamma_b1 = 1, y1 = menchild30bach, gamma_a2 = 2, gamma_b2 = 1, y2 = menchild30nobach, confidence_interval = c(0.025, 0.975), using_MCMC = TRUE, poisson_fitting_mean = 1.4)
The black lines in the third and fourth figures are the empirical distribution and the red lines are the distribution of Poisson with mean 1.4. As we can see, the Poisson model with mean 1.4 is not a proper model.
For each of the 5000 $\theta_B$ values we sampled, sample $218$ Poisson random variables and count the number of $0$s and the number of $1$s in each of the $5000$ simulated datasets. Now we have two sequences of length $5000$ each, one sequence counting the number of people having zero children for each of the $5000$ posterior predictive datasets, the other counting the number of people with one child. The fifth figure plots the two sequences against one another (one on the x-axis, one on the y-axis). The red point marks how many people in the observed dataset had zero children and one child.
The 95% credible interval for $\theta_B-\theta_A$ is
out$theta2_minus_theta1_quantile
And the 95% credible interval for $\tilde{Y}_B-\tilde{Y}_A$ is
out$y_tidle2_minus_ytilde1_quantile
Given two Poisson models, the parameters have $gamma(237,20)$ prior distribution, and variable gamma prior distributions. Using MCMC sampling $5000$ samples, we can see the going of the probability that $\tilde{Y}_A>\tilde{Y}_B$ with the parameters for the second prior changing.
out <- posterior_predictive_sample_comparison(sample_size = 5000, gamma_a1 = 237, gamma_b1 = 20, gamma_a2 = 12*(1:100)+113, gamma_b2 = 13+(1:100), tildeya_smaller_than_tildeyb = FALSE, plot = TRUE)
The output of the function is just the probability.
Given data: school1, school2, school3 containing the amount of time students from three high schools spent on studying or homework during an exam period. Using the normal model with a conjugate prior distribution, in which ${\mu_0 = 5, \sigma_0^2 = 4, \kappa_0 = 1, \nu_0 = 2}$.
out <- normal_gamma_conjugate_family(sample_size = 10000, mu_0 = 5, sigma_0_sequare = 4, kappa_0 = 1, nu_0 = 2, y.1 = school1, y.2 = school2, y.3 = school3, confidence_interval = c(0.025, 0.975))
The posterior mean of $\theta$ are:
out$inference.1$theta_bar.1 out$inference.2$theta_bar.2 out$inference.3$theta_bar.3
The posterior mean of $\sigma$ are:
out$inference.1$sigma_bar.1 out$inference.2$sigma_bar.2 out$inference.3$sigma_bar.3
The 95% confidence interval for the mean $\theta$ are:
out$inference.1$confidence_interval_theta.1 out$inference.2$confidence_interval_theta.2 out$inference.3$confidence_interval_theta.3
The 95% confidence interval for the standard deviation $\sigma$ are:
out$inference.1$confidence_interval_sigma_bar.1 out$inference.2$confidence_interval_sigma_bar.2 out$inference.3$confidence_interval_sigma_bar.3
The posterior probability that $\theta_1<\theta_2<\theta_3$ (here 1, 2, 3 are permutations and can be exchanged freely) is:
out$theta_smaller.1.2.3
The posterior probability that $\tilde{Y}_1<\tilde{Y}_3<\tilde{Y}_2$ (all six permutations as well) is:
out$y_tilde_smaller.1.3.2
The posterior probability that $\theta_1$ is bigger than both $\theta_2$ and $\theta_3$ is:
out$theta_biggest.1
The posterior probability that $\tilde{Y}_1$ is bigger than both $\tilde{Y}_2$ and $\tilde{Y}_3$ is:
out$y_tilde_biggest.1
Thirty-two students in a science classroom were randomly assigned to one of two study methods, $A$ and $B$, so that $n_A = n_B = 16$ students were assigned to each method. After several weeks of study, students were examined on the course material with an exam designed to give an average score of $75$ with a standard deviation of $10$. The scores for the two groups are summarized by ${\bar{y}_A = 75.2, s_A = 7.3}$ and ${\bar{y}_B = 77.5, s_b = 8.1}$. Consider independent, conjugate normal prior distributions for each of $\theta_A$ and $\theta_B$, with $\mu_0 = 75$ and $\sigma_0^2 = 100$ for both groups. For each $(\kappa_0, \nu_0) \in {(1,1),(2,2),(4,4),(8,8),(16,16),(32,32)}$ (or more values), obtain $Pr(\theta_A < \theta_B\vert y_A, y_B)$ via Monte Carlo sampling. The probability as a function of $(\kappa_0 = \nu_0)$ are plotted using the function and the return is the probability array.
sensitivity_analysis(sample_size = 10000, y_bar.1 = 75.2, standard_deviation.1 = 7.3, y_bar.2 = 77.5, standard_deviation.2 = 8.1, mu_0 = 75, sigma_0_square = 100, n = 16, kappa_0 = c(1,2,4,8,16,32), nu_0 = c(1,2,4,8,16,32))
Given the data: number of children of men in their $30$s with and without bachelor's degrees, respectively.
We'll assume Poisson sampling models for the two groups and parameterize $\theta_A$ and $\theta_B$ as $\theta_A = \theta$, $\theta_B = \theta\times\gamma$. In this parameterization, $\gamma$ represents the relative rate $\theta_B/\theta_A$. Let $\theta\sim gamma(2, 1)$ and let $\gamma\sim gamma(a_\gamma, b_\gamma)$. Let $a_\gamma = b_\gamma \in {8, 16, 32, 64, 128}$. For each of these five values, run a Gibbs sampler of at least $5000$ iterations and we have the figure of $E[\theta_B-\theta_A\vert \boldsymbol{y}_A, \boldsymbol{y}_B]$.
two_prior_sensitivity(sample_size = 5000, gamma_theta_a = 2, gamma_theta_b = 1, y.1 = menchild30bach, y.2 = menchild30nobach, gamma_gamma_a = 2^(seq(3, 7)), gamma_gamma_b = 2^(seq(3, 7)))
Given data: A study about 25 married couples over a period of five years. One item of interest is the relationship between divorce rates and the various characteristics of the couples. The data divorce consists of two columns, where the first column is age differential, recorded as the man’s age minus the woman’s age. A binary variable Yi is described in terms of an explanatory variable xi via the following latent variable model:
$$ Z_i=\beta x_i+\epsilon_i\ Y_i=\delta_{(c,\infty)}(Z_i) $$
where $\beta$ and $c$ are unknown coefficients, $\epsilon_1\ldots,\epsilon_n\sim$ i.i.d. $normal(0, 1)$ and $\delta_{(c,\infty)}(z) = 1$ if $z > c$ and equals zero otherwise. Assuming $\beta\sim normal(0, 16)$, $c \sim normal(0, 16)$.
Implement a Gibbs sampling scheme that approximates the joint posterior distribution of $z$, $\beta$ and $c$. Run the Gibbs sampler long enough so that the effective sample sizes of all unknown parameters are greater than 1,000 (including the $Z_i$'s).
out <- gibbs_effective(sample_size = 30000, x = divorce$V1, y = divorce$V2, tau_beta_square = 16, tau_c_square = 16)
The autocorrelation function of beta is:
acf(out$beta)
The autocorrelation function of c is:
acf(out$c)
The autocorrelation function of z is:
acf(out$z)
A 95% posterior confidence interval for $\beta$ is:
out$beta_confidence_interval
The probability of $\beta>0$ given the data is:
out$probability
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.