In lgaborini/rstanBF: rstanBF computes the Bayes Factor for data in specified two-level hierarchical models

\newcommand{\hr}[1]{{#1}{\text{ref}}} \newcommand{\hq}[1]{{#1}{\text{quest}}} \newcommand{\hrq}[1]{{#1}_{\text{ref,quest}}} \renewcommand{\vec}[1]{\boldsymbol{#1}} \newcommand{\tot}[1]{\overline{#1}} \newcommand{\simiid}{\overset{iid}{~\sim~}} \newcommand{\BF}{\mathrm{BF}} \newcommand{\indep}{~\perp!!!\perp~} \newcommand{\EE}{\mathop{\mathbb{E}}} \newcommand{\Var}{\mathop{\mathrm{Var}}} \newcommand{\dirichlet}[1]{\text{Dirichlet}{(#1)}}

set.seed(123)

knitr::opts_chunk$set(
  collapse = TRUE,
  cache = TRUE,
  comment = "#>",
  fig.width = 7,
  cache.extra = knitr::rand_seed,
  autodep=TRUE
)

library(rstanBF)
library(rsamplestudy)
library(dplyr)

Dirichlet-Dirichlet model

This package implements the Dirichlet-Dirichlet model.

In this package, the model is referred by its short name: 'DirDir'.

Model

Consider Dirichlet samples $X_i$ from $m$ different sources. Each source is sampled $n$ times.
Dirichlet parameter are also assumed to be sampled from Dirichlet distributions.

$\vec{X}_{ij} \mid \vec{\theta_i} \sim \dirichlet{\vec{\theta}_i}$ iid $\forall j = 1, \ldots, n$ with $i = 1, \ldots, m$
$\vec{\theta}_i \mid \vec{\alpha} \sim \dirichlet{\vec{\alpha}} \quad \forall i = 1, \ldots, m$

We assume that $\vec{\alpha}$ is known.

Data generation

Let's generate some data using rsamplestudy:

n <- 50        # Number of items per source
m <- 5         # Number of sources
p <- 4         # Number of variables per item

# The generating hyperparameter: components not too small
alpha <- c(1, 0.7, 1.3, 1)   
list_pop <- fun_rdirichlet_population(n, m, p, alpha = alpha)

Data:

$p = r p$
$n = r n$ samples from $m = r m$ sources: in total, $r n*m$ samples.
$\vec{\alpha} = \left( r alpha \right)$

list_pop$df_pop %>% head(10)

Sources:

list_pop$df_sources %>% head(10)

Background data

We then need to simulate a situation where two sets of samples (reference and questioned data) are recovered.
These sets are compared, and the results evaluated using the Bayesian approach.

We want to evaluate two hypotheses:

$H_1$: the samples come from the same source
$H_2$: the samples come from different sources

One can generate these sets from the full data using rsamplestudy.

Let's fix the reference source:

k_ref <- 10
k_quest <- 10
source_ref <- sample(list_pop$df_pop$source, 1)

# The reference source
source_ref

list_samples <- make_dataset_splits(list_pop$df_pop, k_ref, k_quest, source_ref = source_ref)

# The true (unseen) questioned sources
list_samples$df_questioned$source

Hyperpriors

We want to use the background data to evaluate the hyperpriors, i.e., the between-source Dirichlet parameter.

Notice that we have r nrow(list_samples$df_background) samples, with r nrow(list_pop$df_pop) as the population size. This should ensure that our hyperparameter estimates are good enough.

Within-source parameters

One can start to estimate the parameters for each source, although this is not required by the rstanBF package.

According to the model, this should be an estimate of the Dirichlet parameters $\vec{\theta}_i$.

The package contains functions to compute ML estimators of Dirichlet parameters.
For now, they are hidden from the general namespace, but can still be called if needed.

Naive estimator:

df_sources_est <- rstanBF:::fun_estimate_Dirichlet_from_samples(list_samples$df_background, use = 'naive')
df_sources_est

ML estimator:

df_sources_est_ML <- rstanBF:::fun_estimate_Dirichlet_from_samples(list_samples$df_background, use = 'ML')
df_sources_est_ML

Compare with the real sources $\vec{\theta}_i$:

list_pop$df_sources

The ML estimator is much more accurate.

Between-source parameters

We suppose that the within-source ML estimates are samples from the same hyper-source, so the procedure can be repeated.

We use again the ML estimator:

df_alpha_ML_ML <- df_sources_est_ML %>% 
  dplyr::select(-source) %>% 
  rstanBF:::fun_estimate_Dirichlet_from_single_source(use = 'ML')

df_alpha_ML_ML

Compare with the real Dirichlet hyperparameter:

$\vec{\alpha} = \left( r list_pop$alpha \right)$

rstanBF provides stanBF_elicit_hyperpriors, a function to estimate the hyperparameters according to a specified model:

list_hyper <- stanBF_elicit_hyperpriors(list_samples$df_background, 'DirDir', mode_hyperparameter = 'ML')
list_hyper

Posterior sampling

Once all is set, one can run the HMC sampler:

list_data <- stanBF_prepare_rsamplestudy_data(list_pop, list_samples)

obj_StanBF <- compute_BF_Stan(data = list_data, model = 'DirDir', hyperpriors = list_hyper)

The results:

obj_StanBF

Posteriors:

plot_posteriors(obj_StanBF)

Prior and posterior predictive checks

Get the posterior predictive samples under all hypotheses:

df_posterior_pred <- posterior_pred(obj_StanBF)
head(df_posterior_pred, 10)

lgaborini/rstanBF documentation built on March 10, 2021, 1:12 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

lgaborini/rstanBF
rstanBF computes the Bayes Factor for data in specified two-level hierarchical models

In lgaborini/rstanBF: rstanBF computes the Bayes Factor for data in specified two-level hierarchical models

Dirichlet-Dirichlet model

Model

Data generation

Background data

Hyperpriors

Within-source parameters

Between-source parameters

Posterior sampling

Prior and posterior predictive checks

R Package Documentation

Browse R Packages

We want your feedback!

lgaborini/rstanBF rstanBF computes the Bayes Factor for data in specified two-level hierarchical models

In lgaborini/rstanBF: rstanBF computes the Bayes Factor for data in specified two-level hierarchical models

Dirichlet-Dirichlet model

Model

Data generation

Background data

Hyperpriors

Within-source parameters

Between-source parameters

Posterior sampling

Prior and posterior predictive checks

R Package Documentation

Browse R Packages

We want your feedback!

lgaborini/rstanBF
rstanBF computes the Bayes Factor for data in specified two-level hierarchical models