\newcommand{\simplex}[1]{\text{Simplex}{(#1)}} \newcommand{\dirichlet}[1]{\text{Dirichlet}{(#1)}} \renewcommand{\vec}[1]{\boldsymbol{#1}} \newcommand{\EE}{\mathop{\mathbb{E}}} \newcommand{\Var}{\mathop{\mathrm{Var}}}

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(purrr)
library(rsamplestudy)
source('scatter_matrix_simplex.R')

A study is the set of parameters in a model, data generated from those parameters (population), and three sets of samples: the reference/questioned/background items.

This package implements the generation of selected studies.
This vignette describes the Dirichlet-Dirichlet model.

The model

Consider Dirichlet samples $X_i$ from $m$ different sources. Each source is sampled $n$ times:

We assume that $\vec{\alpha}$ is known.

Population generation

The population can be generated using fun_rdirichlet_population:

# Population parameters:
# Number of sources
n <- 10
# Number of items per source
m <- 20
# Number of observations per item
p <- 4

list_pop <- fun_rdirichlet_population(n, m, p)

The output contains:

Notice that the hyperparameter is sampled, too (but it can be fixed).

head(list_pop$df_pop)
head(list_pop$df_sources)

Hyperparameters

We assume that the Dirichlet hyperparameter (the level farther from the data) comes from the Uniform distribution on the (p-1)-Simplex.
In other words, we will sample the Dirichlet hyperparameter from the $\dirichlet{\vec{1}}$ distribution.

The shortcut function the package is fun_rdirichlet_hyperparameter:

df_diri <- purrr::map_dfr(1:300, ~ fun_rdirichlet_hyperparameter(3))
scatter_matrix_simplex(df_diri)

Partitioning

Once the population is generated, the reference/questioned/background samples must be extracted.
This is generically done using make_dataset_splits:

k_ref <- 10
k_quest <- 5

list_samples <- make_dataset_splits(list_pop$df_pop, k_ref, k_quest)
names(list_samples)
head(list_samples$df_reference)
head(list_samples$df_questioned)
head(list_samples$df_background)

Source parameters

The chosen sources can be fixed.

See the documentation for make_dataset_splits.



lgaborini/rsamplestudy documentation built on March 6, 2021, 3:18 p.m.