calibratedva: Performs Gibbs sampling for calibration
In jfiksel/CalibratedVA: CalibratedVA

calibratedva

R Documentation

Performs Gibbs sampling for calibration

Description

Takes in estimated causes (or cause probabilities) for both a representative set of deaths without labels, and an unrepresentative set of deaths with labels, and estimates the calibrated CSMF

Usage

calibratedva(
  va_unlabeled,
  va_labeled = NULL,
  gold_standard = NULL,
  causes,
  method = c("mshrink", "pshrink"),
  nchains = 3,
  ndraws = 10000,
  burnin = 1000,
  thin = 1,
  pseudo_samplesize = 100,
  alpha = 5,
  beta = 0.5,
  lambda = 1,
  delta = 1,
  epsilon = 0.001,
  tau = 0.5,
  which.multimodal = "all",
  which.rhat = "all",
  print.chains = FALSE,
  init.seed = 123
)

Arguments

`va_unlabeled`	When using cause of death predictions from a single algorithm, this will be a matrix, where each row gives the predicted cause of death probabilities for an individual death, for individuals without cause of death labels. Each column represents a cause. If using the top cause, one entry in each row should be 1, while the rest should be 0. When using predictions from multiple algorithms for the ensemble approach, this should be a list of matrices with algorithm predictions for the same individuals, where each entry in the list are predictions from a given algorithm. See examples for more information
`va_labeled`	A matrix or list in the same format as va_unlabeled, but for individuals with labeled causes of death. If there are no individuals with labeled causes, leave as NULL
`gold_standard`	A matrix where each row represents either the true cause for an individual with a labeled cause of death (i.e. if the label for individual i is cause j, then gold_standard[i,j] will be 1, and the other entries of that row will be 0), or the probabilities that each individual died of a certain cause. The rows of `G_L` should correspond to the rows of `A_L` (or the rows of each element of `A_L` if it is a list)
`causes`	A character vector with the names of the causes. These should correspond to the columns of `A_U`, `A_L`, and `G_L`
`method`	One of either "mshrink" (default) for M-shrinkage or "pshrink" for p-shrinkage
`nchains`	The number of chains. Default is 3
`ndraws`	Number of draws in each chain. Default is 10,000
`burnin`	Number of burnin samples. Default is 1,000
`thin`	Thinning parameter. Default is no thinning
`pseudo_samplesize`	The number of pseudo samples (T) used for the Gibbs Sampler using rounding and coarsening. Default is 100.
`alpha`	A numeric value for the alpha in the prior of gamma when using M-shrinkage. Higher values (relative to beta) leads to more shrinkage. Default is 5. If using the ensemble model, a vector of length K can be used (where K is the number of algorithms).
`beta`	A numeric value for the beta in the prior of gamma when using M-shrinkage. Default is .5.
`lambda`	A numeric value for the lambda in the prior of p for p-shrinkage. Higher values leads to more shrinkage. Default is 1. #' @param delta A numeric value for the delta in the prior of p. Only used for M-shrinkage sampling.
`epsilon`	A numeric value for the epsilon in the prior of M. Default is .001.
`which.multimodal`	A character specifying whether both p and M (which.multimodal = "all") should be evaluated for multimodality, or just p (which.multimodal = "p")
`which.rhat`	A character specifying whether both p and M (which.rhat = "all") should be evaluated for convergence, or just p (which.rhat = "p")
`print.chains`	A logical scalar which says whether or not you want the progress of the sampling printed to the screen. Default is FALSE
`init.seed`	The initial seed for sampling. Default is 123.
`tau.vec`	A numeric vector for the log standard deviation for the sampling distributions of the gammas. Only used for M-shrinkage sampling.

Value

A list with the following components.

samples: A mcmc.list object containing the posterior samples for p, M, and gamma (if using M-shrinkage)
A_U: The value of va_unlabeled using for the posterior samples
A_L: The value of va_labeled using for the posterior samples
G_L: The value of gold_standard using for the posterior samples
method: The method used for shrinkage (either mshrink or pshrink)
waic: The estimated WAIC for the calibrated posterior
waic_uncalib: The estimated WAIC for the uncalibrated posterior
multimodal: Either TRUE or FALSE, indicating whether or not the posterior samples for p (and potentially M) are multimodal
rhat_max: The maximum rhat for p (and potentially M), which can be used for evaluating convergence
alpha: The value(s) of alpha used (if method = "mshrink")
beta: The value of beta used (if method = "mshrink")
lambda: The value of lambda used (if method = "pshrink")