# crossvalidation: Loss Calculation by Cross Validation In ganluan123/FlagAE: Flag adverse events by Bayesian methods

## Description

Function here are to calculate the loss by cross validation for Bayesian hierarchical model (see also Hier) and Bayesian model with Ising prior (see also Ising). This can be used to select the best hyperparameters and to compare two models.

## Usage

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Lossfun(aedata, PI) kfdpar(adsl, adae, k) CVhier(AElist, n_burn, n_iter, thin, n_adapt, n_chain, alpha.gamma = 3, beta.gamma = 1, alpha.theta = 3, beta.theta = 1, mu.gamma.0.0 = 0, tau.gamma.0.0 = 0.1, alpha.gamma.0.0 = 3, beta.gamma.0.0 = 1, lambda.alpha = 0.1, lambda.beta = 0.1, mu.theta.0.0 = 0, tau.theta.0.0 = 0.1, alpha.theta.0.0 = 3, beta.theta.0.0 = 1) CVising(AElist, n_burn, n_iter, thin, alpha_ = 0.25, beta_ = 0.75, alpha.t = 0.25, beta.t = 0.75, alpha.c = 0.25, beta.c = 0.75, rho, theta)

## Arguments

 aedata output from function preprocess PI output from function Hiergetpi or Isinggetpi k interger, the number of folds used to split the dataset for cross validation n_burn number of burn in for Gibbs Sampling n_iter number of interation for Gibbs Sampling thin thin for Gibbs Samping, parameters are recorded every thin-th interation n_adapt integer, number of adaptations n_chain number of MCMC chains alpha_ numeric, is the prior for beta distribution, beta distribution for both treatment and control group, alpha parameter of beta distribution beta_ numeric, is the prior for beta distribution, beta distribution for both treatment and control group, beta parameter of beta distribution alpha.t numeric, is the prior for beta distribution, beta distribution for treatment group, alpha parameter of beta distribution beta.t numeric, is the prior for beta distribution, beta distribution for treatment group, beta parameter of beta distribution alpha.c numeric, is the prior for beta distribution, beta distribution for control group, alpha parameter of beta distribution beta.c numeric, is the prior for beta distribution, beta distribution for control group, beta parameter of beta distribution rho either a number or numeric vector with length equals to the number of rows of data frame aedata. If it is a single number, then all adverse events use the same hyperparameter of rho. If it is a numeric vector, then each AE has its own hyperparameter of rho, and the sequence of rho value for each AE should be the same as the sequence of AE in aedata (AE in aedata should be ordered by b and j). theta numeric, rho and theta are parameters for Ising prior

## Details

The loss is calcuated by:

√{∑_{bj} [(Y_{bj}-N_t*t_{bj})^2]}/N_t + √{∑_{bj} [(X_{bj}-N_c*c_{bj})^2]}/N_c

Here b=1,..., B and j=1, ... , k_b, Y_bj and X_bj are the number of subjects with an AE with PT j under SOC b in treatment and control groups. N_t and N_c are the number of subjects in treatment and control groups, respectively. t_bj and c_bj are the model fitted incidence of an AE with PT j under SOC b in treatment and control groups. This formular gives the loss for one interaction/sample, the final loss is the average of loss from all of the interactions/samples.

The loss is calcuated in following way: first the subjects original AE dataset (output of preprocess) is randomly evenly divided k independent subparts. For each subpart, use this subpart as the testing dataset and use the rest of the whole dataset as the training dataset. Model is trained with the training dataset and then loss is calculated for the testing dataset and training dataset. Repeat this for each subpart and take the average of the testing loss and training loss from each subpart as the final loss.

Lossfun takes the AE dataset and fitted incidence as parameters and calculate the loss based on the loss function above.

kfdpar first calls function preprocess to process the data and produce a temporary dataset and also calls function preprocess to process the data to get the whole AE dataset. Then this temporary dataset will be randomly divided into k equal subparts. For each subpart, use this subpart as the testing dataset and use the rest of the whole dataset as the training dataset.This function will generate a list with k elements with each element is a also a list a list contains two elements, named traindf and testdf. "traindf" is used to train the model and testdf is usesd to calcualte the loss. The output is going to be used for further crossvalidation to calculate loss.

CVhier calculates the loss for Bayesian Hierarchical model.

CVising calculates the los for Bayesian model with Ising prior.

## Value

Lossfun returns the loss for dataset aedata based on the fitted incidence PI.
kfdpar returns a list with k elements with each element is a also a list, that contains two elements, named traindf and testdf.
CVhier returns the final training and testing loss for Bayesian hierarchical model.
CVIsing returns the final training and testing loss for Bayesian model with Ising model.