# R/chi_square_goodness_of_fit.R In BayesianFROC: FROC Analysis by Bayesian Approaches

#### Documented in chi_square_at_replicated_data_and_MCMC_samples_MRMCchi_square_goodness_of_fitchi_square_goodness_of_fit_from_input_all_paramchi_square_goodness_of_fit_from_input_all_param_MRMCChi_square_goodness_of_fit_in_case_of_MRMC_Posterior_Mean

#'@title   \emph{\strong{Chi square goodness of
#' fit statistics}} at each MCMC sample w.r.t. a given dataset.
#'
#'
#'@description
#' Calculates a vector, consisting of \emph{\strong{the Goodness of Fit (Chi Square)}}
#' for a given dataset \eqn{D} and
#'  each posterior MCMC samples \eqn{\theta_i=\theta_i(D), i=1,2,3,....}, namely,
#'
#'
#'    \deqn{ \chi^2 (D|\theta_i)}
#'
#'
#'
#' for \eqn{i=1,2,3,....} and thus its dimension is the number of MCMC iterations..
#'
# here ----
#'Note that
#'  In MRMC cases, it is defined as follows.
#'
# \deqn{\chi^2(D|\theta) := \sum_{r=1}^R \sum_{m=1}^M \sum_{c=1}^C \biggr( \frac{[ H_{c,m,r}-N_L\times p_{c,m,r}]^2}{N_L\times p_{c,m,r}}+\frac{[F_{c,m,r}-(\lambda _{c} -\lambda _{c+1} )\times N_{L}]^2}{(\lambda_{c} -\lambda_{c+1} )\times N_{L} }\biggr).}
#' \deqn{\chi^2(D|\theta) := \sum_{r=1}^R \sum_{m=1}^M \sum_{c=1}^C \biggr( \frac{[ H_{c,m,r}-N_L\times p_{c,m,r}(\theta)]^2}{N_L\times p_{c,m,r}(\theta)}+\frac{[F_{c,m,r}-(\lambda _{c} -\lambda _{c+1} )\times N_{L}]^2}{(\lambda_{c}(\theta) -\lambda_{c+1}(\theta) )\times N_{L} }\biggr).}
#'where  a dataset \eqn{D} consists of the
#'pairs of the number of False Positives and the number of True
#' Positives  \eqn{ (F_{c,m,r}, H_{c,m,r}) }
#' together with the number of lesions \eqn{N_L}
#' and the number of images \eqn{N_I} and
#'  \eqn{\theta} denotes the model parameter.
#'
#'
#'
#'
#'
#'@details To calculate the chi square (goodness of fit) \eqn{\chi^2 (y|\theta)}
#' test statistics, the two variables
#' are required; one is an observed dataset \eqn{y}
#'  and the other is an estimated parameter \eqn{\theta}.
#'In the classical chi square values,
#' MLE(maximal likelihood estimator) is used
#'for an estimated parameter \eqn{\theta} in \eqn{\chi^2 (y|\theta)}.
#' However, in the Bayesian context,
#'the parameter is not deterministic and
#'we consider it is a random variable such as
#' samples from the posterior distribution.
#' And such samples are obtained in the Hamiltonian
#' Monte Carlo Simulation.
#' Thus we can calculate chi square values for each MCMC sample.
#'@return Chi squares for each MCMC sample.
#'\deqn{\chi^2 = \chi^2 (D|\theta_i),i=1,2,...,N}
#' So, the return values is a vector of length \eqn{N} which denotes
#'  the number of MCMC iterations
#'  except the warming up period.
#'  Of course if MCMC is not only one chain,
#'   then all samples of chains are used to calculate the chi square.
#'
#'   In the sequel, we use the notations
#'
#'    for a prior \eqn{\pi(\theta)},
#'
#'    posterior \eqn{\pi(\theta|D)},
#'
#'      likelihood \eqn{f(D|\theta)},
#'
#'      parameter \eqn{\theta},
#'
#'    datasets \eqn{D}, for example, we can write as follows;
#'
#'  \deqn{ \pi(\theta|D) \propto f(D|\theta) \pi(\theta).}
#'
#'
#'
#'  Let us denote the \strong{posterior MCMC samples} of
#'  size  \eqn{N} for a given data-set \eqn{D} by
#'
#'    \deqn{\theta_1, \theta_2, \theta_3,...,\theta_N}
#'
#'
#'    which are drawn from
#'    posterior \eqn{\pi(\theta|D)} of given data \eqn{D}.
#'
#'
#'Recall that the chi square goodness of fit statistics \eqn{\chi}
#'depends on the model parameter \eqn{\theta} and data \eqn{D},
#' namely,
#'
#'\deqn{\chi^2 = \chi^2 (D|\theta)}.
#'
#'The function calculates a vector
#'of length \eqn{N} whose components is given by:
#'
#'
#'
#'\deqn{\chi^2 (D|\theta_1), \chi^2 (D|\theta_2), \chi^2 (D|\theta_3),...,\chi^2 (D|\theta_N),}
#'
#'So, the return value is a vector of size  \eqn{N}.
#'
#'  As an application of this return value  \eqn{(\chi^2(D|\theta_i);i=1,...,N)},
#'   we  can calculate
#'    the posterior mean of \eqn{\chi = \chi (D|\theta)},
#'     namely, we get
#'
#'\deqn{ \chi^2 (D) =\int \chi^2 (D|\theta)  \pi(\theta|D)     d\theta.}
#'
#'
#' as its Monte Carlo integral
#'
#'
#'
#'       \deqn{\frac{1}{N} \sum _{i=1} ^N \chi^2(D|\theta_i),}
#'
#'
#' In my model, almost all example,
#'  result of calculation shows that
#'
#'
#'\deqn{  \int \chi^2 (D|\theta)  \pi(\theta|D)     d\theta > \chi^2 (D| \int \theta   \pi(\theta|D)     d\theta) }
#'
#'
#'The above inequality is true for all \eqn{D}?? I conjecture it.
#'
#' Revised 2019 August 18
#' Revised 2019 Sept. 1
#' Revised 2019 Nov 28

#'
#'
#'  Our data is \strong{2C categories}, that is,
#'
#'
#'  the number of hits        :h, h, h,...,h[C] and
#'
#'  the number of false alarms: f,f, f,...,f[C].
#'
#'
#'    Our model has \strong{C+2 parameters}, that is,
#'
#'    the thresholds of the bi normal assumption z,z,z,...,z[C] and
#'
#'    the mean and standard deviation of the signal distribution.
#'
#'
#'  So, the degree of freedom of this statistics is calculated by
#'
#'
#'      No. of categories  -  No. of parameters  - 1 = 2C-(C+2)-1 =C -3.
#'
#'This differ from Chakraborty's  result C-2. Why ? ... In Bayesian,  the degree of freedom is redandunt notion.
#'
#'
#'
#'
#'
#'
#'
#'
#'
#'
#'
#'
#'
#'@param f A vector of positive integers,
#'representing the number of false alarms.
#' This variable was made in order to
#'  substitute the false alarms data drawn
#'  from the posterior predictive distributions.
#'In famous Gelman's book, he explain
#'how to use the test statistics in the
#'Bayesian context. In this context I need
#' to substitute the replication data from
#' the posterior predictive distributions.

#'@param h  A vector of positive integers,
#' representing the number of hits.
#' This variable was made in order to
#'  substitute the hits data drawn
#'  from the posterior predictive distributions.
#'In famous Gelman's book, he explain how
#'to use the test statistics in the Bayesian context.
#' In this context I need to substitute the replication
#'  data from the posterior predictive distributions.

#'@inheritParams fit_Bayesian_FROC
#'@inheritParams DrawCurves
#'@examples
#'
#' \dontrun{

# ####1#### ####2#### ####3#### ####4#### ####5#### ####6#### ####7#### ####8#### ####9####
#'#========================================================================================
#'#                Synthesize the MCMC samples from a dataset.
#'#========================================================================================
#'
#'        fit <- fit_Bayesian_FROC(BayesianFROC::dataList.Chakra.1,
#'                            ite = 1111,
#'                            summary =FALSE,
#'                            cha = 2)
#'
#'#========================================================================================
#'#   The chi square discrepancies are calculated by the following code
#'#========================================================================================
#'
#'          Chi.Square.for.each.MCMC.samples   <-   chi_square_goodness_of_fit(fit)
#'
#'
#'
#'
#'
#'#========================================================================================
#'# With Warning
#'#========================================================================================
#'
#'          chi_square_goodness_of_fit(fit)
#'
#'#========================================================================================
#'# Without warning
#'#========================================================================================
#'
#'           chi_square_goodness_of_fit(fit,
#'                                      h=fit@dataList$h, #' f=fit@dataList$f)
#'
#'
#'
#'
#'
#'
#'#========================================================================================
#'#  Get posterior mean of the chi square discrepancy.
#'#========================================================================================
#'
#'
#'                     m<-   mean(Chi.Square.for.each.MCMC.samples)
#'
#'
#'
#'#========================================================================================
#'# The author read at 2019 Sept. 1, it helps him. Thanks me!!
#'#
#'# Calculate the p-value for the posterior mean of the chi square discrepancy.
#'#========================================================================================
#'
#'                                  stats::pchisq(m,df=1)
#'
#'#========================================================================================
#'# Difference between chi sq. at EAP and EAP of chi sq.
#'#========================================================================================
#'
#'
#'    mean( fit@chisquare - chi_square_goodness_of_fit(fit))
#'
#'
#'}# dottest

#'
#' @export

chi_square_goodness_of_fit <- function(StanS4class,
dig=3,
h=StanS4class@dataList$h, f=StanS4class@dataList$f,
summary =FALSE

){
C <-StanS4class@dataList$C if (StanS4class@studyDesign=="MRMC") { return(message("Now, MRMC not be allowed")) } #This hits and false positives should be changed to the simultaion # from the posterior predictive distributions if (missing(h)) { if(summary) message("Since hits are missing, we use the data which is used to estimates the model parameter to run this function") h <-StanS4class@dataList$h
}

if (missing(f)==TRUE){
if(summary)  message("Since false alarms are missing, we use the data which is used to estimates the model parameter to run this function")
f <-StanS4class@dataList$f } NL <-StanS4class@dataList$NL
NI <-StanS4class@dataList$NI ModifiedPoisson <-(StanS4class@studyDesign =="srsc.per.lesion") if (ModifiedPoisson) NX <- NL if (!ModifiedPoisson) NX <- NI fit <-methods::as(StanS4class, "stanfit") MCMC <- length( rstan::get_divergent_iterations(fit) ) # = cha*(ite-war) # MCMC=(ite-war)*cha chisquare <- vector() #--------- chi ^2 -----------Start p<-rstan::extract(fit)$p
lambda<-rstan::extract(fit)$l ss <- array(0,dim = c(MCMC,C)) tt <- array(0,dim = c(MCMC,C)) for(cd in 1:C){ ss[,cd]<-(h[C+1-cd]-NL*p[,cd])^2/(NL*p[,cd]) if(!cd==C)tt[,cd]<-(f[C+1-cd]-NX*(lambda[,cd]-lambda[,cd+1]))^2/(NX*(lambda[,cd]-lambda[,cd+1])) if(cd==C) tt[,cd]<-(f[C+1-cd]-NX*(lambda[,cd]- 0))^2/(NX*(lambda[,cd]- 0)) } chi.Square.for.each.MCMC.samples <- apply(ss, 1,sum)+apply(ss,1, sum) chi.Square.for.each.MCMC.samples <- signif(chi.Square.for.each.MCMC.samples, digits = dig) return(chi.Square.for.each.MCMC.samples) } #'@title Calculates the Goodness of Fit (Chi Square) #'@description \strong{('; \eqn{\omega};')} #' The so-called \emph{chi square goodness of fit} is #' a function of data-set \eqn{y} #' and model parameter \eqn{\theta}, namely, \eqn{\chi(y|\theta)}. #' This function merely provides this. \strong{Detail.} But when the author reviews this today, I am surprised cuz #' this function depends on many variables and it will be hard to understand what it is. #' OK, I will enjoy to tell the audiences what the variables mean. #' First of all, what we should consider is only substitution of dataset \eqn{y} and #' model parameter \eqn{\theta} into \eqn{\chi(y|\theta)}. #' \eqn{y} is decomposed into \code{h,f,NI,NL} which mean the number of hits, false alarms, images and trials. #' \eqn{\theta} corresponds to \code{p, lambda}. #' Holy moly, I write this without any tips, lemonades and coffee! #' I love you. Today 2020 Oct 19, MCS symptoms is basically not bad, but, still aches in muscles, legs, why? #' for 3 years, too long to be patient. #' #' @details #' statistics for each MCMC sample with a fixed dataset. #' #' #' #' Our data is 2C categories, that is, #' #' #' the number of hits :h, h, h,...,h[C] and #' #' the number of false alarms: f,f, f,...,f[C]. #' #' #' Our model has C+2 parameters, that is, #' #' the thresholds of the bi normal assumption z,z,z,...,z[C] and #' #' the mean and standard deviation of the signal distribution. #' #' #' So, the degree of freedom of this statistics is calculated by #' #' #' 2C-(C+2)-1 =C -3. #' #'This differ from Chakraborty's result C-2. Why ? #' #' #' #' \strong{Remak on the verification of codes} #' To tell the truth, the author doubt that the calculation of ppp in this pkg is incorrect. #' But I cannot reveal where I am wrong. Or, I cannot exculde in 100% that my programming is correct. #' The result of ppp() is sometimes reasonable but sometimes it is against my cute intuition. #' Of curse, I am pretty cute, but why .... Uhnnn I am not sure wheter I am correct. #' So, ha. Today (2020 Oct 19), I checked the code, but it looked correct. #' #' #' #' #' #' #' #'@param f A vector of non-negative integers, indicating the number of false alarms. #' The reason why the author includes this variable is to substitute the false alarms from the posterior predictive distribution. #'In famous Gelman's book, he explain how to make test statistics in the Bayesian context, and it require the samples from posterior predictive distribution. #'So, in this variable author substitute the replication data from the posterior predictive distributions. #'@param h A vector of non-negative integers, indicating the number of hits. #'The reason why the author includes this variable is to substitute the false alarms from the posterior predictive distribution. #'In famous Gelman's book, we can access how to make test statistics in the Bayesian context, and it require the samples from posterior predictive distribution. #'So, using this variable author substitute the replication data from the posterior predictive distributions. #'@param p A vector of non-negative integers, indicating hit rate. A vector whose length is number of confidence levels. #'@param lambda A vector of non-negative integers, indicating False alarm rate. A vector whose length is number of confidence levels. #'@param NI An integer, representing Number of Images #'@param NL An integer, representing Number of Lesions #' @param is_print_each_ratings_wise A logical, whether result is printed on the R/R-studio console. #' #' #'@inheritParams fit_Bayesian_FROC #'@inheritParams DrawCurves #'@return A number! Not list nor data-frame nor vector! #' Only A number represent the chi square for your input data. #'@examples #'\dontrun{ # ####1#### ####2#### ####3#### ####4#### ####5#### ####6#### ####7#### ####8#### ####9#### #' #'# Makes a stanfit object (more precisely its inherited S4 class object) #' #' fit <- fit_Bayesian_FROC(BayesianFROC::dataList.Chakra.1, #' ite = 1111, #' summary =FALSE, #' cha = 2) #' #'# Calculates the chi square discrepancies (Goodness of Fit) #'# with the posterior mean as a parameter. #' #' #' NI <- fit@dataList$NI
#'   NL          <-  fit@dataList$NL #' f.observed <- fit@dataList$f
#'   h.observed  <-  fit@dataList$h #' C <- fit@dataList$C
#'
#'#      p <-  rstan::get_posterior_mean(fit, par=c("p"))
#'# lambda <- rstan::get_posterior_mean(fit, par=c("l"))
#' # Note that get_posterior_mean is not a number but a matrix when
#' # Chains is not 1.
#' # So, instead of it, we use
#' #
#'
#'   e     <- extract_EAP_CI(fit,"l",fit@dataList$C ) #' lambda <- e$l.EAP
#'
#'   e <- extract_EAP_CI(fit,"p",fit@dataList$C ) #' p <- e$p.EAP
#'
#'          Chi.Square <-   chi_square_goodness_of_fit_from_input_all_param(
#'
#'                           h   =   h.observed,
#'                           f   =   f.observed,
#'                           p   =   p,
#'                       lambda  =   lambda,
#'                           NL  =   NL,
#'                           NI  =   NI
#'                                )
#'
#'#  Get posterior mean of the chi square discrepancy.
#'
#'                     Chi.Square
#'
#' # Calculate the p-value for the posterior mean of the chi square discrepancy.
#'
#'                      stats::pchisq(Chi.Square,df=1)
#'
#' # Note that the use of pchisq is fucking in Bayesian context,
#' # so, the pretty cute author made a function to calculate p value in the Bayesian sense.
#' # It is named ppp().
#'
#'
#'
#'
#'
#'
#'}# dottest
#'
#' @export

chi_square_goodness_of_fit_from_input_all_param <- function(
# StanS4class,
h,
f,
p,
lambda,
NL ,
NI ,
ModifiedPoisson = FALSE,
dig = 3,
is_print_each_ratings_wise = FALSE
){

if (ModifiedPoisson == FALSE) NX <- NI
if (ModifiedPoisson == TRUE )  NX <- NL

C <-length(h)

chisquare <- vector()

ss <- array(0,dim = c(C))
tt <- array(0,dim = c(C))

for(cd in 1:C){
ss[cd]<-(h[C+1-cd]-NL*p[cd])^2/(NL*p[cd])

if(!cd==C)tt[cd]<-(f[C+1-cd]-NX*(lambda[cd]-lambda[cd+1]))^2/(NX*(lambda[cd]-lambda[cd+1]))
if(cd==C)  tt[cd]<-(f[C+1-cd]-NX*(lambda[cd]- 0))^2/(NX*(lambda[cd]- 0))
}
chi.Square.for.each.MCMC.samples <- sum(ss)+sum(tt)
chi.Square.for.each.MCMC.samples <- signif(chi.Square.for.each.MCMC.samples, digits = dig)

if(is_print_each_ratings_wise)  print.data.frame( data.frame(Hit_part = ss, False_Alarm_part = tt) ,row.names=FALSE)

return(chi.Square.for.each.MCMC.samples)
}

#' @title Chi square statistic (goodness of fit) in the case
#'  of MRMC at the pair of given data
#'   and each MCMC sample
#'
# @description ---------
#' @description
#'
#' Revised 2019 Oct 16.
#' Revised 2019 Nov 1.
#' Revised 2019 Nov 27.
#' Revised 2019 Dec 1.

#'
#'In the following, we explain what this function calculates.
#'
#' Let \eqn{\chi^2(y|\theta)} be a
#' \emph{chi square goodness of fit} statistic which is defined by
#'
#'
#'  ( Observed data - Expectation \eqn{)^2}/Exectation.
#'
#'
#'
#'
#'
#'
#'  In MRMC cases, it is defined as follows.
#'
#' \deqn{\chi^2(D|\theta) := \sum_{r=1}^R \sum_{m=1}^M \sum_{c=1}^C \biggr( \frac{[ H_{c,m,r}-N_L\times p_{c,m,r}(\theta)]^2}{N_L\times p_{c,m,r}(\theta)}+\frac{[F_{c,m,r}-(\lambda _{c} -\lambda _{c+1} )\times N_{L}]^2}{(\lambda_{c}(\theta) -\lambda_{c+1}(\theta) )\times N_{L} }\biggr).}
#'
#'where  a dataset \eqn{D} consists of the
#'pairs of the number of False Positives and the number of True
#' Positives  \eqn{ (F_{c,m,r}, H_{c,m,r}) }
#' together with the number of lesions \eqn{N_L}
#' and the number of images \eqn{N_I} and
#'  \eqn{\theta} denotes the model parameter.
#'
#'
#'
#'
#'
#'
#'
#'
#'
#'  Note that we can rewrite the chi square as follows.
#'
#'   \deqn{\chi^2(D|\theta) := \sum_{r=1}^R \sum_{m=1}^M \sum_{c=1}^C \biggr( \frac{[ H_{c,m,r}-  E_{\theta}[H_{c,m,r}] ]^2}{E_{\theta}[H_{c,m,r}]}+\frac{[F_{c,m,r}- E_{\theta}[F_{c,m,r}]   ]^2}{    E_{\theta}[F_{c,m,r}]   }\biggr).}

#'
#'So, the chi square has two terms.
#'
#'1) The first term is the difference of hit
#' and its expectation.
#'
#'2) The second term   is the differences of observed false alarms and its expectatioins.
#'
#'   In this function, we calculate each terms, separately.
#'    So, return values retain these two terms, separately.
#'
#'
#'
#' In this function, we calculate the following (I) and (II):
#'
#'
#'
#' \strong{ (I) A vector -------------------}
#'
#'
#'Let us denote a collection of
#' posterior MCMC samples for a given dataset \eqn{D} by
#'
#'\deqn{   \theta_1 ,   \theta_2 ,    \cdots , \theta_i,    \cdots ,    \theta_N,}
#' namely, each \eqn{\theta_i} is
#' synthesized from posterior
#' \eqn{\pi(\theta|D)}, \eqn{\theta_i \sim \pi(\theta|D).}

#'
#'
#' Substituing these MCMC samples into the above definition of the chi square,
#'  we obtain the following vector as a return value of this function.
#'
#'      \deqn{ \chi^2(D|\theta_1),}
#'      \deqn{ \chi^2(D|\theta_2),}
#'      \deqn{ \chi^2(D|\theta_3),}
#'      \deqn{          :         }
#'      \deqn{          :         }
#'      \deqn{ \chi^2(D|\theta_N).}
#'
#'
#

#'
#'  \strong{ (II) A mean of the above vector, namely, the posterior mean of the chi square over all MCMC samples -------------}
#'
#'
#'Using the set of chi squares  \eqn{(\chi^2(D|\theta_i);i=1,...,N)}
#'calculated for each posterior MCMC samples \eqn{\theta_i \sim \pi(\theta|D).}
#',
#'the function also calculates the posterior mean of the chi square statistic, namely,
#'
#'
#'       \deqn{\int   \chi^2(D|\theta)\pi(\theta|D) d\theta,}
#'
#'
#' by approximating it as
#'
#'       \deqn{\frac{1}{N} \sum _{i=1} ^N \chi^2(D|\theta_i),}
#'
#'where \eqn{\pi(\theta|D)} denotes the posterior probability density under the given data \eqn{D}.
#'
#'Do not confuse it with the following
#'
#'       \deqn{   \chi^2(D|\theta^*).}
#'
#' where \eqn{\theta^*} denotes the posterior estimates, i.e.,  \eqn{ \theta^* := \int \theta \pi(\theta|D) d\theta}.
# here -----

#'

# @details -----
#' @details This function is implemented by
#' vectorizations and further techinics.
#' When the author review this, I
#'  find my past work is great,...
#'  I forget that I made this.
#' But this function is great.
#'
#' Revised 2019 Nov 1
#'
#'@inheritParams fit_Bayesian_FROC
#'@inheritParams DrawCurves
#'
# @param --------
#'@param   dl_is_an_array_of_C_only_and_not_C_M_Q A Boolean, if \code{TRUE},
#'then false rate \code{lambda} simply
#'denoted by \code{l} in \R script ( \eqn{\lambda} )
#' is an vector \code{l[C]}. If false, then the false
#'  alarm rate is an array \code{l[C,M,Q]}.
#'
#'
#  @return ----
#' @return A list, calculated by each modality reader and cofidence level, and MCMC samples.
#'  A one the component of list contains \{ \eqn{\chi^2(Data|\theta_i)} ; i= 1,2,3,...n\}, where \eqn{n} is the number of MCMC iterations.
#'
#' Each component
#' of list isan array
#'  whose index indicats  \code{[MCMC, Confidence, Modality, Reader]  }.
#'
#' Each component of list is
#'   \code{an array whose index indicats [MCMC, C, M, Q]}.
#'
#' To be passed to the calculation of Posterior predictive p value,
#' I need the sum of return value,
#' that is, sum of C,M,Q and resulting quantities
#' construct a vetor whose length is a same as the number of MCMC iterations.
#' I love you. I need you. So, to calculate such quantites,
#'  the author .... will make a new function.
#'
#'
#'  Also, it retains the posterior mean of chi square statistic for
#'  an assumed occurrence of the data \eqn{D}:
#'
#'
#'    \deqn{\chi^2(Data)  = \int \chi^2(Data|\theta) \pi(\theta|D)d  \theta}

#'
#'
#'
#' @export Chi_square_goodness_of_fit_in_case_of_MRMC_Posterior_Mean
#'
#' @examples
#'
#'
#' \dontrun{
#'
#'#========================================================================================
#'#            1) Create a fitted model object for data named  dd
#'#========================================================================================
#'
#'
#'
#'
#'      fit <- fit_Bayesian_FROC(    ite = 1111, # Number of MCMC iterations
#'                                   cha = 1,
#'                              dataList = BayesianFROC::dd # This is a MRMC dataset.
#'                               )
#'
#'
#'#========================================================================================
#'#            2) Calculate a chi square and meta data
#'#========================================================================================
#'
#'
#'
#'             a <- Chi_square_goodness_of_fit_in_case_of_MRMC_Posterior_Mean(fit)
#'
#'
#'
#'#========================================================================================
#'#            3) Extract a chi square
#'#========================================================================================
#'
#'
#'
#'
#'
#'
#'                               chi.square  <- a$chi.square #' #' #' #' #'#======================================================================================== #'# A case of single reader is special in the programming perspective #'# 2020 Feb 24 #'#======================================================================================== #' #' #' f <- fit_Bayesian_FROC( ite = 1111, #' cha = 1, #' summary = TRUE, #' dataList = dddd, #' see = 123) #' #' Chi_square_goodness_of_fit_in_case_of_MRMC_Posterior_Mean(f) #' #' #' #'# Revised 2019 August 19 #'# 2019 Nov 1 #' #'}# dontrun #' #' #' Chi_square_goodness_of_fit_in_case_of_MRMC_Posterior_Mean <- function( StanS4class , summary=TRUE, dl_is_an_array_of_C_only_and_not_C_M_Q=TRUE ){ # if (StanS4class@dataList$Q==1) return(message("under construction"))

fit <-StanS4class
dataList <- StanS4class@dataList
NL <- dataList$NL NI <- dataList$NI
if (fit@ModifiedPoisson==TRUE) NX <- NL
if (fit@ModifiedPoisson==FALSE) NX <- NI

harray <- array_of_hit_and_false_alarms_from_vector(dataList = dataList)$harray farray <- array_of_hit_and_false_alarms_from_vector(dataList = dataList)$farray

ppp <- extract(fit)$ppp # Preserving array format of the declaration in stan file, but adding the first index as MCMC samples for (md in 1:dim(ppp)) {for (qd in 1:dim(ppp)){ ppp[,,md,qd] <- aperm(apply(aperm(ppp[,,md,qd]),2,rev))# <-- Very important Make a matrix such that B[i,j]=A[i,J-j] }} A <- ppp*NL B <- harray # browser() if (!StanS4class@dataList$Q==1) {C <- array(aperm(sapply(1:dim(A), function(i) A[i,,,] - B)), dim(A))
hit.term.array <- C^2/A

dl <- extract(fit)$dl # [MCMC, C] Preserving array format of the declaration in stan file, but adding the first index as MCMC samples if(dl_is_an_array_of_C_only_and_not_C_M_Q == FALSE){ for (md in 1:dim(ppp)) {for (qd in 1:dim(dl)){ dl[,,md,qd] <- aperm(apply(aperm(dl[,,md,qd]),2,rev))# <-- Very important Make a matrix such that B[i,j]=A[i,J-j] }} A <- dl*NL B <- farray C <- array(aperm(sapply(1:dim(A), function(i) A[i,,,] - B)), dim(A)) FalseAlarm.term.array <- C^2/A } if(dl_is_an_array_of_C_only_and_not_C_M_Q){ dl <- aperm(apply(aperm(dl),2,rev))# <-- Very important Make a matrix such that B[i,j]=A[i,J-j] A <- farray #[C,M,Q] B <- dl*NL #[MCMC, C] C <- array(NA, c(dim(B), dim(A))) for (h in 1 : dim(B)){ for(i in 1 : dim(A)){ C[h, i,, ] <- A[i,, ] - B[h, i] } } # C^2 FalseAlarm.term.array <- sweep(C^2, c(1,2), B,/) } }#!StanS4class@dataList$Q==1 2020 Feb 24

if (StanS4class@dataList$Q==1){ A <- A[,,,1];B <- B[,,1] #2020 Feb 24 C <- array(aperm(sapply(1:dim(A), function(i) A[i,,] - B)), dim(A)) #2020 Feb 24 hit.term.array <- C^2/A #2020 Feb 24 dl <- extract(fit)$dl # [MCMC, C]           Preserving array format of the declaration in stan file, but adding the first index as MCMC samples
if(dl_is_an_array_of_C_only_and_not_C_M_Q == FALSE){
for (md in 1:dim(ppp)) {for (qd in 1:dim(dl)){
dl[,,md,qd] <-  aperm(apply(aperm(dl[,,md,qd]),2,rev))# <--  Very important    Make a matrix such that B[i,j]=A[i,J-j]
}}

A <- dl*NL
B <- farray

C <- array(aperm(sapply(1:dim(A), function(i) A[i,,,] - B)), dim(A))

FalseAlarm.term.array <- C^2/A
}

if(dl_is_an_array_of_C_only_and_not_C_M_Q){
dl <-  aperm(apply(aperm(dl),2,rev))# <--  Very important    Make a matrix such that B[i,j]=A[i,J-j]

A <- farray #[C,M,Q]
B <- dl*NL  #[MCMC, C]

C <- array(NA, c(dim(B), dim(A)))

for (h in 1 : dim(B)){
for(i in 1 : dim(A)){
C[h, i,, ] <-  A[i,, ] - B[h, i]
}
}
# C^2

FalseAlarm.term.array <- sweep(C^2, c(1,2), B,/)
FalseAlarm.term.array <- FalseAlarm.term.array[,,,1] #2020 Feb 24
}

# browser()

}#  StanS4class@dataList$Q==1 2020 Feb 24 chi.square.array <- hit.term.array + FalseAlarm.term.array hit.term <- apply(hit.term.array, c(1), sum) FalseAlarm.term <- apply(FalseAlarm.term.array, c(1), sum) chi.square <- apply(chi.square.array, c(1), sum) df <-data.frame( chi.square.posterior.mean=mean(chi.square), hit.term.posterior.mean=mean(hit.term), FalseAlarm.term.posterior.mean=mean(FalseAlarm.term) ) if (summary==TRUE)print(knitr::kable(df, format = "pandoc")) if (summary==TRUE) message("\n* Posterior Mean of test statistics") invisible( list( chi.square=chi.square, hit.term=hit.term, FalseAlarm.term=FalseAlarm.term, df.chisquare.posterior.mean =df ) ) }#function # get_posterior_mean(fit,"ppp")[,"mean-all chains"] #' @title Chi square in the case of MRMC at a given dataset and a given parameter. #' @description Given parameter and data, the chi square is calculated. #'@inheritParams fit_Bayesian_FROC #'@inheritParams DrawCurves #' #' @return A list, contains \eqn{\chi^2(Data|\theta)}, #' where \eqn{Data} and \eqn{\theta} are specified by user. #' #' @param ppp An array of \code{[C,M,Q]}, representing hit rate, #' where \code{C,M,Q} denotes the number of confidences, modalities, #' readers, respectively. #' @param dl An vector of length \code{C M Q} representing #' false alarm rate, #' where \code{C,M,Q} denotes the number of #' confidences, modalities, readers, respectively. #' #' @export #' #' @examples #' #' #' \dontrun{ #'#======================================================================================== #'# 0) #'#======================================================================================== #' #'# Chi square depends on data and model parameter, thus what we have to do is: #'# prepare data and parameter #' #'# In the follwoing, we use data named ddd as an example to be fitted a model, #'# and use posterior mean estimates as model parameter. #'# To do so, we execute the following code #'# to run the HMC algorithm for the data named ddd #' #' #' #' fit <- fit_Bayesian_FROC( dataList = ddd, ite = 51 ) #' #' #'# In the resulting object named fit, the posterior samples are retained. #' #'#======================================================================================== #'# 1) hit rate and false alarm rate #'#======================================================================================== #' #' #' e <- extract_estimates_MRMC(fit); #' dl <- e$dl.EAP;
#'          ppp <- e$ppp.EAP; #' #' #'#======================================================================================== #'# 2) Calculates chi square using above hit rate and false alarm rate and data named ddd #'#======================================================================================== #' #' chi_square_goodness_of_fit_from_input_all_param_MRMC(ppp,dl,ddd) #' #'}# dontrun #' #' #' chi_square_goodness_of_fit_from_input_all_param_MRMC <- function( ppp, dl, dataList, summary=TRUE ){ # fit <-StanS4class # dataList <- StanS4class@dataList NL <- dataList$NL
NI <- dataList$NI m <- dataList$m
c <- dataList$c q <- dataList$q

h <- dataList$h f <- dataList$f
# h[n] ~ binomial(NL, ppp[c[n],m[n],q[n]]);
# ff[n] ~ poisson(l[c[n]]*NL);//Chakraborty's model
hit.term.vector <-vector()
FalseAlarm.term.vector <- vector()

#In case of a single reader and multple modaity case, an array of type, e.g., [3,1] is reduced to a vector of dimension 3 and it caused some error. So, the author fixed it.

if (length(dim(ppp))==2) {#2020 Feb 24
M <- dataList$M C <- dataList$C
Q <- dataList$Q #2020 Feb 24 ppp2<- array(NA,dim = c(C,M,Q))#2020 Feb 24 ppp2[,,1]<-ppp#2020 Feb 24 ppp <-ppp2#2020 Feb 24 } for (n in 1:length(h)) { hit.term.vector[n] <- (h[n]-ppp[c[n],m[n],q[n]]*NL)^2/(ppp[c[n],m[n],q[n]]*NL) } hit.term<-sum(hit.term.vector) a <- metadata_to_fit_MRMC(dataList) ff <- a$ff
for (n in 1:length(h)) {
FalseAlarm.term.vector[n] <-( f[n] - dl[c[n]]*NL)^2/(dl[c[n]]*NL)
}

FalseAlarm.term <- sum(FalseAlarm.term.vector)

chi.square <- hit.term + FalseAlarm.term

df <-data.frame(
chi.square=chi.square,
hit.term=hit.term,
FalseAlarm.term=FalseAlarm.term
)

if (summary==TRUE) print(knitr::kable(df, format = "pandoc"))
# browser()
return(
list(
chi.square=chi.square,
hit.term=hit.term,
FalseAlarm.term=FalseAlarm.term,
hit.term.vector=hit.term.vector,
FalseAlarm.term.vector=FalseAlarm.term.vector
)

)
}#function

# get_posterior_mean(fit,"ppp")[,"mean-all chains"]

#' @title chi square at replicated data drawn (only one time) from model with each MCMC samples.
#'@description
#' To pass the return  value to  the  calculator of the posterior predictive p value.
#'@inheritParams DrawCurves
#'@inheritParams fit_Bayesian_FROC
#'@param seed This is  used only in programming phase.
#'If seed is passed, then, in procedure indicator the seed is printed.
#'This parameter is only for package development.
#  @return -----
#' @return A list.
#'
#' From any given posterior MCMC samples
#'  \eqn{\theta_1,\theta_2,...,\theta_i,....,\theta_n}
#'   (provided by stanfitExtended object),
#' it calculates a return value as a vector
#'  of the form \eqn{\chi(y_i|\theta_i),i=1,2,....},
#'  where each dataset \eqn{y_i} is drawn
#'  from the corresponding
#'   likelihood \eqn{likelihood(.|\theta_i),i=1,2,...},
#'  namely,
#'
#'    \deqn{y_i \sim likelihood(.| \theta_i).}
#'
#' The return value also retains these \eqn{y_i, i=1,2,..}.
#'
#'
#'
#' Revised 2019 Dec. 2
#'
#'@details For a given dataset \eqn{D_0},
#' let us denote by \eqn{\pi(|D_0)} a posterior distribution
#' of the given data \eqn{D_0}.
#'
#' Then, we draw poterior samples.
#'
#'
#'    \deqn{\theta_1 \sim \pi(.|  D_0),}
#'    \deqn{\theta_2 \sim \pi(.|  D_0),}
#'    \deqn{\theta_3 \sim \pi(.|  D_0),}
#'    \deqn{....,}
#'    \deqn{\theta_n \sim \pi(.|  D_0).}
#'
#'
#'
#' We let \eqn{L(|\theta)}  be a likelihood function or probability law of data,
#' which is also denoted by \eqn{L(y|\theta)} for a given data \eqn{y}.
#' But, the specification of data \eqn{y} is somehow conversome,
#' thus, to denote the function
#'  sending each \eqn{y} into \eqn{L(y|\theta)},
#'   we use the notation  \eqn{L(|\theta)}.
#'
#' Now, we synthesize
#'  data-samples \eqn{(y_i;i=1,2,...,n)} in \strong{only one time drawing} from
#' the collection of likelihoods  \eqn{L(|\theta_1),L(|\theta_2),...,L(|\theta_n)}.
#'
#'
#'
#'    \deqn{y_1 \sim L(.| \theta_1),}
#'    \deqn{y_2 \sim L(.| \theta_2),}
#'    \deqn{y_3 \sim L(.| \theta_3),}
#'    \deqn{....,}
#'    \deqn{y_n \sim L(.| \theta_n).}
#'
#'
#'
#'
#'
#'    Altogether,
#'    using these pair of samples
#'    \eqn{(y_i, \theta_i), i= 1,2,...,n},
#'     we calculate the chi squares as the
#'    \strong{return value} of this function. That is,
#'
#'
#'    \deqn{\chi(y_1|\theta_1),}
#'    \deqn{\chi(y_2|\theta_2),}
#'    \deqn{\chi(y_3|\theta_3),}
#'    \deqn{....,}
#'    \deqn{\chi(y_n|\theta_n).}
#'
#'    \emph{ \strong{This is contained as a vector in the return value}},
#'
#'    so the return value is a vector whose length
#'    is the number of MCMC iterations
#'     except the burn-in period.
#'
#' Note that
#'  in MRMC cases, \deqn{\chi(y|\theta).} is defined as follows.
#'
#' \deqn{\chi^2(y|\theta) := \sum_{r=1}^R \sum_{m=1}^M \sum_{c=1}^C \biggr( \frac{[ H_{c,m,r}-N_L\times p_{c,m,r}(\theta)]^2}{N_L\times p_{c,m,r}(\theta)}+\frac{[F_{c,m,r}-(\lambda _{c} -\lambda _{c+1} )\times N_{L}]^2}{(\lambda_{c}(\theta) -\lambda_{c+1}(\theta) )\times N_{L} }\biggr).}
#'where  a dataset \eqn{y} consists of the
#'pairs of the number of False Positives and the number of True
#' Positives  \eqn{ (F_{c,m,r}, H_{c,m,r}) }
#' together with the number of lesions \eqn{N_L}
#' and the number of images \eqn{N_I} and
#'  \eqn{\theta} denotes the model parameter.
#'
#'
#'
#' \emph{ \strong{ Application of this return value to calculate the so-called \emph{Posterior Predictive P value.} } }
#here jul  ----

#'
#'  As will be demonstrated in the other function,
#' chaning seed, we can obtain
#'
#'
#'
#'
#'\deqn{y_{1,1},y_{1,2},y_{1,3},...,y_{1,j},....,y_{1,J} \sim L ( . |\theta_1),   }
#'\deqn{y_{2,1},y_{2,2},y_{2,3},...,y_{2,j},....,y_{2,J} \sim L ( . |\theta_2),  }
#'\deqn{y_{3,1},y_{3,2},y_{3,3},...,y_{3,j},....,y_{3,J} \sim L ( . |\theta_3),   }
#'\deqn{...,}
#'\deqn{y_{i,1},y_{i,2},y_{i,3},...,y_{i,j},....,y_{I,J} \sim L ( . |\theta_i),  }
#'\deqn{...,}
#'\deqn{y_{I,1},y_{I,2},y_{I,3},...,y_{I,j},....,y_{I,J} \sim L ( . |\theta_I).  }
#'
#' where \eqn{L ( . |\theta_i)} is a likelihood function for a model parameter \eqn{\theta_i}.
#' And thus, we calculate the chi square statistics.
#'
#'
#'\deqn{ \chi(y_{1,1}|\theta_1), \chi(y_{1,2}|\theta_1), \chi(y_{1,3}|\theta_1),..., \chi(y_{1,j}|\theta_1),...., \chi(y_{1,J}|\theta_1),}
#'\deqn{ \chi(y_{2,1}|\theta_2), \chi(y_{2,2}|\theta_2), \chi(y_{2,3}|\theta_2),..., \chi(y_{2,j}|\theta_2),...., \chi(y_{2,J}|\theta_2),}
#'\deqn{ \chi(y_{3,1}|\theta_3), \chi(y_{3,2}|\theta_3), \chi(y_{3,3}|\theta_3),..., \chi(y_{3,j}|\theta_3),...., \chi(y_{3,J}|\theta_3),}
#'\deqn{...,}
#'\deqn{ \chi(y_{i,1}|\theta_i), \chi(y_{i,2}|\theta_i), \chi(y_{i,3}|\theta_i),..., \chi(y_{i,j}|\theta_i),...., \chi(y_{I,J}|\theta_i),}
#'\deqn{...,}
#'\deqn{ \chi(y_{I,1}|\theta_I), \chi(y_{I,2}|\theta_I), \chi(y_{I,3}|\theta_I),..., \chi(y_{I,j}|\theta_I),...., \chi(y_{I,J}|\theta_I).}
#'
#'
#'
#'
#'
#'
#'whih are used when we calculate
#' the so-called \emph{Posterior Predictive P value} to test the
#' \emph{null hypothesis} that our model is fitted a data well.
#'
#'
#' Revised 2019 Sept. 8
#'
#'
#' Revised 2019 Dec. 2
#'
#' Revised 2020 March
#'
#'
#' Revised 2020 Jul

#'
#'
#'
#'
#'
#'
#' @inheritParams validation.dataset_srsc
#' @export
#'@examples
#'
#'
#' \dontrun{
#'
#'   fit <- fit_Bayesian_FROC( ite  = 1111,  dataList = ddd )
#'  a <- chi_square_at_replicated_data_and_MCMC_samples_MRMC(fit)
#'
#'  b<-a$List_of_dataList #' lapply(b, plot_FPF_and_TPF_from_a_dataset) #' #' #' #'} #' #' chi_square_at_replicated_data_and_MCMC_samples_MRMC <- function( StanS4class, summary=TRUE, seed = NA, serial.number=NA ) { fit <- StanS4class dataList <- fit@dataList NL <- fit@dataList$NL
NI       <- fit@dataList$NI e <- extract(fit) MCMC <- dim(e$mu)
List_of_dataList <- list()
chi.square       <- vector()

for (mcmc in 1:MCMC) {
mu <- e$mu[mcmc,,] v <- e$v[mcmc,,]
z  <- e$z[mcmc,] List_of_dataList[[mcmc]] <- create_dataList_MRMC( z.truth=z, #c(0.1,0.2,0.3,0.4,0.5), mu.truth = mu, #array of (M,Q), v.truth = v, #array of (M,Q), NI =NI, NL=NL, ModifiedPoisson =TRUE,#Since NI = NULL, if it is FALSE, then false alarms cannot be created 2019 August 24 seed =123, summary = FALSE ) # harray <- array_of_hit_and_false_alarms_from_vector(dataList = List_of_dataList[[mcmc]])$harray
# farray <- array_of_hit_and_false_alarms_from_vector(dataList =  List_of_dataList[[mcmc]])$farray ppp <- e$ppp[mcmc,,,]
dl <-   e$dl[mcmc,] if (summary==TRUE){ # Here is printed ----- if(is.na(seed)) message(mcmc,"/",MCMC) if(!is.na(seed)) message("seed=",seed,"/",serial.number, ", ",mcmc,"/",MCMC) } a<- chi_square_goodness_of_fit_from_input_all_param_MRMC( ppp, dl, dataList, summary=summary ) chi.square[mcmc] <- a$chi.square
# browser()
}

invisible(list(
List_of_dataList =List_of_dataList,
chi.square =chi.square

))

}


## Try the BayesianFROC package in your browser

Any scripts or data that you put into this service are public.

BayesianFROC documentation built on Jan. 13, 2021, 5:22 a.m.