R/chi_square_goodness_of_fit.R

Defines functions chi_square_at_replicated_data_and_MCMC_samples_MRMC chi_square_goodness_of_fit_from_input_all_param_MRMC Chi_square_goodness_of_fit_in_case_of_MRMC_Posterior_Mean chi_square_goodness_of_fit_from_input_all_param chi_square_goodness_of_fit

Documented in chi_square_at_replicated_data_and_MCMC_samples_MRMC chi_square_goodness_of_fit chi_square_goodness_of_fit_from_input_all_param chi_square_goodness_of_fit_from_input_all_param_MRMC Chi_square_goodness_of_fit_in_case_of_MRMC_Posterior_Mean

#'@title   \emph{\strong{Chi square goodness of
#' fit statistics}} at each MCMC sample w.r.t. a given dataset.
#'
#'
#'@description
#' Calculates a vector, consisting of \emph{\strong{the Goodness of Fit (Chi Square)}}
#' for a given dataset \eqn{D} and
#'  each posterior MCMC samples \eqn{\theta_i=\theta_i(D), i=1,2,3,....}, namely,
#'
#'
#'    \deqn{ \chi^2 (D|\theta_i)}
#'
#'
#'
#' for \eqn{i=1,2,3,....} and thus its dimension is the number of MCMC iterations..
#'
# here ----
#'Note that
#'  In MRMC cases, it is defined as follows.
#'
# \deqn{\chi^2(D|\theta) := \sum_{r=1}^R \sum_{m=1}^M \sum_{c=1}^C \biggr( \frac{[ H_{c,m,r}-N_L\times p_{c,m,r}]^2}{N_L\times p_{c,m,r}}+\frac{[F_{c,m,r}-(\lambda _{c} -\lambda _{c+1} )\times N_{L}]^2}{(\lambda_{c} -\lambda_{c+1} )\times N_{L} }\biggr).}
#' \deqn{\chi^2(D|\theta) := \sum_{r=1}^R \sum_{m=1}^M \sum_{c=1}^C \biggr( \frac{[ H_{c,m,r}-N_L\times p_{c,m,r}(\theta)]^2}{N_L\times p_{c,m,r}(\theta)}+\frac{[F_{c,m,r}-(\lambda _{c} -\lambda _{c+1} )\times N_{L}]^2}{(\lambda_{c}(\theta) -\lambda_{c+1}(\theta) )\times N_{L} }\biggr).}
#'where  a dataset \eqn{D} consists of the
#'pairs of the number of False Positives and the number of True
#' Positives  \eqn{ (F_{c,m,r}, H_{c,m,r}) }
#' together with the number of lesions \eqn{N_L}
#' and the number of images \eqn{N_I} and
#'  \eqn{\theta} denotes the model parameter.
#'
#'
#'
#'
#'
#'@details To calculate the chi square (goodness of fit) \eqn{\chi^2 (y|\theta)}
#' test statistics, the two variables
#' are required; one is an observed dataset \eqn{y}
#'  and the other is an estimated parameter \eqn{\theta}.
#'In the classical chi square values,
#' MLE(maximal likelihood estimator) is used
#'for an estimated parameter \eqn{\theta} in \eqn{\chi^2 (y|\theta)}.
#' However, in the Bayesian context,
#'the parameter is not deterministic and
#'we consider it is a random variable such as
#' samples from the posterior distribution.
#' And such samples are obtained in the Hamiltonian
#' Monte Carlo Simulation.
#' Thus we can calculate chi square values for each MCMC sample.
#'@return Chi squares for each MCMC sample.
#'\deqn{\chi^2 = \chi^2 (D|\theta_i),i=1,2,...,N}
#' So, the return values is a vector of length \eqn{N} which denotes
#'  the number of MCMC iterations
#'  except the warming up period.
#'  Of course if MCMC is not only one chain,
#'   then all samples of chains are used to calculate the chi square.
#'
#'   In the sequel, we use the notations
#'
#'    for a prior \eqn{\pi(\theta)},
#'
#'    posterior \eqn{\pi(\theta|D)},
#'
#'      likelihood \eqn{f(D|\theta)},
#'
#'      parameter \eqn{\theta},
#'
#'    datasets \eqn{D}, for example, we can write as follows;
#'
#'  \deqn{ \pi(\theta|D) \propto f(D|\theta) \pi(\theta).}
#'
#'
#'
#'  Let us denote the \strong{posterior MCMC samples} of
#'  size  \eqn{N} for a given data-set \eqn{D} by
#'
#'    \deqn{\theta_1, \theta_2, \theta_3,...,\theta_N}
#'
#'
#'    which are drawn from
#'    posterior \eqn{\pi(\theta|D)} of given data \eqn{D}.
#'
#'
#'Recall that the chi square goodness of fit statistics \eqn{\chi}
#'depends on the model parameter \eqn{\theta} and data \eqn{D},
#' namely,
#'
#'\deqn{\chi^2 = \chi^2 (D|\theta)}.
#'
#'The function calculates a vector
#'of length \eqn{N} whose components is given by:
#'
#'
#'
#'\deqn{\chi^2 (D|\theta_1), \chi^2 (D|\theta_2), \chi^2 (D|\theta_3),...,\chi^2 (D|\theta_N),}
#'
#'So, the return value is a vector of size  \eqn{N}.
#'
#'  As an application of this return value  \eqn{(\chi^2(D|\theta_i);i=1,...,N)},
#'   we  can calculate
#'    the posterior mean of \eqn{\chi = \chi (D|\theta)},
#'     namely, we get
#'
#'\deqn{ \chi^2 (D) =\int \chi^2 (D|\theta)  \pi(\theta|D)     d\theta.}
#'
#'
#' as its Monte Carlo integral
#'
#'
#'
#'       \deqn{\frac{1}{N} \sum _{i=1} ^N \chi^2(D|\theta_i),}
#'
#'
#' In my model, almost all example,
#'  result of calculation shows that
#'
#'
#'\deqn{  \int \chi^2 (D|\theta)  \pi(\theta|D)     d\theta > \chi^2 (D| \int \theta   \pi(\theta|D)     d\theta) }
#'
#'
#'The above inequality is true for all \eqn{D}?? I conjecture it.
#'
#' Revised 2019 August 18
#' Revised 2019 Sept. 1
#' Revised 2019 Nov 28

#'
#'
#'  Our data is \strong{2C categories}, that is,
#'
#'
#'  the number of hits        :h[1], h[2], h[3],...,h[C] and
#'
#'  the number of false alarms: f[1],f[2], f[3],...,f[C].
#'
#'
#'    Our model has \strong{C+2 parameters}, that is,
#'
#'    the thresholds of the bi normal assumption z[1],z[2],z[3],...,z[C] and
#'
#'    the mean and standard deviation of the signal distribution.
#'
#'
#'  So, the degree of freedom of this statistics is calculated by
#'
#'
#'      No. of categories  -  No. of parameters  - 1 = 2C-(C+2)-1 =C -3.
#'
#'This differ from Chakraborty's  result C-2. Why ? ... In Bayesian,  the degree of freedom is redandunt notion.
#'
#'
#'
#'
#'
#'
#'
#'
#'
#'
#'
#'
#'
#'@param f A vector of positive integers,
#'representing the number of false alarms.
#' This variable was made in order to
#'  substitute the false alarms data drawn
#'  from the posterior predictive distributions.
#'In famous Gelman's book, he explain
#'how to use the test statistics in the
#'Bayesian context. In this context I need
#' to substitute the replication data from
#' the posterior predictive distributions.

#'@param h  A vector of positive integers,
#' representing the number of hits.
#' This variable was made in order to
#'  substitute the hits data drawn
#'  from the posterior predictive distributions.
#'In famous Gelman's book, he explain how
#'to use the test statistics in the Bayesian context.
#' In this context I need to substitute the replication
#'  data from the posterior predictive distributions.

#'@inheritParams fit_Bayesian_FROC
#'@inheritParams DrawCurves
#'@examples
#'
#' \dontrun{

# ####1#### ####2#### ####3#### ####4#### ####5#### ####6#### ####7#### ####8#### ####9####
#'#========================================================================================
#'#                Synthesize the MCMC samples from a dataset.
#'#========================================================================================
#'
#'        fit <- fit_Bayesian_FROC(BayesianFROC::dataList.Chakra.1,
#'                            ite = 1111,
#'                            summary =FALSE,
#'                            cha = 2)
#'
#'#========================================================================================
#'#   The chi square discrepancies are calculated by the following code
#'#========================================================================================
#'
#'          Chi.Square.for.each.MCMC.samples   <-   chi_square_goodness_of_fit(fit)
#'
#'
#'
#'
#'
#'#========================================================================================
#'# With Warning
#'#========================================================================================
#'
#'          chi_square_goodness_of_fit(fit)
#'
#'#========================================================================================
#'# Without warning
#'#========================================================================================
#'
#'           chi_square_goodness_of_fit(fit,
#'                                      h=fit@dataList$h,
#'                                      f=fit@dataList$f)
#'
#'
#'
#'
#'
#'
#'#========================================================================================
#'#  Get posterior mean of the chi square discrepancy.
#'#========================================================================================
#'
#'
#'                     m<-   mean(Chi.Square.for.each.MCMC.samples)
#'
#'
#'
#'#========================================================================================
#'# The author read at 2019 Sept. 1, it helps him. Thanks me!!
#'#
#'# Calculate the p-value for the posterior mean of the chi square discrepancy.
#'#========================================================================================
#'
#'                                  stats::pchisq(m,df=1)
#'
#'#========================================================================================
#'# Difference between chi sq. at EAP and EAP of chi sq.
#'#========================================================================================
#'
#'
#'    mean( fit@chisquare - chi_square_goodness_of_fit(fit))
#'
#'
#'}# dottest

#'
#' @export

chi_square_goodness_of_fit <- function(StanS4class,
                                       dig=3,
                                       h=StanS4class@dataList$h,
                                       f=StanS4class@dataList$f,
                                       summary =FALSE

){
  C <-StanS4class@dataList$C

  if (StanS4class@studyDesign=="MRMC") {
    return(message("Now, MRMC not be allowed"))
  }

  #This hits and false positives should be changed to the simultaion
  # from the posterior predictive distributions
  if (missing(h))  {
   if(summary) message("Since hits are missing, we use the data which is used to estimates the model parameter to run this function")
    h <-StanS4class@dataList$h
  }

  if (missing(f)==TRUE){
    if(summary)  message("Since false alarms are missing, we use the data which is used to estimates the model parameter to run this function")
    f <-StanS4class@dataList$f
  }

  NL <-StanS4class@dataList$NL
  NI <-StanS4class@dataList$NI
  ModifiedPoisson <-(StanS4class@studyDesign =="srsc.per.lesion")
  if (ModifiedPoisson)  NX <- NL
  if (!ModifiedPoisson)  NX <- NI

  fit <-methods::as(StanS4class, "stanfit")

  MCMC <- length(
    rstan::get_divergent_iterations(fit)
  ) # = cha*(ite-war)

  # MCMC=(ite-war)*cha
  chisquare <- vector()

  #--------- chi ^2 -----------Start
  p<-rstan::extract(fit)$p
  lambda<-rstan::extract(fit)$l


  ss <- array(0,dim = c(MCMC,C))
  tt <- array(0,dim = c(MCMC,C))
  for(cd in 1:C){
    ss[,cd]<-(h[C+1-cd]-NL*p[,cd])^2/(NL*p[,cd])

    if(!cd==C)tt[,cd]<-(f[C+1-cd]-NX*(lambda[,cd]-lambda[,cd+1]))^2/(NX*(lambda[,cd]-lambda[,cd+1]))
    if(cd==C)  tt[,cd]<-(f[C+1-cd]-NX*(lambda[,cd]- 0))^2/(NX*(lambda[,cd]- 0))

  }
  chi.Square.for.each.MCMC.samples <- apply(ss, 1,sum)+apply(ss,1, sum)
  chi.Square.for.each.MCMC.samples <- signif(chi.Square.for.each.MCMC.samples, digits = dig)
  return(chi.Square.for.each.MCMC.samples)
}











#'@title  Calculates the Goodness of Fit (Chi Square)
#'@description \strong{('; \eqn{\omega};')}
#'  The so-called \emph{chi square goodness of fit} is
#' a function of data-set \eqn{y}
#' and model parameter \eqn{\theta}, namely,  \eqn{\chi(y|\theta)}.
#' This function merely provides this. \strong{Detail.} But when the author reviews this today, I am surprised cuz
#' this function depends on many variables and it will be hard to understand what it is.
#' OK, I will enjoy to tell the audiences what the variables mean.
#' First of all, what we should consider is only substitution of  dataset \eqn{y} and
#'  model parameter \eqn{\theta} into  \eqn{\chi(y|\theta)}.
#'  \eqn{y} is decomposed into \code{h,f,NI,NL} which mean the number of hits, false alarms, images and trials.
#' \eqn{\theta} corresponds to \code{p, lambda}.
#' Holy moly, I write this without any tips, lemonades and coffee!
#' I love you. Today 2020 Oct 19, MCS symptoms is basically not bad, but, still aches in muscles, legs, why?
#' for 3 years, too long to be patient.
#'
#' @details
#' statistics for each MCMC sample with a fixed dataset.
#'
#'
#'
#'    Our data is 2C categories, that is,
#'
#'
#'  the number of hits        :h[1], h[2], h[3],...,h[C] and
#'
#'  the number of false alarms: f[1],f[2], f[3],...,f[C].
#'
#'
#'    Our model has C+2 parameters, that is,
#'
#'    the thresholds of the bi normal assumption z[1],z[2],z[3],...,z[C] and
#'
#'    the mean and standard deviation of the signal distribution.
#'
#'
#'  So, the degree of freedom of this statistics is calculated by
#'
#'
#'             2C-(C+2)-1 =C -3.
#'
#'This differ from Chakraborty's  result C-2. Why ?
#'
#'
#'
#' \strong{Remak on the verification of codes}
#' To tell the truth, the author doubt that the calculation of ppp in this pkg is incorrect.
#' But I cannot reveal where I am wrong. Or, I cannot exculde in 100% that my programming is correct.
#' The result of ppp() is sometimes reasonable but sometimes it is against my cute intuition.
#' Of  curse, I am pretty cute, but why .... Uhnnn I am not sure wheter I am correct.
#' So, ha. Today (2020 Oct 19), I checked the code, but it looked correct.
#'
#'
#'

#'
#'
#'
#'
#'@param f A vector of non-negative integers, indicating the number of false alarms.
#' The reason why the author includes this variable is to substitute the false alarms from the posterior predictive distribution.
#'In famous Gelman's book, he explain how to make test statistics in the Bayesian context,  and it require the samples from posterior predictive distribution.
#'So, in this variable author substitute the replication data from the posterior predictive distributions.

#'@param h  A vector of non-negative integers, indicating the number of hits.
#'The reason why the author includes this variable is to substitute the false alarms from the posterior predictive distribution.
#'In famous Gelman's book, we can access how to make test statistics in the Bayesian context,  and it require the samples from posterior predictive distribution.
#'So, using this variable author substitute the replication data from the posterior predictive distributions.


#'@param p  A vector of non-negative integers, indicating hit rate. A vector whose length is number of confidence levels.
#'@param lambda A vector of non-negative integers, indicating False alarm rate. A vector whose length is number of confidence levels.
#'@param NI An integer, representing Number of Images
#'@param NL An integer, representing Number of Lesions
#' @param is_print_each_ratings_wise A logical, whether result is printed on the R/R-studio console.
#'
#'
#'@inheritParams fit_Bayesian_FROC
#'@inheritParams DrawCurves
#'@return A number! Not list nor data-frame nor vector!
#' Only A number represent the chi square for your input data.
#'@examples
#'\dontrun{
# ####1#### ####2#### ####3#### ####4#### ####5#### ####6#### ####7#### ####8#### ####9####
#'
#'#  Makes a stanfit object (more precisely its inherited S4 class object)
#'
#'        fit <- fit_Bayesian_FROC(BayesianFROC::dataList.Chakra.1,
#'                            ite = 1111,
#'                            summary =FALSE,
#'                            cha = 2)
#'
#'#   Calculates the chi square discrepancies (Goodness of Fit)
#'#   with the posterior mean as a parameter.
#'
#'
#'   NI          <-  fit@dataList$NI
#'   NL          <-  fit@dataList$NL
#'   f.observed  <-  fit@dataList$f
#'   h.observed  <-  fit@dataList$h
#'   C           <-  fit@dataList$C
#'
#'#      p <-  rstan::get_posterior_mean(fit, par=c("p"))
#'# lambda <- rstan::get_posterior_mean(fit, par=c("l"))
#' # Note that get_posterior_mean is not a number but a matrix when
#' # Chains is not 1.
#' # So, instead of it, we use
#' #
#'
#'   e     <- extract_EAP_CI(fit,"l",fit@dataList$C )
#'  lambda <- e$l.EAP
#'
#'   e <- extract_EAP_CI(fit,"p",fit@dataList$C )
#'   p <- e$p.EAP
#'
#'          Chi.Square <-   chi_square_goodness_of_fit_from_input_all_param(
#'
#'                           h   =   h.observed,
#'                           f   =   f.observed,
#'                           p   =   p,
#'                       lambda  =   lambda,
#'                           NL  =   NL,
#'                           NI  =   NI
#'                                )
#'
#'#  Get posterior mean of the chi square discrepancy.
#'
#'                     Chi.Square
#'
#' # Calculate the p-value for the posterior mean of the chi square discrepancy.
#'
#'                      stats::pchisq(Chi.Square,df=1)
#'
#' # Note that the use of pchisq is fucking in Bayesian context,
#' # so, the pretty cute author made a function to calculate p value in the Bayesian sense.
#' # It is named ppp().
#'
#'
#'
#'
#'
#'
#'}# dottest
#'
#' @export

chi_square_goodness_of_fit_from_input_all_param <- function(
  # StanS4class,
  h,
  f,
  p,
  lambda,
  NL ,
  NI ,
  ModifiedPoisson = FALSE,
  dig = 3,
  is_print_each_ratings_wise = FALSE
){



  if (ModifiedPoisson == FALSE) NX <- NI
  if (ModifiedPoisson == TRUE )  NX <- NL




  C <-length(h)

  chisquare <- vector()

  ss <- array(0,dim = c(C))
  tt <- array(0,dim = c(C))


  for(cd in 1:C){
    ss[cd]<-(h[C+1-cd]-NL*p[cd])^2/(NL*p[cd])

    if(!cd==C)tt[cd]<-(f[C+1-cd]-NX*(lambda[cd]-lambda[cd+1]))^2/(NX*(lambda[cd]-lambda[cd+1]))
    if(cd==C)  tt[cd]<-(f[C+1-cd]-NX*(lambda[cd]- 0))^2/(NX*(lambda[cd]- 0))
  }
  chi.Square.for.each.MCMC.samples <- sum(ss)+sum(tt)
  chi.Square.for.each.MCMC.samples <- signif(chi.Square.for.each.MCMC.samples, digits = dig)

 if(is_print_each_ratings_wise)  print.data.frame( data.frame(Hit_part = ss, False_Alarm_part = tt) ,row.names=FALSE)

  return(chi.Square.for.each.MCMC.samples)
}









#' @title Chi square statistic (goodness of fit) in the case
#'  of MRMC at the pair of given data
#'   and each MCMC sample
#'
# @description ---------
#' @description
#'
#' Revised 2019 Oct 16.
#' Revised 2019 Nov 1.
#' Revised 2019 Nov 27.
#' Revised 2019 Dec 1.

#'
#'In the following, we explain what this function calculates.
#'
#' Let \eqn{\chi^2(y|\theta)} be a
#' \emph{chi square goodness of fit} statistic which is defined by
#'
#'
#'  ( Observed data - Expectation \eqn{)^2}/Exectation.
#'
#'
#'
#'
#'
#'
#'  In MRMC cases, it is defined as follows.
#'
#' \deqn{\chi^2(D|\theta) := \sum_{r=1}^R \sum_{m=1}^M \sum_{c=1}^C \biggr( \frac{[ H_{c,m,r}-N_L\times p_{c,m,r}(\theta)]^2}{N_L\times p_{c,m,r}(\theta)}+\frac{[F_{c,m,r}-(\lambda _{c} -\lambda _{c+1} )\times N_{L}]^2}{(\lambda_{c}(\theta) -\lambda_{c+1}(\theta) )\times N_{L} }\biggr).}
#'
#'where  a dataset \eqn{D} consists of the
#'pairs of the number of False Positives and the number of True
#' Positives  \eqn{ (F_{c,m,r}, H_{c,m,r}) }
#' together with the number of lesions \eqn{N_L}
#' and the number of images \eqn{N_I} and
#'  \eqn{\theta} denotes the model parameter.
#'
#'
#'
#'
#'
#'
#'
#'
#'
#'  Note that we can rewrite the chi square as follows.
#'
#'   \deqn{\chi^2(D|\theta) := \sum_{r=1}^R \sum_{m=1}^M \sum_{c=1}^C \biggr( \frac{[ H_{c,m,r}-  E_{\theta}[H_{c,m,r}] ]^2}{E_{\theta}[H_{c,m,r}]}+\frac{[F_{c,m,r}- E_{\theta}[F_{c,m,r}]   ]^2}{    E_{\theta}[F_{c,m,r}]   }\biggr).}

#'
#'So, the chi square has two terms.
#'
#'1) The first term is the difference of hit
#' and its expectation.
#'
#'2) The second term   is the differences of observed false alarms and its expectatioins.
#'
#'   In this function, we calculate each terms, separately.
#'    So, return values retain these two terms, separately.
#'
#'
#'
#' In this function, we calculate the following (I) and (II):
#'
#'
#'
#' \strong{ (I) A vector -------------------}
#'
#'
#'Let us denote a collection of
#' posterior MCMC samples for a given dataset \eqn{D} by
#'
#'\deqn{   \theta_1 ,   \theta_2 ,    \cdots , \theta_i,    \cdots ,    \theta_N,}
#' namely, each \eqn{\theta_i} is
#' synthesized from posterior
#' \eqn{\pi(\theta|D)}, \eqn{\theta_i \sim \pi(\theta|D).}

#'
#'
#' Substituing these MCMC samples into the above definition of the chi square,
#'  we obtain the following vector as a return value of this function.
#'
#'      \deqn{ \chi^2(D|\theta_1),}
#'      \deqn{ \chi^2(D|\theta_2),}
#'      \deqn{ \chi^2(D|\theta_3),}
#'      \deqn{          :         }
#'      \deqn{          :         }
#'      \deqn{ \chi^2(D|\theta_N).}
#'
#'
#

#'
#'  \strong{ (II) A mean of the above vector, namely, the posterior mean of the chi square over all MCMC samples -------------}
#'
#'
#'Using the set of chi squares  \eqn{(\chi^2(D|\theta_i);i=1,...,N)}
#'calculated for each posterior MCMC samples \eqn{\theta_i \sim \pi(\theta|D).}
#',
#'the function also calculates the posterior mean of the chi square statistic, namely,
#'
#'
#'       \deqn{\int   \chi^2(D|\theta)\pi(\theta|D) d\theta,}
#'
#'
#' by approximating it as
#'
#'       \deqn{\frac{1}{N} \sum _{i=1} ^N \chi^2(D|\theta_i),}
#'
#'where \eqn{\pi(\theta|D)} denotes the posterior probability density under the given data \eqn{D}.
#'
#'Do not confuse it with the following
#'
#'       \deqn{   \chi^2(D|\theta^*).}
#'
#' where \eqn{\theta^*} denotes the posterior estimates, i.e.,  \eqn{ \theta^* := \int \theta \pi(\theta|D) d\theta}.
# here -----

#'


# @details -----
#' @details This function is implemented by
#' vectorizations and further techinics.
#' When the author review this, I
#'  find my past work is great,...
#'  I forget that I made this.
#' But this function is great.
#'
#' Revised 2019 Nov 1
#'
#'@inheritParams fit_Bayesian_FROC
#'@inheritParams DrawCurves
#'
# @param --------
#'@param   dl_is_an_array_of_C_only_and_not_C_M_Q A Boolean, if \code{TRUE},
#'then false rate \code{lambda} simply
#'denoted by \code{l} in \R script ( \eqn{\lambda} )
#' is an vector \code{l[C]}. If false, then the false
#'  alarm rate is an array \code{l[C,M,Q]}.
#'
#'
#  @return ----
#' @return A list, calculated by each modality reader and cofidence level, and MCMC samples.
#'  A one the component of list contains \{ \eqn{\chi^2(Data|\theta_i)} ; i= 1,2,3,...n\}, where \eqn{n} is the number of MCMC iterations.
#'
#' Each component
#' of list isan array
#'  whose index indicats  \code{[MCMC, Confidence, Modality, Reader]  }.
#'
#' Each component of list is
#'   \code{an array whose index indicats [MCMC, C, M, Q]}.
#'
#' To be passed to the calculation of Posterior predictive p value,
#' I need the sum of return value,
#' that is, sum of C,M,Q and resulting quantities
#' construct a vetor whose length is a same as the number of MCMC iterations.
#' I love you. I need you. So, to calculate such quantites,
#'  the author .... will make a new function.
#'
#'
#'  Also, it retains the posterior mean of chi square statistic for
#'  an assumed occurrence of the data \eqn{D}:
#'
#'
#'    \deqn{\chi^2(Data)  = \int \chi^2(Data|\theta) \pi(\theta|D)d  \theta}

#'
#'
#'
#' @export Chi_square_goodness_of_fit_in_case_of_MRMC_Posterior_Mean
#'
#' @examples
#'
#'
#' \dontrun{
#'
#'#========================================================================================
#'#            1) Create a fitted model object for data named  dd
#'#========================================================================================
#'
#'
#'
#'
#'      fit <- fit_Bayesian_FROC(    ite = 1111, # Number of MCMC iterations
#'                                   cha = 1,
#'                              dataList = BayesianFROC::dd # This is a MRMC dataset.
#'                               )
#'
#'
#'#========================================================================================
#'#            2) Calculate a chi square and meta data
#'#========================================================================================
#'
#'
#'
#'             a <- Chi_square_goodness_of_fit_in_case_of_MRMC_Posterior_Mean(fit)
#'
#'
#'
#'#========================================================================================
#'#            3) Extract a chi square
#'#========================================================================================
#'
#'
#'
#'
#'
#'
#'                               chi.square  <- a$chi.square
#'
#'
#'
#'
#'#========================================================================================
#'#     A case of single reader is special in the programming perspective
#'#                                                                         2020 Feb 24
#'#========================================================================================
#'
#'
#' f <- fit_Bayesian_FROC( ite  = 1111,
#'                          cha = 1,
#'                      summary = TRUE,
#'                     dataList = dddd,
#'                          see = 123)
#'
#' Chi_square_goodness_of_fit_in_case_of_MRMC_Posterior_Mean(f)
#'
#'
#'
#'# Revised 2019 August 19
#'#         2019 Nov 1
#'
#'}# dontrun
#'
#'
#'

Chi_square_goodness_of_fit_in_case_of_MRMC_Posterior_Mean  <-  function(
  StanS4class ,
  summary=TRUE,
  dl_is_an_array_of_C_only_and_not_C_M_Q=TRUE


){
  # if (StanS4class@dataList$Q==1) return(message("under construction"))

  fit <-StanS4class
  dataList <- StanS4class@dataList
  NL <- dataList$NL
  NI <- dataList$NI
  if (fit@ModifiedPoisson==TRUE) NX <- NL
  if (fit@ModifiedPoisson==FALSE) NX <- NI

  harray <- array_of_hit_and_false_alarms_from_vector(dataList = dataList)$harray
  farray <- array_of_hit_and_false_alarms_from_vector(dataList = dataList)$farray

  ppp <- extract(fit)$ppp # Preserving array format of the declaration in stan file, but adding the first index as MCMC samples
  for (md in 1:dim(ppp)[3]) {for (qd in 1:dim(ppp)[4]){
    ppp[,,md,qd] <-  aperm(apply(aperm(ppp[,,md,qd]),2,rev))# <--  Very important    Make a matrix such that B[i,j]=A[i,J-j]
  }}



  A <- ppp*NL
  B <- harray
# browser()
if (!StanS4class@dataList$Q==1) {C <- array(aperm(sapply(1:dim(A)[1], function(i) A[i,,,] - B)), dim(A))
               hit.term.array <- C^2/A



                  dl <- extract(fit)$dl # [MCMC, C]           Preserving array format of the declaration in stan file, but adding the first index as MCMC samples
                  if(dl_is_an_array_of_C_only_and_not_C_M_Q == FALSE){
                    for (md in 1:dim(ppp)[3]) {for (qd in 1:dim(dl)[4]){
                      dl[,,md,qd] <-  aperm(apply(aperm(dl[,,md,qd]),2,rev))# <--  Very important    Make a matrix such that B[i,j]=A[i,J-j]
                    }}



                    A <- dl*NL
                    B <- farray

                    C <- array(aperm(sapply(1:dim(A)[1], function(i) A[i,,,] - B)), dim(A))

                    FalseAlarm.term.array <- C^2/A
                  }

                  if(dl_is_an_array_of_C_only_and_not_C_M_Q){
                    dl <-  aperm(apply(aperm(dl),2,rev))# <--  Very important    Make a matrix such that B[i,j]=A[i,J-j]

                    A <- farray #[C,M,Q]
                    B <- dl*NL  #[MCMC, C]



                    C <- array(NA, c(dim(B)[1], dim(A)))

                    for (h in 1 : dim(B)[1]){
                      for(i in 1 : dim(A)[1]){
                        C[h, i,, ] <-  A[i,, ] - B[h, i]
                      }
                    }
                    # C^2

                    FalseAlarm.term.array <- sweep(C^2, c(1,2), B,`/`)
                  }



}#!StanS4class@dataList$Q==1 2020 Feb 24






  if (StanS4class@dataList$Q==1){ A <-  A[,,,1];B <- B[,,1] #2020 Feb 24
                                    C <- array(aperm(sapply(1:dim(A)[1], function(i) A[i,,] - B)), dim(A)) #2020 Feb 24
                                    hit.term.array <- C^2/A #2020 Feb 24



                                    dl <- extract(fit)$dl # [MCMC, C]           Preserving array format of the declaration in stan file, but adding the first index as MCMC samples
                                    if(dl_is_an_array_of_C_only_and_not_C_M_Q == FALSE){
                                      for (md in 1:dim(ppp)[3]) {for (qd in 1:dim(dl)[4]){
                                        dl[,,md,qd] <-  aperm(apply(aperm(dl[,,md,qd]),2,rev))# <--  Very important    Make a matrix such that B[i,j]=A[i,J-j]
                                      }}



                                      A <- dl*NL
                                      B <- farray

                                      C <- array(aperm(sapply(1:dim(A)[1], function(i) A[i,,,] - B)), dim(A))

                                      FalseAlarm.term.array <- C^2/A
                                    }

                                    if(dl_is_an_array_of_C_only_and_not_C_M_Q){
                                      dl <-  aperm(apply(aperm(dl),2,rev))# <--  Very important    Make a matrix such that B[i,j]=A[i,J-j]

                                      A <- farray #[C,M,Q]
                                      B <- dl*NL  #[MCMC, C]



                                      C <- array(NA, c(dim(B)[1], dim(A)))

                                      for (h in 1 : dim(B)[1]){
                                        for(i in 1 : dim(A)[1]){
                                          C[h, i,, ] <-  A[i,, ] - B[h, i]
                                        }
                                      }
                                      # C^2

                                      FalseAlarm.term.array <- sweep(C^2, c(1,2), B,`/`)
                                      FalseAlarm.term.array <- FalseAlarm.term.array[,,,1] #2020 Feb 24
                                    }

                                    # browser()

  }#  StanS4class@dataList$Q==1 2020 Feb 24








  chi.square.array <- hit.term.array + FalseAlarm.term.array

  hit.term <- apply(hit.term.array, c(1), sum)
  FalseAlarm.term <- apply(FalseAlarm.term.array, c(1), sum)
  chi.square <- apply(chi.square.array, c(1), sum)
  df <-data.frame(
    chi.square.posterior.mean=mean(chi.square),
    hit.term.posterior.mean=mean(hit.term),
    FalseAlarm.term.posterior.mean=mean(FalseAlarm.term)
  )

  if (summary==TRUE)print(knitr::kable(df, format = "pandoc"))


  if (summary==TRUE) message("\n* Posterior Mean of test statistics")
  invisible(
    list(

      chi.square=chi.square,
      hit.term=hit.term,
      FalseAlarm.term=FalseAlarm.term,
      df.chisquare.posterior.mean =df
    )

  )
}#function



# get_posterior_mean(fit,"ppp")[,"mean-all chains"]





#' @title Chi square in the case of MRMC at a given dataset and a given parameter.
#' @description Given parameter and data, the chi square is calculated.

#'@inheritParams fit_Bayesian_FROC
#'@inheritParams DrawCurves
#'
#' @return A list, contains  \eqn{\chi^2(Data|\theta)},
#'  where \eqn{Data} and \eqn{\theta} are specified by user.
#'
#' @param ppp An array of \code{[C,M,Q]}, representing hit rate,
#' where \code{C,M,Q} denotes the number of confidences, modalities,
#'  readers, respectively.
#' @param dl An vector of length \code{C M Q} representing
#' false alarm rate,
#' where \code{C,M,Q} denotes the number of
#'  confidences, modalities, readers, respectively.
#'
#' @export
#'
#' @examples
#'
#'
#' \dontrun{
#'#========================================================================================
#'#  0)
#'#========================================================================================
#'
#'#        Chi square depends on data and model parameter, thus what we have to do is:
#'#        prepare data and parameter
#'
#'#        In the follwoing, we use data named ddd as an example to be fitted a model,
#'#        and use  posterior mean estimates as model parameter.
#'#        To do so, we execute the following code
#'#        to run the HMC algorithm for the data named ddd
#'
#'
#'
#'    fit <- fit_Bayesian_FROC(  dataList = ddd, ite = 51 )
#'
#'
#'#         In the resulting object named fit, the posterior samples are retained.
#'
#'#========================================================================================
#'#  1)   hit rate and false alarm rate
#'#========================================================================================
#'
#'
#'          e   <- extract_estimates_MRMC(fit);
#'          dl  <- e$dl.EAP;
#'          ppp <- e$ppp.EAP;
#'
#'
#'#========================================================================================
#'# 2)  Calculates chi square using above hit rate and false alarm rate and data named ddd
#'#========================================================================================
#'
#'          chi_square_goodness_of_fit_from_input_all_param_MRMC(ppp,dl,ddd)
#'
#'}# dontrun
#'
#'
#'




chi_square_goodness_of_fit_from_input_all_param_MRMC <-  function(  ppp,
                                                                    dl,
                                                                    dataList,
                                                                    summary=TRUE
){

  # fit <-StanS4class
  # dataList <- StanS4class@dataList
  NL <- dataList$NL
  NI <- dataList$NI
  m <- dataList$m
  c <- dataList$c
  q <- dataList$q

  h <- dataList$h
  f <- dataList$f
  # h[n] ~ binomial(NL, ppp[c[n],m[n],q[n]]);
  # ff[n] ~ poisson(l[c[n]]*NL);//Chakraborty's model
  hit.term.vector <-vector()
  FalseAlarm.term.vector <- vector()


  #In case of a single reader and multple modaity case, an array of type, e.g., [3,1] is reduced to a vector of dimension 3 and it caused some error. So, the author fixed it.

 if (length(dim(ppp))==2) {#2020 Feb 24
    M <- dataList$M
    C <- dataList$C
    Q <- dataList$Q
    #2020 Feb 24
    ppp2<- array(NA,dim = c(C,M,Q))#2020 Feb 24

    ppp2[,,1]<-ppp#2020 Feb 24
    ppp <-ppp2#2020 Feb 24


  }





  for (n in 1:length(h)) {
    hit.term.vector[n] <- (h[n]-ppp[c[n],m[n],q[n]]*NL)^2/(ppp[c[n],m[n],q[n]]*NL)
  }
  hit.term<-sum(hit.term.vector)

  a <- metadata_to_fit_MRMC(dataList)
  ff <- a$ff
  for (n in 1:length(h)) {
    FalseAlarm.term.vector[n] <-( f[n] - dl[c[n]]*NL)^2/(dl[c[n]]*NL)
  }

  FalseAlarm.term <- sum(FalseAlarm.term.vector)


  chi.square <- hit.term + FalseAlarm.term

  df <-data.frame(
    chi.square=chi.square,
    hit.term=hit.term,
    FalseAlarm.term=FalseAlarm.term
  )

  if (summary==TRUE) print(knitr::kable(df, format = "pandoc"))
  # browser()
  return(
    list(
      chi.square=chi.square,
      hit.term=hit.term,
      FalseAlarm.term=FalseAlarm.term,
      data.frame.of.metadata.for.chisquare =df,
      hit.term.vector=hit.term.vector,
      FalseAlarm.term.vector=FalseAlarm.term.vector
    )

  )
}#function



# get_posterior_mean(fit,"ppp")[,"mean-all chains"]




#' @title chi square at replicated data drawn (only one time) from model with each MCMC samples.
#'@description
#' To pass the return  value to  the  calculator of the posterior predictive p value.
#'@inheritParams DrawCurves
#'@inheritParams fit_Bayesian_FROC
#'@param seed This is  used only in programming phase.
#'If seed is passed, then, in procedure indicator the seed is printed.
#'This parameter is only for package development.
#  @return -----
#' @return A list.
#'
#' From any given posterior MCMC samples
#'  \eqn{\theta_1,\theta_2,...,\theta_i,....,\theta_n}
#'   (provided by stanfitExtended object),
#' it calculates a return value as a vector
#'  of the form \eqn{\chi(y_i|\theta_i),i=1,2,....},
#'  where each dataset \eqn{y_i} is drawn
#'  from the corresponding
#'   likelihood \eqn{likelihood(.|\theta_i),i=1,2,...},
#'  namely,
#'
#'    \deqn{y_i \sim likelihood(.| \theta_i).}
#'
#' The return value also retains these \eqn{y_i, i=1,2,..}.
#'
#'
#'
#' Revised 2019 Dec. 2
#'
#'@details For a given dataset \eqn{D_0},
#' let us denote by \eqn{\pi(|D_0)} a posterior distribution
#' of the given data \eqn{D_0}.
#'
#' Then, we draw poterior samples.
#'
#'
#'    \deqn{\theta_1 \sim \pi(.|  D_0),}
#'    \deqn{\theta_2 \sim \pi(.|  D_0),}
#'    \deqn{\theta_3 \sim \pi(.|  D_0),}
#'    \deqn{....,}
#'    \deqn{\theta_n \sim \pi(.|  D_0).}
#'
#'
#'
#' We let \eqn{L(|\theta)}  be a likelihood function or probability law of data,
#' which is also denoted by \eqn{L(y|\theta)} for a given data \eqn{y}.
#' But, the specification of data \eqn{y} is somehow conversome,
#' thus, to denote the function
#'  sending each \eqn{y} into \eqn{L(y|\theta)},
#'   we use the notation  \eqn{L(|\theta)}.
#'
#' Now, we synthesize
#'  data-samples \eqn{(y_i;i=1,2,...,n)} in \strong{only one time drawing} from
#' the collection of likelihoods  \eqn{L(|\theta_1),L(|\theta_2),...,L(|\theta_n)}.
#'
#'
#'
#'    \deqn{y_1 \sim L(.| \theta_1),}
#'    \deqn{y_2 \sim L(.| \theta_2),}
#'    \deqn{y_3 \sim L(.| \theta_3),}
#'    \deqn{....,}
#'    \deqn{y_n \sim L(.| \theta_n).}
#'
#'
#'
#'
#'
#'    Altogether,
#'    using these pair of samples
#'    \eqn{(y_i, \theta_i), i= 1,2,...,n},
#'     we calculate the chi squares as the
#'    \strong{return value} of this function. That is,
#'
#'
#'    \deqn{\chi(y_1|\theta_1),}
#'    \deqn{\chi(y_2|\theta_2),}
#'    \deqn{\chi(y_3|\theta_3),}
#'    \deqn{....,}
#'    \deqn{\chi(y_n|\theta_n).}
#'
#'    \emph{ \strong{This is contained as a vector in the return value}},
#'
#'    so the return value is a vector whose length
#'    is the number of MCMC iterations
#'     except the burn-in period.
#'
#' Note that
#'  in MRMC cases, \deqn{\chi(y|\theta).} is defined as follows.
#'
#' \deqn{\chi^2(y|\theta) := \sum_{r=1}^R \sum_{m=1}^M \sum_{c=1}^C \biggr( \frac{[ H_{c,m,r}-N_L\times p_{c,m,r}(\theta)]^2}{N_L\times p_{c,m,r}(\theta)}+\frac{[F_{c,m,r}-(\lambda _{c} -\lambda _{c+1} )\times N_{L}]^2}{(\lambda_{c}(\theta) -\lambda_{c+1}(\theta) )\times N_{L} }\biggr).}
#'where  a dataset \eqn{y} consists of the
#'pairs of the number of False Positives and the number of True
#' Positives  \eqn{ (F_{c,m,r}, H_{c,m,r}) }
#' together with the number of lesions \eqn{N_L}
#' and the number of images \eqn{N_I} and
#'  \eqn{\theta} denotes the model parameter.
#'
#'
#'
#' \emph{ \strong{ Application of this return value to calculate the so-called \emph{Posterior Predictive P value.} } }
#here jul  ----

#'
#'  As will be demonstrated in the other function,
#' chaning seed, we can obtain
#'
#'
#'
#'
#'\deqn{y_{1,1},y_{1,2},y_{1,3},...,y_{1,j},....,y_{1,J} \sim L ( . |\theta_1),   }
#'\deqn{y_{2,1},y_{2,2},y_{2,3},...,y_{2,j},....,y_{2,J} \sim L ( . |\theta_2),  }
#'\deqn{y_{3,1},y_{3,2},y_{3,3},...,y_{3,j},....,y_{3,J} \sim L ( . |\theta_3),   }
#'\deqn{...,}
#'\deqn{y_{i,1},y_{i,2},y_{i,3},...,y_{i,j},....,y_{I,J} \sim L ( . |\theta_i),  }
#'\deqn{...,}
#'\deqn{y_{I,1},y_{I,2},y_{I,3},...,y_{I,j},....,y_{I,J} \sim L ( . |\theta_I).  }
#'
#' where \eqn{L ( . |\theta_i)} is a likelihood function for a model parameter \eqn{\theta_i}.
#' And thus, we calculate the chi square statistics.
#'
#'
#'\deqn{ \chi(y_{1,1}|\theta_1), \chi(y_{1,2}|\theta_1), \chi(y_{1,3}|\theta_1),..., \chi(y_{1,j}|\theta_1),...., \chi(y_{1,J}|\theta_1),}
#'\deqn{ \chi(y_{2,1}|\theta_2), \chi(y_{2,2}|\theta_2), \chi(y_{2,3}|\theta_2),..., \chi(y_{2,j}|\theta_2),...., \chi(y_{2,J}|\theta_2),}
#'\deqn{ \chi(y_{3,1}|\theta_3), \chi(y_{3,2}|\theta_3), \chi(y_{3,3}|\theta_3),..., \chi(y_{3,j}|\theta_3),...., \chi(y_{3,J}|\theta_3),}
#'\deqn{...,}
#'\deqn{ \chi(y_{i,1}|\theta_i), \chi(y_{i,2}|\theta_i), \chi(y_{i,3}|\theta_i),..., \chi(y_{i,j}|\theta_i),...., \chi(y_{I,J}|\theta_i),}
#'\deqn{...,}
#'\deqn{ \chi(y_{I,1}|\theta_I), \chi(y_{I,2}|\theta_I), \chi(y_{I,3}|\theta_I),..., \chi(y_{I,j}|\theta_I),...., \chi(y_{I,J}|\theta_I).}
#'
#'
#'
#'
#'
#'
#'whih are used when we calculate
#' the so-called \emph{Posterior Predictive P value} to test the
#' \emph{null hypothesis} that our model is fitted a data well.
#'
#'
#' Revised 2019 Sept. 8
#'
#'
#' Revised 2019 Dec. 2
#'
#' Revised 2020 March
#'
#'
#' Revised 2020 Jul

#'
#'
#'
#'
#'
#'
#' @inheritParams validation.dataset_srsc
#' @export
#'@examples
#'
#'
#' \dontrun{
#'
#'   fit <- fit_Bayesian_FROC( ite  = 1111,  dataList = ddd )
#'  a <- chi_square_at_replicated_data_and_MCMC_samples_MRMC(fit)
#'
#'  b<-a$List_of_dataList
#'  lapply(b, plot_FPF_and_TPF_from_a_dataset)
#'
#'
#'
#'}
#'
#'
chi_square_at_replicated_data_and_MCMC_samples_MRMC  <- function(
  StanS4class,
  summary=TRUE,
  seed = NA,
  serial.number=NA
) {

  fit      <- StanS4class
  dataList <- fit@dataList
  NL       <- fit@dataList$NL
  NI       <- fit@dataList$NI

  e                <- extract(fit)
  MCMC             <- dim(e$mu)[1]
  List_of_dataList <- list()
  chi.square       <- vector()

  for (mcmc in 1:MCMC) {
    mu <- e$mu[mcmc,,]
    v  <- e$v[mcmc,,]
    z  <- e$z[mcmc,]

    List_of_dataList[[mcmc]]  <- create_dataList_MRMC(
      z.truth=z, #c(0.1,0.2,0.3,0.4,0.5),
      mu.truth = mu, #array of (M,Q),
      v.truth  = v, #array of (M,Q),
      NI =NI,
      NL=NL,
      ModifiedPoisson =TRUE,#Since NI = NULL, if it is FALSE, then false alarms cannot be created 2019 August 24
      seed =123,
      summary = FALSE
    )

    # harray <- array_of_hit_and_false_alarms_from_vector(dataList =  List_of_dataList[[mcmc]])$harray
    # farray <- array_of_hit_and_false_alarms_from_vector(dataList =  List_of_dataList[[mcmc]])$farray
    ppp <-   e$ppp[mcmc,,,]
    dl <-   e$dl[mcmc,]

    if (summary==TRUE){
      # Here is printed -----
      if(is.na(seed)) message(mcmc,"/",MCMC)
      if(!is.na(seed)) message("seed=",seed,"/",serial.number, ",   ",mcmc,"/",MCMC)

    }


    a<- chi_square_goodness_of_fit_from_input_all_param_MRMC(
      ppp,
      dl,
      dataList,
      summary=summary
    )
    chi.square[mcmc] <- a$chi.square
    # browser()
  }

  invisible(list(
    List_of_dataList =List_of_dataList,
    chi.square =chi.square

  ))


}

Try the BayesianFROC package in your browser

Any scripts or data that you put into this service are public.

BayesianFROC documentation built on Jan. 13, 2021, 5:22 a.m.