chi_square_goodness_of_fit: _Chi square goodness of fit statistics_ at each MCMC sample...
In BayesianFROC: FROC Analysis by Bayesian Approaches

Description Usage Arguments Details Value Examples

View source: R/chi_square_goodness_of_fit.R

Calculates a vector, consisting of the Goodness of Fit (Chi Square) for a given dataset D and each posterior MCMC samples θ_i=θ_i(D), i=1,2,3,...., namely,

χ^2 (D|θ_i)

for i=1,2,3,.... and thus its dimension is the number of MCMC iterations..

Note that In MRMC cases, it is defined as follows.

χ^2(D|θ) := ∑_{r=1}^R ∑_{m=1}^M ∑_{c=1}^C \biggr( \frac{[ H_{c,m,r}-N_L\times p_{c,m,r}(θ)]^2}{N_L\times p_{c,m,r}(θ)}+\frac{[F_{c,m,r}-(λ _{c} -λ _{c+1} )\times N_{L}]^2}{(λ_{c}(θ) -λ_{c+1}(θ) )\times N_{L} }\biggr).

where a dataset D consists of the pairs of the number of False Positives and the number of True Positives (F_{c,m,r}, H_{c,m,r}) together with the number of lesions N_L and the number of images N_I and θ denotes the model parameter.

chi_square_goodness_of_fit(
  StanS4class,
  dig = 3,
  h = StanS4class@dataList$h,
  f = StanS4class@dataList$f,
  summary = FALSE
)

`StanS4class`	An S4 object of class `stanfitExtended` which is an inherited class from the S4 class `stanfit`. This R object is a fitted model object as a return value of the function `fit_Bayesian_FROC()`. To be passed to `DrawCurves()` ... etc
`dig`	A variable to be passed to the function `rstan::sampling`() of rstan in which it is named `...??`. A positive integer representing the Significant digits, used in stan Cancellation. Default = 5,
`h`	A vector of positive integers, representing the number of hits. This variable was made in order to substitute the hits data drawn from the posterior predictive distributions. In famous Gelman's book, he explain how to use the test statistics in the Bayesian context. In this context I need to substitute the replication data from the posterior predictive distributions.
`f`	A vector of positive integers, representing the number of false alarms. This variable was made in order to substitute the false alarms data drawn from the posterior predictive distributions. In famous Gelman's book, he explain how to use the test statistics in the Bayesian context. In this context I need to substitute the replication data from the posterior predictive distributions.
`summary`	Logical: `TRUE` of `FALSE`. Whether to print the verbose summary. If `TRUE` then verbose summary is printed in the R console. If `FALSE`, the output is minimal. I regret, this variable name should be verbose.

To calculate the chi square (goodness of fit) χ^2 (y|θ) test statistics, the two variables are required; one is an observed dataset y and the other is an estimated parameter θ. In the classical chi square values, MLE(maximal likelihood estimator) is used for an estimated parameter θ in χ^2 (y|θ). However, in the Bayesian context, the parameter is not deterministic and we consider it is a random variable such as samples from the posterior distribution. And such samples are obtained in the Hamiltonian Monte Carlo Simulation. Thus we can calculate chi square values for each MCMC sample.

Chi squares for each MCMC sample.

χ^2 = χ^2 (D|θ_i),i=1,2,...,N

So, the return values is a vector of length N which denotes the number of MCMC iterations except the warming up period. Of course if MCMC is not only one chain, then all samples of chains are used to calculate the chi square.

In the sequel, we use the notations

for a prior π(θ),

posterior π(θ|D),

likelihood f(D|θ),

parameter θ,

datasets D, for example, we can write as follows;

π(θ|D) \propto f(D|θ) π(θ).

Let us denote the posterior MCMC samples of size N for a given data-set D by

θ_1, θ_2, θ_3,...,θ_N

which are drawn from posterior π(θ|D) of given data D.

Recall that the chi square goodness of fit statistics χ depends on the model parameter θ and data D, namely,

χ^2 = χ^2 (D|θ)

The function calculates a vector of length N whose components is given by:

χ^2 (D|θ_1), χ^2 (D|θ_2), χ^2 (D|θ_3),...,χ^2 (D|θ_N),

So, the return value is a vector of size N.

As an application of this return value (χ^2(D|θ_i);i=1,...,N), we can calculate the posterior mean of χ = χ (D|θ), namely, we get

χ^2 (D) =\int χ^2 (D|θ) π(θ|D) dθ.

as its Monte Carlo integral

\frac{1}{N} ∑ _{i=1} ^N χ^2(D|θ_i),

In my model, almost all example, result of calculation shows that

\int χ^2 (D|θ) π(θ|D) dθ > χ^2 (D| \int θ π(θ|D) dθ)

The above inequality is true for all D?? I conjecture it.

Revised 2019 August 18 Revised 2019 Sept. 1 Revised 2019 Nov 28

Our data is 2C categories, that is,

the number of hits :h[1], h[2], h[3],...,h[C] and

the number of false alarms: f[1],f[2], f[3],...,f[C].

Our model has C+2 parameters, that is,

the thresholds of the bi normal assumption z[1],z[2],z[3],...,z[C] and

the mean and standard deviation of the signal distribution.

So, the degree of freedom of this statistics is calculated by

No. of categories - No. of parameters - 1 = 2C-(C+2)-1 =C -3.

This differ from Chakraborty's result C-2. Why ? ... In Bayesian, the degree of freedom is redandunt notion.

## Not run: 
#========================================================================================
#                Synthesize the MCMC samples from a dataset.
#========================================================================================

       fit <- fit_Bayesian_FROC(BayesianFROC::dataList.Chakra.1,
                           ite = 1111,
                           summary =FALSE,
                           cha = 2)

#========================================================================================
#   The chi square discrepancies are calculated by the following code
#========================================================================================

         Chi.Square.for.each.MCMC.samples   <-   chi_square_goodness_of_fit(fit)





#========================================================================================
# With Warning
#========================================================================================

         chi_square_goodness_of_fit(fit)

#========================================================================================
# Without warning
#========================================================================================

          chi_square_goodness_of_fit(fit,
                                     h=fit@dataList$h,
                                     f=fit@dataList$f)






#========================================================================================
#  Get posterior mean of the chi square discrepancy.
#========================================================================================


                    m<-   mean(Chi.Square.for.each.MCMC.samples)



#========================================================================================
# The author read at 2019 Sept. 1, it helps him. Thanks me!!
#
# Calculate the p-value for the posterior mean of the chi square discrepancy.
#========================================================================================

                                 stats::pchisq(m,df=1)

#========================================================================================
# Difference between chi sq. at EAP and EAP of chi sq.
#========================================================================================


   mean( fit@chisquare - chi_square_goodness_of_fit(fit))



## End(Not run)# dottest