Simulation_Based_Calibration_single_reader_single_modality_via_rstan_sbc: Simulation Based Calibration (SBC) for a single reader and a...
In BayesianFROC: FROC Analysis by Bayesian Approaches

Description Usage Arguments Details Value References Examples

Implements the SBC algorithm for a single reader and a single modality case.

Prior _________ Under Construction———-

I do not use the following prior, but instead the precise prior is defeined in the file: sbcVer2.stan. I am tired and not want to write this.

For sufficinetly small ε,

ε < \widetilde{p}_c(θ) < 1- ε,

q_c(θ) >c ε,

namely

ε < \log \frac{ Φ(θ_{c+1}) }{ Φ(θ_c) },

ε < Φ( \frac{θ_{c+1} - μ }{σ} ) -Φ( \frac{θ_{c} - μ }{σ} ). < 1-ε

We have to consider this equation.

To satisfy the condition q_c(θ) >c ε, we propose the following priors.

θ_1 \sim Unif(-111,Φ^{-1}(\exp^{- 5ε})),

θ_2 \sim Unif( Φ^{-1}(Φ(θ_1) \exp^{ε} ),Φ^{-1}(\exp^{- 4ε})),

θ_3 \sim Unif( Φ^{-1}(Φ(θ_2) \exp^{ε} ),Φ^{-1}(\exp^{- 3ε})),

θ_4 \sim Unif( Φ^{-1}(Φ(θ_3) \exp^{ε} ),Φ^{-1}(\exp^{- 2ε})),

θ_5 \sim Unif( Φ^{-1}(Φ(θ_4) \exp^{ε} ),Φ^{-1}(\exp^{- 1ε})).

To satisfy the condition ε < p_c(θ) < 1- ε, we propose the following priors for more general condition f < p_c(θ) < g, where f and g are function of ε, c, e.g., f=ε, g= 1- ε.

θ_1 \sim Unif(φ^{-1}(1-g),φ^{-1}(1-f) ),

θ_2 \sim Unif(φ^{-1}(\frac{φ(θ_1)}{1-f}),φ^{-1}(\frac{1-g}{(1-f)^1}) ),

θ_3 \sim Unif(φ^{-1}(\frac{φ(θ_2)}{1-f}),φ^{-1}(\frac{1-g}{(1-f)^1}) ),

θ_4 \sim Unif(φ^{-1}(\frac{φ(θ_3)}{1-f}),φ^{-1}(\frac{1-g}{(1-f)^1}) ),

θ_5 \sim Unif(φ^{-1}(\frac{φ(θ_4)}{1-f}),φ^{-1}(\frac{1-g}{(1-f)^1}) ),

where φ(θ):= Φ(\frac{θ - μ}{σ}) and φ^{-1}(τ):= μ + σ Φ^{-1}(τ).

To show that the above equations are well-definded, we have to show

(1) the support of the above uniform disribution is not empty

(2) the condition q_c(θ) >c ε holds.

To show (1), we have to verify

Φ^{-1}(\exp^{- cε}) - Φ^{-1}(Φ(θ_c) \exp^{ε} )

Suppose that we obtain θ_1,θ_2,\cdots,θ_{c} disributed by the above.

\exp^{- (C+1-c)ε} - Φ(θ_c) \exp^{ε}

> \exp^{- (C+1-c-1)ε} - \exp^{(C+1-c)ε} \exp^{ε}

> 0

Recall that the number of false alarms is distributed by Poisson with rate

q_c(θ) = \log \frac{ Φ(θ_{c+1}) }{ Φ(θ_c) }

Because q_c(θ) cannot be zero, but if we use non-informative priors for the model parameter θ, then some synthesized parameter gives q_c(θ)=0 which causes undesired results in SBC.

Thus, for sufficiently small fixed ε, we should assume that

q_c(θ) > cε,

namely,

ε < \log \frac{ Φ(θ_{c+1}) }{ Φ(θ_c) },

from which

Φ^{-1}(Φ(θ_c) \exp^{ε} ) < θ_{c+1},

where we assume Φ(θ_c) \exp^{ε} <1, namely, θ_c < Φ^{-1}(\exp^{- ε}).

θ_1 \sim Unif(-111,Φ^{-1}(\exp^{- ε})),

θ_2 \sim Unif(- Φ^{-1}(Φ(θ_1) \exp^{ε} ),Φ^{-1}(\exp^{- ε})),

θ_3 \sim Unif(- Φ^{-1}(Φ(θ_2) \exp^{ε} ),Φ^{-1}(\exp^{- ε})),

θ_4 \sim Unif(- Φ^{-1}(Φ(θ_3) \exp^{ε} ),Φ^{-1}(\exp^{- ε})),

θ_5 \sim Unif(- Φ^{-1}(Φ(θ_4) \exp^{ε} ),Φ^{-1}(\exp^{- ε})).

These assumptions are necessary restriction for the equation q_c(θ) > ε.

Furthermore, we should consider the Bernoulli success rate for the number of hits. Next, recall that the number of hits is distributed by the binomial distribution of rate p_c(θ) which should be in between zero and one. However, non-informative prior cannot holds this condition. Thus, we should investigate the prior such that it restricts the hit rate to be in the interval [0,1].

Recall that

p_c(θ) = Φ( \frac{θ_{c+1} - μ }{σ} ) -Φ( \frac{θ_{c} - μ }{σ} ).

We have to assume

ε < p_c(θ) < 1- ε,

from which, we obtain

ε < Φ( \frac{θ_{c+1} - μ }{σ} ) -Φ( \frac{θ_{c} - μ }{σ} ). < 1-ε

ε + Φ( \frac{θ_{c} - μ }{σ} ) < Φ( \frac{θ_{c+1} - μ }{σ} ) . < 1-ε +Φ( \frac{θ_{c} - μ }{σ} )

To go further step, we assume that

Φ( \frac{θ_{c} - μ }{σ} ) < ε,

from which, we can apply Φ^{-1} to 1-ε +Φ( \frac{θ_{c} - μ }{σ} ) . So,

\frac{θ_{c} - μ }{σ} < Φ^{-1}( ε),

and thus

θ_{c} < μ + σΦ^{-1}( ε).

Φ^{-1}( ε + Φ( \frac{θ_{c} - μ }{σ} ) )< \frac{θ_{c+1} - μ }{σ} < Φ^{-1}( 1-ε +Φ( \frac{θ_{c} - μ }{σ} ) )

μ + σ Φ^{-1}( ε + Φ( \frac{θ_{c} - μ }{σ} ) )< θ_{c+1} < μ + σ Φ^{-1}( 1-ε +Φ( \frac{θ_{c} - μ }{σ} ) )

To accomplish the above, we shold assume that

θ_{c+1} \sim Uniform( μ + σ Φ^{-1}( ε + Φ( \frac{θ_{c} - μ }{σ} ) ), μ + σ Φ^{-1}( 1-ε +Φ( \frac{θ_{c} - μ }{σ} ) ) ),

namely,

θ_1 \sim Unif(-111,111),

θ_{2} \sim Uniform( μ + σ Φ^{-1}( ε + Φ( \frac{θ_{1} - μ }{σ} ) ), μ + σ Φ^{-1}( 1-ε +Φ( \frac{θ_{1} - μ }{σ} ) ) ),

θ_{3} \sim Uniform( μ + σ Φ^{-1}( ε + Φ( \frac{θ_{2} - μ }{σ} ) ), μ + σ Φ^{-1}( 1-ε +Φ( \frac{θ_{2} - μ }{σ} ) ) ),

θ_{4} \sim Uniform( μ + σ Φ^{-1}( ε + Φ( \frac{θ_{3} - μ }{σ} ) ), μ + σ Φ^{-1}( 1-ε +Φ( \frac{θ_{3} - μ }{σ} ) ) ),

θ_{5} \sim Uniform( μ + σ Φ^{-1}( ε + Φ( \frac{θ_{4} - μ }{σ} ) ), μ + σ Φ^{-1}( 1-ε +Φ( \frac{θ_{4} - μ }{σ} ) ) ),

Combining the necessary conditions of hit rates and false alarm retes,

we should assume their intersections.

Set

X_c := Φ^{-1}(Φ(θ_c) \exp^{ε} ),

Y_c := μ + σ Φ^{-1}( ε + Φ( \frac{θ_{c} - μ }{σ} ) )

Z_c := μ + σ Φ^{-1}( 1-ε +Φ( \frac{θ_{c} - μ }{σ} ) ) ),

then,

θ_1 \sim Unif(-111,111),

θ_2 \sim Unif( max(X_1,Y_1),Z_1),

θ_3 \sim Unif( max(X_2,Y_2),Z_2),

θ_4 \sim Unif( max(X_3,Y_3),Z_3),

θ_5 \sim Unif( max(X_4,Y_4),Z_4).

To justify these priors, we have to implement the SBC algorithm.

In the above unform distribution, the support of them should not be empty. However it is not satisfied without any restriction. So, we should require the inequality that

Φ^{-1}(Φ(θ_c) \exp^{ε} ) < μ + σ Φ^{-1}( 1-ε +Φ( \frac{θ_{c} - μ }{σ} ) ) ),

which is satisfied in sufficiently small θ_c and the continuity of this equation implies that the set of solutions of θ_c satifying the inequality is not empty. Thus we have to find the minimun of parameter θ_c^* such that it saisfies the inequality.

Simulation_Based_Calibration_single_reader_single_modality_via_rstan_sbc(
  epsilon = 0.01,
  ite = 3333,
  NL = 259,
  NI = 57,
  C = 3,
  M = 500,
  BBB = 0.3,
  AAA = 3e-04,
  vvv = 0.3,
  vvvv = 11,
  mmm = 0,
  mmmm = 1,
  war = ite/10,
  stanModel,
  sbc_from_rstan = TRUE
)

`epsilon`	lower bound of Poisson for false positives.
`ite`	A variable to be passed to the function `rstan::sampling`() of rstan in which it is named `iter`. A positive integer representing the number of samples synthesized by Hamiltonian Monte Carlo method, and, Default = 1111
`NL`	number of lesions
`NI`	number of images
`C`	number of confidence levels
`M`	To be passed to the function `rstan::sbc`() in rstan.
`BBB`	a real
`AAA`	a real
`vvv`	a real
`vvvv`	a real
`mmm`	a real
`mmmm`	a real
`war`	A variable to be passed to the function `rstan::sampling`() of rstan in which it is named `warmup`. A positive integer representing the Burn in period, which must be less than `ite`. Defaults to war = floor(ite/5)=10000/5=2000,
`stanModel`	An object of the class stanfit of sbc. This is for the package developer.
`sbc_from_rstan`	A logical, wheter `rstan::sbc()` is used

The implementation is done using the rstan::sbc. The stan file is SBC.stan The implementation is done using the function rstan::sbc. The stan file is SBC.stan The variable in this function is a collection of parameters of priors

If we use non-informative prior, then from the prior the odd model parameter are synthesized. For example, If two thresholds z[c] and z[c+1] agree for some c, then the false alarm rate becomes zero with the following error from rstan::sbc:

failed to create the sampler; sampling not done

Error in new_CppObject_xp(fields$.module, fields$.pointer, ...) :

Exception: poisson_rng: Rate parameter is 0, but must be > 0!

Thus, we have to use very strong prior to avoid to synthesize such odd parameters of model.

SBC is a validation algorithm for models with respect to its prior.

I cannot fined the prior in which we can fit a model to various datasets.

What is SBC?

Aim of SBC is to evaluate how the computed posteriors are incorrect. To do so, SBC algorithm makes a histogram whose uniformity indicates MCMC samples contains bias.

For example,

If histogram is concave, namely there are spikes at the boundaries of histogram, then it indicates that MCMC samples is correlated. If a histogram is convex ( \cap-shaped), then it indicates that over-dispersed posteriors relative to the true posterior.

if histogram is concave,: namely there are spikes at the boundaries of histogram,then it indicates that MCMC samples is correlated.
If a histogram is convex ( \cap-shaped),: then it indicates that over-dispersed posteriors relative to the true posterior.
If a histogram is weighted to right or left,: then posterior moves opposite direction, namely left or right respectively.

We may say that SBC is a statistical test of the null hypothesis H_0:

H_0: MCMC sampling is correct.

If the histogram is far from uniformity, then we reject H_0 and say that MCMC sampling contains bias.

Parameters of our model

w: The first threshold
dz: The difference of thresholds, that is, dz[c]:= z[c+1]-z[c]
m: Mean of signal Gaussian
v: Standard deviation (Do not confuse it with Variance) of signal Gaussian

A list of S3 class "sbc", which is an output of the function rstan::sbc() in rstan.

Talts, S., Betancourt, M., Simpson, D., Vehtari, A., and Gelman, A. (2018). Validating Bayesian Inference Algorithms with Simulation-Based Calibration. arXiv preprint arXiv:1804.06788. https://arxiv.org/abs/1804.06788

data Format:

A single reader and a single modality case

——————————————————————————————————

`NI=63,NL=124`	confidence level	No. of false alarms	No. of hits
In R console ->	`c`	`f`	`h`
-----------------------	-----------------------	-----------------------------	-------------
definitely present	`c[1] =` 5	`f[1] =` F_5 = 1	`h[1] =` H_5 = 41
probably present	`c[2] =` 4	`f[2] =` F_4 = 2	`h[2] =` H_4 = 22
equivocal	`c[3] =` 3	`f[3] =` F_3 = 5	`h[3] =` H_3 = 14
subtle	`c[4] =` 2	`f[4] =` F_2 = 11	`h[4] =` H_2 = 8
very subtle	`c[5] =` 1	`f[5] =` F_1 = 13	`h[5] =` H_1 = 1

—————————————————————————————————

Recall our model for the above data format;

H_5 \sim Binomial (p_5,N_L )

H_4 \sim Binomial (p_4,N_L )

H_3 \sim Binomial (p_3,N_L )

H_2 \sim Binomial (p_2,N_L )

H_1 \sim Poisson (p_1,N_L )

F_5 \sim Poisson (q_5 )

F_4 \sim Poisson (q_4 )

F_3 \sim Poisson (q_3 )

F_2 \sim Poisson (q_2 )

F_1 \sim Poisson (q_1 )

where

p_5= p_5(z_1,...z_C; μ, σ) = \int_{z5}^{∞} Gaussian(z|μ,σ)dz

p_4=p_4(z_1,...z_C; μ, σ) = \int_{z4}^{z5} Gaussian(z|μ,σ)dz

p_3=p_3(z_1,...z_C; μ, σ) = \int_{z3}^{z4} Gaussian(z|μ,σ)dz

p_2=p_2(z_1,...z_C; μ, σ) = \int_{z2}^{z3} Gaussian(z|μ,σ)dz

p_1=p_1(z_1,...z_C; μ, σ) = \int_{z1}^{z2} Gaussian(z|μ,σ)dz

q_5=q_5(z_1,...z_C) = \int_{z5}^{∞} d \log Φ (z)

q_4=q_4(z_1,...z_C) = \int_{z4}^{z5} d \log Φ (z)

q_3=q_3(z_1,...z_C) = \int_{z3}^{z4} d \log Φ (z)

q_2=q_2(z_1,...z_C) = \int_{z2}^{z3} d \log Φ (z)

q_1=q_1(z_1,...z_C) = \int_{z1}^{z2} d \log Φ (z)

Priors

z[c] \sim ?

m \sim ?

v \sim ?

In SBC, we have to specify proper priors, thus, we use the above priors. So, what reader should do is to specify the above parameters, that is, ww,www,zz,zzz,mm,mmm,vv,vvv and further a number of imagesNL and a number of lesion NI and a number of confidence levels should be specified. In the above example data format, the number of confidence level is the number of rows, and now it is 5, that is C=5.

Revised 2019 August 4

I am not statistician nor researcher nor human. My leg is gotten by death who is prurigo nodularis. Death is soon. I cannot understand, I hate statistics. I do not want to waste my time to this FROC analysis. My program is volunteer, I am no money no supported. Completely my own support or my parents. Completely my own. I am tired for this no end point running. I have not money to research or place or circumstance. No healthy condition. This program is made with my blood and pain, great pain. I no longer want to live. I hate all. Honesty.

## Not run: 
#========================================================================================
#                             SBC via rstan::sbc          Default prior
#========================================================================================
#

stanModel <- stan_model_of_sbc()

Simulation_Based_Calibration_single_reader_single_modality_via_rstan_sbc(
NL = 11111,
NI = 11111,
stanModel = stanModel,
ite     = 323,
M       = 211,
epsilon = 0.04,BBB = 1.1,AAA =0.0091,sbc_from_rstan = TRUE)



#========================================================================================
#                             SBC via rstan::sbc          Default prior
#========================================================================================


stanModel <- stan_model_of_sbc()

Simulation_Based_Calibration_single_reader_single_modality_via_rstan_sbc(
NL = 11111,
NI = 11111,
stanModel = stanModel,
ite     = 323,
M       = 511,
epsilon = 0.04,BBB = 1.1,AAA =0.0091,sbc_from_rstan = TRUE)



     Close_all_graphic_devices() # 2020 August




## End(Not run)#dontrun