validation.dataset_srsc: Errors of Estimator for any Given true parameter
In BayesianFROC: FROC Analysis by Bayesian Approaches

Description Usage Arguments Details Value Examples

This function replicates many datasets from a model with a given model parameter as truth. Then fit a model to these datasets and get estimates. Then calculates mean or variance of the difference of estimates and truth.

This function gives a validation under the assumption that we obtain a fixed true model parameter drawn from the prior.

But, you know, the author noticed that it is not sufficient because , in Bayesian, such a true parameter is merely one time sample from prior. So, what we should do is to draw randomly many samples from priors.

Well, I know, but, I am being such a couch potato. In the future, I would do, if I was alive. Even if I try this, this effort never take account by someone. But, to live, it is required. No money, no life. 2020 NOv 29

In the future, what I should do is that the following calculations.

1. Draw a sample of parameter θ from prior, namely, θ \sim π(θ)

2. Draw a sample of data x=x(θ) from a model with the sample of prior x \sim π(x|θ)

3. Draw a sample of parameter θ'=θ'(x(_θ)) from a posterior with the sample of data θ' \sim π(θ|x)

4. Calculates the integral \int | θ' - θ|^2 π(θ)dθ as the error of estimates among all priors..

validation.dataset_srsc(
  replicate.datset = 3,
  ModifiedPoisson = FALSE,
  mean.truth = 0.6,
  sd.truth = 5.3,
  z.truth = c(-0.8, 0.7, 2.38),
  NL = 259,
  NI = 57,
  ite = 1111,
  cha = 1,
  summary = TRUE,
  see = 1234,
  verbose = FALSE,
  serial.number = 1,
  base_size = 0,
  absolute.errors = TRUE
)

`replicate.datset`	A Number indicate that how many you replicate dataset from user's specified dataset.
`ModifiedPoisson`	Logical, that is `TRUE` or `FALSE`. If `ModifiedPoisson = TRUE`, then Poisson rate of false alarm is calculated *per lesion, and a model is fitted so that the FROC curve is an expected curve of points consisting of the pairs of TPF per lesion and FPF per lesion. Similarly, If `ModifiedPoisson = TRUE`, then Poisson rate of false alarm is calculated per image, and a model is fitted so that the FROC curve is an expected curve of points consisting of the pair of TPF per lesion and FPF per image. For more details, see the author's paper in which I explained per image* and per lesion. (for details of models, see vignettes , now, it is omiited from this package, because the size of vignettes are large.) If `ModifiedPoisson = TRUE`, then the False Positive Fraction (FPF) is defined as follows (F_c denotes the number of false alarms with confidence level c ) \frac{F_1+F_2+F_3+F_4+F_5}{N_L}, \frac{F_2+F_3+F_4+F_5}{N_L}, \frac{F_3+F_4+F_5}{N_L}, \frac{F_4+F_5}{N_L}, \frac{F_5}{N_L}, where N_L is a number of lesions (signal). To emphasize its denominator N_L, we also call it the False Positive Fraction (FPF) per lesion. On the other hand, if `ModifiedPoisson = FALSE` (Default), then False Positive Fraction (FPF) is given by \frac{F_1+F_2+F_3+F_4+F_5}{N_I}, \frac{F_2+F_3+F_4+F_5}{N_I}, \frac{F_3+F_4+F_5}{N_I}, \frac{F_4+F_5}{N_I}, \frac{F_5}{N_I}, where N_I is the number of images (trial). To emphasize its denominator N_I, we also call it the False Positive Fraction (FPF) per image. The model is fitted so that the estimated FROC curve can be ragraded as the expected pairs of FPF per image and TPF per lesion (`ModifiedPoisson = FALSE` ) or as the expected pairs of FPF per image and TPF per lesion (`ModifiedPoisson = TRUE`) If `ModifiedPoisson = TRUE`, then FROC curve means the expected pair of FPF per lesion and TPF. On the other hand, if `ModifiedPoisson = FALSE`, then FROC curve means the expected pair of FPF per image and TPF. So,data of FPF and TPF are changed thus, a fitted model is also changed whether `ModifiedPoisson = TRUE` or `FALSE`. In traditional FROC analysis, it uses only per images (trial). Since we can divide one image into two images or more images, number of trial is not important. And more important is per signal. So, the author also developed FROC theory to consider FROC analysis under per signal. One can see that the FROC curve is rigid with respect to change of a number of images, so, it does not matter whether `ModifiedPoisson = TRUE` or `FALSE`. This rigidity of curves means that the number of images is redundant parameter for the FROC trial and thus the author try to exclude it. Revised 2019 Dec 8 Revised 2019 Nov 25 Revised 2019 August 28
`mean.truth`	This is a parameter of the latent Gaussian assumption for the noise distribution.
`sd.truth`	This is a parameter of the latent Gaussian assumption for the noise distribution.
`z.truth`	This is a parameter of the latent Gaussian assumption for the noise distribution.
`NL`	Number of Lesions.
`NI`	Number of Images.
`ite`	A variable to be passed to the function `rstan::sampling`() of rstan in which it is named `iter`. A positive integer representing the number of samples synthesized by Hamiltonian Monte Carlo method, and, Default = 1111
`cha`	A variable to be passed to the function `rstan::sampling`() of rstan in which it is named `chains`. A positive integer representing the number of chains generated by Hamiltonian Monte Carlo method, and, Default = 1.
`summary`	Logical: `TRUE` of `FALSE`. Whether to print the verbose summary. If `TRUE` then verbose summary is printed in the R console. If `FALSE`, the output is minimal. I regret, this variable name should be verbose.
`see`	A variable to be passed to the function `rstan::sampling`() of rstan in which it is named `seed`. A positive integer representing seed used in stan, Default = 1234.
`verbose`	A logical, if `TRUE`, then the redundant summary is printed in R console. If `FALSE`, it suppresses output from this function.
`serial.number`	A positive integer or Character. This is for programming perspective. The author use this to print the serial numbre of validation. This will be used in the validation function.
`base_size`	An numeric for size of object, this is for the package developer.
`absolute.errors`	A logical specifying whether mean of errors is defined by `TRUE` \bar{ε}(θ_0,N_I,N_L)= \frac{1}{K} ∑ \| ε_k \| `FALSE` \bar{ε}(θ_0,N_I,N_L)= \frac{1}{K} ∑ ε_k

Let us denote a model parameter by θ_0 N_I by a number of images and number of lesions by N_L which are specified by user as the variables of the function.

(I) Replicates models for datasets D_1,D_2,...,D_k,...,D_K.
Draw a dataset D_k from a likelihood (model), namely: D_k \sim likelihood(|θ_0).
Draw a MCMC samples \{ θ_i (D_k)\} from a posterior, namely: θ _i \sim π(|D_k).
Calculate a posterior mean, namely: \bar{θ}(D_k) := ∑_i θ_i(D_k) .
Calculates error for D_k: ε_k:=Trueth - posterior mean estimates of D_k = |θ_0 - \bar{θ}(D_k)| (or = θ_0 - \bar{θ}(D_k), accordinly by the user specified absolute.errors ).
(II) Calculates mean of errors: mean of errors \bar{ε}(θ_0,N_I,N_L)= \frac{1}{K} ∑ ε_k

Running this function, we can see that the error \bar{ε}(θ_0,N_I,N_L) decreases monotonically as a given number of images N_I or a given number of lesions N_L increases.

Also, the scale of error also will be found. Thus this function can show how our estimates are correct. Scale of error differs for each componenet of model parameters.

Revised 2019 August 28

Return values is,

Stanfit objects: for each Replicated datasets
Errors: EAPs minus true values, in the above notations, it is \bar{ε}(θ_0,N_I,N_L)
Variances of estimators.: This calculates the vaiance of posterior means over all replicated datasets

## Not run: 
#===========================    The first example  ======================================


#   It is sufficient to run the function with default variable

   datasets <- validation.dataset_srsc()


# Today 2020 Nov 29 I have completely forgotten this function, oh it helps me. Thank me.
#=============================  The second example ======================================

#   If user does not familiar with the values of thresholds, then
#   it would be better to use the actual estimated values
#    as an example of true parameters. In the following,
#     I explain this.
# First, to get posterior mean estimates, we run the following:



  fit <- fit_Bayesian_FROC(dataList.Chakra.1,ite = 1111,summary =FALSE,cha=3)






#  Secondly, extract the expected a posterior estimates (EAPs) from the object fit
# Note that EAP is also called "posterior mean"

  z <- rstan::get_posterior_mean(fit,par=c("z"))[,"mean-all chains"]





#  Thirdly we use this z as a true value.


   datasets <- validation.dataset_srsc(z.truth = z)



#========================================================================================
#            1)             extract replicated fitted model object
#========================================================================================


    # Replicates models

    a <- validation.dataset_srsc(replicate.datset = 3,ite = 111)



    # Check convergence, in the above MCMC iterations = 111 which is too small to get
    # a convergence MCMC chain, and thus the following example will the example
    # of a non-convergent model in the r hat criteria.

    ConfirmConvergence( a$fit[[3]])


    # Check trace plot to confirm whether MCMC chain converge or not.

    stan_trace( a$fit[[3]],pars = "A")


   # Check p value, for chi square goodness of fit whose null hypothesis is that
   # the model is fitted well.

   fit@posterior_predictive_pvalue_for_chi_square_goodness_of_fit












                                          # Revised in 2019 August 29

                                          # Revised in 2020 Nov 28
                                          # It is weird, awesome,
                                          # What a fucking English,...I fix it.




#========================================================================================
#            1)            Histogram of error of postrior means for replicated datasets
#========================================================================================
#'

  a<-   validation.dataset_srsc(replicate.datset = 100)
  hist(a$error.of.AUC,breaks = 111)
  hist(a$error.of.AUC,breaks = 30)






#========================================================================================
#                             absolute.errors = FALSE generates negative biases
#========================================================================================


 validation.dataset_srsc(absolute.errors = FALSE)



#========================================================================================
#      absolute.errors = TRUE coerce negative biases to positives, i.e., L^2 norm
#========================================================================================


 validation.dataset_srsc(absolute.errors = TRUE)








#========================================================================================
#     Check each  fitted model object
#========================================================================================




a <- validation.dataset_srsc(verbose = TRUE)

a$fit[[2]]



class(a$fit[[2]])

rstan::traceplot(a$fit[[2]], pars = c("A"))






#========================================================================================
#     NaN ... why? 2021 Dec
#========================================================================================

fits <- validation.dataset_srsc()

f <-fits$fit[[1]]
rstan::extract(f)$dl
sum(rstan::extract(f)$dl)
Is.nan.in.MCMCsamples <- as.logical(!prod(!is.nan(rstan::extract(f)$dl)))
rstan::extract(f)$A[525]
a<-rstan::extract(f)$a[525]
b<-rstan::extract(f)$b[525]

Phi(  a/sqrt(b^2+1)  )
x<-rstan::extract(f)$dl[2]

a<-rstan::extract(f)$a
b<-rstan::extract(f)$b

a/(b^2+1)
Phi(a/(b^2+1))
mean( Phi(a/(b^2+1))  )

#'





## End(Not run)# dontrun