validation.dataset_srsc: Errors of Estimator for any Given true parameter

Description Usage Arguments Details Value Examples

View source: R/validation_error_srsc.R

Description

This function replicates many datasets from a model with a given model parameter as truth. Then fit a model to these datasets and get estimates. Then calculates mean or variance of the difference of estimates and truth.

This function gives a validation under the assumption that we obtain a fixed true model parameter drawn from the prior.

But, you know, the author noticed that it is not sufficient because , in Bayesian, such a true parameter is merely one time sample from prior. So, what we should do is to draw randomly many samples from priors.

Well, I know, but, I am being such a couch potato. In the future, I would do, if I was alive. Even if I try this, this effort never take account by someone. But, to live, it is required. No money, no life. 2020 NOv 29

In the future, what I should do is that the following calculations.

1. Draw a sample of parameter θ from prior, namely, θ \sim π(θ)

2. Draw a sample of data x=x(θ) from a model with the sample of prior x \sim π(x|θ)

3. Draw a sample of parameter θ'=θ'(x(_θ)) from a posterior with the sample of data θ' \sim π(θ|x)

4. Calculates the integral \int | θ' - θ|^2 π(θ)dθ as the error of estimates among all priors..

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
validation.dataset_srsc(
  replicate.datset = 3,
  ModifiedPoisson = FALSE,
  mean.truth = 0.6,
  sd.truth = 5.3,
  z.truth = c(-0.8, 0.7, 2.38),
  NL = 259,
  NI = 57,
  ite = 1111,
  cha = 1,
  summary = TRUE,
  see = 1234,
  verbose = FALSE,
  serial.number = 1,
  base_size = 0,
  absolute.errors = TRUE
)

Arguments

replicate.datset

A Number indicate that how many you replicate dataset from user's specified dataset.

ModifiedPoisson

Logical, that is TRUE or FALSE.

If ModifiedPoisson = TRUE, then Poisson rate of false alarm is calculated per lesion, and a model is fitted so that the FROC curve is an expected curve of points consisting of the pairs of TPF per lesion and FPF per lesion.

Similarly,

If ModifiedPoisson = TRUE, then Poisson rate of false alarm is calculated per image, and a model is fitted so that the FROC curve is an expected curve of points consisting of the pair of TPF per lesion and FPF per image.

For more details, see the author's paper in which I explained per image and per lesion. (for details of models, see vignettes , now, it is omiited from this package, because the size of vignettes are large.)

If ModifiedPoisson = TRUE, then the False Positive Fraction (FPF) is defined as follows (F_c denotes the number of false alarms with confidence level c )

\frac{F_1+F_2+F_3+F_4+F_5}{N_L},

\frac{F_2+F_3+F_4+F_5}{N_L},

\frac{F_3+F_4+F_5}{N_L},

\frac{F_4+F_5}{N_L},

\frac{F_5}{N_L},

where N_L is a number of lesions (signal). To emphasize its denominator N_L, we also call it the False Positive Fraction (FPF) per lesion.

On the other hand,

if ModifiedPoisson = FALSE (Default), then False Positive Fraction (FPF) is given by

\frac{F_1+F_2+F_3+F_4+F_5}{N_I},

\frac{F_2+F_3+F_4+F_5}{N_I},

\frac{F_3+F_4+F_5}{N_I},

\frac{F_4+F_5}{N_I},

\frac{F_5}{N_I},

where N_I is the number of images (trial). To emphasize its denominator N_I, we also call it the False Positive Fraction (FPF) per image.

The model is fitted so that the estimated FROC curve can be ragraded as the expected pairs of FPF per image and TPF per lesion (ModifiedPoisson = FALSE )

or as the expected pairs of FPF per image and TPF per lesion (ModifiedPoisson = TRUE)

If ModifiedPoisson = TRUE, then FROC curve means the expected pair of FPF per lesion and TPF.

On the other hand, if ModifiedPoisson = FALSE, then FROC curve means the expected pair of FPF per image and TPF.

So,data of FPF and TPF are changed thus, a fitted model is also changed whether ModifiedPoisson = TRUE or FALSE. In traditional FROC analysis, it uses only per images (trial). Since we can divide one image into two images or more images, number of trial is not important. And more important is per signal. So, the author also developed FROC theory to consider FROC analysis under per signal. One can see that the FROC curve is rigid with respect to change of a number of images, so, it does not matter whether ModifiedPoisson = TRUE or FALSE. This rigidity of curves means that the number of images is redundant parameter for the FROC trial and thus the author try to exclude it.

Revised 2019 Dec 8 Revised 2019 Nov 25 Revised 2019 August 28

mean.truth

This is a parameter of the latent Gaussian assumption for the noise distribution.

sd.truth

This is a parameter of the latent Gaussian assumption for the noise distribution.

z.truth

This is a parameter of the latent Gaussian assumption for the noise distribution.

NL

Number of Lesions.

NI

Number of Images.

ite

A variable to be passed to the function rstan::sampling() of rstan in which it is named iter. A positive integer representing the number of samples synthesized by Hamiltonian Monte Carlo method, and, Default = 1111

cha

A variable to be passed to the function rstan::sampling() of rstan in which it is named chains. A positive integer representing the number of chains generated by Hamiltonian Monte Carlo method, and, Default = 1.

summary

Logical: TRUE of FALSE. Whether to print the verbose summary. If TRUE then verbose summary is printed in the R console. If FALSE, the output is minimal. I regret, this variable name should be verbose.

see

A variable to be passed to the function rstan::sampling() of rstan in which it is named seed. A positive integer representing seed used in stan, Default = 1234.

verbose

A logical, if TRUE, then the redundant summary is printed in R console. If FALSE, it suppresses output from this function.

serial.number

A positive integer or Character. This is for programming perspective. The author use this to print the serial numbre of validation. This will be used in the validation function.

base_size

An numeric for size of object, this is for the package developer.

absolute.errors

A logical specifying whether mean of errors is defined by

TRUE

\bar{ε}(θ_0,N_I,N_L)= \frac{1}{K} ∑ | ε_k |

FALSE

\bar{ε}(θ_0,N_I,N_L)= \frac{1}{K} ∑ ε_k

Details

Let us denote a model parameter by θ_0 N_I by a number of images and number of lesions by N_L which are specified by user as the variables of the function.

(I) Replicates models for datasets D_1,D_2,...,D_k,...,D_K.
Draw a dataset D_k from a likelihood (model), namely

D_k \sim likelihood(|θ_0).

Draw a MCMC samples \{ θ_i (D_k)\} from a posterior, namely

θ _i \sim π(|D_k).

Calculate a posterior mean, namely

\bar{θ}(D_k) := ∑_i θ_i(D_k) .

Calculates error for D_k

ε_k:=Trueth - posterior mean estimates of D_k = |θ_0 - \bar{θ}(D_k)| (or = θ_0 - \bar{θ}(D_k), accordinly by the user specified absolute.errors ).

(II) Calculates mean of errors

mean of errors \bar{ε}(θ_0,N_I,N_L)= \frac{1}{K} ∑ ε_k

Running this function, we can see that the error \bar{ε}(θ_0,N_I,N_L) decreases monotonically as a given number of images N_I or a given number of lesions N_L increases.

Also, the scale of error also will be found. Thus this function can show how our estimates are correct. Scale of error differs for each componenet of model parameters.

Revised 2019 August 28

Value

Return values is,

Stanfit objects

for each Replicated datasets

Errors

EAPs minus true values, in the above notations, it is \bar{ε}(θ_0,N_I,N_L)

Variances of estimators.

This calculates the vaiance of posterior means over all replicated datasets

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
## Not run: 
#===========================    The first example  ======================================


#   It is sufficient to run the function with default variable

   datasets <- validation.dataset_srsc()


# Today 2020 Nov 29 I have completely forgotten this function, oh it helps me. Thank me.
#=============================  The second example ======================================

#   If user does not familiar with the values of thresholds, then
#   it would be better to use the actual estimated values
#    as an example of true parameters. In the following,
#     I explain this.
# First, to get posterior mean estimates, we run the following:



  fit <- fit_Bayesian_FROC(dataList.Chakra.1,ite = 1111,summary =FALSE,cha=3)






#  Secondly, extract the expected a posterior estimates (EAPs) from the object fit
# Note that EAP is also called "posterior mean"

  z <- rstan::get_posterior_mean(fit,par=c("z"))[,"mean-all chains"]





#  Thirdly we use this z as a true value.


   datasets <- validation.dataset_srsc(z.truth = z)



#========================================================================================
#            1)             extract replicated fitted model object
#========================================================================================


    # Replicates models

    a <- validation.dataset_srsc(replicate.datset = 3,ite = 111)



    # Check convergence, in the above MCMC iterations = 111 which is too small to get
    # a convergence MCMC chain, and thus the following example will the example
    # of a non-convergent model in the r hat criteria.

    ConfirmConvergence( a$fit[[3]])


    # Check trace plot to confirm whether MCMC chain converge or not.

    stan_trace( a$fit[[3]],pars = "A")


   # Check p value, for chi square goodness of fit whose null hypothesis is that
   # the model is fitted well.

   fit@posterior_predictive_pvalue_for_chi_square_goodness_of_fit












                                          # Revised in 2019 August 29

                                          # Revised in 2020 Nov 28
                                          # It is weird, awesome,
                                          # What a fucking English,...I fix it.




#========================================================================================
#            1)            Histogram of error of postrior means for replicated datasets
#========================================================================================
#'

  a<-   validation.dataset_srsc(replicate.datset = 100)
  hist(a$error.of.AUC,breaks = 111)
  hist(a$error.of.AUC,breaks = 30)






#========================================================================================
#                             absolute.errors = FALSE generates negative biases
#========================================================================================


 validation.dataset_srsc(absolute.errors = FALSE)



#========================================================================================
#      absolute.errors = TRUE coerce negative biases to positives, i.e., L^2 norm
#========================================================================================


 validation.dataset_srsc(absolute.errors = TRUE)








#========================================================================================
#     Check each  fitted model object
#========================================================================================




a <- validation.dataset_srsc(verbose = TRUE)

a$fit[[2]]



class(a$fit[[2]])

rstan::traceplot(a$fit[[2]], pars = c("A"))






#========================================================================================
#     NaN ... why? 2021 Dec
#========================================================================================

fits <- validation.dataset_srsc()

f <-fits$fit[[1]]
rstan::extract(f)$dl
sum(rstan::extract(f)$dl)
Is.nan.in.MCMCsamples <- as.logical(!prod(!is.nan(rstan::extract(f)$dl)))
rstan::extract(f)$A[525]
a<-rstan::extract(f)$a[525]
b<-rstan::extract(f)$b[525]

Phi(  a/sqrt(b^2+1)  )
x<-rstan::extract(f)$dl[2]

a<-rstan::extract(f)$a
b<-rstan::extract(f)$b

a/(b^2+1)
Phi(a/(b^2+1))
mean( Phi(a/(b^2+1))  )

#'





## End(Not run)# dontrun

BayesianFROC documentation built on Jan. 23, 2022, 9:06 a.m.