BayesianFROC: Theory of FROC Analysis via Bayesian Approaches
In BayesianFROC: FROC Analysis by Bayesian Approaches

Appendix: p value

In order to evaluate the goodness of fit of our model to the data, we used the so-called the posterior predictive p value.

In the following, we use general conventional notations. Let y_{obs} be an observed dataset and f(y|θ) be a model (likelihood) for future dataset y. We denote a prior and a posterior distribution by π(θ) and π(θ|y) \propto f(y|θ)π(θ), respectively.

In our case, the data y is a pair of hits and false alarms; that is, y=(H_1,H_2, … H_C; F_1,F_2, … F_C) and θ = (z_1,dz_1,dz_2,…,dz_{C-1},μ, σ) . We define the χ^2 discrepancy (goodness of fit statistics) to validate that our model fit the data.

T(y,θ) := ∑_{c=1,.......,C} \biggr( \frac{\bigl(H_c-N_L\times p_c(θ) \bigr)^2}{N_L\times p_c(θ)}+ \frac{\bigl(F_c- q_{c}(θ) \times N_{X}\bigr)^2}{ q_{c}(θ) \times N_{X} }\biggr).

for a single reader and a single modality.

T(y,θ) := ∑_{r=1}^R ∑_{m=1}^M ∑_{c=1}^C \biggr( \frac{(H_{c,m,r}-N_L\times p_{c,m,r}(θ))^2}{N_L\times p_{c,m,r}(θ)}+ \frac{\bigl(F_c- q_{c}(θ) \times N_{X}\bigr)^2}{ q_{c}(θ) \times N_{X} }\biggr).

for multiple readers and multiple modalities.

Note that p_c and λ _{c} depend on θ.

In classical frequentist methods, the parameter θ is a fixed estimate, e.g., the maximal likelihood estimator. However, in a Bayesian context, the parameter is not deterministic. In the following, we show the p value in the Bayesian sense.

Let y_{obs} be an observed dataset (in an FROC context, it is hits and false alarms). Then, the so-called posterior predictive p value is defined by

p_value = \int \int \, dy\, dθ\, I( T(y,θ) > T(y_{obs},θ) )f(y|θ)π(θ|y_{obs})

In order to calculate the above integral, let θ_1,θ _2, ......., θ_i,.......,θ_I be samples from the posterior distribution of y_{obs} , namely,

θ_1 \sim π(....|y_{obs} ),

.......,

θ_i \sim π(....|y_{obs} ),

.......,

θ_I \sim π(....|y_{obs} ).

we obtain a sequence of models (likelihoods), i.e., f(....|θ_1),f(....|θ_2),......., f(....|θ_n). We then draw the samples y^1_1,....,y^i_j,.......,y^I_J , such that each y^i_j is a sample from the distribution whose density function is f(....|θ_i), namely,

y^1_1,.......,y^1_j,.......,y^1_J \sim f(....|θ_1),

.......,

y^i_1,.......,y^i_j,.......,y^i_J \sim f(....|θ_i),

.......,

y^I_1,.......,y^I_j,.......,y^I_J \sim f(....|θ_I).

Using the Monte Carlo integral twice, we calculate the integral of any function φ(y,θ).

\int \int \, dy\, dθ\, φ(y,θ)f(y|θ)π(θ|y_{obs})

\approx \int \, \frac{1}{I}∑_{i=1}^I φ(y,θ_i)f(y|θ_i)\,dy

\frac{1}{IJ}∑_{i=1}^I ∑_{j=1}^J φ(y^i_j,θ_i)

In particular, substituting φ(y,θ):= I( T(y,θ) > T(y_{obs},θ) ) into the above equation, we can approximate the posterior predictive p value.

p_value \approx \frac{1}{IJ}∑_i ∑_j I( T(y^i_j,θ_i) > T(y_{obs},θ_i) )

The following two Shiny basfed GUIs are available.

fit_GUI_Shiny(): GUI for a single reader and single modality
fit_GUI_Shiny_MRMC(): GUI for multiple readers and multiple modalities

The aim of FROC analysis is to compare imaging modalities, which are imaging methods such as MRI, CT, PET, etc. We want to find an imaging method with which we can find more many lesions in radiographs.

To investigate modality comparison, we have to do a trial in order to obtain a dataset consisting of TP and FP.

Here is what this package implements.

Overview of FROC analysis

In general data-analysis such as generalized linear models, the data can be plotted such as scatter plot, and we fit a model to the data such that the model can be visualized as an expected curve of data. And we can check how model fits to data intuitively. This procedure is available in the FROC paradigm. First, FROC data are plotted as scatter plot, each point is a pair of the so-called false positive fraction (FPF) and true positive fraction (TPF). And the fitted curve to this scatter plot is called FROC curve. However, the FROC curve has an infinite area under the curve (AUC), thus we modify the curve so that the AUC of modified curve has finite AUC, more precisely between zero and one. The modified curve is called AFROC curve. Using the AUC of AFROC curve, we evaluate the observer performance. Namely, high AUC means physicians can find more lesions in x-ray films.

To compare imaging modalities such as MRI, CT, PET, etc, we do a trial from which Data arise and we fit a model to the data. Using the resulting model, we can compare modalities or evaluate the observer performance based on AUC.

In the sequel, we give a complete description about the following three terms.

Trial: from which data arise.
Data: consist of the number of TPs and FPs.
Modeling: calculates the probability law in which data (TPs and FPs) arise

Trial .

To introduce FROC trial, let us consider the following terms.

A reader (in other words, player): who is a physician or radiologist challenges to find lesions (in other words, it is called signals, targets, nodules, ...) from radiographs.
images (in other words, radiographs, x-ray films such as CT, MRI, PET,etc.): containing shadows (not necessarily caused by lesions). We assume that N_L lesions make shadows as targets. (Note that each image can contain one more lesions and this multiple signals for a single image distinct FROC trial from the ordinal ROC trial). The number of images are denoted by N_I.
A researcher (in other words, data-analyst): knows true lesion locations (signal) and she can count reader's True Positives and False Positives after his lesion finding task.

For the sake of simplicity, we consider a single reader.

Throughout this explanation, we follow the convention that readers are male and the researcher is female. So, "he" means the reader, and "she" means a data-analyst.

FROC trial and data

The following table is a dataset to be fitted a model.

Let us see how it arises.

——————————————————————————————————

	confidence level	No. of false alarms	No. of hits
		(FP:False Positive)	(TP:True Positive)
-----------------------	-----------------------	-----------------------------	-------------
definitely present	5	F_5	H_5
probably present	4	F_4	H_4
equivocal	3	F_3	H_3
subtle	2	F_2	H_2
very subtle	1	F_1	H_1

—————————————————————————————————

Suppose that Bob is a reader (physician, briefly \color{green}{\bold{B}}) and Alice is a researcher (Data-analyst, briefly \color{red}{\bold{A}}).

\color{red}{\bold{A}} "Hi, Bob."

\color{green}{\bold{B}} "Hi, Alice"

\color{red}{\bold{A}} "Now, there are radiographs."

\color{green}{\bold{B}} "What are you gonna do today?"

\color{red}{\bold{A}} "Ahem, now, I evaluate your observer performance ability, namely ability of finding lesions from radiographs."

\color{green}{\bold{B}} "Seriously? Duh..."

He was disappointed because he wanted to yada yada yada with her.

\color{red}{\bold{A}} "Find the tumors from these images and I check your answer, by assigning a true positive or a false positive to your answer."

Alice gave Bob the first image (radiograph).

\color{green}{\bold{B}} " OK! Let's start. Hmmm ... Hmmm... It seems to me that the first image contains two suspicious tumors."

\color{red}{\bold{A}} " Locarize your two suspicious tumors locations in the first image."

She gave him a pen.

\color{green}{\bold{B}} "OK! ... Swish, Swish"

He marked two locations in the first image.

\color{red}{\bold{A}} "In addition, assign your confidence levels to your two suspicous tumors."

\color{green}{\bold{B}} "How?"

\color{red}{\bold{A}} "It is a number, 1,2,3,4,5. If you think a shadow is definitely tumor, then you choose 5. Similary, 4 is probably, ...., 2 is subtle, 1 is very subtle."

	confidence level
-----------------------	-----------------------
definitely present	5
probably present	4
equivocal	3
subtle	2
very subtle	1

\color{green}{\bold{B}} " OK! Now, I doubt two shadows are tumors, thus I need two ratings. I think that one is absolutely tumor, so I rate 5 for this shodow. On the other hand, for the another shadow, I think that it is probably a tumor, so I rate 3 for it."

Swish, Swish, He rated for his two suspicious locations. Namely, he associated his confidence levels for his locarizing shadows.

\color{red}{\bold{A}} "Let's check your answer for the first image! Your first suspicous tumor with rating 5 is correctly locarized."

\color{green}{\bold{B}} "I did it! Yay! Hooray!! Woohoo!!! Booyah!!!!"

\color{red}{\bold{A}} " But your second suspicious shadow locarized with rating 3 is not correct, so,..., it is not a tumor."

\color{green}{\bold{B}} "Oops, I did it."

\color{red}{\bold{A}} "Moreover, in the first image, there are several tumors being not detected and we ignore them in this FROC trial."

\color{green}{\bold{B}} "Oopsies. Gah!"

\color{red}{\bold{A}} "So, now, you have one hit with rating 5 and one false alarm with rating 3 as following table. Next, we will work for the second images."

FROC data of the first image

——————————————————————————————————

	confidence level	No. of false alarms	No. of hits
		(FP:False Positive)	(TP:True Positive)
-----------------------	-----------------------	-----------------------------	-------------
definitely present	5	F_5= 0	H_5 = 1 <- attention please
probably present	4	F_4= 0	H_4 = 0
equivocal	3	F_3 = 1 <- attention please	H_3 = 0
subtle	2	F_2= 0	H_2 = 0
very subtle	1	F_1= 0	H_1= 0

—————————————————————————————————

Alice gave Bob the second image (radiograph).

\color{green}{\bold{B}} "In the second image, I think there are three suspicious shadows."

\color{red}{\bold{A}} "OK, locarize your suspicious locaitons."

\color{green}{\bold{B}} "Swish Swish Swish"

Bob locarized his three suspicious locations.

\color{red}{\bold{A}} "OK, rate your confidence level for each locarized shadow."

\color{green}{\bold{B}} "The first shadows is 3, the second shadow is 5, the third shodows is 2."

\color{red}{\bold{A}} "OK, I check your answer. So, the answer is true, true, false."

\color{green}{\bold{B}} " Uh-huh, .... mm hm"

\color{red}{\bold{A}} "So, in the second image you have one hits with confidence level 3 and one hits with rating 5 and one false alarms with rating 2. Combining the first image and the second image, now, you have two hits with rating 5, and one hit with rating 3, and one false alarm with rating 2 and one false alarm with rating 3. Next, we consider the third image."

FROC data of the 1st and 2nd images

——————————————————————————————————

	confidence level	No. of false alarms	No. of hits
		(FP:False Positive)	(TP:True Positive)
-----------------------	-----------------------	-----------------------------	-------------
definitely present	5	F_5= 0	H_5 = 1 + \color{red}{\bold{1}}
probably present	4	F_4= 0	H_4 = 0
equivocal	3	F_3 = 1	H_3 = \color{red}{\bold{1}}
subtle	2	F_2= \color{red}{\bold{1}}	H_2 = 0
very subtle	1	F_1= 0	H_1= 0

————————————————————————————————— #'

Alice and Bob did this trial for all images, and they summarized the number of hits and false alarms in the following table.

FROC data over all images

—————————————————————————————————

`NI=63,NL=124`	confidence level	No. of false alarms	No. of hits
In R console ->	`c`	`f`	`h`
-----------------------	-----------------------	-----------------------------	-------------
definitely present	`c[1] =` 5	`f[1] =` F_5 = 1	`h[1] =` H_5 = 41
probably present	`c[2] =` 4	`f[2] =` F_4 = 2	`h[2] =` H_4 = 22
equivocal	`c[3] =` 3	`f[3] =` F_3 = 5	`h[3] =` H_3 = 14
subtle	`c[4] =` 2	`f[4] =` F_2 = 11	`h[4] =` H_2 = 8
very subtle	`c[5] =` 1	`f[5] =` F_1 = 13	`h[5] =` H_1 = 1

—————————————————————————————————

\color{red}{\bold{A}} "Phew, I summarized the evaluation in the following table"

\color{green}{\bold{B}} ."How kind of you!"

\color{red}{\bold{A}} "Phew, you are finished for the day. Sayonara, Bob!"

\color{green}{\bold{B}} "Boo!"

He was impatient because, today, he wanted to yada yada yada with her.

\color{green}{\bold{B}} "Hey, Alice"

\color{red}{\bold{A}} "!?"

\color{green}{\bold{B}} "Hey, I am done at work now, so I am free to yada yada yada with you today!!"

\color{red}{\bold{A}} "Eww, today, I cannot, cuz I have to fit a FROC model to the data and draw a fitted FROC curve and calculate AUC to evaluate your observer performance ability!"

\color{green}{\bold{B}} "Ugh,..., Duh ...."

Unfortunately, Bob's yada yada plan was a complete failure. Amen.

1. First trial start: The researcher gives the reader the first image which contains suspicious shadows, each of which is noise or lesion.
2. LESION FINDING TASK for the first image (trial): The reader marks (localizes) his suspicious locations of shadow (multiple answer is allowed) each of which is also assigned a integer indicating his confidence levels (if he thinks some shadow is obviously a lesion, then he gives a higher integer with respect to the shadow). So, reader marks two things: location and confidence for each suspicious shadow.
3. Second trial and LESION FINDING TASK for the second image (trial): The researcher gives the reader the second image and reader does the above LESION FINDING TASK for the second image.
4. repeat this trial for all images.: The reader do the LESION FINDING TASK for all images
5. evaluation of TP and FP: The researcher count the number of their true marking positions (hit) and false making positions (false alarm).

Consequently, we obtain the following table.

Example data and its Format:

A single reader and a single modality case

—————————————————————————————————

`NI=63,NL=124`	confidence level	No. of false alarms	No. of hits
In R console ->	`c`	`f`	`h`
-----------------------	-----------------------	-----------------------------	-------------
definitely present	`c[1] =` 5	`f[1] =` F_5 = 1	`h[1] =` H_5 = 41
probably present	`c[2] =` 4	`f[2] =` F_4 = 2	`h[2] =` H_4 = 22
equivocal	`c[3] =` 3	`f[3] =` F_3 = 5	`h[3] =` H_3 = 14
subtle	`c[4] =` 2	`f[4] =` F_2 = 11	`h[4] =` H_2 = 8
very subtle	`c[5] =` 1	`f[5] =` F_1 = 13	`h[5] =` H_1 = 1

—————————————————————————————————

We use two notations for the same number of FPs, e.g., one is f[1] and the other is F_5. We use the former f[1] for programming and the later F_5 is used for descriptions of the theory.

This is the biggest failure of my programming. I regretted that I should be define so that f[c] is F_c for all c. Too late to fix. Ha,,, I regret...damn.

By the R code BayesianFROC::viewdata(BayesianFROC::dataList.Chakra.1.with.explanation), we can see example data named "dataList.Chakra.1.with.explanation".

Modeling 1. Traditional way Let us denotes the model parameter to be estimated by θ_c, μ, σ.

Define

p_c(θ):= \int ^{θ_{c+1}}_{θ_c} Gaussian(z|μ, σ) dz,

q_c(θ):= \int ^{θ_{c+1}}_{θ_c} \frac{d}{dz} \log Φ(z) dz.

Note that θ_0 := - ∞.

We extend the vector from (H_c)_{c=1,2,...,C} to (H_c)_{c=0,1,2,...,C}, where H_0:= N_L - (H_1+H_2+...+H_C).

Then, we assume

(H_c)_{c=0,1,2,...,C} \sim \color{red}{Multinomial}((p_c)_{c=0,1,2,...,C} )

and

F_c \sim Poisson(q_c(θ)N_I ).

Recall that N_I denotes the number of images (radiographs, such as X-ray films) and N_L the number of lesions (signals, nodules,).

Finish! Very simple! fuck! Gratias! We should credo in unum model. Here, we use the logic of latent variable, so .... I am tired .... you know what it is. Dona nobis pacem.

This is a very important and the author will copy and paste this in three times ha.

Modeling 1. Traditional way Let us denotes the model parameter to be estimated by θ_c, μ, σ.

Define

p_c(θ):= \int ^{θ_{c+1}}_{θ_c} Gaussian(z|μ, σ) dz,

q_c(θ):= \int ^{θ_{c+1}}_{θ_c} \frac{d}{dz} \log Φ(z) dz.

Note that θ_0 := - ∞.

We extend the vector from (H_c)_{c=1,2,...,C} to (H_c)_{c=0,1,2,...,C}, where H_0:= N_L - (H_1+H_2+...+H_C).

Then, we assume

(H_c)_{c=0,1,2,...,C} \sim \color{red}{Multinomial}((p_c)_{c=0,1,2,...,C} )

and

F_c \sim Poisson(q_c(θ)N_I ).

Recall that N_I denotes the number of images (radiographs, such as X-ray films) and N_L the number of lesions (signals, nodules,).

Finish! Very simple! Gratias! But We should not credo in unum model. Here, we use the logic of latent variable, so .... I am tired .... you know what it is. Dona nobis pacem. Modeling 1. Traditional way Let us denotes the model parameter to be estimated by θ_c, μ, σ.

Define

p_c(θ):= \int ^{θ_{c+1}}_{θ_c} Gaussian(z|μ, σ) dz,

q_c(θ):= \int ^{θ_{c+1}}_{θ_c} \frac{d}{dz} \log Φ(z) dz.

Note that θ_0 := - ∞.

We extend the vector from (H_c)_{c=1,2,...,C} to (H_c)_{c=0,1,2,...,C}, where H_0:= N_L - (H_1+H_2+...+H_C).

Then, we assume

(H_c)_{c=0,1,2,...,C} \sim \color{red}{Multinomial}((p_c)_{c=0,1,2,...,C} )

and

F_c \sim Poisson(q_c(θ)N_I ).

Recall that N_I denotes the number of images (radiographs, such as X-ray films) and N_L the number of lesions (signals, nodules,).

Finish! Very simple! fuck! Gratias! We should credo in unum model. Here, we use the logic of latent variable, so .... I am tired .... you know what it is. Dona nobis pacem. Modeling 1. Traditional way Let us denotes the model parameter to be estimated by θ_c, μ, σ.

Define

p_c(θ):= \int ^{θ_{c+1}}_{θ_c} Gaussian(z|μ, σ) dz,

q_c(θ):= \int ^{θ_{c+1}}_{θ_c} \frac{d}{dz} \log Φ(z) dz.

Note that θ_0 := - ∞.

We extend the vector from (H_c)_{c=1,2,...,C} to (H_c)_{c=0,1,2,...,C}, where H_0:= N_L - (H_1+H_2+...+H_C).

Then, we assume

(H_c)_{c=0,1,2,...,C} \sim \color{red}{Multinomial}((p_c)_{c=0,1,2,...,C} )

and

F_c \sim Poisson(q_c(θ)N_I ).

Recall that N_I denotes the number of images (radiographs, such as X-ray films) and N_L the number of lesions (signals, nodules,).

Finish! Very simple! fuck! Gratias! We should credo in unum model. Here, we use the logic of latent variable, so .... I am tired .... you know what it is. Dona nobis pacem. #'

Modeling 2 the author'S redandunt way

Our goal now is to define a model of the random variables H_c,F_c, namely, to give a family of probability law of H_c,F_c.

Let

X (ω):= (H_1(ω),H_2(ω),H_3(ω),H_4(ω),H_5(ω),F_1(ω),F_2(ω),F_3(ω),F_4(ω),F_5(ω))

be a random variable from a probability space ( Ω, σ - field,P_{truth}) to N^{10} which denotes the set of 10-dimensional non-negative integers, where ω denotes an element of Ω.

We have to find a family of probability spaces, consisting three tuples ( Ω, σ - field, P_{θ}). In other words, what we want is to define a familiy of likelihoods (π(x|θ))_{θ \in Θ} such that for any event E of a subset of N^{10}, such that the following equation holds

P_{θ}(X^{-1}E) = \int _{E} π(x | θ) dx.

where X^{-1}E denotes the pre-image of E and x is an element of N^{10} as a realization of the random variable X. The quantity of the last equation is the so-called image measure (or push-forward measure) of the random variable X. The space Ω is abstract, on the otherhand the space of non-negative integers are very familiar, so we use the push-forward measure rather than the measure on Ω. More explicitly, if we write the realization of the random variable X by x =(h,f)= (h_1,h_2,h_3,h_4,h_5,f_1,f_2,f_3,f_4,f_5), then the above equation is

P_{θ}(X^{-1}E) = \int _{E} π(h_1,h_2,h_3,h_4,h_5,f_1,f_2,f_3,f_4,f_5 | θ) dh_1 h_2 h_3 h_4 h_5 f_1 f_2 f_3 f_4 f_5 .

or briefly

P_{θ}(X^{-1}E) = \int _{E} π(h,f | θ) dh df.

This is an elementary formula of push-forward measure. In this package, using Stan, we esimates the parameter θ^* so that the two probability measures P_{θ^*} and P_{truth} is close in some sense. Many statisical methods use the Kullback-Leibler divergence to evaluate the distance of the probability measures. Of course, we can never know the probability measure P_{truth} belongs to the familiy of models P_{θ} or not.

Ha, .... multiple chemical sensitivity is very very very very ....very.

Modeling by reducing to easy case as a first step

First, we shall discuss our model rigorously (ignore the confidence). First, to simplify our argument, first we reduce the FP and TP dataset from H_c,F_c to H,F by ignoring the confidence level. Suppose that there are N_L targets (signal), and radiological context, target is lesion. Suppose that a radiologist try to find these lesions from radiographs. Suppose that now, the reader fined H lesions from radiographs which contains N_L lesions, then it is natural to assume that

H \sim Binomial(θ_H, N_L)

where, θ_H denotes the Bernoulli success rate is one of parameter for our model, which should be estimated. Of course 0<θ_H <1.

In addition, suppose that the reader fails F times, namely, the reader marked F locations in radiographs each of which is not a true lesion location. In other words, the reader marked F false positives. Then it is natural to assume that

F \sim Poisson(θ_F)

where, θ_F is also an another parameter of model, which should be estimated from given data. So, our model has a vector θ_H, θ_F as a model parameter.

The above two is very simple, since data is only H,F, indicating the number of TP and the number of FP.

Unfortunately, the FROC data is more complex than above, namely, we have to take account the confidence levels, and so we have to make a model for data F_c H_c,c=1,...,5 instead of the above simplified data H,F. That is, reader answers with his confidence level for each suspicious location, which is usually an integer such as 1,2,3,4,5.

We give a probability law for the random variables F_c and H_c for c=1,...,5.

Suppose that there are N_L targets, and radiological context, each target is a lesion contained in N_I Radiographs. Suppose that a radiologist try to find lesions. Suppose that now, he found H_c lesions with his c-th confidence, then we assume that each random variable H_c is distributed by the following law.

H_5 \sim Binomial(p_5(θ), N_L)

H_4 \sim Binomial(\frac{p_4(θ)}{1-p_5(θ) }, N_L - H_5)

H_3 \sim Binomial(\frac{p_3(θ)}{1-p_5(θ)-p_4(θ) }, N_L - H_5 - H_4)

H_2 \sim Binomial(\frac{p_2(θ)}{1-p_5(θ)-p_4(θ)-p_3(θ) }, N_L - H_5 - H_4 -H_3)

H_1 \sim Binomial(\frac{p_1(θ)}{1-p_5(θ)-p_4(θ)-p_3(θ)-p_2(θ) }, N_L - H_5 - H_4 -H_3 -H_2)

where, hit rates p_1(θ), p_2(θ), p_3(θ), p_4(θ) and p_5(θ) are some functions of a model parameter θ. We also denote them simply by p_c instead of p_c(θ),c=1,2,3,4,5. In addition, suppose that the reader fails in F_c times with his c-th confidence, that is, the reader locarized F_c false locations in radiographs with his c-th confidence. Then it is natural to assume that

F_5 \sim Poisson(q_5(θ)N_X)

F_4 \sim Poisson(q_4(θ)N_X)

F_3 \sim Poisson(q_3(θ)N_X)

F_2 \sim Poisson(q_2(θ)N_X)

F_1 \sim Poisson(q_1(θ)N_X)

where, N_X = N_I or N_L and we fix it for the duration of the paper.

The false rates q_1(θ), q_2(θ), q_3(θ), q_4(θ) and q_5(θ) are functions of a parameter of model.

The above model gives the probability law for the the random variables H_c,F_c, c=1,2,..,C, indicating the number of TP and the number of FP for each confidence level c=1,2,..,C.

We define p_c(θ) and q_c(θ) in terms of the model parameter μ, σ, θ_c, c=1,2,..,C.

p_c(θ) = \int ^{θ_{c+1}}_{θ_c} Gaussian(z|μ, σ) dz

q_c(θ)= \int ^{θ_{c+1}}_{θ_c} \frac{d}{dz} \log Φ(z) dz

We use the abbriviations p_c and q_c for p_c(θ) and q_c(θ).

For any given dataset, we will estimate the model parameter vector θ;

θ = (θ_1,θ_2,...,θ_C; μ,σ).

Intuitively, the reason why we choose such functions for p_c(θ) is the assumption that each lesion is equipped with i.i.d. latent variable, X distributed by Gaussian(z|μ, σ) , and if X associated to some lesion falls into the interval θ_c <X< θ_{c+1}, then we consider that the reader marks this lesion with his c-th confidence level. In order to emphasize that each X is associated to some l-th lesion, l=1,2,...,N_L we denote the latent variable by X_l for the l-th lesion instead the latent decision variable X. Here, we uses latent to means that the variable X cannot be observed. Since the latent variable relates decision of reader, and thus, in this context the latent variable is called a decision variable.

Similarly, suppose that each image (radiograph) is associated some latent variable Y distributed by N_I \frac{d}{dz} Φ(z) and if the Y associated to some image falls into interval the interval θ_c <Y<θ_{c+1}, then we consider that the reader will false decision with his c-th confidence level for the image.

Fundamental equations

The reason why we use the hit rates such as \frac{p_2}{1-p_5-p_4-p_3} instead of p_c is that it ensures the equality E[\frac{H_c}{N_L}] = p_c. This equality is very important to establish Bayesian FROC theory so that it is compatible with the classical FROC theory. As an immediate consequence of the definition of hit rates, we have,

E[\frac{H_c}{N_L}] = p_c,

E[\frac{F_c}{N_X}] = q_c,

where E denotes the expectation and N_X is the number of lesion or the number of images and q_c is a false alarm rate, namely, F_c ~ Poisson(N_X q_c).

More precisely or to express the above with model parameter explicitly, we should rewrite it as follows.

E_{θ}[\frac{H_c}{N_L}] = p_c(θ),

E_{θ}[\frac{F_c}{N_X}] = q_c(θ),

where E_{θ}[X] denotes the expectation of a random variable X with the likelihood π(ω|θ) for data ω parameter θ, namely,

E_{θ}[X] := \int X(ω)P_{θ}(d ω) = \int x π(x|θ)dx

So, the above two equations are rewritten as follows.

E_{θ}[\frac{H_c}{N_L}] := \int \frac{H_c(ω)}{N_L}P_{θ}(d ω) = \int \frac{h_c}{N_L} π(h,f|θ)dhdf = p_c(θ),

E_{θ}[\frac{F_c}{N_X}] := \int \frac{F_c(ω)}{N_X}P_{θ}(d ω) = \int \frac{f_c}{N_L} π(h,f|θ)dhdf = q_c(θ).

What redundant explanation!

These two family of equations are most important one, and the author made this model to satisfy this. Using these equations, we can define the FROC curve such that the curve can be interpreted as the points of expectations.

We call these equations the fundamental equations of FROC analysis. Using this, we can calculates the expectations of FPF and TPF in the later.

The new model by the author is a generative model: The classical model can not synthesize dataset so that the total number of hits is bounded from above by the number of lesions.
Love: The new model is made with great love of the author and poor condition and poor books (to tell the truth, I did not read any books when I made a prototype) without any support of money.
A details of model: The formulation of hit rate differs from the classical theory.
The new model excludes the number of images: The formulation of false rate differs from the classical theory and it allows us to exclude the number of images from modeling.
A multiple chemical sensitivity: The author diseased the serious , so,,,, the author is a patient of the chemical sensitivity, which make his life of quality much lower.
A multiple chemical sensitivity: The author diseased the serious , so,,,, the author is a patient of the chemical sensitivity, which make his life of quality much lower.

#' Using the above two equations, we can establish the alternative Bayesian FROC theory preserving classical notions and formulas.

To fit a model to any dataset, we use the code:

fit_Bayesian_FROC(): Fit a model to data
dataList.Chakra.2: Example data in Chakraborty 1989 paper
dataList.Chakra.3: Example data in Chakraborty 1989 paper
dataList.Chakra.4: Example data in Chakraborty 1989 paper

Priors on the Model Parameter.

Recall that our model has the following parameter.

θ = (θ_1,θ_2,...,θ_C; μ,σ).

In this section, we give priors on this parameter. Only one necessarily prior is to ensure the monotonicity on the thresholds parameters.

θ_1 < θ_2 < ... < θ_C.

To give this monotonicity, we have to assume .... UNDER CONSTRUCTION

Recall that the number of false alarms is distributed by Poisson with rate

q_c(θ) = \log \frac{ Φ(θ_{c+1})}{Φ(θ_c)}

Visualization of TP, FP by FPF, TPF

How to visualize our data constructed by hit and false alarms, that is, TP and FP? Traditionally, the so-called FPF;False Positive Fraction and TPT:True Positive Fraction are used. Recall that our data format:

A single reader and a single modality case auxiliary: number of images and lesions NI, NL ——————————————————————————————————

	confidence level	No. of false alarms	No. of hits
		(FP:False Positive)	(TP:True Positive)
-----------------------	-----------------------	-----------------------------	-------------
definitely present	5	F_5	H_5
probably present	4	F_4	H_4
equivocal	3	F_3	H_3
subtle	2	F_2	H_2
very subtle	1	F_1	H_1

—————————————————————————————————

In the above table, we introduce two kinds of random variables F_c H_c c=1,2,3,4,5 which are non-negative integers and please keep in mind the notations because, from now on, we use them frequently throughout this paper.

Recall that FPF ( False Positive Fraction) is defined as follows;

FPF(5):= \frac{F_5}{N_I},

FPF(4):= \frac{F_4+F_5}{N_I},

FPF(3):= \frac{F_3+F_4+F_5}{N_I},

FPF(2):= \frac{F_2+F_3+F_4+F_5}{N_I},

FPF(1):= \frac{F_1+F_2+F_3+F_4+F_5}{N_I}.

Similarly, TPF ( True Positive Fraction) is defined as follows;

TPF(5):= \frac{H_5}{N_L},

TPF(4):= \frac{H_4+H_5}{N_L},

TPF(3):= \frac{H_3+H_4+H_5}{N_L},

TPF(2):= \frac{H_2+H_3+H_4+H_5}{N_L},

TPF(1):= \frac{H_1+H_2+H_3+H_4+H_5}{N_L}.

Combining TPF and FPF, we obtain the pairs.

( FPF(1), TPF(1) ),

( FPF(2), TPF(2) ),

( FPF(3), TPF(3) ),

( FPF(4), TPF(4) ),

( FPF(5), TPF(5) ).

Plotting these five points in a two-dimensional plain, we can visualize our dataset..

In addition, connecting these points by lines, we obtain the so-called empirical FROC curve.

interpretation of the empirical FROC curve

In fact, if a reader (physician) has a high signal detection ability, namely, he can find more lesions in Radiographs (image), then the number of TPs denoted by H_1,H_2,H_3,H_4,H_5 will be more and more greater. Thus, the

TPF(1),TPF(2),TPF(3),TPF(4),TPF(5)

is also greater. Consequently, the points

( FPF(1), TPF(1) ),

( FPF(2), TPF(2) ),

( FPF(3), TPF(3) ),

( FPF(4), TPF(4) ),

( FPF(5), TPF(5) ).

are located in upper positions. This indicates that the high observer performance leads the empirical FROC curve to be more upper positions in the plane.

Visualization of our model by curve

In this section, we provides the so-called FROC curve which is our desired visualization of estimated model. Roughly speaking, an FROC curve is expected pairs of FPF and TPF. Namely, the points of FPF and TPF will be on FROC curve if model is well fitting to data. So, comparing the FROC curve and the FPF and TPF, we can evaluate our goodness of fit.

In the above, ha,... I want to die.

Define x(c), y(c), c = 1,2,3,4,5 by the expectations of FPF and TPF, respectively, namely,

x(c):= E[ FPF(c) ],

y(c):= E[ TPF(c) ].

for c = 1,2,3,4,5.

Using the formulas E_{θ}[\frac{H_c}{N_L}] = p_c(θ), E_{θ}[\frac{F_c}{N_X}] = q_c(θ),, we can rewrite them in terms of the parameters μ, σ of the latent Gaussian, as follows.

x(c) = E[ FPF(c) ] =\int^∞_{θ_c}\frac{d}{dz} \log Φ(z) dz = - \log Φ(θ_c),

y(c) = E[ TPF(c) ] =\int^∞_{θ_c} Gaussian(z|μ, σ) dz = Φ(\frac{θ_c - μ}{σ}).

From the first equation, we obtain that θ_c = Φ^{-1}(\exp(-x(c))). Substituting this into the second equation, it follows that

y(c) = Φ(\frac{ Φ^{-1}(\exp(-x(c))) - μ}{σ}).

This implies that the set of points (x(c), y(c)), c = 1,2,3,4,5 consisting of all expectations for the pair of FPF and TPF is contained in the following set:

\{(x,y) | y= Φ(\frac{ Φ^{-1}(\exp(-x) - μ}{σ})\}.

We can regard this set as an image of smooth curves, Namely, here we define the so-called FROC curve as a map from 1-dimensional Euclidean space to 2-dimensional Euclidean space, mapping each t>0 to

(x(t),y(t) ) =(t, Φ(\frac{ Φ^{-1}(\exp(-t)) - μ}{σ}) )

Because x(t)=t,t>0 is not bounded, the area under the FROC curve is infinity.

To calculates alternative notion of AUC in the ordinal ROC theory, we define the so-called AFROC curve:

(ξ(t),η(t) ) =(1-e^{-t}, Φ(\frac{ Φ^{-1}(\exp(-t)) - μ}{σ}) )

which contained in the rectangular space [0,1]^2. The area Under the (AFROC) curve (briefly, we call it AUC) represents the observer performance. For example, if radiologist detects more lesions with small False Positives (FPs), then AUC would be high.

Using the parameter of the signal distribution, we express AUC as follows,

AUC = \int η d ξ = \frac{ μ / σ}{ √{1+ 1/σ^2} }.

Introducing new parameter a:= μ / σ and b:= 1 / σ, we can also write

AUC = \frac{ a }{ √{1+ b^2} }.

Generalized Model

Until now, we use the following two

p_c(θ)= \int ^{θ_{c+1}}_{θ_c} Gaussian(z|μ, σ) dz

q_c(θ)= \int ^{θ_{c+1}}_{θ_c} \frac{d}{dz} \log Φ(z) dz

for hit rates and false alarm rates.

However, the explicit representations of these integrands of p_c(θ),q_c(θ) are not determined in a prior manner. So, such explicit representations are redundant for a general theory. So, to simplify our argument in the following, we use general notations P(z|θ_P), Q(z|θ_Q) instead of the above two integrands Gaussian(z|μ, σ) and \frac{d}{dz} \log Φ(z), and rewrite them as follows,

p_c(θ)= \int ^{θ_{c+1}}_{θ_c} P(z|θ_P) dz,

q_c(θ)= \int ^{θ_{c+1}}_{θ_c} Q(z|θ_Q ) dz.

In the sequel, we assume that P(z|θ_P) is a probability density function (namely, its total integral is one) and Q(z|θ_Q) is a positive function (not necessarily to be a probability function). Namely,

\int P(z|θ_P) dz = 1,

for all θ_P and

Q(z|θ_Q ) > 0,

for all z and θ_Q.

A single reader and a single modality

—————————————————————————————————

`NI=63,NL=124`	confidence level	No. of false alarms	No. of hits
In R console ->	`c`	`f`	`h`
-----------------------	-----------------------	-----------------------------	-------------
definitely present	`c[1] =` 5	`f[1] =` F_5 = 1	`h[1] =` H_5 = 41
probably present	`c[2] =` 4	`f[2] =` F_4 = 2	`h[2] =` H_4 = 22
equivocal	`c[3] =` 3	`f[3] =` F_3 = 5	`h[3] =` H_3 = 14
subtle	`c[4] =` 2	`f[4] =` F_2 = 11	`h[4] =` H_2 = 8
very subtle	`c[5] =` 1	`f[5] =` F_1 = 13	`h[5] =` H_1 = 1

—————————————————————————————————

We give a probability law for the random variables F_c H_c,c=1,...,5.

Suppose that there are N_L targets, and radiological context, each target is a lesion contained in some Radiograph as a shadow. Suppose that a radiologist try to find lesions for N_I radiographs. Suppose that now, the radiologist fined H_c lesions with his c-th confidence, then we assume that

H_5 \sim Binomial(p_5(θ), N_L)

H_4 \sim Binomial(\frac{p_4(θ)}{1-p_5(θ) }, N_L - H_5)

H_3 \sim Binomial(\frac{p_3(θ)}{1-p_5(θ)-p_4(θ) }, N_L - H_5 - H_4)

H_2 \sim Binomial(\frac{p_2(θ)}{1-p_5(θ)-p_4(θ)-p_3(θ) }, N_L - H_5 - H_4 -H_3)

H_1 \sim Binomial(\frac{p_1(θ)}{1-p_5(θ)-p_4(θ)-p_3(θ)-p_2(θ) }, N_L - H_5 - H_4 -H_3 -H_2)

where, hit rates p_1(θ), p_2(θ), p_3(θ), p_4(θ) and p_5(θ) are functions of a model parameter θ. In addition, suppose that the reader fails F_c times with his c-th confidence, that is, the reader marked F_c false positives. Then it natural to assume that

F_5 \sim Poisson(q_5(θ)N_X)

F_4 \sim Poisson(q_4(θ)N_X)

F_3 \sim Poisson(q_3(θ)N_X)

F_2 \sim Poisson(q_2(θ)N_X)

F_1 \sim Poisson(q_1(θ)N_X)

where, N_X = N_I or N_L false rates q_1(θ), q_2(θ), q_3(θ), q_4(θ) and q_5(θ) are functions of a parameter of model.

The above model calculates the event of the data H_c,F_c, c=1,2,..,C arises, indicating the number of TP and the number of FP.

We use Gaussian distributions for the functions p_c(θ) and q_c(θ) as follows.

p_c(θ)= \int ^{θ_{c+1}}_{θ_c} P(z|θ_P) dz

q_c(θ)= \int ^{θ_{c+1}}_{θ_c} Q(z|θ_Q ) dz

where the model parameter vector is

θ = (θ_1,θ_2,...,θ_C; θ_P,θ_Q).

Recall that FPF is defined as follows;

FPF(5):= \frac{F_5 }{N_I},

FPF(4):= \frac{F_4+F_5 }{N_I},

FPF(3):= \frac{F_3+F_4+F_5 }{N_I},

FPF(2):= \frac{F_2+F_3+F_4+F_5 }{N_I},

FPF(1):= \frac{F_1+F_2+F_3+F_4+F_5}{N_I}.

Similarly, TPF is defined as follows;

TPF(5):= \frac{H_5 }{N_L},

TPF(4):= \frac{H_4+H_5 }{N_L},

TPF(3):= \frac{H_3+H_4+H_5 }{N_L},

TPF(2):= \frac{H_2+H_3+H_4+H_5 }{N_L},

TPF(1):= \frac{H_1+H_2+H_3+H_4+H_5}{N_L}.

Combining TPF and FPF, we obtain the pairs.

( FPF(1), TPF(1) ),

( FPF(2), TPF(2) ),

( FPF(3), TPF(3) ),

( FPF(4), TPF(4) ),

( FPF(5), TPF(5) ).

Plotting these five points in a 2-dimensional plain, we can visualize our dataset.

Visualization of a generalized model by curve

In this section, we provide the so-called FROC curve which is our desired visualization of estimated model. Roughly speaking, an FROC curve is expected pairs of FPF and TPF. Namely, the points of FPF and TPF will be on FROC curve if model is well fitting to data. So, comparing the FROC curve and the FPF and TPF, we can evaluate our goodness of fit.

Let c = 1,2,3,4,5.

Define

x(c):= E[ FPF(c) ],

y(c):= E[ TPF(c) ].

Using the fundamental equations E_{θ}[\frac{H_c}{N_L}] = p_c(θ), E_{θ}[\frac{F_c}{N_X}] = q_c(θ),

y(c) = E[ TPF(c) ] =\int^∞_{θ_c} P(x|θ_P)dx =: Ψ_P( θ_c ),

x(c) = E[ FPF(c) ] =\int^∞_{θ_c} Q(x|θ_Q)dx =: Ψ_Q( θ_c ),

where Ψ_P and Ψ_Q denote the cumulative functions of the functions P and Q, respectively. ( That is, Ψ_P(x):=\int_x^∞ P(t)dt and Ψ_Q(x):=\int_x^∞ Q(t)dt.)

Note that we assume that P is a probability density function but Q is not. So, Ψ_P is a cumulative distribution function, but Ψ_Q is not a cumulative ‘distribution’ function.

This implies that all expectations for the pair of FPF and TPF, namely (x(c),y(c)) = (E[ FPF(c) ] , E[ TPF(c) ]) , is on the following set:

\{(x(t),y(t)) | x(t)= Ψ_Q(t), y(t)= Ψ_P(t), t >0 \}.

We can regard this set as the image of the smooth curve which is called the generalized FROC curve in this manuscript.

From the first equation, we obtain that θ_c = Ψ_Q^{-1}( x(c) ). Substituting this into the second equation, we obtain that

y(c) = Ψ_P(Ψ_Q^{-1}( x(c) ) ).

This implies that all exceptions for the pair of FPF and TPF is on the set:

\{(x,y) | y= Ψ_P(Ψ_Q^{-1}( x )). \}.

We can regard this set as an image of smooth curves.

(x(t),y(t) ) =(t, Ψ_P(Ψ_Q^{-1}( t )) )

Sine x(t)=t,t>0 is not bounded, the area under the FROC curve is infinity.

To calculates alternative notion of AUC in the ordinal ROC theory, we define the so-called AFROC curve:

(ξ(t),η(t) ) =(1-e^{-t}, Ψ_P(Ψ_Q^{-1}( x )) )

MRMC Model for Multiple Readers and Multiple Modalities (MRMC)

—————————————————————————————————————-

`NI=63,NL=124`	modality ID	reader ID	confidence	No. of FPs	No. of TP
In R console ->	`m`	`q`	`c`	`f`	`h`
----------------------	-----------------------	--------------	-----------------------	------------	--------------------
definitely present	1	1	`c[1] =` 5	`f[1] =` F_{1,1,5}	`h[1] =` H_{1,1,5}
probably present	1	1	`c[2] =` 4	`f[2] =` F_{1,1,4}	`h[2] =` H_{1,1,4}
equivocal	1	1	`c[3] =` 3	`f[3] =` F_{1,1,3}	`h[3] =` H_{1,1,3}
subtle	1	1	`c[4] =` 2	`f[4] =` F_{1,1,2}	`h[4] =` H_{1,1,2}
very subtle	1	1	`c[5] =` 1	`f[5] =` F_{1,1,1}	`h[5] =` H_{1,1,1}
definitely present	1	2	`c[6] =` 5	`f[6] =` F_{1,2,5}	`h[6] =` H_{1,2,5}
probably present	1	2	`c[7] =` 4	`f[7] =` F_{1,2,4}	`h[7] =` H_{1,2,4}
equivocal	1	2	`c[8] =` 3	`f[8] =` F_{1,2,3}	`h[8] =` H_{1,2,3}
subtle	1	2	`c[9] =` 2	`f[9] =` F_{1,2,2}	`h[9] =` H_{1,2,2}
very subtle	1	2	`c[10] =` 1	`f[10] =` F_{1,2,1}	`h[10] =` H_{1,2,1}
definitely present	2	1	`c[11] =` 5	`f[11] =` F_{2,1,5}	`h[11] =` H_{2,1,5}
probably present	2	1	`c[12] =` 4	`f[12] =` F_{2,1,4}	`h[12] =` H_{2,1,4}
equivocal	2	1	`c[13] =` 3	`f[13] =` F_{2,1,3}	`h[13] =` H_{2,1,3}
subtle	2	1	`c[14] =` 2	`f[14] =` F_{2,1,2}	`h[14] =` H_{2,1,2}
very subtle	2	1	`c[15] =` 1	`f[15] =` F_{2,1,1}	`h[15] =` H_{2,1,1}
definitely present	2	2	`c[16] =` 5	`f[16] =` F_{2,2,5}	`h[16] =` H_{2,2,5}
probably present	2	2	`c[17] =` 4	`f[17] =` F_{2,2,4}	`h[17] =` H_{2,2,4}
equivocal	2	2	`c[18] =` 3	`f[18] =` F_{2,2,3}	`h[18] =` H_{2,2,3}
subtle	2	2	`c[19] =` 2	`f[19] =` F_{2,2,2}	`h[19] =` H_{2,2,2}
very subtle	2	2	`c[20] =` 1	`f[20] =` F_{2,2,1}	`h[20] =` H_{2,2,1}

—————————————————————————————————————-

An example data in this package

R codes

R object named dd is an example data, and to show the above table format, execute the following codes

library(BayesianFROC);viewdata(dd)

In this section we use the abbreviation MRMC which means Multiple Readers and Multiple Modalities. In MRMC, Observer performance ability has individualities caused by readers and modalities. Once we includes these individual differences in our Bayesian model, such model will give us an answer for the modality comparison issues.

The author implements several models for MRMC.

1) Non hierarchical MRMC model

2) hierarchical MRMC model

3) A Single reader and multiple modalities model

I am a patient of Multiple Chemical Sensitivity (CS) which cause inflammations in the brain and it makes me hard to write this. I know there are many mistakes. When I read my writing, I always find and fix. Please forgive me, because CS makes me foolish.

MRMC model Without hyper parameter

To include heterogeneity caused by readers and modalities, the author first made a hierarchical model. However, the model has divergent transitions in MCMC iterations. Thus the author also made a non-hierarchical model in which the author removed the hyper parameters to get more stable MCMC simulation and he confirmed that the new model is divergent free with my fake data.

In MRMC models, the model parameter is a vector denoted by

θ = (θ_1,θ_2,...,θ_C; μ,σ),

where each θ_i (i= 1,2,...,C) is a real number and μ,σ are (M, R)-matrices whose components are denoted by

μ_{1,1},μ_{1,2},μ_{1,3},...,μ_{1,r},....,μ_{1,R},

μ_{2,1},μ_{2,2},μ_{2,3},...,μ_{2,r},....,μ_{2,R},

μ_{3,1},μ_{3,2},μ_{3,3},...,μ_{3,r},....,μ_{3,R},

...,

μ_{m,1},μ_{m,2},μ_{m,3},...,μ_{m,r},....,μ_{m,R},

...,

μ_{M,1},μ_{M,2},μ_{M,3},...,μ_{M,r},....,μ_{M,R},

and

σ_{1,1},σ_{1,2},σ_{1,3},...,σ_{1,r},....,σ_{1,R},

σ_{2,1},σ_{2,2},σ_{2,3},...,σ_{2,r},....,σ_{2,R},

σ_{3,1},σ_{3,2},σ_{3,3},...,σ_{3,r},....,σ_{3,R},

...,

σ_{m,1},σ_{m,2},σ_{m,3},...,σ_{m,r},....,σ_{m,R},

...,

σ_{M,1},σ_{M,2},σ_{M,3},...,σ_{M,r},....,σ_{M,R},

where the subscripts m and r indicate the m-th modality and the r-th reader, respectively.

Note that we use the notation θ for

θ = (θ_1,θ_2,...,θ_C; μ,σ),

and do not confuse it with

(θ_1,θ_2,...,θ_C).

Using the model parameter θ , we can define AUC associated with each pair of reader and modality as follows.

AUC_{m,r} = \frac{ μ_{m,r} / σ_{m,r}}{ √{1+ 1/σ_{m,r}^2} }.

Furthermore, we can extract the efficacy of modality.

AUC_{m} = \frac{1}{R} ∑_{r=1}^R AUC_{m,r},

which is also denoted by A[m],m=1,2,...,M in the R console (or R studio console) and retained in the R object of the S4 class (the so-called stanfit or its extended class).

Using A[m],m=1,2,...,M, we can compare modalities such as MRI, CT, PET, etc. Note that if our trial use x-ray films taken by MRI and CT, then M=2. If images are taken by MRI, CT, PET, then M=3. So, A[m],m=1,2,...,M is a function of the model parameter. In Bayesian sense, the estimates are posterior samples and thus, A[m],m=1,2,...,M are obtained as MCMC samples. Using these, we can calculate posterior probabilities of any events. This is the author's main scheme. Ha,,, I want to

Of course, these AUCs are defined as the area under the AFROC curve for the r th reader and the m th modality. The so-called FROC curve for the r th reader and the m th modality is a map from 1-dimensional Euclidean space to 2-dimensional Euclidean space, mapping each t>0 to

(x_{m,r} (t),y_{m,r} (t) ) =(t, Φ(\frac{ Φ^{-1}(\exp(-t)) - μ_{m,r} }{σ_{m,r} }) )

Because x(t)=t,t>0 is not bounded, the area under the FROC curve is infinity.

To calculates alternative notion of AUC in the ordinal ROC theory, we define the so-called AFROC curve:

(ξ_{m,r} (t),η_{m,r} (t) ) =(1-e^{-t}, Φ(\frac{ Φ^{-1}(\exp(-t)) - μ_{m,r} }{σ_{m,r} }) )

which contained in the rectangular space [0,1]^2.

Probability law of hits

In the sequel, the subscripts m,r mean the m-th modality and the r-th reader, respectively.

Random variables of hits are distributed as follows.

H_{5,m,r} \sim Binomial (p_{5,m,r}(θ), N_L ),

where the notation H_{5,m,r} denotes the number of hits (TPs) with confidence level 5 of the m-th modality for the r th reader.

Now, the H_{5,m,r} targets (signals, lesions) are found by the reader (radiologist), and the residue of targes, i.e., number of remaining targets is N_L - H_{5,m,r}.

Thus, the number of hits with the 4-th confidence level H_{4,m,r} should be drawn from the binomial distribution with remaining targets whose number is N_L - H_{5,m,r} and thus

H_{4,m,r} \sim Binomial (\frac{p_{4,m,r}(θ)}{1-p_{5,m,r}(θ)}, N_L - H_{5,m,r}).

Similarly,

H_{3,m,r} \sim Binomial (\frac{p_{3,m,r}(θ)}{1-p_{5,m,r}(θ)-p_{4,m,r}(θ)}, N_L - H_{5,m,r} -H_{4,m,r}).

H_{2,m,r} \sim Binomial (\frac{p_{2,m,r}(θ)}{1-p_{5,m,r}(θ)-p_{4,m,r}(θ)-p_{3,m,r}(θ)}, N_L - H_{5,m,r} -H_{4,m,r}-H_{3,m,r}).

H_{1,m,r} \sim Binomial (\frac{p_{1,m,r}(θ)}{1-p_{5,m,r}(θ)-p_{4,m,r}(θ)-p_{3,m,r}(θ)-p_{2,m,r}(θ)}, N_L - H_{5,m,r} -H_{4,m,r}-H_{3,m,r}-H_{2,m,r}).

Probability law of false alarms

Let N_X be the one of the followings and fix it.

1) N_X = N_L (The number of lesions), if ModifiedPoisson = TRUE.

2) N_X = N_I (The number of images), if ModifiedPoisson = FALSE.

Using N_X, we assume the following,

F_{5,m,r} \sim Poisson(q_{5}(θ) N_X ),

F_{4,m,r} \sim Poisson( q_{4}(θ) N_X ),

F_{3,m,r} \sim Poisson( q_{3}(θ) N_X ),

F_{2,m,r} \sim Poisson( q_{2}(θ) N_X ),

F_{1,m,r} \sim Poisson( q_{1}(θ) N_X ),

where subscripts m,r mean the m-th modality and the r-th reader, respectively.

The rate p_{c,m,r}(θ) and q_{c}(θ) are calculated from the model parameter θ.

We use a Gaussian distribution and the cumulative distribution function Φ() of the standard Gaussian for the functions p_{c,m,r}(θ) and q_c(θ) as following manner.

p_{c,m,r}(θ)= \int ^{θ_{c+1}}_{θ_c} Normal(z|μ_{c,m,r},v_{c,m,r}) dz

q_{c}(θ)= \int ^{θ_{c+1}}_{θ_c} \frac{d}{dz} \log Φ(z) dz

where the model parameter vector is

θ = (θ_1,θ_2,...,θ_C; θ_P,θ_Q).

Specifying a model parameter θ = (θ_1,θ_2,...,θ_C; θ_P,θ_Q). we can make a fake dataset consisting of hit data H_{c,m,r} false alarm data F_{c,m,r} for each c,m,r. So, our model is a generative model and this is a crucial difference between our model and the classical one.

Without hyper parameter MRMC model

A Non-Centered Implementation

AA[md,qd] ~ Normal(A[md],hyper_v[qd])

Non centered version is the following:

AA_tilde[md,qd] ~ Normal(0,1)

AA[md,qd] = A[md]+hyper_v[qd]*AA_tilde

But, the AA[md,qd] is already defined as follows.

AA[md,qd]=Phi( (mu[md,qd]/v[md,qd])/sqrt((1/v[md,qd])^2+1) );

Thus usual non centered model cannot be implemented.

The assumption

AA[md,qd] ~ Normal(A[md],hyper_v[qd])

is an approximation. So, this model is not correct. I am not sure whether the approximation worsen my model.

The hyper parameters have been in use for more than 2 years in this package. However it caused divergent transitions. Thus the author made a new model without these hyper parameters.

Example dataset is dd and ddd and dddd and ddddd and ...etc. ———————————————————————————————

Validation of model via SBC

SBC tests the Null hypothesis that the MCMC sampling is correct by using some rank statistic which synthesizes a histogram. If this hits gram is not uniformly distributed, then we reject the null hypothesis, and we conclude that our MCMC sampling contains bias.

Talts, S., Betancourt, M., Simpson, D., Vehtari, A., and Gelman, A. (2018). Validating Bayesian Inference Algorithms with Simulation-Based Calibration. arXiv preprint arXiv:1804.06788. https://arxiv.org/abs/1804.06788

Validation of model via Posterior Predictive p value

Let θ_1,θ_2,...,θ_n be MCMC samples from a posterior distribution π(.| D) for a given dataset D. Let L(y| θ_i) be a likelihood function for a dataset y and model parameter θ. Let

y^i_j \sim L(| θ_i).

For any real-valued function φ=φ(y,θ), we can calculates its integral with the posterior predictive measure as the approximation of two steps Monte Carlo integral as follows.

\int \int φ(y,θ) L(y| θ) π(θ|y)dy dθ

=\int Σ_i φ(y,θ_i) L(y| θ_i) dy

=Σ_j Σ_i φ( y^i_j,θ_i) L( y^i_j| θ_i).

Using φ= 1(T(y,θ)> T(y,θ_{observed})), we obtain the so-called posterior predictive p value. (The author hates this notion.)

In my opinion, this criteria is not clear whether it is reliable quantities for evaluations.

Validation of model; Comparison between truth and estimates of fake data-sets which are drawn using the truth.

I think this is the most fundamental and intuitive validation.

Under Construction

—————————————————- Appendix: —– Terminology ——-

hit: which is also called True Positive: TP, which is denoted with each confidence level, c=1,2,3,...,C as follows: H_1,H,2,...,H_C or h=c(h[1],h[2],...,h[C]), where h[1]=H_C corresponds a number of hit with most high confidence level
False alarm: which is also called False Positive: FP , which is denoted with each confidence, c=1,2,3,...,C levels as follows: F_1,F,2,...,F_C or f=c(f[1],f[2],...,f[C]), where f[1]=F_C corresponds a number of false alarms with most high confidence level
Modality: Imaging methods, such as MRI, CT, PET,...etc. In another context, it means efficacy of treatment.
Reader: is a radiologist, physician, who try to detect lesions from radiographs. For a single image, reader can answer multiple suspicious shadows and he assigns to each suspicious shadows his or her confidence level. So, the reader localizes and rates for each suspicious shadows. A data analyst evaluates whether each reader's localization of lesion is true or false. Note that a single image can synthesize multiple-false positives or multiple true positives. Such a multiplicity distincts FROC analysis with ordinal ROC analysis.
Image: is a radiograph taken by MRI, CT, PET, etc.
Modality comparison: The question that which modality (MRI, CT, PET, ... etc) is best to detect lesions in radiographs? In order to answer this question, the FROC analysis exists.
hit rate: Each lesion can synthesize a hit of confidence level c according to Bernoulli distribution with probability of p_c, which call hit rate (of c)
false alarm rate: Each image synthesize a false alarm (False Positive: FP) of confidence level c according to Poisson distribution with probability of lambda_c, which call false alarm rate (of c) or simply false rate.
Number of images: which is denoted by N_I. An image means a radiograph or an X ray film, including shadows, each of which is caused by lesions or noise. Namely, each radiograph does not necessarily includes lesions.
Number of lesions: Suppose that there are N_I radiographs. Then by summing the number of lesions over all radiographs, we obtain the number of lesion N_L.
FROC curve: alternative notion of ROC curve in FROC context.
AFROC curve: Alternative-FROC curve, whose area under the curve indicates observer performance. Since area under the FROC curve is infinity, we use this area under the AFROC curve instead of the area under the FROC curve.
AUC: A real number between 0 and 1, indicating how many lesions radiologist can detect from radiographs. It is the area under the AFROC curve. In ROC context, AUC should be greater than 0.5, but in FROC context, the interpretation of AUC is not same as that in ROC context. For example, AUC =0.5 does not means that it is sames as the most bad observer performance.
Chi square: The difference of expectation minus observation, namely it is estimates minus actual observed data. Smaller is better.
Posterior Predictive P value (PPP): This is a posterior predictive probability of the event that a test statistic is greater than its observed value. The author implements the χ^2 goodness of fit as a test statistic and in this context, if the PPP is small then we reject the null hypothesis that the model is well fit to data. The author hates this traditional bitch.
FPF:False Positive Fraction: Cumulative sum of false alarms (FPs) divided by the number of Images or the number of lesions. Using FPFs as x and TPFs as y, we can visualize FPs and TPs.
TPF:True Positive Fraction: Cumulative sum of hits (TPs) decided by the number of Lesions (signals, targets). Using FPFs as x and TPFs as y, we can visualize FPs and TPs.