Home

/

CRAN

/

ONEST

/

ONEST"

ONEST"
In ONEST: Observers Needed to Evaluate Subjective Tests

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(ONEST)

1 General Information

The Observers Needed to Evaluate Subjective Tests software implements a statistical method in Reisenbichler et al. (2020^[Reisenbichler, E. S., Han, G., Bellizzi, A., Bossuyt, V., Brock, J., Cole, K., Fadare, O., Hameed, O., Hanley, K., Harrison, B. T., Kuba, M. G., Ly, A., Miller, D., Podoll, M., Roden, A. C., Singh, K., Sanders, M. A., Wei, S., Wen, H., Pelekanou, V., Yaghoobi, V., Ahmed, F., Pusztai, L., and Rimm, D. L. (2020) “Prospective multi-institutional evaluation of pathologist assessment of PD-L1 assays for patient selection in triple negative breast cancer," Mod Pathol, DOI: 10.1038/s41379-020-0544-x; PMID: 32300181.]), to determine the minimum number of evaluators needed to estimate agreement involving a large number of raters. This method could be utilized by regulatory agencies, such as the FDA, when evaluating agreement levels of a newly proposed subjective laboratory test. Input to the program should be binary(1/0) pathology data, where “0” may stand for negative and “1” for positive. The example datasets in this software are from Rimm et al. (2017^[Rimm, D. L., Han, G., Taube, J. M., Yi, E. S., Bridge, J. A., Flieder, D. B., Homer, R., West, W. W., Wu, H., Roden, A. C., Fujimoto, J., Yu, H., Anders, R., Kowalewski, A., Rivard, C., Rehman, J., Batenchuk, C., Burns, V., Hirsch, F. R., and Wistuba,, II (2017) “A Prospective, Multi-institutional, Pathologist-Based Assessment of 4 Immunohistochemistry Assays for PD-L1 Expression in Non-Small Cell Lung Cancer," JAMA Oncol, 3(8), 1051-1058, DOI: 10.1001/jamaoncol.2017.0013, PMID: 28278348.]) (the SP142 assay), and Reisenbichler et al. 2020. This program can run in R version 3.5.0 and above.

2 Model and Inference

We briefly introduce the statistical model and inference implemented by this program. Let p denote the proportion of concordant (i.e., identical) reads among a group of raters, and the group size can be two or more. We let “p^+^” denote the proportion of tissue cases that will always be evaluated positive by all the raters, and “p^-^” a proportion that will always be evaluated negative. Among the proportion of “1-p^+^-p^-^” cases that could be rated either positive or negative, each case has the probability “p” of being rated positive from any pathologist. Then the proportion of consistent reads among k pathologists can be written as p^(k)^ = p^+^+p^-^+(1-p^+^-p^-^)[p^k^+(1-p)^k^].

Let “I” denote the minimal sufficient number of pathologists in the sense that “I” is the minimum integer value to satisfy p ^(i)^ - p^(i+1)^ < pᵟ with a large probability (e.g., 95%), where pᵟ is a threshold of the change in the percentage agreement due to including one additional pathologist. Let p~c~ = p^+^ + p^-^.

The statistical inference is based on the joint likelihood function of parameters p^+^, p^-^, and p. For n cases and k pathologists, we have the data {y~ij~; i=1,…,n, j=1,…,k}. Each observation y~ij~ is binary, where y~ij~ =1 if the read is positive and y~ij~ =0 if the read is negative. The probabilities of y~ij~=1 and y~ij~=0 can be written as P(y~ij~=1) = p^+^+ p(1-p^+^-p^-^) and P(y~ij~=0) = p^-^+ (1-p)(1-p^+^-p^-^), respectively. We assume all {yij} are independently and identically distributed. The likelihood function can be written as L(p, p^+^, p-|{y~ij~}) = [p^+^+ p(1-p^+^-p^-^)]^T^ [p^-^+ (1-p)(1-p^+^-p^-^)]^nk-T^, where T is the total number of reading equal to 1 among all “nk” reads. With k pathologists, we let n~c~ denote the number of consistent reads among n cases, so n~c~ ~ Bin(n, p~c~). Similarly, we have n^+^ ~ Bin(n, p^+^) and n^-^ ~ Bin(n, p^-^), where n^+^ and n^-^ denote the numbers of cases that all pathologists read positive and negative, respectively.

Based on the binomial maximum likelihood estimation, the estimates are p^+^ = n^+^/n, p^-^ = n^-^/n, p^+^+ p(1-p^+^-p^-^) = T/(nk), and p = [T/(nk) - p^+^]/(1-p^+^-p^-^). We then estimate p by plugging the estimates of {p~c~ , p} into the equation p ^(k)^ = p~c~ +(1-p^+^-p^-^)[p^k^+(1-p)^k^]. We define the objective function as D^(i)^ = p ^(i)^ - p^(i+1)^=(1-p^+^-p^-^)[p^i^(1-p)+ p(1-p)^i^]. The estimate of “p” depends on the product of n and k, and the estimate of p~c~ is n~c~/n. We use 95% as the probability threshold. Based on the central limit theorem, the asymptotic 95% lower bound of p~c~ is: n~c~/n-1.645[n~c~(n-n~c~)/n^3^]^1/2^. By plugging in this lower bound of p~c~ we can compute the upper bound of D^(i)^ with 95% confidence level. If the upper bound of D^(i)^ is less than pᵟ. We conclude “i” is the sufficient number of pathologists.

3 Inputs and Outputs

3.1 Inputs

This software has one driver file ONEST_main. Input to ONEST_main include

‘data’ = dataset, a matrix containing the binary pathology data. Each row is the data from one case, and each column is the data from one rater. Missing values are allowed and can be denoted as NA or left blank. If there are n cases and k raters, the input ‘data’ is a matrix with dimension n by k.

3.2 Outputs

Meanings of the output values are listed below.

consist_p: a vector of length k-1, indicating proportion of identical reads among a set of pathologists. For example, the first element of “consist_p” is the estimate of agreement percentage for 2 raters. The k-1 th element is the estimate of agreement percentage for k raters.
consist_low: a vector of length k-1, indicating the lower bound of the agreement percentage with 95% confidence level corresponding to “consist_p”.
diff_consist: a vector of length k-2, indicating the difference between the consist_p. For example, the first element of “diff_consist” is the estimated difference of agreement percentage after increasing from 2 to 3 raters. The k-2 th element is the difference of agreement percentage after increasing from k-1 to k raters.
diff_high: a vector of length k-2, indicating the upper bound of the change of agreement percentage corresponding to “diff_consist” with 95% confidence level.
size_case: number of cases n.
size_rater: number of raters k.
p: the probability of of being rated positive among the proportion of '1-p_plus-p_minus' cases.
p_plus: proportion of the cases rated positive by all raters.
p_minus: proportion of the cases rated negative by all raters.
empirical: a matrix of dimension k-1 by 3, including the empirical estimate of the agreement percentage, and the empirical 95% confidence intervals (CI) of the agreement percentage with equal tail probabilities on the two sides. The empirical estimate and CI were calculated by permuting the raters with 1000 random permutations, and using the mean, 2.5^th^ percentile, and 97.5^th^ percentile.

All the outputs were saved in the following structure.

consistency: This output includes “consist_p” and “consist_low,” where the data are used to plot figure(5).
difference: This output includes “diff_consist” and “diff_high”, where the data are used to plot figure(6) that can be used to determine the minimum number of evaluators needed to estimate agreement.
estimates: This output includes the ONEST estimates “size_case”, “size_case”, “p”, “p_plus”, and “p_minus”.
empirical: This output has the empirical estimation data for plotting figure(3). The first and third columns are the 2.5% and 97.5% lower and upper bounds of the empirical CI, respectively. The second column is the estimated agreement percentage using the empirical mean.

4 Example with dataset sp142_bin

The dataset "sp142_bin" is a pathology dataset of triple negative breast cancer in Reisenbichler et al. (2020) in a 68 by 18 matrix. An element in position (i, j) having value of 0 means negative for the i-th case, j-th rater, and a value of 1 means a positive evaluation.

Details about other datasets in the package can be found in the reference manual.

4.1 Load data

library(ONEST)
data("sp142_bin")

4.2 Plot the data and get the outputs

The following code is equivalent to ONEST_main(sp142_bin) and can only be applied to the example dataset sp142_bin to decrease the time to build the vignettes. Please use the ONEST_main function instead in practice.

# figure(1): Plot of the agreement percentage in the order of columns in the inputs;
# figure(2): Plot of the 100 randomly chosen permutations;
# figure(3): Plot of the empirical confidence interval;
# figure(4): Barchart: the x axis is the case number and the Y axis is the number of pathologists that called that case positive, sorted from lowest to highest on the y axis;
# figure(5): Plot of the proportion of identical reads among a set of pathologists;
# figure(6): Plot of the difference between the proportion of identical reads among a set of pathologists;

# ONEST_main(sp142_bin)
data('empirical')
ONEST_vignettes(sp142_bin,empirical)

4.3 The ONEST score test

A small p-value from this score test indicates significant evidence that the observers’ agreement will converge to a non-zero proportion.

data("sp142_bin")
ONEST_inflation_test(sp142_bin)

4.4 Code to run other examples

# (1) With example dataset sp263_bin:
# data("sp263_bin") ONEST_main(sp263_bin) ONEST_inflation_test(sp263_bin)

# (2) With example dataset NCNN_sp142:
# data("NCCN_sp142") ONEST_main(NCCN_sp142) ONEST_inflation_test(NCCN_sp142)

# (3) With example dataset NCNN_sp142_t:
# data("NCCN_sp142_t") ONEST_main(NCCN_sp142_t) ONEST_inflation_test(NCCN_sp142_t)

# (4) With example dataset NCCN_22c3_t:
# data("NCCN_22c3_t") ONEST_main(NCCN_22c3_t) ONEST_inflation_test(NCCN_22c3_t)

References

Any scripts or data that you put into this service are public.

ONEST documentation built on July 27, 2021, 1:06 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ONEST
Observers Needed to Evaluate Subjective Tests

ONEST"
In ONEST: Observers Needed to Evaluate Subjective Tests

1 General Information

2 Model and Inference

3 Inputs and Outputs

3.1 Inputs

3.2 Outputs

4 Example with dataset sp142_bin

4.1 Load data

4.2 Plot the data and get the outputs

4.3 The ONEST score test

4.4 Code to run other examples

References

Try the ONEST package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

ONEST Observers Needed to Evaluate Subjective Tests

ONEST" In ONEST: Observers Needed to Evaluate Subjective Tests

1 General Information

2 Model and Inference

3 Inputs and Outputs

3.1 Inputs

3.2 Outputs

4 Example with dataset sp142_bin

4.1 Load data

4.2 Plot the data and get the outputs

4.3 The ONEST score test

4.4 Code to run other examples

References

Try the ONEST package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

ONEST
Observers Needed to Evaluate Subjective Tests

ONEST"
In ONEST: Observers Needed to Evaluate Subjective Tests