ONEST"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(ONEST)

1 General Information

The Observers Needed to Evaluate Subjective Tests software implements a statistical method in Reisenbichler et al. (2020^[Reisenbichler, E. S., Han, G., Bellizzi, A., Bossuyt, V., Brock, J., Cole, K., Fadare, O., Hameed, O., Hanley, K., Harrison, B. T., Kuba, M. G., Ly, A., Miller, D., Podoll, M., Roden, A. C., Singh, K., Sanders, M. A., Wei, S., Wen, H., Pelekanou, V., Yaghoobi, V., Ahmed, F., Pusztai, L., and Rimm, D. L. (2020) “Prospective multi-institutional evaluation of pathologist assessment of PD-L1 assays for patient selection in triple negative breast cancer," Mod Pathol, DOI: 10.1038/s41379-020-0544-x; PMID: 32300181.]), to determine the minimum number of evaluators needed to estimate agreement involving a large number of raters. This method could be utilized by regulatory agencies, such as the FDA, when evaluating agreement levels of a newly proposed subjective laboratory test. Input to the program should be binary(1/0) pathology data, where “0” may stand for negative and “1” for positive. The example datasets in this software are from Rimm et al. (2017^[Rimm, D. L., Han, G., Taube, J. M., Yi, E. S., Bridge, J. A., Flieder, D. B., Homer, R., West, W. W., Wu, H., Roden, A. C., Fujimoto, J., Yu, H., Anders, R., Kowalewski, A., Rivard, C., Rehman, J., Batenchuk, C., Burns, V., Hirsch, F. R., and Wistuba,, II (2017) “A Prospective, Multi-institutional, Pathologist-Based Assessment of 4 Immunohistochemistry Assays for PD-L1 Expression in Non-Small Cell Lung Cancer," JAMA Oncol, 3(8), 1051-1058, DOI: 10.1001/jamaoncol.2017.0013, PMID: 28278348.]) (the SP142 assay), and Reisenbichler et al. 2020. This program can run in R version 3.5.0 and above.

2 Model and Inference

We briefly introduce the statistical model and inference implemented by this program. Let p denote the proportion of concordant (i.e., identical) reads among a group of raters, and the group size can be two or more. We let “p^+^” denote the proportion of tissue cases that will always be evaluated positive by all the raters, and “p^-^” a proportion that will always be evaluated negative. Among the proportion of “1-p^+^-p^-^” cases that could be rated either positive or negative, each case has the probability “p” of being rated positive from any pathologist. Then the proportion of consistent reads among k pathologists can be written as p^(k)^ = p^+^+p^-^+(1-p^+^-p^-^)[p^k^+(1-p)^k^].

Let “I” denote the minimal sufficient number of pathologists in the sense that “I” is the minimum integer value to satisfy p ^(i)^ - p^(i+1)^ < pᵟ with a large probability (e.g., 95%), where pᵟ is a threshold of the change in the percentage agreement due to including one additional pathologist. Let p~c~ = p^+^ + p^-^.

The statistical inference is based on the joint likelihood function of parameters p^+^, p^-^, and p. For n cases and k pathologists, we have the data {y~ij~; i=1,…,n, j=1,…,k}. Each observation y~ij~ is binary, where y~ij~ =1 if the read is positive and y~ij~ =0 if the read is negative. The probabilities of y~ij~=1 and y~ij~=0 can be written as P(y~ij~=1) = p^+^+ p(1-p^+^-p^-^) and P(y~ij~=0) = p^-^+ (1-p)(1-p^+^-p^-^), respectively. We assume all {yij} are independently and identically distributed. The likelihood function can be written as L(p, p^+^, p-|{y~ij~}) = [p^+^+ p(1-p^+^-p^-^)]^T^ [p^-^+ (1-p)(1-p^+^-p^-^)]^nk-T^, where T is the total number of reading equal to 1 among all “nk” reads. With k pathologists, we let n~c~ denote the number of consistent reads among n cases, so n~c~ ~ Bin(n, p~c~). Similarly, we have n^+^ ~ Bin(n, p^+^) and n^-^ ~ Bin(n, p^-^), where n^+^ and n^-^ denote the numbers of cases that all pathologists read positive and negative, respectively.

Based on the binomial maximum likelihood estimation, the estimates are p^+^ = n^+^/n, p^-^ = n^-^/n, p^+^+ p(1-p^+^-p^-^) = T/(nk), and p = [T/(nk) - p^+^]/(1-p^+^-p^-^). We then estimate p by plugging the estimates of {p~c~ , p} into the equation p ^(k)^ = p~c~ +(1-p^+^-p^-^)[p^k^+(1-p)^k^]. We define the objective function as D^(i)^ = p ^(i)^ - p^(i+1)^=(1-p^+^-p^-^)[p^i^(1-p)+ p(1-p)^i^]. The estimate of “p” depends on the product of n and k, and the estimate of p~c~ is n~c~/n. We use 95% as the probability threshold. Based on the central limit theorem, the asymptotic 95% lower bound of p~c~ is: n~c~/n-1.645[n~c~(n-n~c~)/n^3^]^1/2^. By plugging in this lower bound of p~c~ we can compute the upper bound of D^(i)^ with 95% confidence level. If the upper bound of D^(i)^ is less than pᵟ. We conclude “i” is the sufficient number of pathologists.

3 Inputs and Outputs

3.1 Inputs

This software has one driver file ONEST_main. Input to ONEST_main include

3.2 Outputs

Meanings of the output values are listed below.

All the outputs were saved in the following structure.

4 Example with dataset sp142_bin

The dataset "sp142_bin" is a pathology dataset of triple negative breast cancer in Reisenbichler et al. (2020) in a 68 by 18 matrix. An element in position (i, j) having value of 0 means negative for the i-th case, j-th rater, and a value of 1 means a positive evaluation.

Details about other datasets in the package can be found in the reference manual.

4.1 Load data

library(ONEST)
data("sp142_bin")

4.2 Plot the data and get the outputs

The following code is equivalent to ONEST_main(sp142_bin) and can only be applied to the example dataset sp142_bin to decrease the time to build the vignettes. Please use the ONEST_main function instead in practice.

# figure(1): Plot of the agreement percentage in the order of columns in the inputs;
# figure(2): Plot of the 100 randomly chosen permutations;
# figure(3): Plot of the empirical confidence interval;
# figure(4): Barchart: the x axis is the case number and the Y axis is the number of pathologists that called that case positive, sorted from lowest to highest on the y axis;
# figure(5): Plot of the proportion of identical reads among a set of pathologists;
# figure(6): Plot of the difference between the proportion of identical reads among a set of pathologists;

# ONEST_main(sp142_bin)
data('empirical')
ONEST_vignettes(sp142_bin,empirical)

4.3 The ONEST score test

A small p-value from this score test indicates significant evidence that the observers’ agreement will converge to a non-zero proportion.

data("sp142_bin")
ONEST_inflation_test(sp142_bin)

4.4 Code to run other examples

# (1) With example dataset sp263_bin:
# data("sp263_bin") ONEST_main(sp263_bin) ONEST_inflation_test(sp263_bin)

# (2) With example dataset NCNN_sp142:
# data("NCCN_sp142") ONEST_main(NCCN_sp142) ONEST_inflation_test(NCCN_sp142)

# (3) With example dataset NCNN_sp142_t:
# data("NCCN_sp142_t") ONEST_main(NCCN_sp142_t) ONEST_inflation_test(NCCN_sp142_t)

# (4) With example dataset NCCN_22c3_t:
# data("NCCN_22c3_t") ONEST_main(NCCN_22c3_t) ONEST_inflation_test(NCCN_22c3_t)

References



Try the ONEST package in your browser

Any scripts or data that you put into this service are public.

ONEST documentation built on July 27, 2021, 1:06 a.m.