sgt_Delta: Smooth Good-Toulmin estimate of Delta(t), the (expected)...

Description Usage Arguments Details Note References Examples

View source: R/sgt_Delta.R

Description

Smooth Good-Toulmin estimate of Δ(t), the (expected) number of new variants in a future (test) cohort that is t times as large as the training cohort

Usage

1
sgt_Delta(counts = NULL, r = NULL, N_r = NULL, m, t = 1, adj = TRUE)

Arguments

counts

vector of counts or frequencies of the observed variants.

r

unique frequencies.

N_r

frequency of frequency r.

m

training cohort size.

t

positive scalar. The proportion of the future (test) cohort size to the training cohort size.

adj

logical. Should the Orlitsky et al. adjustment be used? Defaults to TRUE. Ignored if t < 1.

Details

Computes the original Good Toulmin (1956) estimate of Δ(t) if t <= 1. If t > 1, the Efron-Thisted estimate (if adj = FALSE) or the Efron-Thisted estimate with Orlitsky et al. (2016) adjustment (if adj = TRUE) is computed. Also returns an approximate standard error ("se") of the estimate as an attribute, computed using the formula provided in Efron-Thisted (1976, equation 5.2).

Note

Either (a) counts, or (b) r and N_r must be provided.

References

Good, I. J., & Toulmin, G. H. (1956). The number of new species, and the increase in population coverage, when a sample is increased. Biometrika, 43(1–2), 45–63. https://doi.org/10.1093/biomet/43.1-2.45.

Efron, B., & Thisted, R. (1976). Estimating the Number of Unseen Species: How Many Words Did Shakespeare Know? Biometrika, 63(3), 435–447. Retrieved from http://www.jstor.org/stable/2335721.

Orlitsky, A., Suresh, A. T., & Wu, Y. (2016). Optimal prediction of the number of unseen species. Proceedings of the National Academy of Sciences, 113(47), 13283–13288. https://doi.org/10.1073/pnas.1607774113

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
## Not run: 
# load tcga data
data("tcga")
tcga <- data.table::setDT(tcga)

# calculate variant frequencies
var_freq <- tcga[,
                 .(v_f = length(unique(patient_id))),
                 by = .(Hugo_Symbol, Variant)
                 ]

# calculate cohort size
m <- length(unique(tcga$patient_id))


# SGT Delta(t) estimate for t = 0.5, 1, 10
sgt_Delta(counts = var_freq$v_f, m = m, t = 0.5)
sgt_Delta(counts = var_freq$v_f, m = m, t = 1)
sgt_Delta(counts = var_freq$v_f, m = m, t = 10)

## End(Not run)

c7rishi/variantprobs documentation built on June 23, 2020, 7:42 a.m.