goodturing_probs: Computing Good Turing probabilities of encountering gene...

Description Usage Arguments Examples

View source: R/goodturing_probs.R

Description

Computing Good Turing probabilities of encountering gene variants (including hitherto unobserved variants) based on training gene mutation frequencies

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
goodturing_probs(
  counts = NULL,
  r = NULL,
  N_r = NULL,
  m = NULL,
  conf = 1.96,
  N0min = 0,
  N0 = NULL,
  N12_imp = 1,
  N = NULL
)

Arguments

counts

vector of counts or frequencies of the observed variants.

r

unique frequencies.

N_r

frequency of frequency r.

m

training cohort size.

N0min

the minimum value of N0, if known, to be used while estimating N0. Ignored if N0 is not NULL.

N0

the total number of unobserved variants. If NULL, N0 is estimated using Chao's formula.

N12_imp

imputed value of N1 and N2 if either of them is 0. Defaults to 1.

N

The total number of variants, which is sum(Nr) for r >= 0. Used for computation of N0. Ignored if N0 is provided.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
## Not run: 
# load tcga data
data("tcga")
tcga <- data.table::setDT(tcga)

# calculate variant frequencies for KRAS
var_freq <- tcga[Hugo_Symbol == "KRAS",
                 .(v_f = length(unique(patient_id))),
                 by = .(Hugo_Symbol, Variant)
                 ]
v_f <- var_freq$v_f
names(v_f) <- var_freq$Variant

# calculate cohort size
m <- length(unique(tcga$patient_id))


# Good Turing estimates
goodturing_probs(counts = v_f, m = m)

## End(Not run)

c7rishi/variantprobs documentation built on June 23, 2020, 7:42 a.m.