add_counts: Adding pseudo counts to the number of mutations
In MultinomialMutations/MultinomialMutations: Fast multinomial regression for large cancer mutation datasets

Description Usage Arguments Details Value Author(s) References See Also Examples

The function add_counts adds pseudo counts to the number of mutations to incorporate prior information and avoid counts of zero.

1	add_counts(x, reference = NULL, categorical, make.integers = F)

`x`	data frame with a set of count columns named NO, I, VA, VG (and YES).
`reference`	reference data frame of the same format. If reference=NULL, x is used. Using a different reference data frame than is currently not implemented.
`categorical`	names of the categorical variables. This vector should not include cancer_type and sample_id.
`make_integers`	logical. Should the final output contain integer counts only?

The function uses functionalities from the packages data.table.

Let c_1, ..., c_n be a set of categorical variables and v_1, ..., v_m the remaining (categorical or continuous) explanatory variables in the dataset, except for the two variables sample_id and cancer_type.

First, the function checks if there is a positive count for each mutation type, denoted by nI, nVA and nVG for each combination of c_1 x ... x c_n x sample_id. If there isn't, pseudo counts are added to nNO, nI, nVA and nVG.

The pseudo counts are obtained from nNO, nI, nVA and nVG for each combination of c_1 x ... x c_n x v_1 x ... x x_m x cancer_type and added to the observed counts for each combination c_1 x ... x c_n x v_1 x ... x x_m x sample_id. The pseudo counts and the observed counts are weighted equally. The new sum nNO + nI + nVA + nVG is the same as originally, so the number of sites of a specific category is preserved and the size of the genome doesn't change.

If 'make_integers=T is used, this only holds approximately, because after adjusting to the number of observed sites, the ceiling is used (to avoid zero counts). This avoids very small non-integer counts that can induce the same numerical problems as zero counts, but on the other hand it increases the number of counts and can cause quite substantial biases.

Note that the number of mutations per sample is not preserved.

A data frame (or data table) of the exact same format as the input table x with an additional logical column 'zero' (indicating the addition of pseudocounts because of a zero mutation count).

Johanna Bertl

Bertl, J.; Guo, Q.; Rasmussen, M. J.; Besenbacher, S; Nielsen, M. M.; Hornshøj, H.; Pedersen, J. S. & Hobolth, A. A Site Specific Model And Analysis Of The Neutral Somatic Mutation Rate In Whole-Genome Cancer Data. bioRxiv, 2017. doi: https://doi.org/10.1101/122879 http://www.biorxiv.org/content/early/2017/06/21/122879

add_counts_pres_mut – does the same, but preserving the number of mutations per sample.

# Adding a prior to the example data

data(cancermutations)
newdata = add_counts(cancermutations, categorical=c("strong", "neighbors"), make.integers=T)

# Looking at a sample with few mutations to see the effect of the imputation: 

sample02 = cancermutations[cancermutations$sample_id=="GBM_TCGA_02_2483_01A" & cancermutations$strong==1 & cancermutations$neighbors=="TG",]
new02 = newdata[newdata$sample_id=="GBM_TCGA_02_2483_01A" & newdata$strong==1 & newdata$neighbors=="TG",]

# number of mutations before adding the prior:
sum(sample02$YES)
# number of mutations after adding the prior:
sum(new02$YES)

MultinomialMutations/MultinomialMutations documentation built on May 22, 2019, 4:39 p.m.

MultinomialMutations/MultinomialMutations index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

MultinomialMutations/MultinomialMutations
Fast multinomial regression for large cancer mutation datasets

add_counts: Adding pseudo counts to the number of mutations
In MultinomialMutations/MultinomialMutations: Fast multinomial regression for large cancer mutation datasets

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to add_counts in MultinomialMutations/MultinomialMutations...

R Package Documentation

Browse R Packages

We want your feedback!

MultinomialMutations/MultinomialMutations Fast multinomial regression for large cancer mutation datasets

add_counts: Adding pseudo counts to the number of mutations In MultinomialMutations/MultinomialMutations: Fast multinomial regression for large cancer mutation datasets

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to add_counts in MultinomialMutations/MultinomialMutations...

R Package Documentation

Browse R Packages

We want your feedback!

MultinomialMutations/MultinomialMutations
Fast multinomial regression for large cancer mutation datasets

add_counts: Adding pseudo counts to the number of mutations
In MultinomialMutations/MultinomialMutations: Fast multinomial regression for large cancer mutation datasets