README.md
In MultinomialMutations/MultinomialMutations: Fast multinomial regression for large cancer mutation datasets

MultinomialMutations

The R package MultinomialMutations was written to estimate a multinomial logistic regression model on the mutations (C to T, C to G, C to A, etc.) observed in cancer genomes, with genomic and epigenomic variables that describe the local environment, like nucleotide context, DNase1 hypersensitivity, etc, as regressors. Of course, other R packages for multinomial regression are available, but the huge data (hundreds or thousands of cancer genomes, with each genome consisting of approximately 3 billion positions) makes it necessary to implement a faster algorithm that takes advantage of the sparsity of the design matrix.

In the main function of the package, fast_multinom, the multinomial regression model with M categories is divided into M-1 binomial regression models. In each of them, parameters are estimated with the function MatrixModels from the contributed R-package MatrixModels. Next, the covariance matrix is computed for the joint model as described in Begg & Gray (1984). This is the core part of the R-package. It is a flexible function that is not restricted to the modeling of mutations, but it can be used for multinomial regression in a very general setting. The usage of the function mimics other regression functions like lm and glm and it also comes with a predict function.

For the analysis of the mutation data obtained from Fredriksson et al. (2014), data preparation and imputation functions have been added (although most of the data preparation is handled by -- Guo's repo --; sumstats_orig, count_table_prep_multinom, mimp, add_counts), as well as an example dataset (cancermutations). Furthermore, the deviance loss can be computed for variable selection by cross validation (deviance_loss) and estimates can be pooled after multiple imputation (pooling). Nested contrast matrices can be created with the functions nested_sum_contrasts and nested_treatment_contrasts.

More details about the data and model are found in Bertl, J., Guo, Q. et al. (2017). The scripts and wrapper functions used for the analysis in the paper are included in the package in the folder inst/Bertl_et_al_2017 and are explained in more detail below. The data will be made available. -- figshare --

Download (clone) the R package into a folder, say downloadfolder, and then type in R:

install.packages("downloadfolder/MultinomialMutations", repos=NULL)

Alternatively, install directly from github with the R package devtools:

install.packages("devtools")
library(devtools)
install_github("MultinomialMutations/MultinomialMutations")

The package depends on the packages psych, methods, Matrix, MatrixModels, data.table, reshape2, rmarkdown. Note that the data preparation functions depend on data.table and reshape2, whereas the estimation function fast_multinom depends on Matrix and MatrixModels.

For each genomic position, 4 outcomes are possible: no mutation (NO), transition (VA), transversion (VG) to a G:C basepair and transversion to a A:T basepair (VT). When the model includes the variable strong (dummy variable for G:C basepairs), this translates to a strand-symmetric mutation model.

In this example, we estimate parameters to quantify the impact of the APOBEC mutation signature on the different mutation types. A cancer type specific intercept is included to take into account that the mutation rate varies among cancer types.

# Load example data:
data(cancermutations)

# # The APOBEC mutation signature is only relevant for transitions (VA) and transversions to a G:C basepair (VG). Construct the corresponding subset of parameters for the 3 binomial models VA vs NO, VG vs NO and VT vs NO:
subs = matrix(T, ncol=3, nrow=4)
subs[3,2] = F

# Fit a multinomial logistic regression model to the mutation rate with the explanatory variables `strong` (C or G position), `apobec` (TpCpA or Tp**C**pT position) and a cancer type specific intercept.
fit = fast_multinom(cbind(NO, I, VA, VG) ~ strong + apobec + cancer_type, data = cancermutations, refLevel=1, loglik=T, predictions=T, VC=T, subsetmatrix=subs)

# Predictions on the data that was used for fitting (only available, because predictions=T in the function fast_multinom):
head(predict.fast_multinom(fit))

-- make data availabe on figshare and fill in here --

Bertl, J.; Guo, Q.; Rasmussen, M. J.; Besenbacher, S; Nielsen, M. M.; Hornshøj, H.; Pedersen, J. S. & Hobolth, A. A Site Specific Model And Analysis Of The Neutral Somatic Mutation Rate In Whole-Genome Cancer Data. bioRxiv, 2017. doi: https://doi.org/10.1101/122879

Juul, M.; Bertl, J.; Guo, Q.; Nielsen, M. M.; Świtnicki, M.; Hornshøj, H.; Madsen, T.; Hobolth, A. & Pedersen, J. S. Non-coding cancer driver candidates identified with a sample- and position-specific model of the somatic mutation rate. eLife, 2017, 6, e21778. https://doi.org/10.7554/eLife.21778

Begg, C. B. & Gray, R. Calculation of polychotomous logistic regression parameters using individualized regressions. Biometrika, 1984, 71, 11-18

Fredriksson, N. J.; Ny, L.; Nilsson, J. A. & Larsson, E. Systematic Analysis of noncoding somatic mutations and gene expression alterations across 14 tumor types. Nature Genetics, 2014, 46, 1258-1263

MultinomialMutations/MultinomialMutations documentation built on May 22, 2019, 4:39 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

MultinomialMutations/MultinomialMutations
Fast multinomial regression for large cancer mutation datasets

README.md
In MultinomialMutations/MultinomialMutations: Fast multinomial regression for large cancer mutation datasets

MultinomialMutations

Introduction

Installation

Small usage example

Scripts used for the analysis in Bertl, J., Guo, Q. et al (2017)

References

The package is used in:

Literature:

R Package Documentation

Browse R Packages

We want your feedback!

MultinomialMutations/MultinomialMutations Fast multinomial regression for large cancer mutation datasets

README.md In MultinomialMutations/MultinomialMutations: Fast multinomial regression for large cancer mutation datasets

MultinomialMutations

Introduction

Installation

Small usage example

Scripts used for the analysis in Bertl, J., Guo, Q. et al (2017)

References

The package is used in:

Literature:

R Package Documentation

Browse R Packages

We want your feedback!

MultinomialMutations/MultinomialMutations
Fast multinomial regression for large cancer mutation datasets

README.md
In MultinomialMutations/MultinomialMutations: Fast multinomial regression for large cancer mutation datasets