count_table_prep_multinom: Count table for multinomial regression
In MultinomialMutations/MultinomialMutations: Fast multinomial regression for large cancer mutation datasets

Description Usage Arguments Details Value Author(s) References See Also Examples

count_table_prep_multinom imputes missing values and reshapes the raw count table for multinomial logistic regression on the strand-symmetric mutation model.

1
2
3

count_table_prep_multinom(count_table, m, impute = T, strong = T, CpG = T,
  apobec = T, neighbors = T, DNase1_dummy = F, expression_dummy = F,
  data_source = "fredriksson")

`count_table`	data frame. Raw count table. See examples.
`m`	integer. Number of multiple imputations. 0 means no imputation.
`impute`	logical. Impute missing values or remove them? See details.
`strong`	logical. Should the variable strong be computed?
`CpG`	logical.
`apobec`	logical.
`neighbors`	logical.
`DNase1_dummy`	logical. Should a dummy variable be computed for DNase1 peaks? If yes, NAs in the original variable DNase1 are replaced by zero. This means that DNase1 := DNase1*I(DNase1_dummy==1).
`expression_dummy`	logical. Should a dummy variable be computed for expression measure available? If yes, NAs in the original variable expression are replaced by zero.
`data_source`	character. Either "fredriksson" or "pcawg".

If impute=T and m>=1, missing data is imputed m times. If impute=T and m=0, the missing data is not touched and just kept as NA. When impute=F, the value of m is irrelevant. In this case, only the complete cases are output (using the function complete.cases).

The packages data.table and reshape2 are used for efficient and fast handling of the large mutation datasets. Multiple imputation is handled by the function mimp.

Note that this function uses a few small functions from small_dataprep_functions.R.

The Fredriksson and PCAWG datasets are handled in the same way, apart from the location information that is removed from the cancer type in the PCAWG set.

A list that consists of the following elements:

imputed: a list of length min(m, 1) of imputed or complete data.tables (data.frames)
missing: a named vector giving the number of sites with missing values
total_count: an integer value giving the total number of sites (to compute proportions of missing sites)

Johanna Bertl & Malene Juul

Bertl, J.; Guo, Q.; Rasmussen, M. J.; Besenbacher, S; Nielsen, M. M.; Hornshøj, H.; Pedersen, J. S. & Hobolth, A. A Site Specific Model And Analysis Of The Neutral Somatic Mutation Rate In Whole-Genome Cancer Data. bioRxiv, 2017. doi: https://doi.org/10.1101/122879 http://www.biorxiv.org/content/early/2017/06/21/122879

cancermutations

# This is how the example dataset cancermutations was created (data(cancermutations)).

# use system.file to find the raw dataset that was installed along with the package:
location = system.file("extdata", "set0", package = "multinomutils")
count.raw = read.table(file=location, header = T, as.is=T)

# data preparation with imputation (on a subset of the data, for speed -- this still takes a few minutes!)
set.seed(1234)
count.raw.sub = count.raw[sample.int(nrow(count.raw), 1000),]
count.imp = count_table_prep_multinom(count.raw.sub, 2)
# Note that imputation only works if there is more than one cancer type in the dataset.

# data preparation without imputation, but with expression dummy variable:
count.noimp = count_table_prep_multinom(count.raw, 0, expression_dummy=T)+

# complete cases only
count.complete = count_table_prep_multinom(count.raw, m=5, impute=F)
# Note that this doesn't work with a very small subset of the data where after removal of the missing cases not all 4 mutation types exist.
# This is similar to the example dataset cancermutations. The code to create this dataset is in data-raw/cancermutations.R

MultinomialMutations/MultinomialMutations documentation built on May 22, 2019, 4:39 p.m.

MultinomialMutations/MultinomialMutations index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

MultinomialMutations/MultinomialMutations
Fast multinomial regression for large cancer mutation datasets

count_table_prep_multinom: Count table for multinomial regression
In MultinomialMutations/MultinomialMutations: Fast multinomial regression for large cancer mutation datasets

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to count_table_prep_multinom in MultinomialMutations/MultinomialMutations...

R Package Documentation

Browse R Packages

We want your feedback!

MultinomialMutations/MultinomialMutations Fast multinomial regression for large cancer mutation datasets

count_table_prep_multinom: Count table for multinomial regression In MultinomialMutations/MultinomialMutations: Fast multinomial regression for large cancer mutation datasets

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to count_table_prep_multinom in MultinomialMutations/MultinomialMutations...

R Package Documentation

Browse R Packages

We want your feedback!

MultinomialMutations/MultinomialMutations
Fast multinomial regression for large cancer mutation datasets

count_table_prep_multinom: Count table for multinomial regression
In MultinomialMutations/MultinomialMutations: Fast multinomial regression for large cancer mutation datasets