mimp: Multiple imputation for count data
In MultinomialMutations/MultinomialMutations: Fast multinomial regression for large cancer mutation datasets

Description Usage Arguments Details Value Author(s) See Also Examples

The function mimp produces multiply imputed datasets for categorical count data.

1	mimp(x, to.impute, m)

`x`	data frame with ncols-1 categorical columns (do not need to be factors, but note that numeric vectors are handled as if they were categorical) and one column with counts, named "count".
`to.impute`	character vector of names of the columns where the missing values should be imputed (if there are any, this is tested in the function)
`m`	number of multiple imputations

A count dataset consists of a set of categorical variables and a column of counts (named "count"). It is not necessary that each combination of categories is present in the dataset. The imputation is conducted for each variable independently by sampling from its empirical distribution function.

Functionalities from the package data.table are used to speed up computations.

There are sophisticated and flexible R packages for multiple imputation available. I wrote this function because the functions from package cat, e. g. em.cat, can't handle the large counts that we have in our dataset (it uses the integer class for communicating with an underlying Fortran function, so it can cause integer overflows) and the package mice doesn't support count data.

If there are no missing values in the specified columns, there is no output and the function throws a warning. Otherwise, a list of m complete data frames. Note that they have the same number of counts, but they will usually not have the same number of rows as the original data frame.

Johanna Bertl

imp.cat, mice

# use a subset of the raw dataset that was installed along with the package:
location = system.file("extdata", "set0", package = "multinomutils")
count_table = fread(file=location) 
count_table = count_table[count_table$cancer_type %in% c("KICH", "LGG")]
count_table[,left:=NULL]
count_table[,right:=NULL]

# number of missing values per column of count_table
missing = apply(count_table, 2, FUN = function(x) sum(is.na(x)))
names(missing) = names(count_table)

# 3-fold multiple imputation
count_table_imp = mimp(x = count_table, to.impute = names(missing[missing>0]), m = 3)

MultinomialMutations/MultinomialMutations documentation built on May 22, 2019, 4:39 p.m.

MultinomialMutations/MultinomialMutations index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

MultinomialMutations/MultinomialMutations
Fast multinomial regression for large cancer mutation datasets

mimp: Multiple imputation for count data
In MultinomialMutations/MultinomialMutations: Fast multinomial regression for large cancer mutation datasets

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Related to mimp in MultinomialMutations/MultinomialMutations...

R Package Documentation

Browse R Packages

We want your feedback!

MultinomialMutations/MultinomialMutations Fast multinomial regression for large cancer mutation datasets

mimp: Multiple imputation for count data In MultinomialMutations/MultinomialMutations: Fast multinomial regression for large cancer mutation datasets

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Related to mimp in MultinomialMutations/MultinomialMutations...

R Package Documentation

Browse R Packages

We want your feedback!

MultinomialMutations/MultinomialMutations
Fast multinomial regression for large cancer mutation datasets

mimp: Multiple imputation for count data
In MultinomialMutations/MultinomialMutations: Fast multinomial regression for large cancer mutation datasets