estimateSimParams: Estimate negative binomial parameters from real data

View source: R/sim.R

estimateSimParamsR Documentation

Estimate negative binomial parameters from real data

Description

This function reads a read counts table containing real RNA-Seq data (preferebly with more than 20 samples so as to get as much accurate as possible estimations) and calculates a population of count means and dispersion parameters which can be used to simulate an RNA-Seq dataset with synthetic genes by drawing from a negative binomial distribution. This function works in the same way as described in (Soneson and Delorenzi, BMC Bioinformatics, 2013) and (Robles et al., BMC Genomics, 2012).

Usage

    estimateSimParams(realCounts, libsizeGt = 3e+6,
        rowmeansGt = 5,eps = 1e-11, rc = NULL, draw = FALSE)

Arguments

realCounts

a text tab-delimited file with real RNA-Seq data. See Details.

libsizeGt

a library size below which samples are excluded from parameter estimation (default: 3000000).

rowmeansGt

a row means (mean counts over samples for each gene) below which genes are excluded from parameter estimation (default: 5).

eps

the tolerance for the convergence of optimize function. Defaults to 1e-11.

rc

in case of parallel optimization, the fraction of the available cores to use.

draw

boolean to determine whether to plot the estimated simulation parameters (mean and dispersion) or not. Defaults to FALSE (do not draw a mean-dispersion scatterplot).

Details

Regarding realCounts, the file should strictly contain a unique gene name (e.g. Ensembl accession) in the first column and all other columns should contain read counts for each gene. Each column must be named with a unique sample identifier. See examples in the ReCount database http://bowtie-bio.sourceforge.net/recount/.

Also, the parameter estimation involves a lot of random sampling. For guaranteed reproducibility, be sure to use set.seed prior to any calculations. By default, when the metaseqR2 package is loaded, the seed is set to 42.

Value

A named list with two members: muHat which contains negative binomial mean estimates and phiHat which contains dispersion estimates.

Author(s)

Panagiotis Moulos

Examples

dataMatrix <- metaseqR2:::exampleCountData(2000)
parList <- estimateSimParams(dataMatrix,libsizeGt=3e+4)

pmoulos/metaseqR2 documentation built on May 20, 2024, 5:48 a.m.