estimate.sim.params: Estimate negative binomial parameters from real data

Description Usage Arguments Value Author(s) Examples

View source: R/metaseqr.sim.R

Description

This function reads a read counts table containing real RNA-Seq data (preferebly with more than 20 samples so as to get as much accurate as possible estimations) and calculates a population of count means and dispersion parameters which can be used to simulate an RNA-Seq dataset with synthetic genes by drawing from a negative binomial distribution. This function works in the same way as described in (Soneson and Delorenzi, BMC Bioinformatics, 2013) and (Robles et al., BMC Genomics, 2012).

Usage

1
2
3
    estimate.sim.params(real.counts, libsize.gt = 3e+6,
        rowmeans.gt = 5,eps = 1e-11, 
        restrict.cores = 0.1, seed = 42, draw = FALSE)

Arguments

real.counts

a text tab-delimited file with real RNA-Seq data. The file should strictly contain a unique gene name (e.g. Ensembl accession) in the first column and all other columns should contain read counts for each gene. Each column must be named with a unique sample identifier. See examples in the ReCount database http://bowtie-bio.sourceforge.net/recount/.

libsize.gt

a library size below which samples are excluded from parameter estimation (default: 3000000).

rowmeans.gt

a row means (mean counts over samples for each gene) below which genes are excluded from parameter estimation (default: 5).

eps

the tolerance for the convergence of optimize function. Defaults to 1e-11.

restrict.cores

in case of parallel optimization, the fraction of the available cores to use.

seed

a seed to use with random number generation for reproducibility.

draw

boolean to determine whether to plot the estimated simulation parameters (mean and dispersion) or not. Defaults to FALSE (do not draw a mean-dispersion scatterplot).

Value

A named list with two members: mu.hat which contains negative binomial mean estimates and phi.hat which contains dispersion estimates.

Author(s)

Panagiotis Moulos

Examples

1
2
3
4
5
6
7
# Dowload locally the file "bottomly_read_counts.txt" from
# the ReCount database
download.file(paste("http://bowtie-bio.sourceforge.net/",
    "recount/countTables/bottomly_count_table.txt",sep=""),
    destfile="~/bottomly_count_table.txt")
# Estimate simulation parameters
par.list <- estimate.sim.params("~/bottomly_count_table.txt")

metaseqR documentation built on Nov. 8, 2020, 5:57 p.m.