npSeq.Main: Discover differentially expressed genes using a nonparametric...

Description Usage Arguments Value Author(s) References Examples

View source: R/npSeq.Main.R

Description

Discover significant genes and estimate false discovery rates using the method described in Jun Li and Robert Tibshirani (2011).
This is the main (key) function of this package.

Usage

1
npSeq.Main(dat, para=list())

Arguments

dat

a list with elements:
n: count matrix. row: counts from a gene, column: counts from an experiment. each element should be a non-negative integer. original count matrix (not normalized.)
y: outcome vector. twoclass data: '1', '2' for two classes. multiclass data: '1', '2', ..., 'K' for K classes. quantitative data: real numbers. survi: real numbers (survival times).
type: "twoclass", "multiclass", "quant", or "survi".
gname(optional): the names of the genes.
gamma(optional): censoring statuses. '1' for observed (died), '0' for censored.
delta(optional): true significance. TRUE for significance. FALSE for insignificance. This can only be known in simulated data. When delta is not null, true false discovery rates will be calculated and returned.

para

a list with elements (all of them are optional):
npermu: number of permutations used to estimate FDR. Default value: 100.
nsam: number of resamplings. Default value: 20.
sam.meth: resampling method: '1' for subsampling, '2' for Poisson sampling. Default value: 2.
seed: random seed for resampling. Default value: 20.
ct.sum: if the total number of reads of a gene across all experiments <= ct.sum, this gene will not be considered for differential expression detection. Default value: 5.
ct.mean: if the mean number of reads of a gene across all experiments <= ct.mean, this gene will not be considered for differential expression detection. Default value: 0.5.

Value

a data frame (table) containing the following columns. Each row stands for a gene. The genes are sorted from the most significant to the most insignificant.

nc

number of significant genes called.

gname

the sorted gene names.

tt

The statistics of the genes.

pval

Permutation-based p-values of the genes.

fdr

Estimated false discovery rate.

log.fc

Estimated log fold change of the genes. Only available for twoclass outcomes.

tfdr

True false discovery rate. Only available when dat$delta is not NULL.

Author(s)

Jun Li

References

Jun Li and Robert Tibshirani (2011). Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. To appear, Statistical Methods in Medical Research.

Jun Li, Daniela M. Witten, Iain Johnstone, Robert Tibshirani (2011). Normalization, testing, and false discovery rate estimation for RNA-sequencing data. To appear, Biostatistics.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
## two class negative binomial-distributed data with outliers,
## 12 samples in each class
dat <- npSeq.Simu.Data(list(type='twoclass', NGENE=1000, option=4, NSAM=c(8, 8)))
np.fdr <- npSeq.Main(dat)

## 4 class Poisson-distributed data with outliers,
## 6 samples in each class
dat <- npSeq.Simu.Data(list(type='multiclass', NGENE=1000, option=3, NSAM=c(3, 3, 3, 3)))
np.fdr <- npSeq.Main(dat)

## quantitative negative binomial-distributed data with outliers,
## 24 samples totally
dat <- npSeq.Simu.Data(list(type='quant', NGENE=1000, option=4, NSAM=12))
np.fdr <- npSeq.Main(dat)

## survival negative binomial-distributed data with outliers,
## 24 samples totally
dat <- npSeq.Simu.Data(list(type='survi', NGENE=1000, option=4, NSAM=12))
np.fdr <- npSeq.Main(dat)

joey711/npSeq documentation built on May 19, 2019, 3:01 p.m.