iDEG: Identification of *i*ndividualized *D*ifferentially...

Description Usage Arguments Value Examples

Description

Identify differentionally expressed genes between two conditions, and only one transcriptome is collected for each condition.

Usage

1
2
3
4
iDEG(baseline, case, normalization = F, dataDistribution = c("NB",
  "Poisson"), numBin = 100, rankBaseline = T, estBaseline = F,
  estSize = F, spar = NULL, plot = 0, constDisp = T, df = 7,
  nulltype = 1, pct = 1e-04)

Arguments

baseline

a vector of gene expression levels of the baseline transcriptome (e.g., healthy tissue)

case

a vector of gene expression levels of the case transcriptome (e.g., tumor tissue)

normalization

a logical variable indicating if normalization has been done

dataDistribution

the distribuitonal assumption of the RNA-Seq data under analysis. Possible values are 'Poisson' and 'NB'. Default is NB–negative binomial.

numBin

number of bins used to group all genes into. Default is 100.

rankBaseline

if True, iDEG groups all genes based on the gene expression levels of the baseline transcriptome. If False, iDEG group all genes based on the gene expression levels of the average of baseline and case transcriptomes.

estBaseline

compute the dispersion parameter only using the baseline transcriptome

estSize

if True, size parameter is estiamted from each bin; if False, dispersion parameter is estiamted from each bin.

spar

smoothing parameter used to fit a smoothing spline, typically (but not necessarily) in (0,1]. The coefficient lambda of the integral of the squared second derivative in the fit (penalized log likelihood) criterion is a monotone function of ‘spar’, see the details from help(smooth.spline)

plot

plots desired. 0 gives no plots. 1 gives single plot showing the histogram of zz and fitted densities f and p0*f0.

constDisp

if True, iDEG assumes the dispersion is a count across all genes. If False, iDEG assume dispersion is a smooth fucntion os expression mean

df

the degrees of freedom used for estimating marginal distrution.

nulltype

type of null distribution assumed in computing the probability of gene differential expression. 0 is the theoretical null N(0,1), 1 is maximum likelihood estimation.

pct

the percentage of genes exculded from fiting the two-group mixture model.

Value

'iDEG' produces a list containing the following elements:

results

a table iDEG result for each gene. The first two columns are the gene epxression values of the two transcriptomes provided by the user. The thrid column is the local false discovery rate, which provides the probability of a gene being differentially expresseed. The fourth column is the statistic used to compute the local false discovery rate, and can be used as an effect size.

sizeHat

When the assumptioin of constant dispersion across genes is made, this is an single estimate of the common dispersion. When the assumptioin of non-constant is made, this is a vector of estimates for the dispersion parameter of each gene.

Examples

1
2
3
4
5
6
7
set.seed(1)
exp_mean1 <- rexp(20000, 1/500) + 1
exp_mean2 <- exp_mean1
exp_mean2[1:100] <- exp_mean2[1:100] * 10
transcriptome1 <- rnbinom(n = length(exp_mean1), size = 60, mu = exp_mean1)
transcriptome2 <- rnbinom(n = length(exp_mean2), size = 60, mu = exp_mean2)
res <- iDEG(transcriptome1,transcriptome2)

QikeLi/iDEG documentation built on May 17, 2019, 6:34 p.m.