EBMultiTest: Using EM algorithm to calculate the posterior probabilities...

View source: R/EBMultiTest.R

EBMultiTestR Documentation

Using EM algorithm to calculate the posterior probabilities of interested patterns in a multiple condition study

Description

'EBMultiTest' is built based on the assumption of NB-Beta Empirical Bayes model. It utilizes the EM algorithm to give the posterior probability of the interested patterns.

Usage

EBMultiTest(Data, NgVector = NULL, Conditions, sizeFactors, uc = 0, AllParti = NULL, fast = T,
    Alpha = NULL, Beta = NULL, Qtrm = 1, QtrmCut = 0, maxround = 50, 
    step1 = 1e-06, step2 = 0.01, thre = log(2), sthre = 0, 
    filter = 10, stopthre = 1e-04, nequal = 2)

Arguments

Data

A data matrix contains expression values for each transcript (gene or isoform level). In which rows should be transcripts and columns should be samples.

NgVector

A vector indicates the uncertainty group assignment of each isoform. e.g. if we use number of isoforms in the host gene to define the uncertainty groups, suppose the isoform is in a gene with 2 isoforms, Ng of this isoform should be 2. The length of this vector should be the same as the number of rows in Data. If it's gene level data, Ngvector could be left as NULL.

Conditions

A vector indicates the condition in which each sample belongs to.

sizeFactors

The normalization factors. It should be a vector with lane specific numbers (the length of the vector should be the same as the number of samples, with the same order as the columns of Data).

uc

number of unceratin positions, unit levels

AllParti

user specified set of partitions, a matrix, with each row represent a partition

fast

boolean indicator whether to use fast EBSeq or full EBSeq

Alpha

start value of hyper parameter alpha

Beta

start value of hyper parameter beta

Qtrm, QtrmCut

Transcripts with Qtrm th quantile < = QtrmCut will be removed before testing. The default value is Qtrm = 1 and QtrmCut=0. By default setting, transcripts with all 0's won't be tested.

maxround

Number of iterations. The default value is 50. Users should always check the convergency by looking at the Alpha and Beta in output. If the hyper-parameter estimations are not converged in 50 iterations, larger number is suggested.

step1

stepsize for gradient ascent of alpha

step2

stepsize for gradietn ascent of beta

thre

threshold for determining the state of a position

sthre

shrinkage threshold for iterative pruning during the EM updates

filter

filterthreshold for low expression units

stopthre

stopping threshold for EM

nequal

when there is a chain of equal states with the number of equal states bigger than nequal, equalhandle algorithm will be used to further checking the homogeneity between the group means

Value

Alpha

Fitted parameter alpha of the prior beta distribution.

Beta

Fitted parameter beta of the prior beta distribution.

P

Global proportion of DE patterns.

RList

The fitted values of r for each transcript.

MeanList

The mean of each transcript (across conditions).

VarList

The variance of each transcript (across conditions).

QList

The fitted q values of each transcript within the two conditions

Mean

The mean of each transcript within the two conditions (adjusted by normalization factors).

Var

The estimated variance of each transcript within the two conditions (adjusted by normalization factors).

PoolVar

The variance of each transcript (The pooled value of within condition EstVar).

DataNorm

Normalized expression matrix.

Iso

same as NgVector

AllZeroIndex

The transcript with expression 0 for all samples (which are not tested).

PPMat

The Posterior Probability of following each pattern (columns) for each transcript (rows). Transcripts with expression 0 for all samples are not shown in this matrix.

AllParti

selected patterns

PPMatWith0

The Posterior Probability of following each pattern (columns) for each transcript (rows). Transcripts with expression 0 for all samples are shown in this matrix with PP(any_pattrn)=NA. The transcript order is exactly the same as the order of the input data.

Conditions

The input conditions.

NumUC

The number of uncertain positions at each unit

Author(s)

Ning Leng, Xiuyu Ma

References

Ning Leng, John A. Dawson, James A. Thomson, Victor Ruotti, Anna I. Rissman, Bart M.G. Smits, Jill D. Haag, Michael N. Gould, Ron M. Stewart, and Christina Kendziorski. EBSeq: An empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics (2013)

See Also

EBTest, GetMultiPP, GetMultiFC

Examples

data(MultiGeneMat)
Conditions = c("C1","C1","C2","C2","C3","C3")
MultiSize = MedianNorm(MultiGeneMat)
MultiOut = EBMultiTest(MultiGeneMat,Conditions=Conditions,uc = 2,
                     sizeFactors=MultiSize)
MultiPP = GetMultiPP(MultiOut)

wiscstatman/EBSeq documentation built on June 3, 2023, 7:34 a.m.