DEGseq: DEGseq: Identify Differentially Expressed Genes from RNA-seq...

Description Usage Arguments References See Also Examples

View source: R/MainFunctionWrap.R

Description

This function is used to identify differentially expressed genes from RNA-seq data. It takes uniquely mapped reads from RNA-seq data for the two samples with a gene annotation as input. So users should map the reads (obtained from sequencing libraries of the samples) to the corresponding genome in advance.

Usage

1
2
3
4
5
6
7
DEGseq(mapResultBatch1, mapResultBatch2, fileFormat="bed", readLength=32,
       strandInfo=FALSE, refFlat, groupLabel1="group1", groupLabel2="group2",
       method=c("LRT", "CTR", "FET", "MARS", "MATR", "FC"), 
       pValue=1e-3, zScore=4, qValue=1e-3, foldChange=4, thresholdKind=1,
       outputDir="none", normalMethod=c("none", "loess", "median"),
       depthKind=1, replicate1="none", replicate2="none",
       replicateLabel1="replicate1", replicateLabel2="replicate2")

Arguments

mapResultBatch1

vector containing uniquely mapping result files for technical replicates of sample1 (or replicate1 when method="CTR").

mapResultBatch2

vector containing uniquely mapping result files for technical replicates of sample2 (or replicate2 when method="CTR").

fileFormat

file format: "bed" or "eland".
example of "bed" format: chr12 7 38 readID 2 +
example of "eland" format: readID chr12.fa 7 U2 F
Note: The field separator character is TAB. And the files must follow the format as one of the examples.

readLength

the length of the reads (only used if fileFormat="eland").

strandInfo

whether the strand information was retained during the cloning of the cDNAs.

  • "TRUE" : retained,

  • "FALSE": not retained.

refFlat

gene annotation file in UCSC refFlat format.
See http://genome.ucsc.edu/goldenPath/gbdDescriptionsOld.html#RefFlat.

groupLabel1

label of group1 on the plots.

groupLabel2

label of group2 on the plots.

method

method to identify differentially expressed genes. Possible methods are:

  • "LRT": Likelihood Ratio Test (Marioni et al. 2008),

  • "CTR": Check whether the variation between two Technical Replicates can be explained by the random sampling model (Wang et al. 2009),

  • "FET": Fisher's Exact Test (Joshua et al. 2009),

  • "MARS": MA-plot-based method with Random Sampling model (Wang et al. 2009),

  • "MATR": MA-plot-based method with Technical Replicates (Wang et al. 2009),

  • "FC" : Fold-Change threshold on MA-plot.

pValue

pValue threshold (for the methods: LRT, FET, MARS, MATR).
only used when thresholdKind=1.

zScore

zScore threshold (for the methods: MARS, MATR).
only used when thresholdKind=2.

qValue

qValue threshold (for the methods: LRT, FET, MARS, MATR).
only used when thresholdKind=3 or thresholdKind=4.

thresholdKind

the kind of threshold. Possible kinds are:

  • 1: pValue threshold,

  • 2: zScore threshold,

  • 3: qValue threshold (Benjamini et al. 1995),

  • 4: qValue threshold (Storey et al. 2003),

  • 5: qValue threshold (Storey et al. 2003) and Fold-Change threshold on MA-plot are both required (can be used only when method="MARS").

foldChange

fold change threshold on MA-plot (for the method: FC).

outputDir

the output directory.

normalMethod

the normalization method: "none", "loess", "median" (Yang,Y.H. et al. 2002).
recommend: "none".

depthKind

1: take the total number of reads uniquely mapped to genome as the depth for each replicate,
0: take the total number of reads uniquely mapped to all annotated genes as the depth for each replicate.
We recommend taking depthKind=1, especially when the genes in annotation file are part of all genes.

replicate1

files containing uniquely mapped reads obtained from replicate batch1 (only used when method="MATR").

replicate2

files containing uniquely mapped reads obtained from replicate batch2 (only used when method="MATR").

replicateLabel1

label of replicate batch1 on the plots (only used when method="MATR").

replicateLabel2

label of replicate batch2 on the plots (only used when method="MATR").

References

Benjamini,Y. and Hochberg,Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57, 289-300.

Jiang,H. and Wong,W.H. (2009) Statistical inferences for isoform expression in RNA-seq. Bioinformatics, 25, 1026-1032.

Bloom,J.S. et al. (2009) Measuring differential gene expression by short read sequencing: quantitative comparison to 2-channel gene expression microarrays. BMC Genomics, 10, 221.

Marioni,J.C. et al. (2008) RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res., 18, 1509-1517.

Storey,J.D. and Tibshirani,R. (2003) Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. 100, 9440-9445.

Wang,L.K. and et al. (2010) DEGseq: an R package for identifying differentially expressed genes from RNA-seq data, Bioinformatics 26, 136 - 138.

Yang,Y.H. et al. (2002) Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research, 30, e15.

See Also

DEGexp, getGeneExp, readGeneExp, kidneyChr21.bed, liverChr21.bed, refFlatChr21.

Examples

1
2
3
4
5
6
7
8
9
  kidneyR1L1 <- system.file("extdata", "kidneyChr21.bed.txt", package="DEGseq")
  liverR1L2  <- system.file("extdata", "liverChr21.bed.txt", package="DEGseq")
  refFlat    <- system.file("extdata", "refFlatChr21.txt", package="DEGseq")
  mapResultBatch1 <- c(kidneyR1L1)  ## only use the data from kidneyR1L1 and liverR1L2
  mapResultBatch2 <- c(liverR1L2)
  outputDir <- file.path(tempdir(), "DEGseqExample")
  DEGseq(mapResultBatch1, mapResultBatch2, fileFormat="bed", refFlat=refFlat,
         outputDir=outputDir, method="LRT")
  cat("outputDir:", outputDir, "\n")

Example output

Loading required package: qvalue
Loading required package: samr
Please wait...

mapResultBatch1: 
	 /usr/local/lib/R/site-library/DEGseq/extdata/kidneyChr21.bed.txt 
mapResultBatch2: 
	 /usr/local/lib/R/site-library/DEGseq/extdata/liverChr21.bed.txt 
file format: bed 
refFlat:  /usr/local/lib/R/site-library/DEGseq/extdata/refFlatChr21.txt 
Ignore the strand information when count the reads mapped to genes!
Count the number of reads mapped to each gene ... 
This will take several minutes, please wait patiently!
Please wait...

SampleFiles:
	/usr/local/lib/R/site-library/DEGseq/extdata/kidneyChr21.bed.txt
Count the number of reads mapped to each gene.
This will take several minutes.
Please wait ...
total 259 unique genes

processed 0 reads (kidneyChr21.bed.txt)
processed 10000 reads (kidneyChr21.bed.txt)
processed 20000 reads (kidneyChr21.bed.txt)
processed 30000 reads (kidneyChr21.bed.txt)
processed 34304 reads (kidneyChr21.bed.txt) 
total used 0.140310 seconds!
Please wait...

SampleFiles:
	/usr/local/lib/R/site-library/DEGseq/extdata/liverChr21.bed.txt
Count the number of reads mapped to each gene.
This will take several minutes.
Please wait ...
total 259 unique genes

processed 0 reads (liverChr21.bed.txt)
processed 10000 reads (liverChr21.bed.txt)
processed 20000 reads (liverChr21.bed.txt)
processed 30000 reads (liverChr21.bed.txt)
processed 30729 reads (liverChr21.bed.txt) 
total used 0.091963 seconds!
Please wait...
gene id column in geneExpMatrix1 for sample1:  1 
expression value column(s) in geneExpMatrix1: 2 
total number of reads uniquely mapped to genome obtained from sample1: 34304 
gene id column in geneExpMatrix2 for sample2:  1 
expression value column(s) in geneExpMatrix2: 2 
total number of reads uniquely mapped to genome obtained from sample2: 30729 

method to identify differentially expressed genes:  LRT 
pValue threshold: 0.001 
output directory: /work/tmp/tmp/Rtmp0Lb5H8/DEGseqExample 

Please wait ...
Identifying differentially expressed genes ...
Please wait patiently ...
output ...

Done ...
The results can be observed in directory:  /work/tmp/tmp/Rtmp0Lb5H8/DEGseqExample 
outputDir: /work/tmp/tmp/Rtmp0Lb5H8/DEGseqExample 

DEGseq documentation built on Nov. 8, 2020, 5:33 p.m.