getGeneExp: getGeneExp: Count the number of reads and calculate the RPKM...

Description Usage Arguments Note References See Also Examples

View source: R/MainFunctionWrap.R

Description

This function is used to count the number of reads and calculate the RPKM for each gene. It takes uniquely mapped reads from RNA-seq data for a sample with a gene annotation file as input. So users should map the reads (obtained from sequencing library of the sample) to the corresponding genome in advance.

Usage

1
2
getGeneExp(mapResultBatch, fileFormat="bed", readLength=32, strandInfo=FALSE,
           refFlat, output=paste(mapResultBatch[1],".exp",sep=""), min.overlapPercent=1)

Arguments

mapResultBatch

vector containing uniquely mapping result files for a sample.
Note: The sample can have multiple technical replicates.

fileFormat

file format: "bed" or "eland".
example of "bed" format: chr12 7 38 readID 2 +
example of "eland" format: readID chr12.fa 7 U2 F
Note: The field separator character is TAB. And the files must follow the format as one of the examples.

readLength

the length of the reads (only used if fileFormat="eland").

strandInfo

whether the strand information was retained during the cloning of the cDNAs.

  • "TRUE" : retained,

  • "FALSE": not retained.

refFlat

gene annotation file in UCSC refFlat format.
See http://genome.ucsc.edu/goldenPath/gbdDescriptionsOld.html#RefFlat.

output

the output file.

min.overlapPercent

the minimum percentage of the overlapping length for a read and an exon over the length of the read itself, for counting this read from the exon. should be <=1.
0: at least 1 bp overlap between a read and an exon.

Note

This function sums up the numbers of reads coming from all exons of a specific gene (according to the known gene annotation) as the gene expression value. The exons may include the 5'-UTR, protein coding region, and 3'-UTR of a gene. All introns are ignored for a gene for the sequenced reads are from the spliced transcript library. If a read falls in an exon (usually, a read is shorter than an exon), the read count for this exon plus 1. If a read is crossing the boundary of an exon, users can tune the parameter min.overlapPercent, which is the minimum percentage of the overlapping length for a read and an exon over the length of the read itself, for counting this read from the exon. The method use the union of all possible exons for calculating the length for each gene.

References

Mortazavi,A. et al. (2008) Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat. Methods, 5, 621-628.

See Also

DEGexp, DEGseq, readGeneExp, kidneyChr21.bed, liverChr21.bed, refFlatChr21.

Examples

1
2
3
4
5
6
7
  kidneyR1L1 <- system.file("extdata", "kidneyChr21.bed.txt", package="DEGseq")
  refFlat    <- system.file("extdata", "refFlatChr21.txt", package="DEGseq")
  mapResultBatch <- list(kidneyR1L1)
  output <- file.path(tempdir(), "kidneyChr21.bed.exp")
  exp <- getGeneExp(mapResultBatch, refFlat=refFlat, output=output)
  write.table(exp[30:35,], row.names=FALSE)
  cat("output: ", output, "\n")

Example output

Loading required package: qvalue
Loading required package: samr
Loading required package: impute
Loading required package: matrixStats
Please wait...

SampleFiles:
	/usr/local/lib/R/site-library/DEGseq/extdata/kidneyChr21.bed.txt
Count the number of reads mapped to each gene.
This will take several minutes.
Please wait ...
total 259 unique genes

processed 0 reads (kidneyChr21.bed.txt)
processed 10000 reads (kidneyChr21.bed.txt)
processed 20000 reads (kidneyChr21.bed.txt)
processed 30000 reads (kidneyChr21.bed.txt)
processed 34304 reads (kidneyChr21.bed.txt) 
total used 0.053236 seconds!
"geneName" "kidneyChr21.bed.txt.raw.counts." "kidneyChr21.bed.txt.RPKM." "kidneyChr21.bed.txt.all.reads."
"C21orf131" "0" "0" "34304"
"C21orf15" "0" "0" "34304"
"C21orf2" "51" "665.789" "34304"
"C21orf29" "5" "36.742" "34304"
"C21orf33" "466" "8149.02" "34304"
"C21orf34" "4" "93.2836" "34304"
output:  /work/tmp/tmp/RtmpSGRdCx/kidneyChr21.bed.exp 

DEGseq documentation built on Nov. 8, 2020, 5:33 p.m.