stallingIndex: returns a list with average read count on TSS, gene body, and...
In kamalfartiyal84/compEpiTools: Tools for computational epigenomics

Description Usage Arguments Details Value References Examples

based on a TxDb, and a list of Pol2 ChIP-seq samples with a GRange corresponding to the peak call, returns a list whose elements contain the stalling index and average read count on TSS and gene body for a user-defined list of genes

1
2
3

stallingIndex(BAMlist,inputList=NULL, peakGRlist, peakGB=FALSE, 
  genesList, transcriptDB, countMode="transcript", upstream=300, 
  downstream=300, cutoff=600, elongationOffset=0)

`BAMlist`	List of characters; paths to BAM files containing ChIP-seq aligned reads
`inputList`	List of characters (optional); paths to BAM files containing ChIP-seq aligned reads of inputs relative to the ChIP-seq samples in BAMlist
`peakGRlist`	List of GRanges; Pol2 peaks relative to the ChIP-seq samples in BAMlist. Only genes with a peak on the TSS will be used for counting
`peakGB`	logical; whether or not to consider only genes which have also a Pol2 peak on the gene body. Deafult=FALSE
`genesList`	List of characters; Entrez gene IDs defining the genes where the stalling index will be computed
`transcriptDB`	An object of class transcriptDb
`countMode`	either 'transcript' or 'average' or 'gene'. Genomic units used to count reads: 'transcript' considers all transcripts separately, 'average' averages over different isoforms of a gene, 'gene' considers the longest isoform only. Default='transcript'
`upstream`	numeric; upstream end of the TSS interval where reads will be counted. Default=300
`downstream`	numeric; downstream end of the TSS interval where reads will be counted. Default=300
`cutoff`	numeric; minimum length required for a gene to be considered for the counts. Default=600
`elongationOffset`	numeric; downstream end of the gene body interval where reads will be counted. Default=0

Given a set of Pol2 ChIP-seq samples, computes the average base coverage on TSS and gene body and their ratio (Pol2 stalling index) computed over specific sets of genes for a list of samples.

For each sample, reads will be counted only for genes contained in the corresponding element of the genesList, and only for those genes having a Pol2 peak (given in peakGRlist as list of GRanges) on the TSS region (defined through upstream and downstream). If peakGB is set to TRUE, a Pol2 peak on the body of a gene will be required. The lists of genes peakGRlist must be provided in terms of entrezIDs.

Reads can be counted on individual gene isoforms (countMode='transcript'), averaging over isoforms (countMode='average') or taking the longest isoform for each gene (countMode='gene').

The read counts will be performed on the intervals [TSS-upstream, TSS+downstream] (TSS) and [TSS+downstream, TES+elongationOffset] (genebody), where upstream, downstream, elongationOffset are user-defined parameters. Moreover, only genes having a length bigger than the user-defined parameter cutoff will be considered.

BAM files have to be associated to the corresponding index .bai files. Please refer to the documentation of samtools on how to create them.

A list of matrices, each containing TSS counts, gene body counts and stalling index for each sample

http://genomics.iit.it/groups/computational-epigenomics.html

  require(TxDb.Mmusculus.UCSC.mm9.knownGene)
  require(org.Mm.eg.db)
  txdb <- TxDb.Mmusculus.UCSC.mm9.knownGene
  isActiveSeq(txdb) <- c(rep(FALSE,18), TRUE, rep(FALSE, length(isActiveSeq(txdb))-19))
  # pointing to Pol2 BAM file
  # BAM file from the GEO GSM1234478 sample, limited to chr19:3200000-4000000
  Pol2bam <- system.file("extdata", "Pol2.bam", package="compEpiTools")
  # loading Pol2 peaks as a GRanges object
  # built based on the BED file from the GEO GSM1234478 sample
  # limited to chr19:3200000-4000000
  Pol2GR <- system.file("extdata", "Pol2GR.Rda", package="compEpiTools")
  load(Pol2GR)
  egs <- distanceFromTSS(Pol2GR, txdb=txdb)
  egs <- unique(egs$nearest_gene_id)
  SI_matrix <- stallingIndex(BAMlist=list(Pol2bam), peakGRlist=list(Pol2GR), 
    genesList=list(egs), transcriptDB=txdb, countMode='gene')
  restoreSeqlevels(txdb)