profileGenicFeatures: Calculate metadata profiles across genic features
In fagostini/Mimir: Metadata profiles for NGS data

Description Usage Arguments Value Examples

View source: R/profileGenicFeatures.R

This function calculates metadata profiles across genic features using a procedure similar to that used in (https://www.sciencedirect.com/science/article/pii/S1097276519303533?via

profileGenicFeatures(genicRegions = NULL, sampleObject = NULL,
  TxDb = NULL, tx2gene = NULL, bins = c(20, 100, 70, 100),
  weightCol = NULL, ignoreStrand = FALSE, dropEmpty = TRUE,
  normType = c("density", "max", "none"), collapseBy = c("region",
  "gene", "transcript", "recursive"), verbose = TRUE)

`genicRegions`	A named list of `GenomicRanges` objects.
`sampleObject`	A `GenomicRanges` or `GenomicAlignments` object.
`TxDb`	A `GenomicFeatures` object. Required if `genicRegions` or `tx2gene` are not provided. It must contain `GenomeInfoDb` information.
`tx2gene`	A `data.frame` object. The first column must be of transcript identifiers, while the second must be of gene identifiers. Additional columns will be discarded.
`bins`	An ordered integer vector (must be greater equal that the length of `genicRegions`). The vector order determines the number of bins for each region. If more bins than regions are provided, the additional will be ignored. Default order: 5'UTR, CDS, 3'UTR, Intron.
`weightCol`	A single character string. This must be the name of an integer column in the `sampleObject` object.
`ignoreStrand`	When set to 'TRUE', the strand information in `sampleObject` is ignored. This does not affect the features in `genicRegions`.
`dropEmpty`	When set to 'TRUE', the transcripts with no signal in any of their sub-regions will be discarded. When set to 'FALSE', all values of these regions will be set to 0.
`normType`	A character string indicating which region normalising method to use. One of 'density' (default), 'max', 'none': can be abbreviated. Depending on the chosen method the values of each region are normalised using the sum ('density'), the maximum ('max') of the values across the region, or not normalised at all ('none').
`collapseBy`	A character string indicating which summarising method to use. One of "region" (default), "gene", "transcript", "recursive": can be abbreviated. Profiles are always calculated per transcript (tx_id), and reported as such when this parameter is set to 'transcript'. When set to 'gene', profiles are collapsed (using sum, mean and sd) by gene_id, while with 'region' they are collapsed in the same way but by region_id. When set to 'recursive', profiles are first collapse by gene_id and then by region_id, and sum, mean (pooled) and sd (pooled) are reported. Please, be aware that each method will result in a different output, as the number and names of the columns by the chosen method.
`verbose`	When set to 'TRUE', the function prints diagnostic messages.

A data.table of the normalised binned coverage across the genic features. Names and number of columns are determined by collapseBy.

library("TxDb.Dmelanogaster.UCSC.dm3.ensGene")
TxDb = TxDb.Dmelanogaster.UCSC.dm3.ensGene

query = extractGenicFeatures(TxDb)

library("pasillaBamSubset")
library("GenomicAlignments")

fl1 <- untreated1_chr4()
subject = readGAlignments(fl1)

profile = profileGenicFeatures(genicRegions=query, sampleObject=subject, TxDb=TxDb)

library("ggplot2")

ggplot(profile, aes(x=bin, y=Mean, colour=region_id)) + 
  geom_line() +
  geom_vline(xintercept=c(20.5, 120.5), linetype="dashed", colour="grey30") +
  scale_x_continuous("Relative position",
       breaks=c(10.5, 70.5, 155.5), label=c("5'-UTR", "CDS", "3'-UTR")) +
  scale_y_continuous("Average normalised signal") +
  coord_cartesian(xlim=c(0, 190)) +
  theme_bw() +
  theme(legend.position=c(0.9, 0.8), legend.background=element_blank()) +
  guides(colour=guide_legend(title=""))