This package provides an interface to estimate transcript abundances of any samples quantified by the aligner Rail-RNA. This method is a non-negative least squares (NNLS) estimation that infers the number of reads that originated from each transcript of the coding portion of the GencodeV25 transcriptome. The model does not require raw aligned BAM files, but is content with compressed coverage statistics (coverage of the genome at a basepair level and across annotated junctions) primarily stored in bigwig formats.
The more than 70,000 samples compiled by recount2 already have transcript expression pre-computed by this method and is directly accessible. To replicate the abundance estimation of any SRA project in recount2, the user only needs to supply the SRA project id. Otherwise, users can supply the necessary information as outlined further below to utilize this package to carry out transcript abundance estimation.
For the projects currently curated in recount2, to access quantified transcript abundances use:
project = 'DRP000366' path = getRseTx(project, download_path=paste0(getwd(), '/rse_tx.RData')) load(path)
To re-run the NNLS model on the samples in recount2 follow:
library(recountNNLS) ## Specify a SRA project and download the relevant path data from recount2 project = 'SRP063581' pheno = processPheno(project) ## Main NNLS workhorse function to create a RSE of transcript abundance rse_tx = recountNNLS(pheno)
Please see vignette vignette('recountNNLS', package='recountNNLS').
The output of recountNNLS() is a RangedSummarizedExperiment where:
rowRanges() can be used to access a GRangesList annotation of the transcriptsassays() returns a list of length 2 that can be access with $$fragments returns a matrix of estimated transcript abundance on a number of reads scale$se returns a matrix of estimated standard error of the abundance$score returns a matrix of estimated scores$df returns a matrix of degrees of freedom for statistical inferencecolData() returns a DataFrame that contains the metadata for this project obtained from SRA and processed by processPheno().rowData() returns a DataFrame containing information on transcript names, gene, names, transcript lengths, and number of exons in transcript.colnames() returns the sample ids (runs) in the project, and corresponds to each column of the assays() matrices.rownames() returns the transcript ids, and corresponds to each row of the assays() matrices.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.