r6objDocuStat <- rqudocuhelper::R6ClassDocuStatus$new() r6objDocuStat$setProject(psProject = "DEXSeq") # r6objDocuStat$setVersion(psVersion = "0.0.900") # r6objDocuStat$setDate(psDate = "31.03.2016") # r6objDocuStat$setAuthor(psAuthor = "pvr") # r6objDocuStat$setStatus(psStatus = "Init") # r6objDocuStat$setVersion(psVersion = "0.0.901") # r6objDocuStat$setDate(psDate = "01.04.2016") # r6objDocuStat$setAuthor(psAuthor = "peter") # r6objDocuStat$setStatus(psStatus = "First changes") # r6objDocuStat$setVersion(psVersion = "0.0.902") # r6objDocuStat$setDate(psDate = "22.04.2016") # r6objDocuStat$setAuthor(psAuthor = "peter") # r6objDocuStat$setStatus(psStatus = "Adding description on analysis") # r6objDocuStat$setVersion(psVersion = "0.0.903") # r6objDocuStat$setStatus(psStatus = "Adding more details on the analysis") # r6objDocuStat$setVersion(psVersion = "0.0.904") # r6objDocuStat$setStatus(psStatus = "Starting with report, added TODO-list") # r6objDocuStat$setVersion(psVersion = "0.0.905") # r6objDocuStat$setStatus(psStatus = "Adding annotation info to results") # r6objDocuStat$setVersion(psVersion = "0.0.906") # r6objDocuStat$setStatus(psStatus = "Extending description on preparation") # r6objDocuStat$setVersion(psVersion = "0.0.907") # r6objDocuStat$setStatus(psStatus = "Interpretation of multiple gene-ids") # r6objDocuStat$setVersion(psVersion = "0.0.908") # r6objDocuStat$setStatus(psStatus = "Annotations via biomaRt") # r6objDocuStat$setVersion(psVersion = "0.0.909") # r6objDocuStat$setStatus(psStatus = "Check assignment of treatment and control") r6objDocuStat$setVersion(psVersion = "0.0.910") r6objDocuStat$setStatus(psStatus = "Added subsection on assignment of treatment and control") r6objDocuStat$include_doc_stat()
This document describes how to assess differential exon usage using the R/BioC-Package DEXSeq
. The description in this document follows closely the vignette of DEXSeq
which is available from \url{http://bioconductor.org/packages/release/bioc/html/DEXSeq.html} and also from within R using the command
browseVignettes(package = "DEXSeq")
On the BioC website for DEXSeq
the publication by @ARH2012 is mentioned as the reference to be cited for the package. This reference can also be obtained by
citation(package = "DEXSeq")
Differential exon usage (DEU) can be analysed by comparing RNA-Seq
data from different tissues or under different experimental conditions. DEU is defined as the changes in relative usage of exons caused by experimental conditions.
Since the basic nature of RNA-Seq
data is counts of molecules within a given experimental sample, the measure of differential exon usage amounts to being a function of the count differences between the different experimental conditions. The R/BioC-package DEXSeq
implements a statistical method based on generalized linear models which offers to control false discovery rate by taking biological variation into account.
People familiar with R/BioC-Package DESeq2
will recognize that the usage of DEXSeq
is very similar to DESeq2
.
\pagebreak
Before being able to analyse differential exon usage, one has to do the following preparations steps.
DEXSeq
works with sam files as input for the reads. In case aligned reads are available in bam format, they can be converted using samtools
. DEXSeq
vignette.DEXSeq
are used for counting the how many exons occur in a given read file sample.For this document, we use the dataset that was used in the vignette of DEXSeq
. It consists of read counts for pasillaGenes and pasillaExons in Drosophila. More information on the used dataset can be obtained from the vignette that comes with the pasilla
-package using
browseVignettes(package = "pasilla")
Preparatory steps to load the data consisting of count files and an annotation file and to setup the relevant modelling objects which is a dataframe relating the samples to the experimental conditions, are shown below.
if (!require(pasilla)) { source("https://bioconductor.org/biocLite.R") biocLite("pasilla") require(pasilla) }
The count data and the annotation file are taken from the package and the sample table dataframe is constructed manually.
inDir <- system.file("extdata", package="pasilla") list.files(inDir) countFiles <- list.files(inDir, pattern="fb.txt$", full.names=TRUE) (flattenedFile <- list.files(inDir, pattern="gff$", full.names=TRUE)) sampleTable <- data.frame( row.names = c( "treated1", "treated2", "treated3", "untreated1", "untreated2", "untreated3", "untreated4" ), condition = c("knockdown", "knockdown", "knockdown", "control", "control", "control", "control" ), libType = c( "single-end", "paired-end", "paired-end", "single-end", "single-end", "paired-end", "paired-end" ) )
The general procedure of the analysis starts with constructing a DEXSeqDataSet
using the function DEXSeq::DEXSeqDataSetFromHTSeq
. This function requires four arguments
sDxdFn <- file.path(system.file(package = "AlternativeSplicing", "inst", "extdata", "pasilla"), "dxd.Rd") bRunAnalysis <- TRUE if (file.exists(sDxdFn)) { suppressPackageStartupMessages( library( "DEXSeq" ) ) load(file = sDxdFn) bRunAnalysis <- FALSE }
suppressPackageStartupMessages( library( "DEXSeq" ) ) dxd <- DEXSeqDataSetFromHTSeq( countFiles, sampleData=sampleTable, design= ~ sample + exon + condition:exon, flattenedfile=flattenedFile )
Once a DEXSeqDataSet
object is created there are different accessor functions that can be used to inspect the content of a DEXSeqDataSet
object. The annotation and all the other information about a DEXSeqDataSet
can be obtained using the colData()
function.
colData(dxd)
As seen above the result of colData()
is a dataframe showing available information about the DEXSeqDataSet
object. The column entitled condition
is of particular interest with respect to the assignment of treatment and control samples. The content of the condition column is of type factor and it defines the way how the comparisions are done. Using the statement
dxd$condition
gives a list of the conditions that are used in the model for accessing DEU. It is important that the content of this column is of type factor. This can be verified using the statement
is.factor(dxd$condition)
For our comparison it is important to notice that the later used model fitting routines will assume that the first level is always the control or the untreated condition. The levels of a given factor can be shown using the function levels()
as follows
levels(dxd$condition)
This shows that condition r levels(dxd$condition)[1]
will be used as control condition. We can change the control condition using the function relevel()
. The statement
dxd$condition <- relevel(dxd$condition, "knockdown")
would change the control condition to be "knockdown" instead of "control".
Counts data can be viewed using the counts()
function
head( counts(dxd), 5 )
From the output of the functions colData()
and counts()
, we can see that there are 14 columns, although we have only seven samples. The first seven columns are the number of reads mapping to the specific exonic regions and the last seven columns correspond to the sum of counts mapping to the rest of the exons from the same gene on each sample. The two groups of counts are termed this
and others
respectively by DEXSeq
. The distribution of column indices into the two groups can be viewd by
split( seq_len(ncol(dxd)), colData(dxd)$exon )
In case, we are only interested in the counts of the this
group, the following function brings up the first five lines of all feature counts
head( featureCounts(dxd), 5 )
Details on counting bins can accessed using the function rowData()
.
head( rowRanges(dxd), 3 )
The function sampleAnnotation
gives details about the design table and the sample annotations.
sampleAnnotation( dxd )
The following analysis steps are always done following the same principle. The current DEXSeqDataSet
object is passed to a function. And the function result is again assiged to the same DEXSeqDataSet
object.
Package DEXSeq
contains a wrapper function called DEXSeq
that does all the analysis steps that will be described below in one go. Hence the only thing that needs to be done is to create an instance of a DEXSeqDataSet
object using function DEXSeqDataSetFromHTSeq()
as shown above and call function DEXSeq()
on that DEXSeqDataSet
object. This will return a DEXSeqResults
object.
In what follows, each single step of the analysis is described in more details
Different samples may show different sequencing depths which might lead to a coverage bias. So called size factors
are estimated which measure relative sequencing depth. Size factor estimation in DEXSeq is done the same way as in DESeq and DESeq2 which is implemented in the function estimateSizeFactors
dxd <- estimateSizeFactors( dxd )
Differential exon usage is based on separating variability of the count data into sources that are related to biological variation and into non-biological factors. Information on the amount of variability due to non-biological factors which is also known as noise is inferred from biological replicates in an experimental data set and is characterized by the so-clled dispersion. Typically the number of replicates is too small to estimated dispersion parameters on the level of individual exons. Therefore information is shared across exons and genes in an intensity-dependent manner, to get reliable dispersion estimates.
So far only simple one-way designs are considered, i.e. samples are grouped according to a common experimental condition which is indicated by the condition
factor of the sample table. Samples that share the same level of this one experimental condition are considered to be replicates. It is possible to accomodated more complicated experimental designs than just a single condition factor. The discussion of such models is deferred to a later section
In DEXSeq
dispersions are estimated the same way as in package DESeq2
. Internally the functions of DESeq2
are called, adapting the parameters to the specific caes of the DEXSeq
model. In short, per-exon dispersions are computed using a Cox-Reid adjusted profile likelihood estimation.
The results are reported using the R-package ReporteRs
. This package offers a number of functions to generate on the one hand MS-based documents and on the other hand also HTML pages. The report that is generated for the DEXSeq results is very simple. It consists of titles, paragraphs Tables and Links to either other pages or to images which are directly shown in the report.
For the gene-level results so far, we just inserted a link to the page that is generated by DEXSeq::DEXSeqHTML()
. It might be more attractive to construct a separate table with relevant genes of interest directly on the main report page and use the links to the plots for each gene that are produced by DEXSeq::DEXSeqHTML()
.
The reports generated by DEXSeq
annotate the results using ENSEMBL gene-ids. As those Ids might not be informative enough, some additional annotations are added. The minimal set of additional annotation that is added so far consists of the gene_name
and of the description
. Both items of information are taken from an annotation file that is given by the S4 Object describing the genome reference build. The name of the annotation file is automatically assigned using the construct
refBuild = "Mus_musculus/Ensembl/GRCm38.PatchesSkipped/Annotation/Version-2015-06-25" param <- ezParam(list(refBuild=refBuild))
This creates an item called refBuild
in the parameter list param
. That list item is an S4 object which has many slots one of which is called refAnnotationFile
which stored the relevant annotation file belonging to the refBuild
that was specified above. In short the following statement returns the name of the annotation file.
sGnAnFn <- param[['ezRef']]@refAnnotationFile
The annotation file is read into a dataframe and from that dataframe, informations in columns gene_name
and description
are extracted for the set of genes that are in the result table. Since the granularity of the information in the annotation file goes down to single transcripts, a gene_id
is very likely to apprear several times. For that reason we extract the unique entries in from the annotation file. If there are multiple entries per gene, we paste the information together using "|" as a separator.
In the genetable that summarizes the results of a DEXSeq analysis, some genes are grouped together and their gene-names are separated by a "+" sign. An example of this is shown below (see red arrows).
rcoursetools::insertOdgAsPdf(psOdgFileStem = "GeneIdGroup")
According to \url{https://support.bioconductor.org/p/46294/} this means that multiple genes are merged and the reason for this is that the merged genes seam to share exons which are not obvious to be assigned to a single gene.
The HTML-Report generating function DEXSeqHTML()
has additional arguments that let us specify additional annotation detail for the results. The additional annotation is collected using functionality that is provided by the package biomaRt
. The following example statement for calling DEXSeqHTML()
with information specifying queries to a biomart was copied from \url{https://www.biostars.org/p/95550}:
DEXSeqHTML( ecs, geneIDs=NULL, fitExpToVar="condition", FDR=0.1, color=c("#FF000080","#0000FF80"), mart=ensembl_mart, filter="ensembl_gene_id", attributes=c("external_gene_id","description") )
What is important in the above call is that argument mart
is of type "Mart", i.e., the result of the statment class(mart)
must be "Mart". The easiest way to create such a "Mart"-object is to use the function useMart()
from package biomaRt
. This can be done as follows.
ensembl_mart <- biomaRt::useMart(biomart = "ensembl", dataset="mmusculus_gene_ensembl")
The arguments filter
and attributes
must be consistent with the results of what is returned by functions listFilters()
and listAttributes()
, respectively. Those arguments determine the parameters of the query being on the filters which operate as selectors on the rows and the attributes which select specific columns.
Specifying the additional biomart related arguments to DEXSeqHTML()
leads to additional columns in the genetable. For each attribute that was specified, there is an additional column in the genetable created by DEXSeqHTML()
.
\pagebreak
sessionInfo()
\pagebreak
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.