Identifies clusters in a DESeq2 object using the degPatterns
function from DEGreport.
This tool clusters elements that show a similar pattern or behavior accross group of samples using the degPatterns
function. Two different approaches to cluster elements are supported: diana
and ConsensusClusterPlus
.
The most common use case for degPatterns
is to cluster genes in an RNA-Seq differential expression analysis; however, it can also be used in ChIP-seq for differential peak binding as well as in Metagenomics, for differential abundance studies.
This tool has the following required inputs:
deseq2Object
DESeq2 object in RDS format. This object is generated after running the DESeq
function from DESeq2.
explanatoryVar
Explanatory variable in the DESeq2 object. In an experimental study, it is the variable that is manipulated by the researcher (e.g. time
). It will be used on the x-axis of the plots. This variable must exist in the DESeq2 object.
If the designFile
shown below was used to run a differential expression analysis with DESeq2. The variable time
could then be used as explanatoryVar
when running degPatterns
.
sample tissue sex time
sample_1 bone M 1st
sample_2 bone F 2nd
sample_3 bone M 3rd
sample_4 bone M 1st
sample_5 bone F 2nd
sample_6 bone M 3rd
sample_7 liver F 1st
sample_8 liver M 2nd
sample_9 liver F 3rd
sample_10 liver F 1st
sample_11 liver M 2nd
sample_12 liver F 3rd
The tool provides additional configuration through the following optional inputs:
qValue
Q-value threshold to filter elements from the DESeq2 object (default: 0.05
).
log2FoldChange
Minimum absolute log2 fold-change threshold to filter elements from the DESeq2 object (default: 0
).
minElements
Minimum number of elements to form a cluster (default: 15
).
groupVar
Group variable in the DESeq2 object. It will determine how samples are grouped (eg. male/female, control/mutant). This variable must exist in the DESeq2 object.
If the designFile
shown below was used to run a differential expression analysis with DESeq2. The variables tissue
or sex
could then be used as groupVar
when running degPatterns
.
sample tissue sex time
sample_1 bone M 1st
sample_2 bone F 2nd
sample_3 bone M 3rd
sample_4 bone M 1st
sample_5 bone F 2nd
sample_6 bone M 3rd
sample_7 liver F 1st
sample_8 liver M 2nd
sample_9 liver F 3rd
sample_10 liver F 1st
sample_11 liver M 2nd
sample_12 liver F 3rd
clusterMethod
Clustering method to use, either diana
or ConsensusClusterPlus
funtion (default: diana
).
removeOutliers
Remove outliers from the cluster distribution (default: true
).
scale
Scale DESeq2 normalized count matrix (default: true
).
plotsPerColumn
Number of plots per column in the summaryPlot
output file. Maximum number of plots allowed per column are 5
(default: 2
).
plotsPerRow
Number of plots per row in the summaryPlot
output file. Maximum number of plots allowed per row are 5
(default: 2
).
10. outputPrefix
Prefix name for the output file (default: deg-patterns
).
This tool will always provide the following outputs:
degReport
Tab delimited table describing the association between each element and the assigned cluster.
| Column name | Definition |
|:------------------- |:---------------------------------------------------------------------------------------------------------------------------------- |
| [element_ID] | Unique identifier for each element. Eg., geneID and sampleID. |
| merge | Interaction specified in the designFormula while running DESeq2. |
| value | Z-score using the mean and the standard deviation from elements at the same merge level. This is the value displayed on the y-axis of the plots. |
| [metadata_columns] | Columns representing the metadata associated with the particular element. Eg. for a gene it can be tissue, time, and sex. |
| sizeFactor | Geometric mean considering elements with the same merge
value (ie., the sizeFactor
will be the same for all elements at the same explanatoryVar
and groupVar
level). |
| cluster | Cluster assigned to the particular element. |
elementClusterMap
Tab delimited table describing what cluster has been assigned to each element. Two columns namely, id
, which represents the element identification number and cluster
, which representing the assigned cluster are present.
clusterCount
Tab delimited table summarizing the number of elements per cluster. Two columns namely, cluster
, which represents the particular cluster ID and count
, which represents the number of times that cluster is repeated are present.
summaryPlot
PDF file displaying plots of all the identified clusters. The x-axis represents the factors from the explanatoryVar
input, while the y-axis represents the Z-score ranges. Z-score represents the value's relationship with the mean and is measured in terms of standard deviation from the mean. For example, a zero value indicates that it is equal to the mean, while a value of positive or negative one indicates that it's one standard deviation above or below the mean respectively. An example of the plot is shown below:
clusterPlots
Compressed .tar.gz
file with individual Plotly cluster plots.
This tool expects a DESeq2 object in RDS format. This can be generated by running the matrix-to-deseq2 tool found in the BatchX marketplace. The RDS object generated from Example 4 will be used here as input (i.e., time_series.deseq2.RDS
).
In case the RDS object is not on BatchX use the following command to upload this file to the BatchX file system:
bx cp time_series.deseq2.RDS bx://test/degreports/
Submit a job to cluster genes expressing similarly across time
and belonging to two different tissue
.
bx submit lpantano-team@degreport/deg-patterns:0.0.4 '{
"deseq2Object": "bx://test/degreports/time_series.deseq2.RDS",
"explanatoryVar": "time",
"qValue": 0.5,
"log2FoldChange": 0.1,
"plotsPerColumn": 4,
"plotsPerRow": 4,
"removeOutliers": false,
"groupVar": "tissue",
"outputPrefix": "time-tissue",
"minElements": 1
}'
The above job will create two clusters which upon visualization looks like the following:
DESeq2: 1.36.0
DEGreport: 1.32.0
dplyr: 1.0.9
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.