knitr::opts_chunk$set(echo=TRUE, warning=FALSE, message=FALSE, fig.align="left", fig.show="hold", fig.keep='all')
AskoR is a pipeline for the analysis of gene expression data, using edgeR.
Several steps are performed: data filters (cpm method), normalize this filtered data, look at the correlation of our data, run differential expression analysis, compare contrast, GO enrichment and co-expression.
You'll find a test set in the inst/extdata/input folder. It'll be used for the rest of the documentation. Contents of input folder:
IMPORTANT : All input files must be in a folder named input (case sensitive).
You have a count file per sample, they can be in text or csv format. In this case, you will have to fill in the Samples file with the path and name of the count files for each sample in a "file" column. This one can contain several columns according to the counting tools used, you will have to inform the following parameters:
parameters$col_genes
column number with GeneId (default 1)parameters$col_counts
column number with count (default 7)parameters$sep
column separator (default "\t" ) Example of count file:
Geneid | Chr | Start | End | Strand | Length | Counts :-|:-|-:|-:|:-:|-:|-: Gene_000001 | Random_Chr_001 | 1692 | 1907 | - | 215 | 0 Gene_000002 | Random_Chr_001 | 6641 | 8705 | - | 2064 | 43.5 Gene_000003 | Random_Chr_001 | 9228 | 9569 | - | 341 | 8 Gene_000004 | Random_Chr_001 | 12009 | 13155 | - | 1146 | 781 Gene_000005 | Random_Chr_001 | 15242 | 15844 | + | 602 | 16 Gene_000006 | Random_Chr_001 | 16304 | 19834 | + | 3530 | 9 Gene_000007 | Random_Chr_001 | 20595 | 21625 | - | 1030 | 13.83 Gene_000008 | Random_Chr_001 | 22377 | 23461 | - | 1084 | 565.33 ... |
The corresponding parameters:
parameters$col_genes=1 parameters$col_counts=7 parameters$sep="\t"
In this example column 7 contains the counts and the gene identifiers are in the first column, it's a tabulate file so the column separator is
It is also possible to have a table, tabulated file, grouping the counts for each gene in each sample, in text or csv format. The Samples file should not contain a file column, you will have to fill in the name of the count file: parameters$fileofcount
.
Example of a count matrix, the column separator is a tabulation:
Geneid | AC1R1 | AC1R2 | AC1R3 | BC1R1 | BC1R2 | BC1R3 | ... :-|-:|-:|-:|-:|-:|-:|:-: Gene_000001 | 0 | 1 | 0 | 0 | 0 | 1 | ... Gene_000002 | 43.5 | 25.33 | 31.5 | 27.5 | 29.5 | 29 | ... Gene_000003 | 8 | 4 | 5 | 30 | 16 | 13 | ... Gene_000004 | 781 | 412 | 626 | 558 | 538 | 346 | ... Gene_000005 | 16 | 7 | 13 | 9 | 8 | 6 | ... Gene_000006 | 9 | 4 | 5 | 21 | 15 | 12 | ... ... |
This tabulated file describes the design of experiments. The first and second columns are mandatory and are named "sample" and "condition". You may have several other columns. The contents of the condition column will be the same as in the Contrast file.
The column "color" is optional, it allows to predefined the color of the sample in the graphs. If it is absent askoR will assign colors itself.
The column "file" is mandatory if you have samples counts files. In the example below, these files are grouped in a "counts" folder. You do not need to specify the name of the "input" folder (i.e. input/counts/AC1R1_counts.txt) since, by default, it will search for it.
Don't forget to fill in the name of your Samples file: parameters$sample_file
, no need to specify the "input" folder.
Example of a Samples.txt file:
sample | condition | genotype | treatment | color | file :-|:-|:-|:-|:-|:- AC1R1 | AC1 | A | C1 | darkorchid2 | counts/AC1R1_counts.txt AC1R2 | AC1 | A | C1 | darkorchid2 | counts/AC1R2_counts.txt AC1R3 | AC1 | A | C1 | darkorchid2 | counts/AC1R3_counts.txt BC1R1 | BC1 | B | C1 | saddlebrown | counts/BC1R1_counts.txt BC1R2 | BC1 | B | C1 | saddlebrown | counts/BC1R2_counts.txt ... |
This tabulated file indicates contrasts you wish to make between your different conditions.
The first column corresponds to the condition column of the Samples file, then the others are columns the comparisons to be made in the form ConditionXvsConditionY. Then under these columns, ConditionX will be noted + and ConditionY will be noted -, the rest 0. You will have to fill in the name of your file: parameters$contrast_file
Example of contrasts file:
Condition | AC1vsAC2 | AC1vsAC3 | AC2vsAC3 | BC1vsBC2 | ... :-|:-|:-|:-|:-|:- AC1 | + | + | 0 | 0 | ... AC2 | - | 0 | + | 0 | ... AC3 | 0 | - | - | 0 | ... BC1 | 0 | 0 | 0 | + | ... BC2 | 0 | 0 | 0 | - | ... ... |
This tabulated file contains the annotations of your genes, it is optional. It can contain several columns but the first one must be the gene identifier. You will have to fill in the name of your file: parameters$annotation
.
Example of annotation file:
SeqName | Description :-|:- Gene_000001 | hypothetical protein pbra 009537 Gene_000002 | hypothetical protein pbra 009324 Gene_000003 | histone-lysine n-methyltransferase nsd2 Gene_000004 | hypothetical protein pbra 009496 ... |
This tabulated file will be WITHOUT HEADER, the first column contains the gene identifier and the second column contains all the corresponding GOs separated by a comma. This file is optional, you will have to fill in its name: parameters$geneID2GO_file
. Cf. GO enrichment Section
Example of GOs annotation file:
| :-|:- Gene_000001 | GO:0003676,GO:0015074 Gene_000002 | GO:0003676,GO:0015074 Gene_000003 | GO:0005488,GO:0006807,GO:0016740,GO:0043170,GO:0044238 Gene_000005 | GO:0005525,GO:0005525,GO:0005525 ... |
All the generated files and images will be in a folder named by default "DE_analysis", you can change this name: parameters$analysis_name
.
Now that we have our input files, look at the script, you should have these first lines:
# Path to askoR file library(askoR) # Sets defaults parameters parameters<-Asko_start()
Don't forget to r}place the paths, the first one to the askoR.R script and the second one to your working directory (containing the input folder).
Once this step has been completed, you will be able to indicate the names of the analysis files:
# output directory name (default DE_analysis) parameters$analysis_name="DEG_test" # input files: # matrix of different contrasts desired parameters$contrast_file = "Contrasts.txt" # file containing the functional annotations for each gene parameters$annotation = "Genes_annotations.txt" # GO annotation files parameters$geneID2GO_file = "GO_annotations.txt"
# matrix of count for all samples/conditions parameters$fileofcount = "CountsMatrix.txt" # file describing all samples parameters$sample_file = "Samples_CountsMatrix.txt"
# file describing all samples parameters$sample_file = "Samples_CountsFiles.txt" # column with the gene names (default 1) parameters$col_genes = 1 # column with the counts values (default 7) parameters$col_counts = 7 # field separator (default "\t") parameters$sep = "\t"
We are informed that two samples, "AC3R2" and "BC3R3", had problems during the experiments, it is requested to extract it from our analysis. No need to redo all the files, just use the parameter: parameters$rm_sample
. You can provide a list of samples "c("sample1","sample2","sample3",...), or a single sample c("sample1"). In the same way, if you only want to work on a part of your samples, you can use parameters$select_sample
.
# delete sample AC3R2 parameters$rm_sample = c("AC3R2","BC3R3")
It's time to load your data:
data<-loadData(parameters)
rm(list=ls()) library(askoR) parameters<-Asko_start() parameters$dir_path="../inst/extdata/" parameters$analysis_name="DEG_test" parameters$fileofcount = "CountsMatrix.txt" parameters$sample_file = "Samples_CountsMatrix.txt" parameters$contrast_file = "Contrasts.txt" parameters$annotation = "Genes_annotations.txt" parameters$geneID2GO_file = "GO_annotations.txt" parameters$rm_sample = c("AC3R2","BC3R3") data<-loadData(parameters)
You can see that two folders have been created in DEG_test:
Then the samples and conditions that have been loaded are displayed. These have been loaded into a structure called "data". Some commands to display your data:
# Displays all samples recorded data$samples # Displays all contrast recorded data$contrast # Displays design experiment data$design # Displays the first 5 lines and 8 columns of counts table. data$dge$counts[1:5,1:8] # Total number of genes: dim(data$dge$counts)[1] # Total number of samples: dim(data$dge$counts)[2]
The next step is to generate the files describing your experiences for Askomics. Even if you don't plan to use Askomics, this command is mandatory because it generates a data structure "asko_data" that will be used in the further analysis.
asko_data<-asko3c(data, parameters)
For the filters the CPM method is used, you can set the cutoff values you want to:
# CPM's threshold parameters$threshold_cpm = 0.5 # minimum of sample which are upper to cpm threshold parameters$replicate_cpm = 3 # we have 3 replicates
# run filtering asko_filt<-GEfilt(data, parameters) # Total number of filtered genes: dim(asko_filt$counts)[1]
The filtered data is saved in a structure called here: asko_filt. In the folder DEG_test/images/, you should find the images representing your data before and after filtering.
knitr::include_graphics("../inst/extdata/DEG_test/images/DEG_test_boxplot_logcpm_before_filtering.png") knitr::include_graphics("../inst/extdata/DEG_test/images/DEG_test_boxplot_logcpm_after_filtering.png") knitr::include_graphics("../inst/extdata/DEG_test/images/DEG_test_barplot_logcpm_before_filtering.png") knitr::include_graphics("../inst/extdata/DEG_test/images/DEG_test_barplot_logcpm_after_filtering.png")
knitr::include_graphics("../inst/extdata/DEG_test/images/DEG_test_raw_data_1.png") knitr::include_graphics("../inst/extdata/DEG_test/images/DEG_test_filtered_data_1.png")
You notice that the legend of the density graphs is very low compared to the graph. You can correct this with the options parameters$densinset
which modifies the position of the legend, it is also possible to define the number of columns with parameters$legendcol
. Finally, restart "GEfilt" function.
# Set position the legend in bottom density graphe parameters$densinset = 0.20 # Set numbers of column for legends parameters$legendcol = 8 # run filtering asko_filt<-GEfilt(data, parameters)
knitr::include_graphics("../inst/extdata/DEG_test/images/DEG_test_raw_data.png") knitr::include_graphics("../inst/extdata/DEG_test/images/DEG_test_filtered_data.png")
Once the filters have been made, we can proceed to the normalization of the data. At this step, you can generate file with normalize factor values for each sample parameters$norm_factor=TRUE
and/or generate with normalize counts parameters$norm_counts=TRUE
.
# run normalization asko_norm<-GEnorm(asko_filt, asko_data, data, parameters)
Normalized data is saved in a structure called here : asko_norm. In the folder DEG_test/images/, you should find the images representing your data after normalization. Two files are automatically generated because they will be used for co-expression analysis: "DEG_test_CPMNormCounts.txt" and "DEG_test_CPM_NormMeanCounts.txt".
knitr::include_graphics("../inst/extdata/DEG_test/images/DEG_test_boxplot_logcpm_after_norm.png")
knitr::include_graphics("../inst/extdata/DEG_test/images/DEG_test_heatmap_CPMcounts_per_sample.png") knitr::include_graphics("../inst/extdata/DEG_test/images/DEG_test_heatmap_meanCounts_per_condi.png")
From the filtered and normalized data, we can re-correlate the correlation between our samples.
GEcorr(asko_norm,parameters)
Several graphics will be saved in the DEG_test/images/ folder, including MDS and PCA plots. Axis1 vs axis2 differentiate our A and B samples.
knitr::include_graphics("../inst/extdata/DEG_test/images/DEG_test_MDS_corr_axe1_2.png") knitr::include_graphics("../inst/extdata/DEG_test/images/DEG_test_PCA_axe1_2.png")
The differential expression analysis can be started. We will play with the following parameters:
# FDR threshold parameters$threshold_FDR = 0.05 # logFC threshold parameters$threshold_logFC = 0 # normalization method parameters$normal_method = "TMM" # p-value adjust method parameters$p_adj_method = "BH" # GLM method parameters$glm = "lrt"
You can decide to get the Volcano or Mean-Difference Plots for each contrast:
# Mean-Difference Plot of Expression Data (aka MA plot) parameters$plotMD = TRUE # Volcano plot for a specified coefficient/contrast of a linear model parameters$plotVO = TRUE
Once our parameters are defined, we can start the analysis.
# run differential expression analysis resDEG<-DEanalysis(asko_norm, data, asko_data, parameters)
For each contrast, you will find the number of over- or under-expressed genes. Genotype B does not show any major effects of the treatment, unlike genotype A. We also observe a certain number of differentially expressed genes between the genotypes.
A file named "DEG_test_summary_DE" is located in the DEG_test/ folder, which contains for each gene whether it is over-expressed (1) or under-expressed (-1) or neutral (0) for a given contrast. If you had provided an annotation file, these will be found in the last columns.
First lines of the :
| | AC1vsAC2 | AC1vsAC3 | ... | AC2vsBC2 | AC3vsBC3 | Description :-|:-:|:-:|:-:|:-:|:-:|:- Gene_000002 | 0 | 0 | ... | 0 | 0 | hypothetical protei... Gene_000003 | -1 | 0 | ... | 0 | 0 | histone-lysine n-m... Gene_000004 | 1 | 1 | ... | -1 | 0 | hypothetical prote... ... |
You'll find in "DEG_test/images/" directory, les Volcano, MD plots and heatmap for each contrast.
knitr::include_graphics("../inst/extdata/DEG_test/images/AC3vsBC3_MeanDifference_of_ExpressionData.png") knitr::include_graphics("../inst/extdata/DEG_test/images/AC3vsBC3_VolcanoPlot.png")
knitr::include_graphics("../inst/extdata/DEG_test/images/AC3vsBC3_topDGE_heatmap.png")
You can compare your lists of differentially expressed genes using two methods: Venn diagrams or Upset graphs. Venn diagrams allow you to compare up to 4 lists while Upset allows you to make wider comparisons. However, if you have too many lists to display the graph may be unreadable.
To display the Venn diagrams, you need to specify the type of comparison wanted parameters$VD
:
Next, you must provide a list of the comparisons to display: parameters$compaVD
. For exemple :
# this create 1 venn diagram parameters$compaVD=c("Ctrast1-Ctrast2-Ctrast3") # this create 3 venn diagrams parameters$compaVD=c("Ctrast1-Ctrast2-Ctrast3", "Ctrast4-Ctrast5-Ctrast6", "Ctrast7-Ctrast8-Ctrast9")
Be careful, with the VD="both" you will only have to provide two contrasts. Example:
# this create 1 venn diagram parameters$compaVD=c("Ctrast1-Ctrast2") # this create 3 venn diagrams parameters$compaVD=c("Ctrast1-Ctrast2", "Ctrast1-Ctrast3", "Ctrast2-Ctrast3")
With our data, we will make 3 Venn diagrams for the different types (all, up and down).
parameters$compaVD = c("AC1vsAC2-AC1vsAC3-AC2vsAC3", "BC1vsBC2-BC1vsBC3-BC2vsBC3", "AC1vsBC1-AC2vsBC2-AC3vsBC3") # graph type "all" parameters$VD = "all" VD(resDEG, parameters, asko_data) # graph type "up" parameters$VD = "up" VD(resDEG, parameters, asko_data) # graph type "down" parameters$VD = "down" VD(resDEG, parameters, asko_data)
To use the VD="both" option, we can provide list of two contrasts.
# graph type "both" parameters$compaVD = c("AC1vsBC1-AC2vsBC2", "AC1vsBC1-AC3vsBC3", "AC2vsBC2-AC3vsBC3") parameters$VD = "both" VD(resDEG, parameters, asko_data)
All graphs will appear in a folder named "DEG_test/vennDiagram/". Some example of venn diagrams :
knitr::include_graphics("../inst/extdata/DEG_test/vennDiagram/AC1vsBC1-AC2vsBC2-AC3vsBC3_all.png") knitr::include_graphics("../inst/extdata/DEG_test/vennDiagram/AC1vsBC1-AC2vsBC2-AC3vsBC3_up.png") knitr::include_graphics("../inst/extdata/DEG_test/vennDiagram/AC1vsBC1-AC2vsBC2-AC3vsBC3_down.png") knitr::include_graphics("../inst/extdata/DEG_test/vennDiagram/AC1vsBC1-AC2vsBC2_mixed.png")
You can display all contrast, you just need to specify the type of comparison wanted parameters$upset_basic
:
- "all" : Create chart for all differentially expressed genes
- "up" : Create chart for gene expressed UP
- "down" : Create chart for gene expressed DOWN
- "mixed" : Create chart for gene expressed UP and DOWN (in the same graph)
- NULL : Don't make graphs
You can display multiples graphs based on list of contrast parameters$upset_list
, you need to precise the type of comparison parameters$upset_type
. Example:
# Precise type of comparison: all, down, up, mixed. parameters$upset_type = "all" # Give a list of contrast, for example: # this create 1 graphs parameters$upset_list = c("Ctrast1-Ctrast2-Ctrast3") # this create 3 graphs parameters$upset_list = c("Ctrast1-Ctrast2-Ctrast3", "Ctrast4-Ctrast5-Ctrast6", "Ctrast1-Ctrast2-Ctrast3-Ctrast4-Ctrast5")
With our data, we will make several upset charts for the different types (all, up, down and mixed), with all contrast and list of contrast.
parameters$upset_list = c("AC1vsAC2-AC1vsAC3-AC2vsAC3", "BC1vsBC2-BC1vsBC3-BC2vsBC3", "AC1vsBC1-AC2vsBC2-AC3vsBC3") # graphs type "all" parameters$upset_basic = "all" # all contrast parameters$upset_type = "all" # list of contrast UpSetGraph(resDEG, data, parameters) # graphs type "mixed" parameters$upset_basic = "mixed" # all contrast parameters$upset_type = "mixed" # list of contrast UpSetGraph(resDEG, data, parameters) # graphs type "up" parameters$upset_basic = "up" # all contrast parameters$upset_type = "up" # list of contrast UpSetGraph(resDEG, data, parameters) # graphs type "down" parameters$upset_basic = "down" # all contrast parameters$upset_type = "down" # list of contrast UpSetGraph(resDEG, data, parameters)
An "DEG_test/UpSetR_graphs/" directory will be created with two subdirectories "DEG_test/ UpSetR_graphs/Global_upset/" and "DEG_test/UpSetR_graphs/ Subset_upset/".
Some example of upset graphs (from subset "AC1vsBC1-AC2vsBC2-AC3vsBC3"):
knitr::include_graphics("../inst/extdata/DEG_test/UpSetR_graphs/Subset_upset/DEG_test_UpSetR_AC1vsBC1-AC2vsBC2-AC3vsBC3_allDEG.png") knitr::include_graphics("../inst/extdata/DEG_test/UpSetR_graphs/Subset_upset/DEG_test_UpSetR_AC1vsBC1-AC2vsBC2-AC3vsBC3_upDEG.png") knitr::include_graphics("../inst/extdata/DEG_test/UpSetR_graphs/Subset_upset/DEG_test_UpSetR_AC1vsBC1-AC2vsBC2-AC3vsBC3_downDEG.png") knitr::include_graphics("../inst/extdata/DEG_test/UpSetR_graphs/Subset_upset/DEG_test_UpSetR_AC1vsBC1-AC2vsBC2-AC3vsBC3_mixedDEG.png")
We uses the GOs annotations file to perform enrichment analysis on differentially expressed gene. For this, you define :
parameters$GO_threshold
the significant threshold used to filter p-valuesparameters$GO_max_top_terms
the maximum number of GO terms plotparameters$GO_min_num_genes
the minimum number of genes for each GO termsparameters$GO
gene set chosen for analysis 'up', 'down', 'both' (up+down)parameters$GO_algo
algorithms for runTest function ("classic", "elim", "weight", "weight01", "lea", "parentchild")parameters$GO_stats
statistical tests for runTest function ("fisher", "ks", "t", "globaltest", "sum", "ks.ties")parameters$Ratio_threshold
the min ratio for display GO in graphAfter that, we can run Go enrichment analysis:
# Parameters parameters$GO_threshold = 0.05 parameters$GO_max_top_terms = 10 parameters$GO_min_num_genes = 10 parameters$GO = "both" parameters$GO_algo = "weight01" parameters$GO_stats = "fisher" parameters$Ratio_threshold = 1 # run analysis GOenrichment(resDEG, data, parameters)
An "DEG_test/GO_images/" directory will be created with all GO images and tables of statistics.
Example of graph:
knitr::include_graphics("../inst/extdata/DEG_test/GO_images/AC1vsBC1_Pvalue_BUBBLESgraph.png") knitr::include_graphics("../inst/extdata/DEG_test/GO_images/AC1vsBC1_Ratio_BUBBLESgraph.png")
Example of one statistical table:
GO.ID | Term | Annotated | Significant | Expected | statisticTest | Ratio | GO_cat | -----------|---------------------|-----------|-------------|----------|---------------|--------------|--------| GO:0003735 | structural const... | 135 | 36 | 10.81 | 4.6e-11 | 3.3302497687 | MF | GO:0000155 | phosphorelay sen... | 22 | 7 | 1.76 | 0.0012 | 3.9772727272 | MF | GO:0003729 | mRNA binding | 13 | 5 | 1.04 | 0.0024 | 4.8076923076 | MF | GO:0036094 | small molecule b... | 1327 | 105 | 106.24 | 0.0038 | 0.9883283132 | MF | ... |
Explications of some columns:
The function and the tutorial are being written/finalized. Please wait!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.