rankGenes,scRNAseq-method | R Documentation |
This function searches marker genes for each cluster.
rankGenes(theObject, column="clusters", writeMarkerGenes=FALSE)
theObject |
An Object of class scRNASeq for which the count matrix was normalized (see ?normaliseCountMatrix), tSNE were calculated (see ?generateTSNECoordinates), dbScan was run (see ?runDBSCAN), cells were clustered (see ?clusterCellsInternal), as clusters themselves (see ?calculateClustersSimilarity). |
column |
Name of the column with a clustering result. Default="clusters" |
writeMarkerGenes |
If TRUE, output one list of marker genes per cluster in the output directory defined in theObject and in the sub-directory 'marker_genes'. Default=FALSE. |
To understand the nature of the consensus clusters identified by CONCLUS, it is essential to identify genes which could be classified as marker genes for each cluster. To this aim, each gene should be "associated" to a particular cluster. This association is performed by looking at upregulated genes in a particular cluster compared to the others (multiple comparisons).
The function rankGenes performs multiple comparisons of all genes from theObject and rank them according to a score reflecting a FDR power.
For each table corresponding to a particular consensus cluster, the first column is a gene name. The following columns represent adjusted p-values (FDR) of a one-tailed T-test between the considered cluster and all others.
Top genes with significant FDR in most of the comparisons can be assumed as positive markers of a cluster. The column mean_log10_fdr is the mean power of FDR in all comparisons; the column n_05 is the number of comparisons in which the gene was significantly upregulated. The score for marker genes is the average power of FDR among all comparisons for a cluster multiplied to weights taken from the clustersSimilarityMatrix + 0.05. Taking into account both FDRs of all comparisons and clustersSimilarityMatrix allows us to keep the balance between highlighting markers for individual clusters and their 'families' which makes the final heatmap as informative as possible.
Note: Adding 0.05 to the clustersSimilarityMatrix in calculating the score helps avoiding the following problem: in case you have a cluster very different from all others, it will have the value 1 on the diagonal and 0 similarities to all others groups in the clustersSimilarityMatrix. So all weights for that cluster will be zeros meaning that the score would also be zero and genes will be ordered in alphabetical order in the corresponding marker genes list file.
For a cluster k and a gene G, a scoreG was defined in the following way:
scoreG= sum((-log10(fdrk, i + epsilon)*weightk,i) / nClusters-1)
Where
1. fdrk,i is an adjusted p-value obtained by comparing expression of G in
cluster k versus expression of G in cluster i.
2. weightk,i is a similarity between these two groups taken from the element
in the clustersSimilarityMatrix.
3. nClusters is a number of consensus clusters given to the rankGenes().
4. epsilon = 10-300 is a small number which does not influence the ranking
and added to avoid an error when fdr is equal to zero.
5. k = [1,…,nClusters].
6. I = ([1,…,nClusters]exceptfor[k]).
An object of class scRNASeq with its markerGenesList slot updated.
Ilyess RACHEDI, based on code by Polina PAVLOVICH and Nicolas DESCOSTES.
retrieveTopClustersMarkers retrieveGenesInfo
## Object scr containing the results of previous steps load(system.file("extdata/scrFull.Rdat", package="conclus")) ## Ranking genes scr <- rankGenes(scr)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.