runCONCLUS: runCONCLUS

View source: R/runCONCLUS.R

runCONCLUSR Documentation

runCONCLUS

Description

This function is a wrapper to run the whole CONCLUS workflow. See details.

Usage

runCONCLUS(
        ## General parameters
        outputDirectory, experimentName, countMatrix, species, cores=2,
        clusteringMethod="ward.D2", exportAllResults=TRUE,
        orderClusters=FALSE, clusToAdd=NA, silentPlot=TRUE,

        ## Normalisation parameters
        sizes=c(20,40,60,80,100), rowMetaData=NULL, columnsMetaData = NULL,
        alreadyCellFiltered=FALSE, runQuickCluster=TRUE, info=TRUE,

        ## tSNE parameters
        randomSeed = 42, PCs=c(4, 6, 8, 10, 20, 40, 50),
        perplexities=c(30,40), writeOutputTSne = FALSE,

        ## Dbscan parameters
        epsilon=c(1.3, 1.4, 1.5), minPoints=c(3, 4), writeOutputDbScan=FALSE,

        ## Cell Similarity matrix parameters
        clusterNumber=10, deepSplit=4,

        ## Rank genes parameters
        columnRankGenes="clusters", writeOutputRankGenes=FALSE,

        ## Retrieving top markers parameters
        nTopMarkers=10, removeDuplicates = TRUE, writeTopMarkers=FALSE,

        ## Retrieving genes infos parameters
        groupBy="clusters", orderGenes="initial", getUniprot=TRUE,
        saveInfos=FALSE,

        ## plotCellSimilarity parameters
        colorPalette="default", statePalette="default", writeCSM=FALSE,
        widthCSM=7, heightCSM=6,

        ## plotClusteredTSNE parameters
        savePlotCTSNE=FALSE, widthPlotClustTSNE=6, heightPlotClustTSNE=5,
        tSNENb=NA,

        ## plotCellHeatmap parameters
        meanCentered=TRUE, orderGenesCH=FALSE, savePlotCH=FALSE, widthCH=10,
        heightCH=8.5, clusterCols=FALSE,

        ## plotClustersSimilarity parameters
        savePlotClustSM=FALSE, widthPlotClustSM=7, heightPlotClustSM=5.5)

Arguments

outputDirectory

Directory to which results should be written. This needs to be defined even if you choose to not output any results.

experimentName

String of the name of the experiment.

countMatrix

Matrix containing the raw counts.

species

Character string of the species of interest. Shoud be mouse or human. Other organisms can be added on demand.

cores

Maximum number of jobs that CONCLUS can run in parallel. This parameter is used by ?generateTSNECoordinates, ?runDBSCAN, ?clusterCellsInternal, and ?retrieveGenesInfo. Default=1.

clusteringMethod

Clustering method passed to hclust() function. See ?hclust for a list of method. This parameter is used by ?clusterCellsInternal, ?calculateClustersSimilarity, ?plotCellSimilarity, ?plotClusteredTSNE, ?plotCellHeatmap, and ?plotClustersSimilarity. Default = "ward.D2".

exportAllResults

If TRUE, Save all results of CONCLUS. See ?exportResults for details. Default=TRUE.

orderClusters

If TRUE, clusters in the cells and clusters similarity matrix of cells will be ordered by name. Default = FALSE.

clusToAdd

If not NA, defines the clustering to be used in theObject. This is particularly useful when one wants to compare the clustering performance of different tools. It should be a data frame having two columns 'clusters' and 'cells'. Default=NA.

silentPlot

Boolean indicating if the figures should not be output on the R graphics. Default=TRUE.

sizes

Vector of size factors from scran::computeSumFactors() function used by ?normaliseCountMatrix.

rowMetaData

Data frame containing genes informations. Default is NULL. See ?normaliseCountMatrix.

columnsMetaData

Data frame containing cells informations. Default is NULL. See ?normaliseCountMatrix.

alreadyCellFiltered

If TRUE, quality check and filtering will not be applied during the normalization of the count matrix. See ?normaliseCountMatrix.

runQuickCluster

If TRUE scran::quickCluster() function will be applied. It usually improves the normalization for medium-size count matrices. However, it is not recommended for datasets with less than 200 cells and may take too long for datasets with more than 10000 cells. Default=TRUE. See ?normaliseCountMatrix.

info

Logical. If TRUE, additional annotations like ensembl_gene_id, go_id, name_1006, chromosome_name and gene_biotype are added to the row data, for all the genes from the count matrix with ENSEMBL IDs or SYMBOL ID. Default: TRUE.

randomSeed

Default is 42. Seeds used to generate the tSNE. See ?generateTSNECoordinates.

PCs

Vector of first principal components. For example, to take ranges 1:5 and 1:10 write c(5, 10). Default = c(4, 6, 8, 10, 20, 40, 50). See ?generateTSNECoordinates.

perplexities

A vector of perplexity (t-SNE parameter). See ?generateTSNECoordinates for details. Default = c(30, 40).

writeOutputTSne

If TRUE, write the tsne parameters to the output directory defined in theObject. Default = FALSE. Ignored if exportAllResults=TRUE.

epsilon

Reachability distance parameter of fpc::dbscan() function. See Ester et al. (1996) for more details. Default = c(1.3, 1.4, 1.5).

minPoints

Reachability minimum no. of points parameter of fpc::dbscan() function. See Ester et al. (1996) for more details. Default = c(3, 4).

writeOutputDbScan

If TRUE, write the results of the dbScan clustering to the output directory defined in theObject, in the sub-directory output_tables. Default = FALSE. Ignored if exportAllResults=TRUE.

clusterNumber

Exact number of cluster. Default = NULL that will determine the number of clusters automatically. See ?clusterCellsInternal.

deepSplit

Intuitive level of clustering depth. Options are 1, 2, 3, 4. See ?clusterCellsInternal. Default = 4.

columnRankGenes

Name of the column with a clustering result. See ?rankGenes. Default="clusters".

writeOutputRankGenes

If TRUE, output one list of marker genes per cluster in the output directory defined in theObject and in the sub-directory 'marker_genes'. Default=FALSE. Ignored if exportAllResults=TRUE.

nTopMarkers

Number of marker genes to retrieve per cluster. See ?retrieveTopClustersMarkers. Default=10.

removeDuplicates

If TRUE, duplicated markers are removed from the lists. See ?retrieveTopClustersMarkers. Default=TRUE.

writeTopMarkers

If TRUE, writes one list per cluster in the output folder defined in theObject, and in the sub-directory marker_genes/markers_lists. Default=FALSE. Ignored if exportAllResults=TRUE.

groupBy

A column in the input table used for grouping the genes in the output tables. This option is useful if a table contains genes from different clusters. See ?retrieveGenesInfo. Default = "clusters".

orderGenes

If "initial" then the order of genes will not be changed. The other option is "alphabetical". See ?retrieveGenesInfo. Default="initial".

getUniprot

Boolean, whether to get information from UniProt or not. See ?retrieveGenesInfo. Default = TRUE.

saveInfos

If TRUE, save the genes infos table in the directory defined in theObject (?getOutputDirectory) and in the sub-directory 'marker_genes/saveGenesInfo'. Default=FALSE. Ignored if exportAllResults=TRUE.

colorPalette

A vector of colors for clusters. This parameter is used by all plotting methods. Default = "default". See ?plotClustersSimilarity for details.

statePalette

A vector of colors for states or conditions. This parameter is used by all plotting functions except ?plotClusteredTSNE. See ?plotClustersSimilarity for details.

writeCSM

If TRUE, the cells similarity heatmap is saved in the directory defined in theObject (?getOutputDirectory) and in the sub-directory "pictures". Default=FALSE. Ignored if exportAllResults=TRUE.

widthCSM

Width of the plot in the pdf file. See ?pdf for more details. Default = 7.

heightCSM

Height of the plot in the pdf file. See ?pdf for more details. Default = 6.

savePlotCTSNE

If TRUE, the heatmap of the clustered tSNE is saved in the directory defined in theObject (?getOutputDirectory) and in the sub-directory "pictures/tSNE_pictures". Default=FALSE. Ignored if exportAllResults=TRUE.

widthPlotClustTSNE

Width of the clustered tSNE plot in the pdf file. See ?pdf for more details. Default = 6.

heightPlotClustTSNE

Height of the clustered tSNE plot in the pdf file. See ?pdf for more details. Default = 5.

tSNENb

Give the number of the tSNE to plot. If NA, all tSNE solutions are plotted (14 tSNE by default). Default=NA.

meanCentered

Boolean indicating if mean centering should be applied to the expression matrix. See ?plotCellHeatmap. Default = TRUE.

orderGenesCH

Boolean, should the heatmap be structured by gene. See ?plotCellHeatmap. Default=FALSE.

savePlotCH

If TRUE save the cell heatmap in pdf format. The heatmap is saved in the output directory defined in theObject (?getOutputDirectory) and in the sub-directory 'pictures'. Default=FALSE. Ignored if exportAllResults=TRUE.

widthCH

Width of the cell heatmap saved in ?pdf. Default = 10.

heightCH

Height of the cell heatmap saved in ?pdf. Default = 8.5.

clusterCols

If TRUE, the columns representing the clusters are also taken into account in the hierarchical clustering of the cell heatmap. Default=FALSE.

savePlotClustSM

If TRUE, save the cluster similarity heatmap in pdf format. The heatmap is saved in the output directory defined in theObject (?getOutputDirectory) and in the sub-directory 'pictures'. Default=FALSE. Ignored if exportAllResults=TRUE.

widthPlotClustSM

Width of the clusters similarity heatmap in the pdf file. See ?pdf for more details. Default = 7.

heightPlotClustSM

Height of the clusters similarity heatmap in the pdf file. See ?pdf for more details. Default = 5.5.

Details

CONCLUS is a tool for robust clustering and positive marker features selection of single-cell RNA-seq (sc-RNA-seq) datasets. Of note, CONCLUS does not cover the preprocessing steps of sequencing files obtained following next-generation sequencing.

CONCLUS is organized into the following steps:

1) Generation of multiple t-SNE plots with a range of parameters including different selection of genes extracted from PCA.
2) Use the Density-based spatial clustering of applications with noise (DBSCAN) algorithm for idenfication of clusters in each generated t-SNE plot. 3) All DBSCAN results are combined into a cell similarity matrix.
4) The cell similarity matrix is used to define "CONSENSUS" clusters conserved accross the previously defined clustering solutions.
5) Identify marker genes for each concensus cluster. cr

This wrapper function performs the following steps:

1) Building the single-cell RNA-Seq object. See ?scRNAseq-class.
2) Performing the normalization. See ?normaliseCountMatrix.
3) Calculating all tSNEs. See ?generateTSNECoordinates.
4) Clustering with DbScan. See ?runDBSCAN.
5) Computing the cells similarity matrix. See ?clusterCellsInternal.
6) Computing the clusters similarity matrix. If clusToAdd is not NA, add the provided clustering. See ?calculateClustersSimilarity and ?addClustering.
7) Ranking genes. See ?rankGenes.
8) Getting marker genes. See ?retrieveTopClustersMarkers.
9) Getting genes info. See ?retrieveGenesInfo.
10) Plot the cell similarity matrix. See ?plotCellSimilarity.
11) Plot clustered tSNE. See ?plotClusteredTSNE.
12) Plot the cell heatmap. See ?plotCellHeatmap.
13) Plot the clusters similarity heatmap. See ?plotClustersSimilarity.
14) Exporting all results to outputDirectory if exportAllResults=TRUE. See ?exportAllResults.
15) Return an object containing all the results provided by CONCLUS.

If exportAllResults=TRUE, in your "outputDirectory", the sub-folder pictures contains all tSNE with dbscan coloration (sub-folder tSNE_pictures), the cell similarity matrix (Test_cells_correlation_X_clusters.pdf), the cell heatmap (Test_clustersX_meanCenteredTRUE_orderClustersFALSE_orderGenesFALSE markrsPerCluster.pdf'), and the cluster similarity matrix ('Test_clusters_similarity_10_clusters.pdf'). You will also find in the sub-folder 'Results':

+ '1_MatrixInfo': The normalized count matrix and its meta-data for both rows and columns.
+ '2_TSNECoordinates': The tSNE coordinates for each parameter of principal components (PCs) and perplexities.
+ '3_dbScan': The different clusters given by DBscan according to different parameters. Each file gives a cluster number for each cell.
+ '4_CellSimilarityMatrix': The matrix underlying the cells similarity heatmap.
+ '5_ClusterSimilarityMatrix': The matrix underlying the clusters similarity heatmap.
+ '6_ConclusResult': A table containing the result of the consensus clustering. This table contains two columns: clusters-cells.
+ '7_fullMarkers': Files containing markers for each cluster, defined by the consensus clustering.
+ '8_TopMarkers': Files containing the top 10 markers for each cluster.
+ '9_genesInfos': Files containing gene information for the top markers defined in the previous folder.

Value

A scRNAseq object containing the similarity matrices and the marker genes.

Author(s)

Nicolas Descostes

Examples

experimentName <- "Bergiers"
outputDirectory <- "YourOutputDirectory"
species <- "mouse"

## Load the count matrix
countmatrixPath <- system.file("extdata/countMatrix.tsv", package="conclus")
countMatrix <- loadDataOrMatrix(file=countmatrixPath, type="countMatrix",
                                ignoreCellNumber=TRUE)

## Load the coldata
coldataPath <- system.file("extdata/colData.tsv", package="conclus")
columnsMetaData <- loadDataOrMatrix(file=coldataPath, type="coldata",
columnID="cell_ID")

## Use runCONCLUS
## These parameters are tweaked to fit our example data and reduce
## computing time, please consider using the default parameters or
## adjusted to your dataset.
scr <- runCONCLUS(outputDirectory, experimentName, countMatrix, species,
        columnsMetaData=columnsMetaData, perplexities=c(2,3), tSNENb=1,
        PCs=c(4,5,6,7,8,9,10), epsilon=c(380, 390, 400), minPoints=c(2,3),
        clusterNumber=2)

## Remove the results
unlink(outputDirectory, recursive=TRUE)


ilyessr/conclus documentation built on April 8, 2022, 1:43 p.m.