UniformClusters | R Documentation |
This group of functions takes in input a COTAN
object and
handle the task of dividing the dataset into Uniform Clusters, that is
clusters that have an homogeneous genes' expression. This condition is
checked by calculating the GDI
of the cluster and verifying that no
more than a small fraction of the genes have their GDI
level above the
given GDIThreshold
GDIPlot(
objCOTAN,
genes,
condition = "",
statType = "S",
GDIThreshold = 1.43,
GDIIn = NULL
)
cellsUniformClustering(
objCOTAN,
GDIThreshold = 1.43,
ratioAboveThreshold = 0.01,
cores = 1L,
maxIterations = 25L,
optimizeForSpeed = TRUE,
deviceStr = "cuda",
initialClusters = NULL,
initialResolution = 0.8,
useDEA = TRUE,
distance = NULL,
hclustMethod = "ward.D2",
saveObj = TRUE,
outDir = "."
)
isClusterUniform(
GDIThreshold,
ratioAboveThreshold,
ratioQuantile,
fractionAbove,
usedGDIThreshold,
usedRatioAbove
)
checkClusterUniformity(
objCOTAN,
clusterName,
cells,
GDIThreshold = 1.43,
ratioAboveThreshold = 0.01,
cores = 1L,
optimizeForSpeed = TRUE,
deviceStr = "cuda",
saveObj = TRUE,
outDir = "."
)
mergeUniformCellsClusters(
objCOTAN,
clusters = NULL,
GDIThreshold = 1.43,
ratioAboveThreshold = 0.01,
batchSize = 0L,
allCheckResults = data.frame(),
cores = 1L,
optimizeForSpeed = TRUE,
deviceStr = "cuda",
useDEA = TRUE,
distance = NULL,
hclustMethod = "ward.D2",
saveObj = TRUE,
outDir = "."
)
objCOTAN |
a |
genes |
a named |
condition |
a string corresponding to the condition/sample (it is used only for the title). |
statType |
type of statistic to be used. Default is "S": Pearson's chi-squared test statistics. "G" is G-test statistics |
GDIThreshold |
the threshold level that discriminates uniform clusters.
It defaults to |
GDIIn |
when the |
ratioAboveThreshold |
the fraction of genes allowed to be above the
|
cores |
number of cores to use. Default is 1. |
maxIterations |
max number of re-clustering iterations. It defaults to
|
optimizeForSpeed |
Boolean; when |
deviceStr |
On the |
initialClusters |
an existing clusterization to use as starting point: the clusters deemed uniform will be kept and the rest processed as normal |
initialResolution |
a number indicating how refined are the clusters
before checking for uniformity. It defaults to |
useDEA |
Boolean indicating whether to use the DEA to define the distance; alternatively it will use the average Zero-One counts, that is faster but less precise. |
distance |
type of distance to use. Default is |
hclustMethod |
It defaults is |
saveObj |
Boolean flag; when |
outDir |
an existing directory for the analysis output. The effective output will be paced in a sub-folder. |
ratioQuantile |
the |
fractionAbove |
the fraction of genes above the |
usedGDIThreshold |
the threshold level actually used to calculate fourth argument |
usedRatioAbove |
the fraction of genes actually used to calculate the third argument |
clusterName |
the tag of the cluster |
cells |
the cells belonging to the cluster |
clusters |
The clusterization to merge. If not given the last available clusterization will be used, as it is probably the most significant! |
batchSize |
Number pairs to test in a single round. If none of them
succeeds the merge stops. Defaults to |
allCheckResults |
An optional |
GDIPlot()
directly evaluates and plots the GDI
for a sample.
cellsUniformClustering()
finds a Uniform clusterizations by
means of the GDI
. Once a preliminary clusterization is obtained from
the Seurat-package
methods, each cluster is checked for uniformity
via the function checkClusterUniformity()
. Once all clusters are
checked, all cells from the non-uniform clusters are pooled together
for another iteration of the entire process, until all clusters are
deemed uniform. In the case only a few cells are left out (\leq
50
), those are flagged as "-1"
and the process is stopped.
isClusterUniform()
takes in the current thresholds and used them
to check whether the calculated cluster parameters are sufficient to
determine whether the cluster is uniform and in the positive scenario
the corresponding answer
checkClusterUniformity()
takes a COTAN
object and a cells'
cluster and checks whether the latter is uniform by GDI
. The
function runs COTAN
to check whether the GDI
is lower than the given
GDIThreshold
(1.43) for all but at the most ratioAboveThreshold
(1\%
) genes. If the GDI
results to be too high for too many genes,
the cluster is deemed
non-uniform.
mergeUniformCellsClusters()
takes in a uniform
clusterization and iteratively checks whether merging two near clusters
would form a uniform cluster still. Multiple thresholds will be used
from 1.37
up to the given one in order to prioritize merge of the
best fitting pairs.
This function uses the cosine distance to establish the nearest clusters
pairs. It will use the checkClusterUniformity()
function to check
whether the merged clusters are uniform. The function will stop once
no tested pairs of clusters are mergeable after testing all pairs in a
single batch
GDIPlot()
returns a ggplot2
object with a point got each gene,
where on the ordinates are the GDI
levels and on the abscissa are the
average gene expression (log scaled). Also marked are the given threshold
(in red) and the 50\%
and 75\%
quantiles (in blue).
cellsUniformClustering()
returns a list
with 2 elements:
"clusters"
the newly found cluster labels array
"coex"
the associated COEX
data.frame
a single Boolean
value when it is possible to decide the answer
with the given information and NA
otherwise
checkClusterUniformity
returns a list with:
"isUniform"
: a flag indicating whether the cluster is uniform
"fractionAbove"
: the percentage of genes with GDI
above the threshold
"ratioQuantile"
: the quantile associated to the high quantile
associated to given ratio
"size"
: the number of cells in the cluster
"GDIThreshold"
the used GDI
threshold
"ratioAboveThreshold"
the used fraction of genes above threshold
allowed in uniform clusters
a list
with:
"clusters"
the merged cluster labels array
"coex"
the associated COEX
data.frame
data("test.dataset")
objCOTAN <- automaticCOTANObjectCreation(raw = test.dataset,
GEO = "S",
sequencingMethod = "10X",
sampleCondition = "Test",
cores = 6L,
saveObj = FALSE)
groupMarkers <- list(G1 = c("g-000010", "g-000020", "g-000030"),
G2 = c("g-000300", "g-000330"),
G3 = c("g-000510", "g-000530", "g-000550",
"g-000570", "g-000590"))
gdiPlot <- GDIPlot(objCOTAN, genes = groupMarkers, cond = "test")
plot(gdiPlot)
## Here we override the default GDI threshold as a way to speed-up
## calculations as higher threshold implies less stringent uniformity
## It real applications it might be appropriate to change the threshold
## in cases of relatively low genes/cells number, or in cases when an
## rough clusterization is needed in the early satges of the analysis
##
splitList <- cellsUniformClustering(objCOTAN, cores = 6L,
optimizeForSpeed = TRUE,
deviceStr = "cuda",
initialResolution = 0.8,
GDIThreshold = 1.46, saveObj = FALSE)
clusters <- splitList[["clusters"]]
firstCluster <- getCells(objCOTAN)[clusters %in% clusters[[1L]]]
firstClusterIsUniform <-
checkClusterUniformity(objCOTAN, GDIThreshold = 1.46,
ratioAboveThreshold = 0.01,
cluster = clusters[[1L]], cells = firstCluster,
cores = 6L, optimizeForSpeed = TRUE,
deviceStr = "cuda", saveObj = FALSE)[["isUniform"]]
objCOTAN <- addClusterization(objCOTAN,
clName = "split",
clusters = clusters)
objCOTAN <- addClusterizationCoex(objCOTAN,
clName = "split",
coexDF = splitList[["coex"]])
identical(reorderClusterization(objCOTAN)[["clusters"]], clusters)
mergedList <- mergeUniformCellsClusters(objCOTAN,
GDIThreshold = 1.43,
ratioAboveThreshold = 0.02,
batchSize = 2L,
clusters = clusters,
cores = 6L,
optimizeForSpeed = TRUE,
deviceStr = "cpu",
distance = "cosine",
hclustMethod = "ward.D2",
saveObj = FALSE)
objCOTAN <- addClusterization(objCOTAN,
clName = "merged",
clusters = mergedList[["clusters"]],
coexDF = mergedList[["coex"]])
identical(reorderClusterization(objCOTAN), mergedList)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.