optimize: Dubious cells detector under tSNE and UMAP for a single...
In JSB-UCLA/scDED: Single-cell Dubious Embedding Detector

View source: R/scDEED.R

optimize

R Documentation

Dubious cells detector under tSNE and UMAP for a single hyperparameter setting

Description

A wrapper function for the scDEED method. It is similar to the function scDEED, but there are no defaults, and it can only handle one hyperparameter input. Additionally, results.PCA must be provided (in scDEED, this will be calculated internally). You can obtain the same results using scDEED at a single hyperparameter setting.

Usage

optimize(input_data, input_data.permuted, pre_embedding, reduction.method, K,
                    n, m, perplexity, results.PCA, similarity_percent, dubious_cutoff,
                    trustworthy_cutoff, check_duplicates = T, rerun = T)

Arguments

`input_data`	a Seurat object
`input_data.permuted`	a Seurat object containing permuted data
`pre_embedding`	the slot to use as input for t-SNE and UMAP. If users would like to use a different pre-embedding space, they can add this to the Seurat object and specify the name here.
`reduction.method`	Which dimension reduction method to use; currently the package is only set up for 'tsne' or 'umap'
`K`	number of principal components
`n`	input for the n.neighbors parameter in UMAP
`m`	input for the min.dist parameter in UMAP
`perplexity`	input for the perplexity parameter in tSNE
`results.PCA`	A named list containing the cell-cell distance matrices calculated in the pre-embedding space for the original (name = "pre_embedding_distances") and permuted data (name = "pre_embedding_distances_permuted").
`similarity_percent`	The percentage of cells to consider in the similarity score calculations (default = 0.5). scDEED uses the nearest floor(number of cells * similarity_percent) neighbors in the similarity percent calculations. Intuitively, a higher similarity score considers more cells as neighbors (emphasis on global preservation) while a lower similarity score considers less cells (emphasis on local preservation)
`dubious_cutoff`	The cutoff for dubious cells (default = 0.05). Cells with scores worse (lower) than the dubious_cutoff percentile of null scores will be considered dubious. A lower dubious_cutoff means that to be considered dubious, cells will have to have lower scores. A higher dubious_cutoff means that cells can score higher and still be considered dubious. It is similar to significance level in hypothesis testing.
`trustworthy_cutoff`	The cutoff for trustworthy cells (default = 0.95). Cells with scores better (higher) than the trustworthy_cutoff percentile of null scores will be considered trustworthy. A lower trustworthy_cutoff means that to be considered trustworthy, cells will not have to score as high. A higher trustworthy_cutoff means that cells will need to score higher in order to be considered trustworthy. It is similar to significance level in hypothesis testing.
`check_duplicates`	This is an argument to `Seurat::RunTSNE`. Default = T. If there are duplicates in the data, t-SNE will not proceed. If the user believes there are true biological duplicates in the data, they may change this setting to F.
`rerun`	This is a time-saving argument (default = T). If the user has already performed dimension reduction and would only like to check the results of that dimension reduction, then they can use rerun=F so scDEED does not re-run the embedding method on the data. In most cases, rerun=T because if you are optimizing hyperparameters, the function will need to rerun the embedding method.

Value

a vector of 4 items containing (1): number of dubious cells (2): the indices for the dubious cells, separated by commas (3): the indicies for the trustworthy cells, separated by commas (4): the indicies for the intermediate cells, separated by commas

If one of the categories (dubious, trustworthy, or intermediate) is empty, the entry is 'none'

JSB-UCLA/scDED documentation built on Feb. 8, 2025, 11:12 a.m.