select_genes: Selects informative genes based on k-nearest neighbour...
In dputhier/dbfmcl: The scigenex package (Single-Cell Informative GENe Explorer)

select_genes

R Documentation

Selects informative genes based on k-nearest neighbour analysis.

Description

This function selects genes based on k-nearest neighbour analysis. The function takes a seurat object or gene expression matrix as input and compute distance to k-nearest neighbour for each gene/feature. A threshold is set based on permutation analysis and FDR computation.

Usage

select_genes(
  data = NULL,
  distance_method = c("pearson", "cosine", "euclidean", "spearman", "kendall"),
  noise_level = 5e-05,
  k = 80,
  row_sum = 1,
  fdr = 5e-05,
  which_slot = c("data", "sct", "counts"),
  no_dknn_filter = FALSE,
  no_anti_cor = FALSE,
  seed = 123
)

Arguments

`data`	A matrix, data.frame or Seurat object.
`distance_method`	a character string indicating the method for computing distances (one of "pearson", "cosine", "euclidean", spearman or "kendall").
`noise_level`	This parameter controls the fraction of genes with high dknn (ie. noise) whose neighborhood (i.e associated distances) will be used to compute simulated DKNN values. A value of 0 means to use all the genes. A value close to 1 means to use only gene with high dknn (i.e close to noise).
`k`	An integer specifying the size of the neighborhood.
`row_sum`	A feature/gene whose row sum is below this threshold will be discarded. Use -Inf to keep all genes.
`fdr`	A numeric value indicating the false discovery rate threshold (range: 0 to 1).
`which_slot`	a character string indicating which slot to use from the input scRNA-seq object (one of "data", "sct" or "counts").
`no_dknn_filter`	a logical indicating whether to skip the k-nearest-neighbors (KNN) filter. If FALSE, all genes are kept for the next steps.
`no_anti_cor`	If TRUE, correlation below 0 are set to zero ("pearson", "cosine", "spearman" "kendall"). This may increase the relative weight of positive correlation (as true anti-correlation may be rare).
`seed`	An integer specifying the random seed to use.

Value

a ClusterSet class object

Author(s)

Julie Bavais, Sebastien Nin, Lionel Spinelli and Denis Puthier

References

- Lopez F.,Textoris J., Bergon A., Didier G., Remy E., Granjeaud S., Imbert J. , Nguyen C. and Puthier D. TranscriptomeBrowser: a powerful and flexible toolbox to explore productively the transcriptional landscape of the Gene Expression Omnibus database. PLoSONE, 2008;3(12):e4001.

Examples


# Restrict vebosity to info messages only.
set_verbosity(1)

# Load a dataset
load_example_dataset("7871581/files/pbmc3k_medium")

# Select informative genes
res <- select_genes(pbmc3k_medium,
                    distance = "pearson",
                    row_sum=5)

# Result is a ClusterSet object
is(res)
slotNames(res)

# The selected genes
nrow(res)
head(row_names(res))

dputhier/dbfmcl documentation built on April 17, 2025, 4:41 a.m.