select_genes: Selects informative genes based on k-nearest neighbour...

View source: R/select_genes.R

select_genesR Documentation

Selects informative genes based on k-nearest neighbour analysis.

Description

This function selects genes based on k-nearest neighbour analysis. The function takes a seurat object or gene expression matrix as input and compute distance to k-nearest neighbour for each gene/feature. A threshold is set based on permutation analysis and FDR computation.

Usage

select_genes(
  data = NULL,
  distance_method = c("pearson", "cosine", "euclidean", "spearman", "kendall"),
  noise_level = 5e-05,
  k = 80,
  row_sum = 1,
  fdr = 0.005,
  which_slot = c("data", "sct", "counts"),
  no_dknn_filter = FALSE,
  no_anti_cor = FALSE,
  seed = 123
)

Arguments

data

A matrix, data.frame or Seurat object.

distance_method

a character string indicating the method for computing distances (one of "pearson", "cosine", "euclidean", spearman or "kendall").

noise_level

This parameter controls the fraction of genes with high dknn (ie. noise) whose neighborhood (i.e associated distances) will be used to compute simulated DKNN values. A value of 0 means to use all the genes. A value close to 1 means to use only gene with high dknn (i.e close to noise).

k

An integer specifying the size of the neighborhood.

row_sum

A feature/gene whose row sum is below this threshold will be discarded. Use -Inf to keep all genes.

fdr

A numeric value indicating the false discovery rate threshold (range: 0 to 100).

which_slot

a character string indicating which slot to use from the input scRNA-seq object (one of "data", "sct" or "counts").

no_dknn_filter

a logical indicating whether to skip the k-nearest-neighbors (KNN) filter. If FALSE, all genes are kept for the next steps.

no_anti_cor

If TRUE, correlation below 0 are set to zero ("pearson", "cosine", "spearman" "kendall"). This may increase the relative weight of positive correlation (as true anti-correlation may be rare).

seed

An integer specifying the random seed to use.

Value

a ClusterSet class object

Author(s)

Julie Bavais, Sebastien Nin, Lionel Spinelli and Denis Puthier

References

- Lopez F.,Textoris J., Bergon A., Didier G., Remy E., Granjeaud S., Imbert J. , Nguyen C. and Puthier D. TranscriptomeBrowser: a powerful and flexible toolbox to explore productively the transcriptional landscape of the Gene Expression Omnibus database. PLoSONE, 2008;3(12):e4001.

Examples


# Restrict vebosity to info messages only.
set_verbosity(1)

# Load a dataset
load_example_dataset("7871581/files/pbmc3k_medium")

# Select informative genes
res <- select_genes(pbmc3k_medium,
                    distance = "pearson",
                    row_sum=5)

# Result is a ClusterSet object
is(res)
slotNames(res)

# The selected genes
nrow(res)
head(row_names(res))


dputhier/scigenex documentation built on May 31, 2024, 8:59 a.m.