gene_search: gene_search

View source: R/gene_search.R

gene_searchR Documentation

gene_search

Description

Main function of the package - returns optimal library of the selected size.

Usage

gene_search(
  sce,
  genes_base = NULL,
  n_genes_total,
  batch = NULL,
  n.neigh = 5,
  p.minkowski = 3,
  nPC.selection = NULL,
  nPC.all = 50,
  genes.discard = NULL,
  genes.discard_prefix = NULL,
  verbose = TRUE,
  stat_all = NULL
)

Arguments

sce

SingleCellExperiment object containing gene counts matrix (stored in 'logcounts' assay).

genes_base

Character vector specifying base genes to construct first Selection graph. Default=NULL in case no genes are supplied.

n_genes_total

Scalar specifying total number of genes to be selected (this includes base genes).

batch

Name of the field in colData(sce) to specify batch. Default batch=NULL if no batch is applied.

n.neigh

Positive integer > 1, specifying number of neighbors to use for kNN-graph. Default n.neigh=5.

p.minkowski

Order of Minkowski distance. Default p.minkowski=3.

nPC.selection

Scalar specifying number of PCs to use for Selection Graphs. Default nPC=NULL. We advise to set it to 50 if length(genes.selection) > 50.

nPC.all

Scalar specifying number of PCs to use for True Graph. Default nPC.all=50.

genes.discard

Character vector containing genes to be excluded from candidates (note that they still will be used for graphs construction. If you want to exclude them from graph construction as well, just discard them prior in sce object). Default = NULL and no genes will be discarded.

genes.discard_prefix

Character vector containing prefixes of genes to be excluded (e.g. Rpl for L ribosomal proteins. Note that they still will be used for graphs construction. If you want to exclude them from graph construction as well, just discard them prior in sce object). Default = NULL and no genes will be discarded.

verbose

Boolean identifying whether intermediate print outputs should be returned. Default verbose=TRUE.

stat_all

If True graph and corresponding Minkowski distances have been calculated prior to search, provide this data here. It can be useful if gene_search is desired to be recycled (e.g. for selecting multiple libraries with different inputs such as n_genes_total and genes_base) Ensure that colnames = c("gene", "dist_all"). Default stat_all=NULL - in case this info is not supplied.

Value

data.frame containing selected genes and corresponding ranks. In case genes_base are supplied, rank among them will be assigned based on the order they are supplied in the corresponding string.

Examples

require(SingleCellExperiment)
n_row = 1000
n_col = 100
sce = SingleCellExperiment(assays = list(logcounts = matrix(rnorm(n_row*n_col), ncol=n_col)))
rownames(sce) = as.factor(1:n_row)
colnames(sce) = c(1:n_col)
sce$cell = colnames(sce)
genes = rownames(sce)
out = gene_search(sce, n_genes_total = 5)


MarioniLab/geneBasisR documentation built on June 30, 2023, 2:04 p.m.