buildMappingBasedMarkerPanel: Greedy algorithm for building marker gene panel

View source: R/markerGenesAndMapping.r

buildMappingBasedMarkerPanelR Documentation

Greedy algorithm for building marker gene panel

Description

This is the primary function that iteratively builds a marker gene panel, one gene at a time by iteratively adding the most informative gene to the existing gene panel.

Usage

buildMappingBasedMarkerPanel(
  mapDat,
  medianDat = NA,
  clustersF = NA,
  panelSize = 50,
  subSamp = 20,
  maxFcGene = 1000,
  qMin = 0.75,
  seed = 10,
  currentPanel = NULL,
  panelMin = 5,
  writeText = TRUE,
  corMapping = TRUE,
  optimize = "FractionCorrect",
  clusterDistance = NULL,
  clusterGenes = NULL,
  dend = NULL,
  percentSubset = 100
)

Arguments

mapDat

normalized data of the mapping (=reference) data set.

medianDat

representative value for each leaf. If not entered, it is calculated

clustersF

cluster calls for each cell.

panelSize

number of genes to include in the marker gene panel

subSamp

number of random nuclei to select from each cluster (to increase speed); set as NA to not subsample

maxFcGene

maximum number of genes to consider at each iteration (to increase speed)

qMin

minimum quantile for fold change comparison (between 0 and 1, higher = more specific marker genes are included)

seed

for reproducibility

currentPanel

starting panel. Default is NULL.

panelMin

if there are fewer genes than this, the top number of these genes by fc rank are set as the starting panel. Cannot be less than 2.

writeText

should gene names and marker scores be output (default TRUE)

corMapping

if TRUE (default) map by correlation; otherwise, map by Euclidean distance (not recommended)

optimize

if 'FractionCorrect' (default) will seek to maximize the fraction of cells correctly mapping to final clusters if 'CorrelationDistance' will seek to minimize the total distance between actual cluster calls and mapped clusters if 'DendrogramHeight' will seek to minimize the total dendrogram height between actual cluster calls and mapped clusters

clusterDistance

only used if optimize='CorrelationDistance'; a matrix (or vector) of cluster distances. Will be calculated if NULL and if clusterGenes provided. (NOTE: order must be the same as medianDat and/or have column and row names corresponding to clusters in clustersF)

clusterGenes

a vector of genes used to calculate the cluster distance. Only used if optimize='CorrelationDistance' and clusterDistance=NULL.

dend

only used if optimize='DendrogramHeight' dendrogram; will error out of not provided

percentSubset

for each iteration the function can subset the set of possible genes to speed up the calculation.

Value

an ordered character vector corresponding to the marker gene panel


AllenInstitute/mfishtools documentation built on July 5, 2023, 4:20 p.m.