restrictCorpus: Restrict to informative words (genes) for topic modeling

View source: R/functions.R

restrictCorpusR Documentation

Restrict to informative words (genes) for topic modeling

Description

identifies over dispersed genes across pixels to use as informative words (genes) in topic modeling. Also allows ability to restrict over dispersed genes to those that occur in more than and/or less than selected fractions of pixels in corpus. Limits to the top 1000 overdispersed genes in order to keep the corpus to a reasonable size.

Usage

restrictCorpus(
  counts,
  removeAbove = 1,
  removeBelow = 0.05,
  alpha = 0.05,
  nTopOD = 1000,
  plot = FALSE,
  verbose = TRUE
)

Arguments

counts

genes x pixels gene count matrix

removeAbove

remove over dispersed genes that are present in more than this fraction of pixels (default: 1.0)

removeBelow

remove over dispersed genes that are present in less than this fraction of pixels (default: 0.05)

alpha

alpha parameter for getOverdispersedGenes(). Higher = less stringent and more overdispersed genes returned (default: 0.05)

nTopOD

number of top over dispersed genes to use. int (default: 1000). If the number of overdispersed genes is less then this number will use all of them, or set to NA to use all overdispersed genes.

plot

return histogram plots of genes per pixel and pixels per genes for over dispersed genes and after corpus restriction. (default: FALSE)

verbose

(default: TRUE)

Value

a gene by pixel matrix where the remaining genes have been filtered

Examples

data(mOB)
corpus <- restrictCorpus(counts = mOB$counts)
corpus


JEFworks-Lab/STdeconvolve documentation built on Nov. 14, 2024, 7:24 p.m.