fitLDA: Find the optimal number of cell-types K for the LDA model

View source: R/functions.R

fitLDAR Documentation

Find the optimal number of cell-types K for the LDA model

Description

The input for topicmodels::LDA needs to be a slam::as.simple_triplet_matrix (docs x words). Access a given model in the returned list via: lda$models$k. The models are objects from the R package "topicmodels". The LDA models have slots with additional information.

Usage

fitLDA(
  counts,
  Ks = seq(2, 10, by = 2),
  seed = 0,
  perc.rare.thresh = 0.05,
  ncores = 1,
  plot = TRUE,
  verbose = TRUE
)

Arguments

counts

Gene expression counts with pixels as rows and genes as columns

Ks

vector of K parameters, or number of cell-types, to fit models with

seed

Random seed

perc.rare.thresh

the number of deconvolved cell-types with mean pixel proportion below this fraction used to assess performance of fitted models for each K. Recorded for each K. (default: 0.05)

ncores

Number of cores for parallelization (default: 1). Suggest: parallel::detectCores()

plot

Boolean for plotting (default: TRUE)

verbose

Boolean for verbosity (default: TRUE)

Value

A list that contains

  • models: each fitted LDA model for a given K

  • kneedOptK: the optimal K based on Kneed algorithm

  • minOptK: the optimal K based on minimum

  • ctPropOptK: Suggested upper bound on K. K in which number of returned cell-types with mean proportion < perc.rare.thresh starts to increase steadily.

  • numRare: number of cell-types with mean pixel proportion < perc.rare.thresh for each K

  • perplexities: perplexity scores for each model

  • fitCorpus: the corpus that was used to fit each model

  • testCorpus: the corpus used to compute model perplexity.

Examples

data(mOB)
pos <- mOB$pos
cd <- mOB$counts
counts <- cleanCounts(cd, min.lib.size = 100)
corpus <- restrictCorpus(counts, removeAbove=1.0, removeBelow = 0.05)
ldas <- fitLDA(t(as.matrix(corpus)), Ks = 3, ncores=2)


JEFworks-Lab/STdeconvolve documentation built on Nov. 14, 2024, 7:24 p.m.