runCGSModels: Run Latent Dirichlet Allocation with a Collapsed Gibbs...

View source: R/RunModels.R

runCGSModelsR Documentation

Run Latent Dirichlet Allocation with a Collapsed Gibbs Sampler in a cisTopic object

Description

Run Latent Dirichlet Allocation with a Collapsed Gibbs Sampler in a given cisTopic object.

Usage

runCGSModels(
  object,
  topic = c(2, 10, 20, 30, 40, 50),
  nCores = 1,
  seed = 123,
  iterations = 500,
  burnin = 250,
  alpha = 50,
  alphaByTopic = TRUE,
  beta = 0.1,
  returnType = "allModels",
  addModels = TRUE,
  tmp = NULL,
  ...
)

Arguments

object

Initialized cisTopic object.

topic

Integer or vector of integers indicating the number of topics in the model/s (by default it is a vector with 2, 10, 20, 30, 40 and 50 topics). We recommend to try several values if possible, and select the best model based on the highest likelihood.

nCores

Number of cores to use. By default it is 1, but if several models with distinct number of topics are being tested; it is recommended to increase it to the number of models tested (or capacity of the machine). Parellelization is done with snow.

seed

Seed for the assignment initialization for making results reproducible.

iterations

Number of iterations over the data set. By default, 500 iterations are taken. However, we advise to use logLikelihoodByIter to check whether the log likelihood of the model is stabilized with this parameters.

burnin

Number of iterations to discard from the assingment counting. By default, 250 iterations are discarded. This number has to be lower than the number of iterations.

alpha

Scalar value indicating the (symmetric) Dirichlet hyperparameter for topic proportions. By default, it is set to 50.

alphaByTopic

Logical indicating whether the scalar given in alpha has to be divided by the number of topics. By default, it is set to true.

beta

Scalar value indicating the (symmetric) Dirichlet hyperparameter for topic multinomilas. By default, it is set to 0.1.

returnType

Defines what has to be returned to the cisTopic object: either 'allModels' or 'selectedModel'. 'allModels' will return a list with all the fitted models (as lists) to object@models, while 'selectedModel' will return the model with the best log likelihood to object@selected.model, and a dataframe with the log likelihood of the other models to object@log.lik. By default, this function will return all models for allowing posterior selection; however, note that if the number of models and the size of the data is considerably big, returning all models may be memory expensive.

addModels

Whether models should be added if there is a pre-existing list of models or should be overwritten by new models. If TRUE, parameters are setted to match the existing models.

tmp

Folder to save intermediate models.

...

See lda.collapsed.gibbs.sampler from the package lda.

Details

The selected parameters are adapted from Griffiths & Steyvers (2004).

Value

Returns a cisTopic object with the models stored in object@models. If specified, only the best model based on log likelihood is returned in object@selected.model, and the rest of log likelihood values are stored in object@log.lik.

Examples

bamfiles <- c('example_1.bam', 'example_2.bam', 'example_3.bam')
regions <- 'example.bed'
cisTopicObject <- createcisTopicObjectfromBAM(bamfiles, regions)
cisTopicObject <- runCGSModels(cisTopicObject)

aertslab/cisTopic documentation built on April 6, 2024, 9:31 p.m.