FitLsaModel: Fit a topic model using Latent Semantic Analysis

Description Usage Arguments Details Value Examples

View source: R/topic_modeling_core.R

Description

A wrapper for RSpectra::svds that returns a nicely-formatted latent semantic analysis topic model.

Usage

1
FitLsaModel(dtm, k, calc_coherence = TRUE, return_all = FALSE, ...)

Arguments

dtm

A document term matrix of class Matrix::dgCMatrix

k

Number of topics

calc_coherence

Do you want to calculate probabilistic coherence of topics after the model is trained? Defaults to TRUE.

return_all

Should all objects returned from RSpectra::svds be returned here? Defaults to FALSE

...

Other arguments to pass to svds through its opts parameter.

Details

Latent semantic analysis, LSA, uses single value decomposition to factor the document term matrix. In many LSA applications, TF-IDF weights are applied to the DTM before model fitting. However, this is not strictly necessary.

Value

Returns a list with a minimum of three objects: phi, theta, and sv. The rows of phi index topics and the columns index tokens. The rows of theta index documents and the columns index topics. sv is a vector of singular values.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Load a pre-formatted dtm 
data(nih_sample_dtm) 

# Convert raw word counts to TF-IDF frequency weights
idf <- log(nrow(nih_sample_dtm) / Matrix::colSums(nih_sample_dtm > 0))

dtm_tfidf <- Matrix::t(nih_sample_dtm) * idf

dtm_tfidf <- Matrix::t(dtm_tfidf)

# Fit an LSA model
model <- FitLsaModel(dtm = dtm_tfidf, k = 5)

str(model)

textmineR documentation built on June 28, 2021, 9:08 a.m.