FitLsaModel: Fit a topic model using Latent Semantic Analysis

Description Usage Arguments Details Value Examples

Description

A wrapper for RSpectra::svds that returns a nicely-formatted latent semantic analysis topic model.

Usage

1
FitLsaModel(dtm, k, return_all = FALSE, ...)

Arguments

dtm

A document term matrix of class Matrix::dgCMatrix

k

Number of topics

return_all

Should all objects returned from RSpectra::svds be returned here? Defaults to FALSE

...

Other arguments to pass to svds through its opts parameter.

Details

Latent semantic analysis, LSA, uses single value decomposition to factor the document term matrix. In many LSA applications, TF-IDF weights are applied to the DTM before model fitting. However, this is not strictly necessary.

Value

Returns a list with a minimum of three objects: phi, theta, and sv. The rows of phi index topics and the columns index tokens. The rows of theta index documents and the columns index topics. sv is a vector of singular values.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Load a pre-formatted dtm 
data(nih_sample_dtm) 

# Convert raw word counts to TF-IDF frequency weights
idf <- log(nrow(nih_sample_dtm) / Matrix::colSums(nih_sample_dtm > 0))

dtm_tfidf <- Matrix::t(nih_sample_dtm) * idf

dtm_tfidf <- Matrix::t(dtm_tfidf)

# Fit an LSA model
model <- FitLsaModel(dtm = dtm_tfidf, k = 5)

str(model)

ChengMengli/topic documentation built on May 31, 2019, 8:44 p.m.