tune_LDA: Find the optimal number of topics

Description Usage Arguments Value Author(s) Examples

Description

Tune a specified range of numbers, find the optimal number of topics based on some criterion measures ("Griffiths2004", "CaoJuan2009", "Arun2010", "Deveaud2014")

Usage

1
2
tune_LDA(df, k_range, seed_num = 731, mc.cores = 4L, verbose = TRUE,
  q = NULL)

Arguments

df

The document-term data_frame output from create_DTM

k_range

A vector of integers specifying the range of number of topics

seed_num

An integer specifying the random seed in the algorithm

mc.cores

An integer specifying how many cores to use for computing. Default to 4

verbose

A logical specifying whether to display the progress when the algorith is running. Default to TRUE

q

A numeric specifying the quantile of tf-idf to remove words. If default to NULL, then don't remove

Value

A list of matrix reporting the criterion measure for each number, the optimal number based on Griffiths2004, the optimal number based on CaoJuan2009, the optimal number based on Arun2010, the optimal number based on Deveaud2014, and the plot of the measure matrix

Author(s)

Jiacheng He

Examples

1
2
tune_result <- tune_LDA(text, k_range = 2:15, q = 0.1)
print(tune_result$k_plot)

JiachengHe/TextAnalysis documentation built on May 28, 2019, 7:51 a.m.