Description Usage Arguments Value Author(s) References See Also Examples
View source: R/agg_document_stability.R
Detects informative and uninformative components to compute aggregate document stability. Performs a chi-square test to evaluate document stability, Also, computes a F-test to further evaluate deviation from optimal model.
1 2 3 4 5 6 7 8 9 | agg_document_stability(
lda_models,
weighted_dfm,
optimal_model,
q = 0.8,
alpha = 0.05,
smoothed = TRUE,
do_plot = TRUE
)
|
lda_models |
A list of ordered LDA models as estimated by
|
weighted_dfm |
A weighted |
optimal_model |
A number corresponding to the optimal topic model. |
q |
Set a cutoff for important words as the quantile of the expected cumulative probability of word weights. Default to 0.80, meaning that the function reaches 80% of the distribution mass and leaves out the remaining 20%. |
alpha |
Alpha level to identify informative words from the Cumulative Distribution Function over the cosine similarities in the Topic Word Weights matrix. Default to 0.05. |
smoothed |
A logical to control whether the test is performed on each
document for each LDA model or on the smoothed chi-square statistic.
This is the aggregated version which gives the overall behavior across all
documents in the corpus. Default is |
do_plot |
Plot the chi-square statistic and the F-statistic as functions of the number of
topics. Default to |
A data.table
containing the following columns:
|
An integer giving the number of topics. |
|
An integer document id as given in the original corpus. |
|
A numeric giving the standardized chi-square statistic for the informative component. |
|
A numeric giving the standardized chi-square statistic for the uninformative component. |
|
A numeric giving the p-value of the chi-square test over the informative component. |
|
A numeric giving the p-value of the chi-square test over the uninformative component. |
|
A numeric giving the standardized F statistic
of the ratio |
|
A numeric giving the p-value of the F test. |
Francesco Grossetti francesco.grossetti@unibocconi.it.
Craig M. Lewis craig.lewis@owen.vanderbilt.edu
Lewis, C. and Grossetti, F. (2019 - forthcoming):
A Statistical Approach for Optimal Topic Model Identification.
1 2 3 4 5 6 | ## Not run:
test4 <- agg_document_stability( lda_models = lda_list,
weighted_dfm = weighted_dfm,
smoothed = TRUE, do_plot = TRUE )
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.