Home

/

GitHub

/

contefranz/OpTop

/

agg_document_stability: Compute aggregate document stability and F-test

agg_document_stability: Compute aggregate document stability and F-test
In contefranz/OpTop: Optimal topic specification for latent dirichlet allocation models

Description Usage Arguments Value Author(s) References See Also Examples

View source: R/agg_document_stability.R

Detects informative and uninformative components to compute aggregate document stability. Performs a chi-square test to evaluate document stability, Also, computes a F-test to further evaluate deviation from optimal model.

agg_document_stability(
  lda_models,
  weighted_dfm,
  optimal_model,
  q = 0.8,
  alpha = 0.05,
  smoothed = TRUE,
  do_plot = TRUE
)

`lda_models`	A list of ordered LDA models as estimated by `LDA`. The LDA models must be in ascending order according to the number of topics.
`weighted_dfm`	A weighted `dfm` containing word proportions. It is recommended that `weighted_dfm` has the corresponding internal variable that can be accessed with `docid`. See ?`optimal_topic` for more details.
`optimal_model`	A number corresponding to the optimal topic model.
`q`	Set a cutoff for important words as the quantile of the expected cumulative probability of word weights. Default to 0.80, meaning that the function reaches 80% of the distribution mass and leaves out the remaining 20%.
`alpha`	Alpha level to identify informative words from the Cumulative Distribution Function over the cosine similarities in the Topic Word Weights matrix. Default to 0.05.
`smoothed`	A logical to control whether the test is performed on each document for each LDA model or on the smoothed chi-square statistic. This is the aggregated version which gives the overall behavior across all documents in the corpus. Default is `TRUE`.
`do_plot`	Plot the chi-square statistic and the F-statistic as functions of the number of topics. Default to `TRUE`.

A data.table containing the following columns:

`topic`	An integer giving the number of topics.
`id_doc`	An integer document id as given in the original corpus.
`chisq_inform_std`	A numeric giving the standardized chi-square statistic for the informative component.
`chisq_uninform_std`	A numeric giving the standardized chi-square statistic for the uninformative component.
`pval_inform`	A numeric giving the p-value of the chi-square test over the informative component.
`pval_uninform`	A numeric giving the p-value of the chi-square test over the uninformative component.
`Fstat`	A numeric giving the standardized F statistic of the ratio `chisq_inform_std`/`chisq_uninform_std`.
`pval_Fstat`	A numeric giving the p-value of the F test.

Francesco Grossetti francesco.grossetti@unibocconi.it.

Craig M. Lewis craig.lewis@owen.vanderbilt.edu

Lewis, C. and Grossetti, F. (2019 - forthcoming):
A Statistical Approach for Optimal Topic Model Identification.

LDA data.table

## Not run: 
test4 <- agg_document_stability( lda_models = lda_list,
                                 weighted_dfm = weighted_dfm,
                                 smoothed = TRUE, do_plot = TRUE )

## End(Not run)

contefranz/OpTop documentation built on Feb. 14, 2022, 7:04 p.m.

contefranz/OpTop index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

contefranz/OpTop
Optimal topic specification for latent dirichlet allocation models

agg_document_stability: Compute aggregate document stability and F-test
In contefranz/OpTop: Optimal topic specification for latent dirichlet allocation models

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Related to agg_document_stability in contefranz/OpTop...

R Package Documentation

Browse R Packages

We want your feedback!

contefranz/OpTop Optimal topic specification for latent dirichlet allocation models

agg_document_stability: Compute aggregate document stability and F-test In contefranz/OpTop: Optimal topic specification for latent dirichlet allocation models

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Related to agg_document_stability in contefranz/OpTop...

R Package Documentation

Browse R Packages

We want your feedback!

contefranz/OpTop
Optimal topic specification for latent dirichlet allocation models

agg_document_stability: Compute aggregate document stability and F-test
In contefranz/OpTop: Optimal topic specification for latent dirichlet allocation models