knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette outlines the intended use of the textclass package in its current form. The package is able to:
This package includes AFICAdata
dataset. It is a dataframe of all Information Technology contracts for installation support from Air Force Installation Contracting Agency.
The term_frequency
function allows the user to plot term or n-gram frequency depending on how many consecutive terms (n
) for which they wish to evaluate term frequency.
library(textclass)
term_frequency(AFICAdata, n=3)
The tidyDTM
function allows the user to construct a dtm for a given sparsity.
#tidyDTM(data, sparsity) dtm <- tidyDTM(AFICAdata, 0.98) dtm
The plot_topics
function allows the user to plot the topic models from a Latent Dirichlet Allocation (LDA) model. This plot shows the most associated (beta) terms with each of the topics. The LDA
function is an external function for which the user can create the LDA model and define the number of topics requested.
lda <- LDA(dtm, k = 4) plot_topics(lda)
The functions optimal_topics
, normalize_metrics
, and plot_optimal_topics
allow the use to evaluate the structure of topics for every number of topics k from 2 to 30. The metrics used are those proposed by Cao and Deveaud. Cao is a metric that needs to be minimized for optimality, while Deveaud is intended to be maximized. This takes very long without a quad-core computer. Fortunately, included with the package is pre-calculated optimal topic data, topic_data
.
#returns raw values for optimal topic analysis #values <- optimal_topics(dtm) values <- topic_data #normalizes the values norm_values <- normalize_metrics(values) #plots the optimal topic analysis metrics plot_optimal_topics(norm_values)
As the Cao metric needs to be minimized to find optimality and the Deveaud metric maximized, this package provides a table using rank_topics
that will rank order the topics by the best balance between the two metrics for the analyst to assess all possible points of optimality.
#creates a table of rank ordered topic numbers rank_topics(norm_values)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.