View source: R/MovieGroupProcess.R
MovieGroupProcess | R Documentation |
This function implements the Movie Group Process outlined by Ying and Wang in their 2014 paper (A Dirichelt Multinomial Mixture Model-based Approach for Short Text Clustering).
MovieGroupProcess( data, text, K, alpha = 0.1, beta = 0.1, iter = 30, repeat_words = FALSE, r_stopwords = TRUE )
data |
A data frame. |
text |
The name of a column within the data frame containing text to cluster. The column name should not be listed in quotes. |
K |
The upper limit for the number of topics. The function will automatically condense and remove empty clusters. |
alpha |
A tuning parameter ranging from 0 to 1 controlling a documents affinity for a larger cluster. Default value is set to 0.1. |
beta |
A tuning parameter ranging from 0 to 1 controlling a documents affinity for a more similar cluster. Default values is set to 0.1. |
iter |
The upper limit for the number of iterations to perform. The function will terminate earlier if a stable solution is found. Default is set at 30. |
repeat_words |
A logical vector indicated whether the documents contain repeated words. If TRUE, the function uses a Algorithm 4 from Yin and Wang's paper; if FALSE, the function using Algorithm 3 from their paper. Default is set to FALSE. |
r_stopwords |
A logical vector indicating whether stop words should be removed. Default is set at TRUE. |
The function returns the original data frame with an additional column containing cluster assigments for each row of text.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.