MovieGroupProcess: Movie Group Process Function

View source: R/MovieGroupProcess.R

MovieGroupProcessR Documentation

Movie Group Process Function

Description

This function implements the Movie Group Process outlined by Ying and Wang in their 2014 paper (A Dirichelt Multinomial Mixture Model-based Approach for Short Text Clustering).

Usage

MovieGroupProcess(
  data,
  text,
  K,
  alpha = 0.1,
  beta = 0.1,
  iter = 30,
  repeat_words = FALSE,
  r_stopwords = TRUE
)

Arguments

data

A data frame.

text

The name of a column within the data frame containing text to cluster. The column name should not be listed in quotes.

K

The upper limit for the number of topics. The function will automatically condense and remove empty clusters.

alpha

A tuning parameter ranging from 0 to 1 controlling a documents affinity for a larger cluster. Default value is set to 0.1.

beta

A tuning parameter ranging from 0 to 1 controlling a documents affinity for a more similar cluster. Default values is set to 0.1.

iter

The upper limit for the number of iterations to perform. The function will terminate earlier if a stable solution is found. Default is set at 30.

repeat_words

A logical vector indicated whether the documents contain repeated words. If TRUE, the function uses a Algorithm 4 from Yin and Wang's paper; if FALSE, the function using Algorithm 3 from their paper. Default is set to FALSE.

r_stopwords

A logical vector indicating whether stop words should be removed. Default is set at TRUE.

Value

The function returns the original data frame with an additional column containing cluster assigments for each row of text.


jason-hanser/mgp documentation built on Aug. 6, 2022, 3:24 a.m.