BullsEyeR: Topic Modelling for Content curation Cognizant...

Description FreqAnalysis() createDTM() BullsEye() EmptyRows Topics

Description

This Package provides three categories of important functions: frequency Analysis of word tokens, Creation of Document Term Matrix and Topic Modelling using LDA.

FreqAnalysis()

Frequency Analysis of word tokens - returns dataframe with words and their frequencies after initial preprocessing, sparsity control and TFIDF analysis is performed.we can pick some words from the high frequency list as custom stop words

createDTM()

Creation of Document Term Matrix -repeats first step, now including the custom stop words as well, removes empty documents if any and returns a Document term matrix. This DTM is used for finding optimal number of topics for LDA modelling using 'FindTopicsNumber' from 'ldatuning' package

BullsEye()

Topic Modelling- Performs preprocessing along with removal of custom stop words,Uses topic number selected using 'ldatuning' and builds unigram topic model with/without stemming. Returns,

EmptyRows

A list of zero length documents after preprocessing

Topics

A data frame with top 20 terms in all the topics discovered by LDA.


BullsEyeR documentation built on May 1, 2019, 6:36 p.m.