textmineR: Functions for Text Mining and Topic Modeling

An aid for text mining in R, with a syntax that should be familiar to experienced R users. Provides a wrapper for several topic models that take similarly-formatted input and give similarly-formatted output. Has additional functionality for analyzing and diagnostics for topic models.

AuthorThomas Jones [aut, cre], William Doane [ctb]
Date of publication2017-04-07 06:07:04 UTC
MaintainerThomas Jones <jones.thos.w@gmail.com>
LicenseGPL (>= 3)

CalcHellingerDist: Calculate Hellinger Distance

CalcJSDivergence: Calculate Jensen-Shannon Divergence

CalcLikelihood: Calculate the log likelihood of a document term matrix given...

CalcPhiPrime: Calculate a matrix whose rows represent P(topic_i|tokens)

CalcProbCoherence: Probailistic coherence of topics

CalcTopicModelR2: Calculate the R-squared of a topic model.

Cluster2TopicModel: Represent a document clustering as a topic model

CorrectS: Function to remove some forms of pluralization.

CreateDtm: Convert a character vector to a document term matrix.

CreateTcm: Convert a character vector to a term co-occurence matrix.

DepluralizeDtm: Run the CorrectS function on columns of a document term...

Dtm2Docs: Convert a DTM to a Character Vector of documents

Dtm2Tcm: Turn a document term matrix into a term co-occurence matrix

Files2Vec: Function for reading text files into R

FitCtmModel: Fit a Correlated Topic Model

FitLdaModel: Fit a topic model using Latent Dirichlet Allocation

FitLsaModel: Fit a topic model using Latent Semantic Analysis

FormatRawLdaOutput: Format Raw Output from 'lda.collapsed.gibbs.sampler'

GetPhiPrime: Calculate a matrix whose rows represent P(topic_i|tokens)

GetProbableTerms: Get cluster labels using a "more probable" method of terms

GetTopTerms: Get Top Terms for each topic from a topic model

GetVocabFromDtm: Reconstruct a 'text2vec::vocabulary' object from a document...

HellDist: Hellinger Distance

InternalFunctions: Internal helper functions for 'textmineR'

JSD: Jensen-Shannon Divergence

LabelTopics: Get some topic labels using a "more probable" method of terms

nih: Abstracts and metadata from NIH research grants awarded in...

RecursiveRbind: Recursively call rBind from the Matrix package.

TermDocFreq: Get term frequencies and document frequencies from a document...

TmParallelApply: An OS-independent parallel version of 'lapply'

Vec2Dtm: Convert a character vector to a document term matrix of class...


