textmineR: Functions for Text Mining and Topic Modeling

An aid for text mining in R, with a syntax that should be familiar to experienced R users. Provides a wrapper for several topic models that take similarly-formatted input and give similarly-formatted output. Has additional functionality for analyzing and diagnostics for topic models.

Install the latest version of this package by entering the following in R:
AuthorThomas Jones [aut, cre], William Doane [ctb]
Date of publication2017-04-07 06:07:04 UTC
MaintainerThomas Jones <jones.thos.w@gmail.com>
LicenseGPL (>= 3)

View on CRAN

Man pages

CalcHellingerDist: Calculate Hellinger Distance

CalcJSDivergence: Calculate Jensen-Shannon Divergence

CalcLikelihood: Calculate the log likelihood of a document term matrix given...

CalcPhiPrime: Calculate a matrix whose rows represent P(topic_i|tokens)

CalcProbCoherence: Probailistic coherence of topics

CalcTopicModelR2: Calculate the R-squared of a topic model.

Cluster2TopicModel: Represent a document clustering as a topic model

CorrectS: Function to remove some forms of pluralization.

CreateDtm: Convert a character vector to a document term matrix.

CreateTcm: Convert a character vector to a term co-occurence matrix.

DepluralizeDtm: Run the CorrectS function on columns of a document term...

Dtm2Docs: Convert a DTM to a Character Vector of documents

Dtm2Tcm: Turn a document term matrix into a term co-occurence matrix

Files2Vec: Function for reading text files into R

FitCtmModel: Fit a Correlated Topic Model

FitLdaModel: Fit a topic model using Latent Dirichlet Allocation

FitLsaModel: Fit a topic model using Latent Semantic Analysis

FormatRawLdaOutput: Format Raw Output from 'lda.collapsed.gibbs.sampler'

GetPhiPrime: Calculate a matrix whose rows represent P(topic_i|tokens)

GetProbableTerms: Get cluster labels using a "more probable" method of terms

GetTopTerms: Get Top Terms for each topic from a topic model

GetVocabFromDtm: Reconstruct a 'text2vec::vocabulary' object from a document...

HellDist: Hellinger Distance

InternalFunctions: Internal helper functions for 'textmineR'

JSD: Jensen-Shannon Divergence

LabelTopics: Get some topic labels using a "more probable" method of terms

nih: Abstracts and metadata from NIH research grants awarded in...

RecursiveRbind: Recursively call rBind from the Matrix package.

TermDocFreq: Get term frequencies and document frequencies from a document...

TmParallelApply: An OS-independent parallel version of 'lapply'

Vec2Dtm: Convert a character vector to a document term matrix of class...


CalcHellingerDist Man page
CalcJSDivergence Man page
CalcLikelihood Man page
CalcLikelihoodC Man page
CalcPhiPrime Man page
CalcProbCoherence Man page
CalcSumSquares Man page
CalcTopicModelR2 Man page
Cluster2TopicModel Man page
CorrectS Man page
CreateDtm Man page
CreateTcm Man page
DepluralizeDtm Man page
Dtm2Docs Man page
Dtm2DocsC Man page
Dtm2Tcm Man page
Files2Vec Man page
FitCtmModel Man page
FitLdaModel Man page
FitLsaModel Man page
FormatRawLdaOutput Man page
GetPhiPrime Man page
GetProbableTerms Man page
GetTopTerms Man page
GetVocabFromDtm Man page
HellDist Man page
Hellinger_cpp Man page
HellingerMat Man page
JSD Man page
JSD_cpp Man page
JSDmat Man page
LabelTopics Man page
nih Man page
nih_sample Man page
nih_sample_dtm Man page
nih_sample_topic_model Man page
RecursiveRbind Man page
TermDocFreq Man page
TmParallelApply Man page
Vec2Dtm Man page

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.