tm: Text Mining Package

A framework for text mining applications within R.

AuthorIngo Feinerer [aut, cre], Kurt Hornik [aut], Artifex Software, Inc. [ctb, cph] (pdf_info.ps taken from GPL Ghostscript)
Date of publication2017-03-02 17:45:01
MaintainerIngo Feinerer <feinerer@logic.at>
LicenseGPL-3
Version0.7-1
http://tm.r-forge.r-project.org/

View on CRAN

Man pages

acq: 50 Exemplary News Articles from the Reuters-21578 Data Set of...

combine: Combine Corpora, Documents, Term-Document Matrices, and Term...

content_transformer: Content Transformers

Corpus: Corpora

crude: 20 Exemplary News Articles from the Reuters-21578 Data Set of...

DataframeSource: Data Frame Source

DirSource: Directory Source

Docs: Access Document IDs and Terms

findAssocs: Find Associations in a Term-Document Matrix

findFreqTerms: Find Frequent Terms

findMostFreqTerms: Find Most Frequent Terms

foreign: Read Document-Term Matrices

getTokenizers: Tokenizers

getTransformations: Transformations

hpc: Parallelized 'lapply'

inspect: Inspect Objects

matrix: Term-Document Matrix

meta: Metadata Management

PCorpus: Permanent Corpora

PlainTextDocument: Plain Text Documents

plot: Visualize a Term-Document Matrix

readDOC: Read In a MS Word Document

Reader: Readers

readPDF: Read In a PDF Document

readPlain: Read In a Text Document

readRCV1: Read In a Reuters Corpus Volume 1 Document

readReut21578XML: Read In a Reuters-21578 XML Document

readTabular: Read In a Text Document

readTagged: Read In a POS-Tagged Word Text Document

readXML: Read In an XML Document

removeNumbers: Remove Numbers from a Text Document

removePunctuation: Remove Punctuation Marks from a Text Document

removeSparseTerms: Remove Sparse Terms from a Term-Document Matrix

removeWords: Remove Words from a Text Document

SimpleCorpus: Simple Corpora

Source: Sources

stemCompletion: Complete Stems

stemDocument: Stem Words

stopwords: Stopwords

stripWhitespace: Strip Whitespace from a Text Document

termFreq: Term Frequency Vector

TextDocument: Text Documents

tm_filter: Filter and Index Functions on Corpora

tm_map: Transformations on Corpora

tm_reduce: Combine Transformations

tm_term_score: Compute Score for Matching Terms

tokenizer: Tokenizers

URISource: Uniform Resource Identifier Source

VCorpus: Volatile Corpora

VectorSource: Vector Source

weightBin: Weight Binary

WeightFunction: Weighting Function

weightSMART: SMART Weightings

weightTf: Weight by Term Frequency

weightTfIdf: Weight by Term Frequency - Inverse Document Frequency

writeCorpus: Write a Corpus to Disk

XMLSource: XML Source

XMLTextDocument: XML Text Documents

Zipf_n_Heaps: Explore Corpus Term Frequency Characteristics

ZipSource: ZIP File Source

Functions

acq Man page
as.DocumentTermMatrix Man page
as.TermDocumentMatrix Man page
as.VCorpus Man page
close.SimpleSource Man page
content_transformer Man page
Corpus Man page
crude Man page
c.TermDocumentMatrix Man page
c.term_frequency Man page
c.TextDocument Man page
c.VCorpus Man page
DataframeSource Man page
DirSource Man page
Docs Man page
DocumentTermMatrix Man page
DublinCore Man page
DublinCore<- Man page
eoi Man page
eoi.SimpleSource Man page
findAssocs Man page
findAssocs.DocumentTermMatrix Man page
findAssocs.TermDocumentMatrix Man page
findFreqTerms Man page
findMostFreqTerms Man page
findMostFreqTerms.DocumentTermMatrix Man page
findMostFreqTerms.TermDocumentMatrix Man page
findMostFreqTerms.term_frequency Man page
FunctionGenerator Man page
getElem Man page
getElem.DataframeSource Man page
getElem.DirSource Man page
getElem.URISource Man page
getElem.VectorSource Man page
getElem.XMLSource Man page
getReaders Man page
getSources Man page
getTokenizers Man page
getTransformations Man page
Heaps_plot Man page
inspect Man page
inspect.PCorpus Man page
inspect.TermDocumentMatrix Man page
inspect.TextDocument Man page
inspect.VCorpus Man page
length.SimpleSource Man page
MC_tokenizer Man page
meta Man page
meta<-.PCorpus Man page
meta.PCorpus Man page
meta<-.PlainTextDocument Man page
meta.PlainTextDocument Man page
meta<-.SimpleCorpus Man page
meta.SimpleCorpus Man page
meta<-.VCorpus Man page
meta.VCorpus Man page
meta<-.XMLTextDocument Man page
meta.XMLTextDocument Man page
nDocs Man page
nTerms Man page
open.SimpleSource Man page
PCorpus Man page
pGetElem Man page
pGetElem.DataframeSource Man page
pGetElem.DirSource Man page
pGetElem.URISource Man page
pGetElem.VectorSource Man page
PlainTextDocument Man page
plot.TermDocumentMatrix Man page
readDOC Man page
read_dtm_Blei_et_al Man page
read_dtm_MC Man page
reader Man page
Reader Man page
reader.SimpleSource Man page
readPDF Man page
readPlain Man page
readRCV1 Man page
readRCV1asPlain Man page
readReut21578XML Man page
readReut21578XMLasPlain Man page
readTabular Man page
readTagged Man page
readXML Man page
removeNumbers Man page
removeNumbers.PlainTextDocument Man page
removePunctuation Man page
removePunctuation.character Man page
removePunctuation.PlainTextDocument Man page
removeSparseTerms Man page
removeWords Man page
removeWords.character Man page
removeWords.PlainTextDocument Man page
scan_tokenizer Man page
SimpleCorpus Man page
SimpleSource Man page
Source Man page
Source Man page
stemCompletion Man page
stemDocument Man page
stemDocument.character Man page
stemDocument.PlainTextDocument Man page
stepNext Man page
stepNext.SimpleSource Man page
stopwords Man page
stripWhitespace Man page
stripWhitespace.PlainTextDocument Man page
TermDocumentMatrix Man page
termFreq Man page
Terms Man page
TextDocument Man page
tm_filter Man page
tm_filter.PCorpus Man page
tm_filter.SimpleCorpus Man page
tm_filter.VCorpus Man page
tm_index Man page
tm_index.PCorpus Man page
tm_index.SimpleCorpus Man page
tm_index.VCorpus Man page
tm_map Man page
tm_map.PCorpus Man page
tm_map.SimpleCorpus Man page
tm_map.VCorpus Man page
tm_parLapply Man page
tm_parLapply_engine Man page
tm_reduce Man page
tm_term_score Man page
tm_term_score.DocumentTermMatrix Man page
tm_term_score.PlainTextDocument Man page
tm_term_score.TermDocumentMatrix Man page
tm_term_score.term_frequency Man page
URISource Man page
VCorpus Man page
VectorSource Man page
weightBin Man page
WeightFunction Man page
weightSMART Man page
weightTf Man page
weightTfIdf Man page
writeCorpus Man page
XMLSource Man page
XMLTextDocument Man page
Zipf_plot Man page
ZipSource Man page

Files

tm
tm/inst
tm/inst/CITATION
tm/inst/NEWS.Rd
tm/inst/ghostscript
tm/inst/ghostscript/pdf_info.ps
tm/inst/doc
tm/inst/doc/tm.pdf
tm/inst/doc/extensions.Rnw
tm/inst/doc/extensions.R
tm/inst/doc/tm.Rnw
tm/inst/doc/tm.R
tm/inst/doc/extensions.pdf
tm/inst/texts
tm/inst/texts/crude
tm/inst/texts/crude/reut-00004.xml
tm/inst/texts/crude/reut-00009.xml
tm/inst/texts/crude/reut-00008.xml
tm/inst/texts/crude/reut-00014.xml
tm/inst/texts/crude/reut-00001.xml
tm/inst/texts/crude/reut-00022.xml
tm/inst/texts/crude/reut-00007.xml
tm/inst/texts/crude/reut-00002.xml
tm/inst/texts/crude/reut-00023.xml
tm/inst/texts/crude/reut-00016.xml
tm/inst/texts/crude/reut-00005.xml
tm/inst/texts/crude/reut-00011.xml
tm/inst/texts/crude/reut-00015.xml
tm/inst/texts/crude/reut-00010.xml
tm/inst/texts/crude/reut-00012.xml
tm/inst/texts/crude/reut-00006.xml
tm/inst/texts/crude/reut-00013.xml
tm/inst/texts/crude/reut-00019.xml
tm/inst/texts/crude/reut-00021.xml
tm/inst/texts/crude/reut-00018.xml
tm/inst/texts/acq
tm/inst/texts/acq/reut-00042.xml
tm/inst/texts/acq/reut-00004.xml
tm/inst/texts/acq/reut-00035.xml
tm/inst/texts/acq/reut-00024.xml
tm/inst/texts/acq/reut-00009.xml
tm/inst/texts/acq/reut-00031.xml
tm/inst/texts/acq/reut-00056.xml
tm/inst/texts/acq/reut-00051.xml
tm/inst/texts/acq/reut-00008.xml
tm/inst/texts/acq/reut-00014.xml
tm/inst/texts/acq/reut-00001.xml
tm/inst/texts/acq/reut-00022.xml
tm/inst/texts/acq/reut-00026.xml
tm/inst/texts/acq/reut-00007.xml
tm/inst/texts/acq/reut-00045.xml
tm/inst/texts/acq/reut-00002.xml
tm/inst/texts/acq/reut-00029.xml
tm/inst/texts/acq/reut-00030.xml
tm/inst/texts/acq/reut-00027.xml
tm/inst/texts/acq/reut-00023.xml
tm/inst/texts/acq/reut-00048.xml
tm/inst/texts/acq/reut-00016.xml
tm/inst/texts/acq/reut-00017.xml
tm/inst/texts/acq/reut-00047.xml
tm/inst/texts/acq/reut-00028.xml
tm/inst/texts/acq/reut-00043.xml
tm/inst/texts/acq/reut-00005.xml
tm/inst/texts/acq/reut-00049.xml
tm/inst/texts/acq/reut-00052.xml
tm/inst/texts/acq/reut-00011.xml
tm/inst/texts/acq/reut-00015.xml
tm/inst/texts/acq/reut-00050.xml
tm/inst/texts/acq/reut-00053.xml
tm/inst/texts/acq/reut-00010.xml
tm/inst/texts/acq/reut-00046.xml
tm/inst/texts/acq/reut-00034.xml
tm/inst/texts/acq/reut-00020.xml
tm/inst/texts/acq/reut-00012.xml
tm/inst/texts/acq/reut-00025.xml
tm/inst/texts/acq/reut-00006.xml
tm/inst/texts/acq/reut-00032.xml
tm/inst/texts/acq/reut-00003.xml
tm/inst/texts/acq/reut-00036.xml
tm/inst/texts/acq/reut-00013.xml
tm/inst/texts/acq/reut-00055.xml
tm/inst/texts/acq/reut-00040.xml
tm/inst/texts/acq/reut-00039.xml
tm/inst/texts/acq/reut-00054.xml
tm/inst/texts/acq/reut-00021.xml
tm/inst/texts/acq/reut-00018.xml
tm/inst/texts/rcv1_2330.xml
tm/inst/texts/reuters-21578.xml
tm/inst/texts/custom.xml
tm/inst/texts/loremipsum.txt
tm/inst/texts/txt
tm/inst/texts/txt/ovid_4.txt
tm/inst/texts/txt/ovid_2.txt
tm/inst/texts/txt/ovid_3.txt
tm/inst/texts/txt/ovid_1.txt
tm/inst/texts/txt/ovid_5.txt
tm/inst/stopwords
tm/inst/stopwords/portuguese.dat
tm/inst/stopwords/french.dat
tm/inst/stopwords/hungarian.dat
tm/inst/stopwords/swedish.dat
tm/inst/stopwords/norwegian.dat
tm/inst/stopwords/russian.dat
tm/inst/stopwords/italian.dat
tm/inst/stopwords/english.dat
tm/inst/stopwords/dutch.dat
tm/inst/stopwords/finnish.dat
tm/inst/stopwords/german.dat
tm/inst/stopwords/catalan.dat
tm/inst/stopwords/romanian.dat
tm/inst/stopwords/danish.dat
tm/inst/stopwords/SMART.dat
tm/inst/stopwords/spanish.dat
tm/src
tm/src/copy.c
tm/src/tdm.cpp
tm/src/init.c
tm/src/RcppExports.cpp
tm/NAMESPACE
tm/data
tm/data/crude.rda
tm/data/acq.rda
tm/R
tm/R/utils.R tm/R/stopwords.R tm/R/plot.R tm/R/foreign.R tm/R/score.R tm/R/filter.R tm/R/meta.R tm/R/corpus.R tm/R/RcppExports.R tm/R/weight.R tm/R/hpc.R tm/R/doc.R tm/R/source.R tm/R/transform.R tm/R/pdftools.R tm/R/complete.R tm/R/matrix.R tm/R/reader.R tm/R/tokenizer.R
tm/vignettes
tm/vignettes/extensions.Rnw
tm/vignettes/tm.Rnw
tm/vignettes/references.bib
tm/MD5
tm/build
tm/build/vignette.rds
tm/DESCRIPTION
tm/man
tm/man/meta.Rd tm/man/tm_map.Rd tm/man/removeWords.Rd tm/man/Docs.Rd tm/man/readXML.Rd tm/man/Corpus.Rd tm/man/PCorpus.Rd tm/man/readDOC.Rd tm/man/combine.Rd tm/man/getTokenizers.Rd tm/man/foreign.Rd tm/man/PlainTextDocument.Rd tm/man/VectorSource.Rd tm/man/readTagged.Rd tm/man/matrix.Rd tm/man/tm_filter.Rd tm/man/SimpleCorpus.Rd tm/man/stripWhitespace.Rd tm/man/XMLSource.Rd tm/man/tm_reduce.Rd tm/man/readReut21578XML.Rd tm/man/URISource.Rd tm/man/hpc.Rd tm/man/removeSparseTerms.Rd tm/man/weightTfIdf.Rd tm/man/Zipf_n_Heaps.Rd tm/man/crude.Rd tm/man/getTransformations.Rd tm/man/content_transformer.Rd tm/man/readPlain.Rd tm/man/readPDF.Rd tm/man/tm_term_score.Rd tm/man/removeNumbers.Rd tm/man/findFreqTerms.Rd tm/man/DataframeSource.Rd tm/man/weightBin.Rd tm/man/weightTf.Rd tm/man/weightSMART.Rd tm/man/removePunctuation.Rd tm/man/readRCV1.Rd tm/man/VCorpus.Rd tm/man/stemCompletion.Rd tm/man/TextDocument.Rd tm/man/acq.Rd tm/man/writeCorpus.Rd tm/man/stemDocument.Rd tm/man/Reader.Rd tm/man/tokenizer.Rd tm/man/termFreq.Rd tm/man/findAssocs.Rd tm/man/Source.Rd tm/man/XMLTextDocument.Rd tm/man/plot.Rd tm/man/inspect.Rd tm/man/stopwords.Rd tm/man/DirSource.Rd tm/man/readTabular.Rd tm/man/WeightFunction.Rd tm/man/ZipSource.Rd tm/man/findMostFreqTerms.Rd

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.