tm: Text Mining Package

A framework for text mining applications within R.

AuthorIngo Feinerer [aut, cre], Kurt Hornik [aut], Artifex Software, Inc. [ctb, cph] (pdf_info.ps taken from GPL Ghostscript)
Date of publication2015-07-03 10:43:07
MaintainerIngo Feinerer <feinerer@logic.at>
LicenseGPL-3
Version0.6-2
http://tm.r-forge.r-project.org/

View on CRAN

Man pages

acq: 50 Exemplary News Articles from the Reuters-21578 Data Set of...

combine: Combine Corpora, Documents, Term-Document Matrices, and Term...

content_transformer: Content Transformers

Corpus: Corpora

crude: 20 Exemplary News Articles from the Reuters-21578 Data Set of...

DataframeSource: Data Frame Source

DirSource: Directory Source

Docs: Access Document IDs and Terms

findAssocs: Find Associations in a Term-Document Matrix

findFreqTerms: Find Frequent Terms

foreign: Read Document-Term Matrices

getTokenizers: Tokenizers

getTransformations: Transformations

inspect: Inspect Objects

matrix: Term-Document Matrix

meta: Metadata Management

PCorpus: Permanent Corpora

PlainTextDocument: Plain Text Documents

plot: Visualize a Term-Document Matrix

readDOC: Read In a MS Word Document

Reader: Readers

readPDF: Read In a PDF Document

readPlain: Read In a Text Document

readRCV1: Read In a Reuters Corpus Volume 1 Document

readReut21578XML: Read In a Reuters-21578 XML Document

readTabular: Read In a Text Document

readTagged: Read In a POS-Tagged Word Text Document

readXML: Read In an XML Document

removeNumbers: Remove Numbers from a Text Document

removePunctuation: Remove Punctuation Marks from a Text Document

removeSparseTerms: Remove Sparse Terms from a Term-Document Matrix

removeWords: Remove Words from a Text Document

Source: Sources

stemCompletion: Complete Stems

stemDocument: Stem Words

stopwords: Stopwords

stripWhitespace: Strip Whitespace from a Text Document

termFreq: Term Frequency Vector

TextDocument: Text Documents

tm_filter: Filter and Index Functions on Corpora

tm_map: Transformations on Corpora

tm_reduce: Combine Transformations

tm_term_score: Compute Score for Matching Terms

tokenizer: Tokenizers

URISource: Uniform Resource Identifier Source

VCorpus: Volatile Corpora

VectorSource: Vector Source

weightBin: Weight Binary

WeightFunction: Weighting Function

weightSMART: SMART Weightings

weightTf: Weight by Term Frequency

weightTfIdf: Weight by Term Frequency - Inverse Document Frequency

writeCorpus: Write a Corpus to Disk

XMLSource: XML Source

XMLTextDocument: XML Text Documents

Zipf_n_Heaps: Explore Corpus Term Frequency Characteristics

ZipSource: ZIP File Source

Files in this package

tm
tm/inst
tm/inst/CITATION
tm/inst/NEWS.Rd
tm/inst/ghostscript
tm/inst/ghostscript/pdf_info.ps
tm/inst/doc
tm/inst/doc/tm.pdf
tm/inst/doc/extensions.Rnw
tm/inst/doc/extensions.R
tm/inst/doc/tm.Rnw
tm/inst/doc/tm.R
tm/inst/doc/extensions.pdf
tm/inst/texts
tm/inst/texts/crude
tm/inst/texts/crude/reut-00004.xml
tm/inst/texts/crude/reut-00009.xml
tm/inst/texts/crude/reut-00008.xml
tm/inst/texts/crude/reut-00014.xml
tm/inst/texts/crude/reut-00001.xml
tm/inst/texts/crude/reut-00022.xml
tm/inst/texts/crude/reut-00007.xml
tm/inst/texts/crude/reut-00002.xml
tm/inst/texts/crude/reut-00023.xml
tm/inst/texts/crude/reut-00016.xml
tm/inst/texts/crude/reut-00005.xml
tm/inst/texts/crude/reut-00011.xml
tm/inst/texts/crude/reut-00015.xml
tm/inst/texts/crude/reut-00010.xml
tm/inst/texts/crude/reut-00012.xml
tm/inst/texts/crude/reut-00006.xml
tm/inst/texts/crude/reut-00013.xml
tm/inst/texts/crude/reut-00019.xml
tm/inst/texts/crude/reut-00021.xml
tm/inst/texts/crude/reut-00018.xml
tm/inst/texts/acq
tm/inst/texts/acq/reut-00042.xml
tm/inst/texts/acq/reut-00004.xml
tm/inst/texts/acq/reut-00035.xml
tm/inst/texts/acq/reut-00024.xml
tm/inst/texts/acq/reut-00009.xml
tm/inst/texts/acq/reut-00031.xml
tm/inst/texts/acq/reut-00056.xml
tm/inst/texts/acq/reut-00051.xml
tm/inst/texts/acq/reut-00008.xml
tm/inst/texts/acq/reut-00014.xml
tm/inst/texts/acq/reut-00001.xml
tm/inst/texts/acq/reut-00022.xml
tm/inst/texts/acq/reut-00026.xml
tm/inst/texts/acq/reut-00007.xml
tm/inst/texts/acq/reut-00045.xml
tm/inst/texts/acq/reut-00002.xml
tm/inst/texts/acq/reut-00029.xml
tm/inst/texts/acq/reut-00030.xml
tm/inst/texts/acq/reut-00027.xml
tm/inst/texts/acq/reut-00023.xml
tm/inst/texts/acq/reut-00048.xml
tm/inst/texts/acq/reut-00016.xml
tm/inst/texts/acq/reut-00017.xml
tm/inst/texts/acq/reut-00047.xml
tm/inst/texts/acq/reut-00028.xml
tm/inst/texts/acq/reut-00043.xml
tm/inst/texts/acq/reut-00005.xml
tm/inst/texts/acq/reut-00049.xml
tm/inst/texts/acq/reut-00052.xml
tm/inst/texts/acq/reut-00011.xml
tm/inst/texts/acq/reut-00015.xml
tm/inst/texts/acq/reut-00050.xml
tm/inst/texts/acq/reut-00053.xml
tm/inst/texts/acq/reut-00010.xml
tm/inst/texts/acq/reut-00046.xml
tm/inst/texts/acq/reut-00034.xml
tm/inst/texts/acq/reut-00020.xml
tm/inst/texts/acq/reut-00012.xml
tm/inst/texts/acq/reut-00025.xml
tm/inst/texts/acq/reut-00006.xml
tm/inst/texts/acq/reut-00032.xml
tm/inst/texts/acq/reut-00003.xml
tm/inst/texts/acq/reut-00036.xml
tm/inst/texts/acq/reut-00013.xml
tm/inst/texts/acq/reut-00055.xml
tm/inst/texts/acq/reut-00040.xml
tm/inst/texts/acq/reut-00039.xml
tm/inst/texts/acq/reut-00054.xml
tm/inst/texts/acq/reut-00021.xml
tm/inst/texts/acq/reut-00018.xml
tm/inst/texts/rcv1_2330.xml
tm/inst/texts/reuters-21578.xml
tm/inst/texts/custom.xml
tm/inst/texts/loremipsum.txt
tm/inst/texts/txt
tm/inst/texts/txt/ovid_4.txt
tm/inst/texts/txt/ovid_2.txt
tm/inst/texts/txt/ovid_3.txt
tm/inst/texts/txt/ovid_1.txt
tm/inst/texts/txt/ovid_5.txt
tm/inst/stopwords
tm/inst/stopwords/portuguese.dat
tm/inst/stopwords/french.dat
tm/inst/stopwords/hungarian.dat
tm/inst/stopwords/swedish.dat
tm/inst/stopwords/norwegian.dat
tm/inst/stopwords/russian.dat
tm/inst/stopwords/italian.dat
tm/inst/stopwords/english.dat
tm/inst/stopwords/dutch.dat
tm/inst/stopwords/finnish.dat
tm/inst/stopwords/german.dat
tm/inst/stopwords/catalan.dat
tm/inst/stopwords/romanian.dat
tm/inst/stopwords/danish.dat
tm/inst/stopwords/SMART.dat
tm/inst/stopwords/spanish.dat
tm/src
tm/src/copy.c
tm/NAMESPACE
tm/data
tm/data/crude.rda
tm/data/acq.rda
tm/R
tm/R/utils.R tm/R/stopwords.R tm/R/plot.R tm/R/foreign.R tm/R/score.R tm/R/filter.R tm/R/meta.R tm/R/corpus.R tm/R/weight.R tm/R/doc.R tm/R/source.R tm/R/transform.R tm/R/pdftools.R tm/R/complete.R tm/R/matrix.R tm/R/reader.R tm/R/tokenizer.R
tm/vignettes
tm/vignettes/extensions.Rnw
tm/vignettes/tm.Rnw
tm/vignettes/references.bib
tm/MD5
tm/build
tm/build/vignette.rds
tm/DESCRIPTION
tm/man
tm/man/meta.Rd tm/man/tm_map.Rd tm/man/removeWords.Rd tm/man/Docs.Rd tm/man/readXML.Rd tm/man/Corpus.Rd tm/man/PCorpus.Rd tm/man/readDOC.Rd tm/man/combine.Rd tm/man/getTokenizers.Rd tm/man/foreign.Rd tm/man/PlainTextDocument.Rd tm/man/VectorSource.Rd tm/man/readTagged.Rd tm/man/matrix.Rd tm/man/tm_filter.Rd tm/man/stripWhitespace.Rd tm/man/XMLSource.Rd tm/man/tm_reduce.Rd tm/man/readReut21578XML.Rd tm/man/URISource.Rd tm/man/removeSparseTerms.Rd tm/man/weightTfIdf.Rd tm/man/Zipf_n_Heaps.Rd tm/man/crude.Rd tm/man/getTransformations.Rd tm/man/content_transformer.Rd tm/man/readPlain.Rd tm/man/readPDF.Rd tm/man/tm_term_score.Rd tm/man/removeNumbers.Rd tm/man/findFreqTerms.Rd tm/man/DataframeSource.Rd tm/man/weightBin.Rd tm/man/weightTf.Rd tm/man/weightSMART.Rd tm/man/removePunctuation.Rd tm/man/readRCV1.Rd tm/man/VCorpus.Rd tm/man/stemCompletion.Rd tm/man/TextDocument.Rd tm/man/acq.Rd tm/man/writeCorpus.Rd tm/man/stemDocument.Rd tm/man/Reader.Rd tm/man/tokenizer.Rd tm/man/termFreq.Rd tm/man/findAssocs.Rd tm/man/Source.Rd tm/man/XMLTextDocument.Rd tm/man/plot.Rd tm/man/inspect.Rd tm/man/stopwords.Rd tm/man/DirSource.Rd tm/man/readTabular.Rd tm/man/WeightFunction.Rd tm/man/ZipSource.Rd

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.