tm: Text Mining Package
Version 0.7-1

A framework for text mining applications within R.

Browse man pages Browse package API and functions Browse package files

AuthorIngo Feinerer [aut, cre], Kurt Hornik [aut], Artifex Software, Inc. [ctb, cph] (pdf_info.ps taken from GPL Ghostscript)
Date of publication2017-03-02 17:45:01
MaintainerIngo Feinerer <feinerer@logic.at>
LicenseGPL-3
Version0.7-1
URL http://tm.r-forge.r-project.org/
Package repositoryView on CRAN
InstallationInstall the latest version of this package by entering the following in R:
install.packages("tm")

Man pages

acq: 50 Exemplary News Articles from the Reuters-21578 Data Set of...
combine: Combine Corpora, Documents, Term-Document Matrices, and Term...
content_transformer: Content Transformers
Corpus: Corpora
crude: 20 Exemplary News Articles from the Reuters-21578 Data Set of...
DataframeSource: Data Frame Source
DirSource: Directory Source
Docs: Access Document IDs and Terms
findAssocs: Find Associations in a Term-Document Matrix
findFreqTerms: Find Frequent Terms
findMostFreqTerms: Find Most Frequent Terms
foreign: Read Document-Term Matrices
getTokenizers: Tokenizers
getTransformations: Transformations
hpc: Parallelized 'lapply'
inspect: Inspect Objects
matrix: Term-Document Matrix
meta: Metadata Management
PCorpus: Permanent Corpora
PlainTextDocument: Plain Text Documents
plot: Visualize a Term-Document Matrix
readDOC: Read In a MS Word Document
Reader: Readers
readPDF: Read In a PDF Document
readPlain: Read In a Text Document
readRCV1: Read In a Reuters Corpus Volume 1 Document
readReut21578XML: Read In a Reuters-21578 XML Document
readTabular: Read In a Text Document
readTagged: Read In a POS-Tagged Word Text Document
readXML: Read In an XML Document
removeNumbers: Remove Numbers from a Text Document
removePunctuation: Remove Punctuation Marks from a Text Document
removeSparseTerms: Remove Sparse Terms from a Term-Document Matrix
removeWords: Remove Words from a Text Document
SimpleCorpus: Simple Corpora
Source: Sources
stemCompletion: Complete Stems
stemDocument: Stem Words
stopwords: Stopwords
stripWhitespace: Strip Whitespace from a Text Document
termFreq: Term Frequency Vector
TextDocument: Text Documents
tm_filter: Filter and Index Functions on Corpora
tm_map: Transformations on Corpora
tm_reduce: Combine Transformations
tm_term_score: Compute Score for Matching Terms
tokenizer: Tokenizers
URISource: Uniform Resource Identifier Source
VCorpus: Volatile Corpora
VectorSource: Vector Source
weightBin: Weight Binary
WeightFunction: Weighting Function
weightSMART: SMART Weightings
weightTf: Weight by Term Frequency
weightTfIdf: Weight by Term Frequency - Inverse Document Frequency
writeCorpus: Write a Corpus to Disk
XMLSource: XML Source
XMLTextDocument: XML Text Documents
Zipf_n_Heaps: Explore Corpus Term Frequency Characteristics
ZipSource: ZIP File Source

Functions

CategorizedDocumentTermMatrix Source code
Corpus Man page Source code
CorpusMeta Source code
DataframeSource Man page Source code
DirSource Man page Source code
Docs Man page Source code
DocumentTermMatrix Man page Source code
DublinCore Man page Source code
DublinCore<- Man page
FunctionGenerator Man page Source code
Heaps_plot Man page Source code
MC_tokenizer Man page
PCorpus Man page Source code
PDF_Date_to_POSIXt Source code
PlainTextDocument Man page Source code
Reader Man page
SimpleCorpus Man page Source code
SimpleSource Man page Source code
SimpleTripletMatrix Source code
Source Man page Man page
TermDocumentMatrix Man page Source code Source code
TermDocumentMatrix.SimpleCorpus Source code
Terms Man page Source code
TextDocument Man page
TextDocumentMeta Source code
URISource Man page Source code
VCorpus Man page Source code
VectorSource Man page Source code
WeightFunction Man page Source code
XMLSource Man page Source code
XMLTextDocument Man page Source code
ZipSource Man page Source code
Zipf_plot Man page Source code
acq Man page
as.DocumentTermMatrix Man page Source code
as.DocumentTermMatrix.DocumentTermMatrix Source code
as.DocumentTermMatrix.TermDocumentMatrix Source code
as.DocumentTermMatrix.default Source code
as.TermDocumentMatrix Man page Source code
as.TermDocumentMatrix.DocumentTermMatrix Source code
as.TermDocumentMatrix.TermDocumentMatrix Source code
as.TermDocumentMatrix.default Source code
as.VCorpus Man page Source code
as.VCorpus.list Source code
as.character.PlainTextDocument Source code
as.character.XMLTextDocument Source code
as.list.SimpleCorpus Source code
c.DocumentTermMatrix Source code
c.TermDocumentMatrix Man page Source code
c.TextDocument Man page Source code
c.VCorpus Man page Source code
c.term_frequency Man page Source code
close.SimpleSource Man page
close.ZipSource Source code
content.PCorpus Source code
content.PlainTextDocument Source code
content.SimpleCorpus Source code
content.VCorpus Source code
content.XMLTextDocument Source code
content_transformer Man page Source code
crude Man page
cum_vocabulary_size Source code
eoi Man page Source code
eoi.SimpleSource Man page Source code
filter_global_bounds Source code
findAssocs Man page Source code
findAssocs.DocumentTermMatrix Man page Source code
findAssocs.TermDocumentMatrix Man page Source code
findAssocs.matrix Source code
findFreqTerms Man page Source code
findMostFreqTerms Man page Source code
findMostFreqTerms.DocumentTermMatrix Man page Source code
findMostFreqTerms.TermDocumentMatrix Man page Source code
findMostFreqTerms.term_frequency Man page Source code
format.PlainTextDocument Source code
format.VCorpus Source code
format_TextDocument Source code
getElem Man page Source code
getElem.DataframeSource Man page Source code
getElem.DirSource Man page Source code
getElem.URISource Man page Source code
getElem.VectorSource Man page Source code
getElem.XMLSource Man page Source code
getElem.ZipSource Source code
getReaders Man page Source code
getSources Man page Source code
getTokenizers Man page Source code
getTransformations Man page Source code
inspect Man page Source code
inspect.PCorpus Man page
inspect.TermDocumentMatrix Man page
inspect.TextDocument Man page Source code
inspect.VCorpus Man page Source code
length.SimpleSource Man page Source code
length.VCorpus Source code
map_name_index Source code
materialize Source code
meta Man page
meta.PCorpus Man page
meta.PlainTextDocument Man page Source code
meta.SimpleCorpus Man page Source code
meta.VCorpus Man page
meta.XMLTextDocument Man page
meta<-.PCorpus Man page
meta<-.PlainTextDocument Man page
meta<-.SimpleCorpus Man page
meta<-.VCorpus Man page
meta<-.XMLTextDocument Man page
nDocs Man page Source code
nTerms Man page Source code
names.VCorpus Source code
open.SimpleSource Man page
open.ZipSource Source code
outer_union Source code
pGetElem Man page Source code
pGetElem.DataframeSource Man page Source code
pGetElem.DirSource Man page Source code
pGetElem.URISource Man page Source code
pGetElem.VectorSource Man page Source code
pGetElem.ZipSource Source code
pdf_info_via_gs Source code
pdf_info_via_xpdf Source code
pdf_text_via_gs Source code
plot.TermDocumentMatrix Man page
prepareReader Source code
print.TextDocumentMeta Source code
print_via_format Source code
processURI Source code
readContent Source code
readDOC Man page Source code
readPDF Man page Source code
readPlain Man page Source code
readRCV1 Man page
readRCV1asPlain Man page
readReut21578XML Man page
readReut21578XMLasPlain Man page
readTabular Man page Source code
readTagged Man page Source code
readXML Man page Source code
read_all_bytes Source code
read_dtm_Blei_et_al Man page Source code
read_dtm_MC Man page Source code
reader Man page Source code
reader.SimpleSource Man page Source code
removeNumbers Man page Source code
removeNumbers.PlainTextDocument Man page
removeNumbers.character Source code
removePunctuation Man page Source code
removePunctuation.PlainTextDocument Man page
removePunctuation.character Man page Source code
removeSparseTerms Man page Source code
removeWords Man page Source code
removeWords.PlainTextDocument Man page
removeWords.character Man page Source code
sample.TermDocumentMatrix Source code
scan_tokenizer Man page
stemCompletion Man page Source code
stemDocument Man page Source code
stemDocument.PlainTextDocument Man page Source code
stemDocument.character Man page Source code
stepNext Man page Source code
stepNext.SimpleSource Man page Source code
stopwords Man page
stripWhitespace Man page Source code
stripWhitespace.PlainTextDocument Man page
stripWhitespace.character Source code
table Source code
tdm Source code
termFreq Man page Source code
tm_filter Man page Source code
tm_filter.PCorpus Man page
tm_filter.SimpleCorpus Man page
tm_filter.VCorpus Man page Source code
tm_index Man page Source code
tm_index.PCorpus Man page
tm_index.SimpleCorpus Man page
tm_index.VCorpus Man page Source code
tm_map Man page Source code
tm_map.PCorpus Man page Source code
tm_map.SimpleCorpus Man page Source code
tm_map.VCorpus Man page Source code
tm_parLapply Man page Source code
tm_parLapply_engine Man page
tm_reduce Man page Source code
tm_term_score Man page Source code
tm_term_score.DocumentTermMatrix Man page Source code
tm_term_score.PlainTextDocument Man page Source code
tm_term_score.TermDocumentMatrix Man page Source code
tm_term_score.term_frequency Man page Source code
weightBin Man page
weightSMART Man page
weightTf Man page
weightTfIdf Man page
writeCorpus Man page Source code
xml_content Source code
xml_value_if_not_null Source code

Files

inst
inst/CITATION
inst/NEWS.Rd
inst/ghostscript
inst/ghostscript/pdf_info.ps
inst/doc
inst/doc/tm.pdf
inst/doc/extensions.Rnw
inst/doc/extensions.R
inst/doc/tm.Rnw
inst/doc/tm.R
inst/doc/extensions.pdf
inst/texts
inst/texts/crude
inst/texts/crude/reut-00004.xml
inst/texts/crude/reut-00009.xml
inst/texts/crude/reut-00008.xml
inst/texts/crude/reut-00014.xml
inst/texts/crude/reut-00001.xml
inst/texts/crude/reut-00022.xml
inst/texts/crude/reut-00007.xml
inst/texts/crude/reut-00002.xml
inst/texts/crude/reut-00023.xml
inst/texts/crude/reut-00016.xml
inst/texts/crude/reut-00005.xml
inst/texts/crude/reut-00011.xml
inst/texts/crude/reut-00015.xml
inst/texts/crude/reut-00010.xml
inst/texts/crude/reut-00012.xml
inst/texts/crude/reut-00006.xml
inst/texts/crude/reut-00013.xml
inst/texts/crude/reut-00019.xml
inst/texts/crude/reut-00021.xml
inst/texts/crude/reut-00018.xml
inst/texts/acq
inst/texts/acq/reut-00042.xml
inst/texts/acq/reut-00004.xml
inst/texts/acq/reut-00035.xml
inst/texts/acq/reut-00024.xml
inst/texts/acq/reut-00009.xml
inst/texts/acq/reut-00031.xml
inst/texts/acq/reut-00056.xml
inst/texts/acq/reut-00051.xml
inst/texts/acq/reut-00008.xml
inst/texts/acq/reut-00014.xml
inst/texts/acq/reut-00001.xml
inst/texts/acq/reut-00022.xml
inst/texts/acq/reut-00026.xml
inst/texts/acq/reut-00007.xml
inst/texts/acq/reut-00045.xml
inst/texts/acq/reut-00002.xml
inst/texts/acq/reut-00029.xml
inst/texts/acq/reut-00030.xml
inst/texts/acq/reut-00027.xml
inst/texts/acq/reut-00023.xml
inst/texts/acq/reut-00048.xml
inst/texts/acq/reut-00016.xml
inst/texts/acq/reut-00017.xml
inst/texts/acq/reut-00047.xml
inst/texts/acq/reut-00028.xml
inst/texts/acq/reut-00043.xml
inst/texts/acq/reut-00005.xml
inst/texts/acq/reut-00049.xml
inst/texts/acq/reut-00052.xml
inst/texts/acq/reut-00011.xml
inst/texts/acq/reut-00015.xml
inst/texts/acq/reut-00050.xml
inst/texts/acq/reut-00053.xml
inst/texts/acq/reut-00010.xml
inst/texts/acq/reut-00046.xml
inst/texts/acq/reut-00034.xml
inst/texts/acq/reut-00020.xml
inst/texts/acq/reut-00012.xml
inst/texts/acq/reut-00025.xml
inst/texts/acq/reut-00006.xml
inst/texts/acq/reut-00032.xml
inst/texts/acq/reut-00003.xml
inst/texts/acq/reut-00036.xml
inst/texts/acq/reut-00013.xml
inst/texts/acq/reut-00055.xml
inst/texts/acq/reut-00040.xml
inst/texts/acq/reut-00039.xml
inst/texts/acq/reut-00054.xml
inst/texts/acq/reut-00021.xml
inst/texts/acq/reut-00018.xml
inst/texts/rcv1_2330.xml
inst/texts/reuters-21578.xml
inst/texts/custom.xml
inst/texts/loremipsum.txt
inst/texts/txt
inst/texts/txt/ovid_4.txt
inst/texts/txt/ovid_2.txt
inst/texts/txt/ovid_3.txt
inst/texts/txt/ovid_1.txt
inst/texts/txt/ovid_5.txt
inst/stopwords
inst/stopwords/portuguese.dat
inst/stopwords/french.dat
inst/stopwords/hungarian.dat
inst/stopwords/swedish.dat
inst/stopwords/norwegian.dat
inst/stopwords/russian.dat
inst/stopwords/italian.dat
inst/stopwords/english.dat
inst/stopwords/dutch.dat
inst/stopwords/finnish.dat
inst/stopwords/german.dat
inst/stopwords/catalan.dat
inst/stopwords/romanian.dat
inst/stopwords/danish.dat
inst/stopwords/SMART.dat
inst/stopwords/spanish.dat
src
src/copy.c
src/tdm.cpp
src/init.c
src/RcppExports.cpp
NAMESPACE
data
data/crude.rda
data/acq.rda
R
R/utils.R
R/stopwords.R
R/plot.R
R/foreign.R
R/score.R
R/filter.R
R/meta.R
R/corpus.R
R/RcppExports.R
R/weight.R
R/hpc.R
R/doc.R
R/source.R
R/transform.R
R/pdftools.R
R/complete.R
R/matrix.R
R/reader.R
R/tokenizer.R
vignettes
vignettes/extensions.Rnw
vignettes/tm.Rnw
vignettes/references.bib
MD5
build
build/vignette.rds
DESCRIPTION
man
man/meta.Rd
man/tm_map.Rd
man/removeWords.Rd
man/Docs.Rd
man/readXML.Rd
man/Corpus.Rd
man/PCorpus.Rd
man/readDOC.Rd
man/combine.Rd
man/getTokenizers.Rd
man/foreign.Rd
man/PlainTextDocument.Rd
man/VectorSource.Rd
man/readTagged.Rd
man/matrix.Rd
man/tm_filter.Rd
man/SimpleCorpus.Rd
man/stripWhitespace.Rd
man/XMLSource.Rd
man/tm_reduce.Rd
man/readReut21578XML.Rd
man/URISource.Rd
man/hpc.Rd
man/removeSparseTerms.Rd
man/weightTfIdf.Rd
man/Zipf_n_Heaps.Rd
man/crude.Rd
man/getTransformations.Rd
man/content_transformer.Rd
man/readPlain.Rd
man/readPDF.Rd
man/tm_term_score.Rd
man/removeNumbers.Rd
man/findFreqTerms.Rd
man/DataframeSource.Rd
man/weightBin.Rd
man/weightTf.Rd
man/weightSMART.Rd
man/removePunctuation.Rd
man/readRCV1.Rd
man/VCorpus.Rd
man/stemCompletion.Rd
man/TextDocument.Rd
man/acq.Rd
man/writeCorpus.Rd
man/stemDocument.Rd
man/Reader.Rd
man/tokenizer.Rd
man/termFreq.Rd
man/findAssocs.Rd
man/Source.Rd
man/XMLTextDocument.Rd
man/plot.Rd
man/inspect.Rd
man/stopwords.Rd
man/DirSource.Rd
man/readTabular.Rd
man/WeightFunction.Rd
man/ZipSource.Rd
man/findMostFreqTerms.Rd
tm documentation built on May 20, 2017, 4:40 a.m.