TextDocument: Text Documents

TextDocumentR Documentation

Text Documents

Description

Representing and computing on text documents.

Details

Text documents are documents containing (natural language) text. The tm package employs the infrastructure provided by package NLP and represents text documents via the virtual S3 class TextDocument. Actual S3 text document classes then extend the virtual base class (such as PlainTextDocument).

All extension classes must provide an as.character method which extracts the natural language text in documents of the respective classes in a “suitable” (not necessarily structured) form, as well as content and meta methods for accessing the (possibly raw) document content and metadata.

See Also

PlainTextDocument, and XMLTextDocument for the text document classes provided by package tm.

TextDocument for text documents in package NLP.


tm documentation built on Sept. 11, 2024, 6:47 p.m.