| Reader | R Documentation |
Creating readers.
getReaders()
Readers are functions for extracting textual content and metadata out
of elements delivered by a Source, and for constructing a
TextDocument. A reader must accept following arguments in
its signature:
elema named list with the components content and
uri (as delivered by a Source via
getElem or pGetElem).
languagea character string giving the language.
ida character giving a unique identifier for the created text document.
The element elem is typically provided by a source whereas the language
and the identifier are normally provided by a corpus constructor (for the case
that elem$content does not give information on these two essential
items).
In case a reader expects configuration arguments we can use a function
generator. A function generator is indicated by inheriting from class
FunctionGenerator and function. It allows us to process
additional arguments, store them in an environment, return a reader function
with the well-defined signature described above, and still be able to access
the additional arguments via lexical scoping. All corpus constructors in
package tm check the reader function for being a function generator and
if so apply it to yield the reader with the expected signature.
For getReaders(), a character vector with readers provided by package
tm.
readDOC, readPDF, readPlain,
readRCV1, readRCV1asPlain,
readReut21578XML, readReut21578XMLasPlain,
and readXML.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.