DataframeSource: Data Frame Source
In tm: Text Mining Package

Description Usage Arguments Details Value See Also Examples

Create a data frame source.

1	DataframeSource(x)

`x`	A data frame giving the texts and metadata.

A data frame source interprets each row of the data frame x as a document. The first column must be named "doc_id" and contain a unique string identifier for each document. The second column must be named "text" and contain a UTF-8 encoded string representing the document's content. Optional additional columns are used as document level metadata.

An object inheriting from DataframeSource, SimpleSource, and Source.

Source for basic information on the source infrastructure employed by package tm, and meta for types of metadata.

readtext for reading in a text in multiple formats suitable to be processed by DataframeSource.

docs <- data.frame(doc_id = c("doc_1", "doc_2"),
                   text = c("This is a text.", "This another one."),
                   dmeta1 = 1:2, dmeta2 = letters[1:2],
                   stringsAsFactors = FALSE)
(ds <- DataframeSource(docs))
x <- Corpus(ds)
inspect(x)
meta(x)

Loading required package: NLP
$encoding
[1] ""

$length
[1] 2

$position
[1] 0

$reader
function (elem, language, id) 
{
    if (!is.null(elem$uri)) 
        id <- basename(elem$uri)
    PlainTextDocument(elem$content, id = id, language = language)
}
<environment: namespace:tm>

$content
  doc_id              text dmeta1 dmeta2
1  doc_1   This is a text.      1      a
2  doc_2 This another one.      2      b

attr(,"class")
[1] "DataframeSource" "SimpleSource"    "Source"         
<<VCorpus>>
Metadata:  corpus specific: 0, document level (indexed): 0
Content:  documents: 2

[[1]]
<<PlainTextDocument>>
Metadata:  7
Content:  chars: 22

[[2]]
<<PlainTextDocument>>
Metadata:  7
Content:  chars: 24

data frame with 0 columns and 2 rows