Description Usage Arguments Details Value See Also Examples
Create a data frame source.
1 |
x |
A data frame giving the texts and metadata. |
A data frame source interprets each row of the data frame x
as a
document. The first column must be named "doc_id"
and contain a unique
string identifier for each document. The second column must be named
"text"
and contain a UTF-8 encoded string representing the
document's content. Optional additional columns are used as document level
metadata.
An object inheriting from DataframeSource
, SimpleSource
,
and Source
.
Source
for basic information on the source infrastructure
employed by package tm, and meta
for types of metadata.
readtext
for reading in a text in multiple formats
suitable to be processed by DataframeSource
.
1 2 3 4 5 6 7 8 | docs <- data.frame(doc_id = c("doc_1", "doc_2"),
text = c("This is a text.", "This another one."),
dmeta1 = 1:2, dmeta2 = letters[1:2],
stringsAsFactors = FALSE)
(ds <- DataframeSource(docs))
x <- Corpus(ds)
inspect(x)
meta(x)
|
Loading required package: NLP
$encoding
[1] ""
$length
[1] 2
$position
[1] 0
$reader
function (elem, language, id)
{
if (!is.null(elem$uri))
id <- basename(elem$uri)
PlainTextDocument(elem$content, id = id, language = language)
}
<environment: namespace:tm>
$content
doc_id text dmeta1 dmeta2
1 doc_1 This is a text. 1 a
2 doc_2 This another one. 2 b
attr(,"class")
[1] "DataframeSource" "SimpleSource" "Source"
<<VCorpus>>
Metadata: corpus specific: 0, document level (indexed): 0
Content: documents: 2
[[1]]
<<PlainTextDocument>>
Metadata: 7
Content: chars: 22
[[2]]
<<PlainTextDocument>>
Metadata: 7
Content: chars: 24
data frame with 0 columns and 2 rows
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.