Description Usage Arguments Value References See Also Examples
Read in a Reuters Corpus Volume 1 XML document.
1 2 | readRCV1(elem, language, id)
readRCV1asPlain(elem, language, id)
|
elem |
a named list with the component |
language |
a string giving the language. |
id |
Not used. |
An XMLTextDocument
for readRCV1
, or a
PlainTextDocument
for readRCV1asPlain
, representing the
text and metadata extracted from elem$content
.
Lewis, D. D.; Yang, Y.; Rose, T.; and Li, F (2004). RCV1: A New Benchmark Collection for Text Categorization Research. Journal of Machine Learning Research, 5, 361–397. https://www.jmlr.org/papers/volume5/lewis04a/lewis04a.pdf
Reader
for basic information on the reader infrastructure
employed by package tm.
1 2 3 4 5 |
Loading required package: NLP
{xml_document}
<newsitem itemid="2330" id="root" date="1996-08-20" lang="en">
[1] <title>USA: Tylan stock jumps; weighs sale of company.</title>
[2] <headline>Tylan stock jumps; weighs sale of company.</headline>
[3] <dateline>SAN DIEGO</dateline>
[4] <text>\n <p>The stock of Tylan General Inc. jumped Tuesday after the mak ...
[5] <copyright>(c) Reuters Limited 1996</copyright>
[6] <metadata>\n <codes class="bip:countries:1.0">\n <code code="USA"> </ ...
author :
datetimestamp: 1996-08-20
description :
heading : USA: Tylan stock jumps; weighs sale of company.
id : 2330
language : en
origin : Reuters Corpus Volume 1
publisher : Reuters Holdings Plc
topics : c("C15", "C152", "C18", "C181", "CCAT")
industries : I34420
countries : USA
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.