tm.plugin.factiva-package: A plug-in for the tm text mining framework to import articles...

Description Details Author(s) References

Description

This package provides a tm Source to create corpora from articles exported from Dow Jones's Factiva content provider as XML or HTML files.

Details

Typical usage is to create a corpus from a XML or HTML files exported from Factiva (here called myFactivaArticles.xml). Setting language=NA allows the language to be set automatically from the information provided by Factiva:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
    # Import corpus
    source <- FactivaSource("myFactivaArticles.xml")
    corpus <- Corpus(source, list(language=NA))

    # See how many articles were imported
    corpus

    # See the contents of the first article and its meta-data
    inspect(corpus[1])
    meta(corpus[[1]])
  

Currently, only HTML files saved in French are supported. Please send the maintainer examples of LexisNexis files in your language if you want it to be supported.

See link{FactivaSource} for more details and real examples.

Author(s)

Milan Bouchet-Valat <nalimilan@club.fr>

References

http://global.factiva.com/


tm.plugin.factiva documentation built on Oct. 30, 2019, 11:23 a.m.