AlcesteSource: Alceste Source

Description Usage Arguments Details Value Author(s) See Also Examples

Description

Construct a source for an input containing a set of texts saved in the Alceste format in a single text file.

Usage

1
  AlcesteSource(x, encoding = "auto")

Arguments

x

Either a character identifying the file or a connection.

encoding

A character string: if non-empty declares the encoding used when reading the file, so the character data can be re-encoded. See the ‘Encoding’ section of the help for file. The default, “auto”, uses stri_enc_detect to try to guess the encoding; this may fail, in which case the native encoding is used.

Details

Several texts are saved in a single Alceste-formatted file, separated by lines starting with “***” or digits, followed by starred variables (see links below). These variables are set as document meta-data that can be accessed via the meta function.

Currently, “theme” lines starting with “-*” are ignored.

Value

An object of class AlcesteSource which extends the class Source representing set of articles from Alceste.

Author(s)

Milan Bouchet-Valat

See Also

http://www.image-zafar.com/sites/default/files/telechargements/formatage_alceste.pdf (in French) about the Alceste format

readAlceste for the function actually parsing individual articles.

getSources to list available sources.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
    library(tm)
    file <- system.file("texts", "alceste_test.txt", 
                        package = "tm.plugin.alceste")
    corpus <- Corpus(AlcesteSource(file))

    # See the contents of the documents
    inspect(corpus)

    # See meta-data associated with first article
    meta(corpus[[1]])

Example output

Loading required package: NLP
<<VCorpus>>
Metadata:  corpus specific: 0, document level (indexed): 0
Content:  documents: 2

[[1]]
<<PlainTextDocument>>
Metadata:  10
Content:  chars: 127

[[2]]
<<PlainTextDocument>>
Metadata:  10
Content:  chars: 28

  author       : character(0)
  datetimestamp: 2020-02-20 00:00:03
  description  : character(0)
  heading      : character(0)
  id           : 1
  language     : en
  origin       : character(0)
  var1         : 1
  var2         : levela
  boolean1     : TRUE

tm.plugin.alceste documentation built on May 1, 2019, 10:30 p.m.