Home

/

GitHub

/

Docma-TU/tmT

/

readSZ: Read the SZ corpus

readSZ: Read the SZ corpus
In Docma-TU/tmT: Textmining Tools

View source: R/readSZ.R

readSZ

R Documentation

Read the SZ corpus

Description

Reads the XML-files from the SZ corpus and seperates the text and meta data.

Usage

readSZ(path = getwd(), file = list.files(path = path, pattern =
  "*.xml$", full.names = FALSE, recursive = TRUE, ignore.case = TRUE),
  do.meta = TRUE, do.text = TRUE)

Arguments

`path`	Path where the data files are.
`file`	Character string with names of the HTML files.
`do.meta`	Logical: Should the algorithm collect meta data?
`do.text`	Logical: Should the algorithm collect text data?

Value

`meta`	id date rubrik page AnzChar AnzWoerter dachzeile title zwischentitel untertitel
`text`	Text (Paragraphenweise)

Examples


##---- Should be DIRECTLY executable !! ----

Docma-TU/tmT documentation built on May 5, 2022, 12:45 a.m.

Docma-TU/tmT index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Docma-TU/tmT
Textmining Tools

readSZ: Read the SZ corpus
In Docma-TU/tmT: Textmining Tools

Read the SZ corpus

Description

Usage

Arguments

Value

Examples

Related to readSZ in Docma-TU/tmT...

R Package Documentation

Browse R Packages

We want your feedback!

Docma-TU/tmT Textmining Tools

readSZ: Read the SZ corpus In Docma-TU/tmT: Textmining Tools

Read the SZ corpus

Description

Usage

Arguments

Value

Examples

Related to readSZ in Docma-TU/tmT...

R Package Documentation

Browse R Packages

We want your feedback!

Docma-TU/tmT
Textmining Tools

readSZ: Read the SZ corpus
In Docma-TU/tmT: Textmining Tools