readHBWiWo: Read the HB WiWo Corpus
In Docma-TU/tmT: Textmining Tools

View source: R/readHBWiWo.R

readHBWiWo

R Documentation

Read the HB WiWo Corpus

Description

Reads the XML-files from the HB WiWo corpus and seperates the text and meta data.

Usage

readHBWiWo(path = getwd(), file = list.files(path = path, pattern =
  "*.xml$", full.names = FALSE, recursive = TRUE), do.meta = TRUE,
  do.text = TRUE)

Arguments

`path`	Character string with Path where the data files are.
`file`	Character string with names of the XML files.
`do.meta`	Logical: Should the algorithm collect meta data?
`do.text`	Logical: Should the algorithm collect text data?

Value

`meta`	id source date title abstract dachzeile
`text`	Text
`metamult`	person company industry country author category klassifikation (mehrere moeglich) thema sachgruppe serie