readHBWiWo: Read the HB WiWo Corpus

View source: R/readHBWiWo.R

readHBWiWoR Documentation

Read the HB WiWo Corpus

Description

Reads the XML-files from the HB WiWo corpus and seperates the text and meta data.

Usage

readHBWiWo(path = getwd(), file = list.files(path = path, pattern =
  "*.xml$", full.names = FALSE, recursive = TRUE), do.meta = TRUE,
  do.text = TRUE)

Arguments

path

Character string with Path where the data files are.

file

Character string with names of the XML files.

do.meta

Logical: Should the algorithm collect meta data?

do.text

Logical: Should the algorithm collect text data?

Value

meta

id source date title abstract dachzeile

text

Text

metamult

person company industry country author category klassifikation (mehrere moeglich) thema sachgruppe serie


Docma-TU/tmT documentation built on May 5, 2022, 12:45 a.m.