readSZ | R Documentation |
Reads the XML-files from the SZ corpus and seperates the text and meta data.
readSZ(path = getwd(), file = list.files(path = path, pattern = "*.xml$", full.names = FALSE, recursive = TRUE, ignore.case = TRUE), do.meta = TRUE, do.text = TRUE)
path |
Path where the data files are. |
file |
Character string with names of the HTML files. |
do.meta |
Logical: Should the algorithm collect meta data? |
do.text |
Logical: Should the algorithm collect text data? |
meta |
id date rubrik page AnzChar AnzWoerter dachzeile title zwischentitel untertitel |
text |
Text (Paragraphenweise) |
##---- Should be DIRECTLY executable !! ----
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.