Description Usage Arguments Details Value Examples
Removes XML tags (removeXML), remove or resolve HTML tags (removeHTML) and changes german umlauts in a standardized form (removeUmlauts).
1 2 3 4 5 6 7 8 9 10 11 12 13 | removeXML(x)
removeUmlauts(x)
removeHTML(
x,
dec = TRUE,
hex = TRUE,
entity = TRUE,
symbolList = c(1:4, 9, 13, 15, 16),
delete = TRUE,
symbols = FALSE
)
|
x |
Character: Vector or list of character vectors. |
dec |
Logical: If |
hex |
Logical: If |
entity |
Logical: If |
symbolList |
numeric vector to chhose from the 16 ISO-8859 Lists (ISO-8859 12 did not exists and is empty). |
delete |
Logical: If |
symbols |
Logical: If |
The decision which u.type is used should consider the language of the corpus, because in some languages the replacement of umlauts can change the meaning of a word.
To change which columns are used by removeXML use argument xmlAction in readTextmeta
.
Adjusted character string or list, depending on input.
1 2 3 4 5 6 7 8 9 | xml <- "<text>Some <b>important</b> text</text>"
removeXML(xml)
x <- "ø ø ø"
removeHTML(x=x, symbolList = 1, dec=TRUE, hex=FALSE, entity=FALSE, delete = FALSE)
removeHTML(x=x, symbolList = c(1,3))
y <- c("Bl\UFChende Apfelb\UE4ume")
removeUmlauts(y)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.