View source: R/delete.markup.R
delete.markup | R Documentation |
Function for removing markup tags (e.g. HTML, XML) from a string of characters. All XML markup is assumed to be compliant with the TEI guidelines (https://tei-c.org/).
delete.markup(input.text, markup.type = "plain")
input.text |
any string of characters (e.g. vector) containing markup tags that have to be deleted. |
markup.type |
any of the following values: |
This function needs to be used carefully: while a document formatted in compliance with the TEI guidelines will be parsed flawlessly, the cleaning up of an HTML page harvested randomly on the web might cause some side effects, e.g. the footers, disclaimers, etc. will not be removed.
Maciej Eder, Mike Kestemont
load.corpus
, txt.to.words
,
txt.to.words.ext
, txt.to.features
delete.markup("Gallia est omnis <i>divisa</i> in partes tres",
markup.type = "html")
delete.markup("Gallia<note>Gallia: Gaul.</note> est omnis
<emph>divisa</emph> in partes tres", markup.type = "xml")
delete.markup("<speaker>Hamlet</speaker>Words, words, words...",
markup.type = "xml.drama")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.