Description Usage Arguments Details Author(s) See Also Examples
View source: R/delete.markup.R
Function for removing markup tags (e.g. HTML, XML) from a string of characters. All XML markup is assumed to be compliant with the TEI guidelines (https://tei-c.org/).
1 | delete.markup(input.text, markup.type = "plain")
|
input.text |
any string of characters (e.g. vector) containing markup tags that have to be deleted. |
markup.type |
any of the following values: |
This function needs to be used carefully: while a document formatted in compliance with the TEI guidelines will be parsed flawlessly, the cleaning up of an HTML page harvested randomly on the web might cause some side effects, e.g. the footers, disclaimers, etc. will not be removed.
Maciej Eder, Mike Kestemont
load.corpus
, txt.to.words
,
txt.to.words.ext
, txt.to.features
1 2 3 4 5 6 7 8 | delete.markup("Gallia est omnis <i>divisa</i> in partes tres",
markup.type = "html")
delete.markup("Gallia<note>Gallia: Gaul.</note> est omnis
<emph>divisa</emph> in partes tres", markup.type = "xml")
delete.markup("<speaker>Hamlet</speaker>Words, words, words...",
markup.type = "xml.drama")
|
### stylo version: 0.6.9 ###
If you plan to cite this software (please do!), use the following reference:
Eder, M., Rybicki, J. and Kestemont, M. (2016). Stylometry with R:
a package for computational text analysis. R Journal 8(1): 107-121.
<https://journal.r-project.org/archive/2016/RJ-2016-007/index.html>
To get full BibTeX entry, type: citation("stylo")
Warning message:
no DISPLAY variable so Tk is not available
[1] "Gallia est omnis divisa in partes tres"
[1] "Gallia est omnis \n divisa in partes tres"
[1] "Words, words, words..."
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.