unescape_markup: Clean up xml or html markup tags and formatting
In slin30/wzMisc: Miscellaneous functions by WZ

unescape_markup

R Documentation

Clean up xml or html markup tags and formatting

Description

This is a minor modification of http://stackoverflow.com/questions/5060076/convert-html-character-entity-encoding-in-r, and all credit is due.

This function will call either xml2::read_xml() or xml2::read_html(), depending on the value passed to the argument. The default, if not specified, is html.

If called with iconv_encoding == TRUE, x is processed by iconv, which may or may not change x. In both the spirit of minimizing surprises, and with particular note to the potential of an early return if no unescaping is required, iconv_encoding is FALSE by default, and therefore any args that would be passed to iconv() via ... are ignored.

Usage

unescape_markup(x, what_ml = c("html", "xml"), iconv_encoding = FALSE, ...)

Arguments

`x`	A character; the input you wish to unescape
`what_ml`	One of `xml, html` to denote if content should be handled as such. Defaults to `html`
`iconv_encoding`	A logical vector of length 1. Should the input be processed via `iconv`?
`...`	Optional. Additional args to `iconv` and used when iconv_encoding is `TRUE`

Details

Useful when dealing with '< >' enclosed parts of strings in a vector

Value

A character vector the same length of x, with <x> unescaped. If no unescaping was required, will return x as is, by default.

Note

The xml2 functions this relies upon are not vectorized (this is a different use case, so no criticism is implied re: the functions themselves). The actual function handles vector inputs of length >1 through vapply(), and should maintain a reasonable level of performance by first subsetting only those elements of x where <.+> is present. Therefore, if there are only a few elements of x that require this function, performance should be acceptable; runtimes will therefore increase on an as-needed basis, and not solely as a function of length(x).

Examples

x <- "<i>in-situ</i> electron microscopy"
unescape_markup(x)

slin30/wzMisc documentation built on Jan. 27, 2023, 1 a.m.

slin30/wzMisc index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

slin30/wzMisc
Miscellaneous functions by WZ

unescape_markup: Clean up xml or html markup tags and formatting
In slin30/wzMisc: Miscellaneous functions by WZ

Clean up xml or html markup tags and formatting

Description

Usage

Arguments

Details

Value

Note

Examples

Related to unescape_markup in slin30/wzMisc...

R Package Documentation

Browse R Packages

We want your feedback!

slin30/wzMisc Miscellaneous functions by WZ

unescape_markup: Clean up xml or html markup tags and formatting In slin30/wzMisc: Miscellaneous functions by WZ

Clean up xml or html markup tags and formatting

Description

Usage

Arguments

Details

Value

Note

Examples

Related to unescape_markup in slin30/wzMisc...

R Package Documentation

Browse R Packages

We want your feedback!

slin30/wzMisc
Miscellaneous functions by WZ

unescape_markup: Clean up xml or html markup tags and formatting
In slin30/wzMisc: Miscellaneous functions by WZ