drop_tags: Drop all XML tags, or XML-like tags, from a character vector

Description Usage Arguments Value Examples

Description

The function drop.tags takes as its input a character vector x, and returns a copy of x from which all XML-like tags have been removed. Moreover, in case of the setting half_tags_too = TRUE, any half tag at the beginning or the end of x is also remove.

This function is not truly XML-aware. It uses a very simple definition of what is a ‘tag’. More specifically, any character sequence starting with < and ending with > is considered a ‘tag’. Inside such a the tag, between < and >, this function accepts any sequence of zero or more characters; all characters are accepted inside a tag, with the exception of >.

Usage

1
2
drop_tags(x,
          half_tags_too = TRUE) 

Arguments

x

the argument x contains the character vector from which the tags need to be removed.

half_tags_too

in case of the setting half_tags_too = TRUE, half tags at the beginning or the end of x are also removed; in case of the setting half_tags_too = FALSE, only complete tags are removed.

Value

Returns a character vector that is a copy of x from which all tags have been removed.

Examples

1
2
3
xml_snippet <- "id='3'/><w pos='Det'>An</w> <w pos='N'>example</w> <w"
drop_tags(xml_snippet)
drop_tags(xml_snippet, half_tags_too = FALSE)

wai-wong-reimagine/mclm documentation built on May 16, 2019, 9:12 p.m.