clean_html: Clean HTML and whitespace from a string

Description Usage Arguments Value


This function uses regex extensively to clean HTML out of a given text block. "(&[a-z]*;|<.*?>)" is the first regular expression used. It matches a substring that starts with & and ends with ; with lower case letters between them, or a substring with < and > on each side, with any characters between. Each matched substring is replaced with a space character. The next regex is "\s+". It matches multiple characters of whitespace, and reduces them to a single space character. The last regex used is "^\s+|\s+$". It matches whitespace at the beginning or end of the text and removes it.





any text string that might contain HTML or whitespace that needs stripped.


text without any html or extraneous whitespace.

ctesta01/QualtricsTools documentation built on May 14, 2019, 12:27 p.m.