txx_text_to_xml: Transform semi-structured text into an XML structure

Description Usage Arguments Value Examples

View source: R/txx_text_to_xml.R

Description

Transforms a character string into an XML structure by identifying (known) words or phrases that indicates that a new section has started. These words or phrases (known as tags) need to be entered beforehand. Outputs can then be analyzed in other packages such as XML or xml2, to extract the interested sections.

Usage

1
txx_text_to_xml(strings, tags)

Arguments

strings

A vector of character strings containing the semi-structured text. Each string should represent a single entry (e.g. a single letter).

tags

The character strings that identify that a section has started, e.g. "Diagnosis:" or "SUMMARY:". This may be variable from letter to letter, include all variants. Order matters; the first strings are searched for first - if there are tags that are contained within larger tags, put the larger tag first so that it is used in the string searches first.

Value

A vector of character strings with the text transformed into an XML structure.

Examples

1
2
3
4
5
6
7
txx_text_to_xml(strings = "Name: Alice Age:40 Address:43 Maple Street",
                tags = c("Name:", "Age:", "Address:"))
                
txx_text_to_xml(strings = 
                    c("Name: Alice Age:40 Address:43 Maple Street",
                      "Name: Bob Address: 44 Maple Street Age:41 Weight:100kg"),
                tags = c("Name:", "Age:", "Address:", "Weight:"))

michael-ccccc/textured documentation built on Dec. 21, 2021, 5:56 p.m.