separate_text: Separate all matching text into multiple rows
In tidypmc: Parse Full Text XML Documents from PubMed Central

separate_text

R Documentation

Separate all matching text into multiple rows

Description

Separate all matching text into multiple rows

Usage

separate_text(txt, pattern, column = "text")

Arguments

`txt`	a tibble, usually results from `pmc_text`
`pattern`	either a regular expression or a vector of words to find in text
`column`	column name, default "text"

Value

a tibble

Note

passed to grepl and str_extract_all

Author(s)

Chris Stubben

Examples

# doc <- pmc_xml("PMC2231364")
doc <- xml2::read_xml(system.file("extdata/PMC2231364.xml",
        package = "tidypmc"))
txt <- pmc_text(doc)
separate_text(txt, "[ATCGN]{5,}")
separate_text(txt, "\\([A-Z]{3,6}s?\\)")
# pattern can be a vector of words
separate_text(txt, c("hmu", "ybt", "yfe", "yfu"))
# wrappers for separate_text with extra step to expand matched ranges
separate_refs(txt)
separate_tags(txt, "YPO")

tidypmc documentation built on Sept. 11, 2024, 7:17 p.m.