pmc_text | R Documentation |
Split section paragraph tags into a table with subsection titles and
sentences using tokenize_sentences
pmc_text(doc, sentence = TRUE)
doc |
|
sentence |
split paragraphs into sentences, default TRUE |
a tibble with section, paragraph and sentence number and text
Subsections may be nested to arbitrary depths and this function will return the entire path to the subsection title as a delimited string like "Results; Predicted functions; Pathogenicity". Tables, figures and formulas that are nested in section paragraphs are removed, superscripted references are replaced with brackets, and any other superscripts or subscripts are separared with ^ and _.
Chris Stubben
# doc <- pmc_xml("PMC2231364")
doc <- xml2::read_xml(system.file("extdata/PMC2231364.xml",
package = "tidypmc"
))
txt <- pmc_text(doc)
txt
dplyr::count(txt, section, sort = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.