epm_parse_record: Extract Information from a Raw PubMed Record.

View source: R/epm_all_fx.R

epm_parse_recordR Documentation

Extract Information from a Raw PubMed Record.

Description

Read a raw PubMed record, identify XML tags, extract information and cast it into a structured 'data.frame'. The expected input is an XML-tag-decorated string corresponding to a single PubMed record. Information about article title, authors, affiliations, journal name and abbreviation, publication date, references, and keywords are returned.

Usage

epm_parse_record(
  pubmedArticle,
  max_authors = 15,
  autofill_address = TRUE,
  compact_output = TRUE,
  include_abstract = TRUE,
  max_references = 1000,
  ref_id_type = "pmid"
)

Arguments

pubmedArticle

String, this is an XML-tag-decorated raw PubMed record.

max_authors

Numeric, maximum number of authors to retrieve. If this is set to -1, only the last author is extracted. If this is set to 1, only the first author is returned. If this is set to 2, the first and the last authors are extracted. If this is set to any other positive number (i), up to the leading (n-1) authors are retrieved together with the last author. If this is set to a number larger than the number of authors in a record, all authors are returned. Note that at least 1 author has to be retrieved, therefore a value of 0 is not accepted (coerced to -1).

autofill_address

Logical, shall author affiliations be propagated within each record to fill missing values.

compact_output

Logical, shall record data be returned in a compact format where each row is a single record and author names are collapsed together. If 'FALSE', each row corresponds to a single author of the publication and the record-specific data are recycled for all included authors.

include_abstract

Logical, shall abstract text be included in the output data.frame. If 'FALSE', the abstract text column is populated with a missing value.

max_references

Numeric, maximum number of references to return (for each PubMed record).

ref_id_type

String, must be one of the following values: ‘c(’pmid', 'doi')'. Type of identifier used to describe citation references.

Value

a data.frame including information extracted from a raw XML PubMed record.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

data(epm_samples)
x <- epm_samples$bladder_cancer_2018$demo_data_03$raw[[1]]
epm_parse_record(x)





dami82/easyPubMed documentation built on Jan. 4, 2024, 6:21 a.m.