epm_parse: Extract Information from a Raw PubMed Record.

View source: R/epm_all_fx.R

epm_parseR Documentation

Extract Information from a Raw PubMed Record.

Description

Read a raw PubMed record, identify XML tags, extract information and cast it into a structured data.frame. The expected input is an XML-tag-decorated string corresponding to a single PubMed record. Information about article title, authors, affiliations, journal name and abbreviation, publication date, references, and keywords are returned.

Usage

epm_parse(
  x,
  max_authors = 10,
  autofill_address = TRUE,
  compact_output = TRUE,
  include_abstract = TRUE,
  max_references = 150,
  ref_id_type = "doi",
  verbose = TRUE
)

Arguments

x

An 'easyPubMed' object. The object must include raw records (n>0) downloaded in the 'xml' format.

max_authors

Numeric, maximum number of authors to retrieve. If this is set to -1, only the last author is extracted. If this is set to 1, only the first author is returned. If this is set to 2, the first and the last authors are extracted. If this is set to any other positive number (i), up to the leading (n-1) authors are retrieved together with the last author. If this is set to a number larger than the number of authors in a record, all authors are returned. Note that at least 1 author has to be retrieved, therefore a value of 0 is not accepted (coerced to -1).

autofill_address

Logical, shall author affiliations be propagated within each record to fill missing values.

compact_output

Logical, shall record data be returned in a compact format where each row is a single record and author names are collapsed together. If 'FALSE', each row corresponds to a single author of the publication and the record-specific data are recycled for all included authors (legacy approach).

include_abstract

Logical, shall abstract text be included in the output data.frame. If 'FALSE', the abstract text column is populated with a missing value.

max_references

Numeric, maximum number of references to return (for each PubMed record).

ref_id_type

String, must be one of the following values: ‘c(’pmid', 'doi')'. Type of identifier used to describe citation references.

verbose

Logical, shall details about the progress of the operation be printed to console.

Value

an easyPubMed object including a data.frame ('data' slot) that stores information extracted from its raw XML PubMed records.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

# Note: a time limit can be set in order to kill the operation when/if 
# the NCBI/Entrez server becomes unresponsive.
setTimeLimit(elapsed = 4.9)
try({
  x <- epm_query(query_string = 'Damiano Fantini[AU] AND "2018"[PDAT]')
  x <- epm_fetch(x = x, format = 'xml')
  x <- epm_parse(x, include_abstract = FALSE, max_authors = 1)
  get_epm_data(x)
}, silent = TRUE)
setTimeLimit(elapsed = Inf)
 


dami82/easyPubMed documentation built on Jan. 4, 2024, 6:21 a.m.