parse_batches: Read and parse doc of PubMed records and extract specified...

Description Usage Arguments Value

View source: R/parse_batches.R

Description

Read and parse doc of PubMed records and extract specified datatypes as csv's

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
parse_batches(
  input_dir,
  pmids = NULL,
  datatypes = c("table", "abstract", "databanks", "authors", "mesh", "keywords",
    "pubtypes"),
  file_name = "pubmed",
  suffix = NULL,
  dir = here::here(),
  subdir = dir,
  quiet = FALSE,
  return = FALSE
)

Arguments

input_dir

Filepath of doc of batch of unparsed PubMed records, such as the output of fetch_batch.

pmids

Vector of pmids. If pmids not user-provided, pmids will be saved as .rds.

datatypes

Types of data to extract from xml for which there is a corresponding "pubmed_" function ("table", "abstract", "databanks", "authors", "mesh", "keywords", "pubtypes")

file_name

and @param suffix If both are equal to default, doc is checked against filename patterns of files generated by fetch_batch (either "YYYY-MM-DD_FILENAME_SUFFIX.txt" or "YYYY-MM-DD_FILENAME.txt"). Available values are extracted.

suffix

Suffix for file names. For example, record numbers. Defaults to NULL.

dir

Directory for saving files (log file and pmids.rds, and extracted csv's, depending on subdir). Defaults to project root (here::here())

subdir

Directory for saving extracted csv's. Defaults to dir.

quiet

Whether to silence messages in console. Defaults to FALSE.

return

Whether to return parsed xml. Defaults to FALSE since complete batches may be too large to hold in memory and interested in only side-effect csv's. If TRUE, returns list of length number of files in input_dir with each element containing a parsed xml.

Value

Parsed xml with names = pmids. Also, side-effect of specified datatypes as csv's.


maia-sh/pubmedparser documentation built on Feb. 18, 2021, 11:44 a.m.