parse_pubchem_compound: Parse the json or xml compound data from PubChem

View source: R/15_PUBCHEM.R

parse_pubchem_compoundR Documentation

Parse the json or xml compound data from PubChem

Description

Parse the json or xml compound data from PubChem

Usage

parse_pubchem_compound(file_name)

Arguments

file_name

file name of the data.

Details

#' @title Read the xml database from download_pubchem_compound function #' @description Read the xml database from download_pubchem_compound function #' @author Xiaotao Shen #' shenxt1990@outlook.com #' @param file should be xml format #' @param path Default is .. Should be same with download_pubchem_compound function. #' @return A list #' @importFrom magrittr #' @importFrom plyr dlply . #' @importFrom readr read_delim #' @importFrom dplyr mutate bind_rows select distinct rename full_join filter #' @importFrom tidyr pivot_wider #' @importFrom purrr map #' @importFrom XML xmlParse #' @importFrom R.utils gunzip isGzipped #' @importFrom utils untar #' @importFrom xml2 read_xml #' @export read_pubchem_xml <- function(file, path = ".") if (R.utils::isGzipped(file.path(path, "data", file))) message("Uncompressing data...") R.utils::gunzip(file.path(path, "data", file)) message("Done")

message("Reading data, it may take a while...") result <- xml2::read_xml(stringr::str_replace(file.path(path, "data", file), "\.gz", "")) message("Done")

message("Parsing data, it may take a while...") result <- XML::xmlParse(result) message("Done")

result <- XML::xmlToList(result)

message("Organizing...") pb <- progress::progress_bar$new(total = length(lipidmaps))

lipidmaps_result <- seq_len(length(lipidmaps)) purrr::map(function(i) # cat(i, " ") pb$tick() x <- lipidmaps[[i]] result <- tryCatch( matrix(x[[4]], nrow = 1) as.data.frame(), error = NULL ) if (is.null(result)) return(NULL) colnames(result) <- names(x[[4]]) result ) message("Done.") return(lipidmaps_result)

Value

A data frame.

Author(s)

Xiaotao Shen shenxt1990@outlook.com


tidymass/massdatabase documentation built on Sept. 10, 2023, 10:35 p.m.