parsePubChemBioassay: Parse PubChem Bioassay Data

Description Usage Arguments Value Author(s) References Examples

View source: R/loadingData.R

Description

Parses a PubChem Bioassay experimental result from two required files (a csv file and an XML description) into a bioassay object.

Usage

1
2
parsePubChemBioassay(aid, csvFile, xmlFile, duplicates = "drop",
    missingCid = "drop", scoreRegex = "inhibition|ic50|ki|gi50|ec50|ed50|lc50")

Arguments

aid

The assay identifier (aid) for the assay to be parsed.

csvFile

A CSV file for a given assay, as downloaded from PubChem Bioassay.

xmlFile

An XML description file for a given assay, as downloaded from PubChem Bioassay.

duplicates

Specifies how duplicate CIDs in the same assay are treated. If 'drop' is specified, only the first of each duplicated cid is kept and a warning is returned. If 'FALSE' processing will stop with an error if duplicates are present. If 'TRUE' duplicates will be included without warning, which may cause erroneous results with other bioassayR functions that assume a unique cid list for each assay.

missingCid

A value of either 'drop' or a logical value of FALSE. If 'FALSE' processing will stop with an error for any input compounds with an empty cid string. If 'drop' is specified, a warning will be issued and these compounds will be skipped.

scoreRegex

A regular expression (perl compatible, case insensitive) to be matched to the column names in the CSV header, to identify relavent score rows. If any rows match this regex, the first matching row will be used in place of the 'PUBCHEM_ACTIVITY_SCORE' and it's row name will be stored as the assays scoring method. The default will identify most PubChem Bioassays which contain protein target inhibition data. If a matching row contains all empty or non-numeric results, the next matching row is automatically used.

Value

A bioassay object containing the loaded data.

Author(s)

Tyler Backman

References

http://pubchem.ncbi.nlm.nih.gov NCBI PubChem

Examples

1
2
3
4
5
6
7
8
## get sample data locations
extdata_dir <- system.file("extdata", package="bioassayR")
assayDescriptionFile <- file.path(extdata_dir, "exampleAssay.xml")
activityScoresFile <- file.path(extdata_dir, "exampleScores.csv")

## parse files
myAssay <- parsePubChemBioassay("1000", activityScoresFile, assayDescriptionFile)
myAssay

Example output

Loading required package: DBI
Loading required package: RSQLite
Loading required package: Matrix
Loading required package: rjson
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:Matrix':

    colMeans, colSums, rowMeans, rowSums, which

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, basename, cbind, colMeans, colSums, colnames,
    dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
    intersect, is.unsorted, lapply, lengths, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
    rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min


Attaching package: 'bioassayR'

The following objects are masked from 'package:BiocGenerics':

    organism, organism<-

class:		 bioassay 
aid:		 1000 
source_id:	 PubChem BioAssay 
assay_type:	 confirmatory 
organism:	 NA 
scoring:	 IC50 
targets:	 116516899 
target_types:	 protein 
total scores:	 57 

bioassayR documentation built on March 1, 2021, 2 a.m.