An R interface to the ProteomeXchange repository

suppressPackageStartupMessages(library("BiocStyle"))
suppressPackageStartupMessages(library("Biostrings"))

Introduction

The goal of the r Biocpkg("rpx") package is to provide programmatic access to proteomics data from R, in particular to the ProteomeXchange (PX) central repository (see http://www.proteomexchange.org/ and http://central.proteomexchange.org/).

Vizcaino J.A. et al. ProteomeXchange: globally co-ordinated proteomics data submission and dissemination, Nature Biotechnology 2014, 32, 223 -- 226, doi:10.1038/nbt.2839.

Additional repositories are likely to be added in the future.

The r Biocpkg("rpx") package

PXDataset objects

The central object that handles data access is the PXDataset class. Such an instance can be generated by passing a valid PX experiment identifier to the PXDataset constructor.

library("rpx")
id <- "PXD000001"
px <- PXDataset(id)
px

Data and meta-data

Several attributes can be extracted from an PXDataset instance, as described below.

The experiment identifier, that was originally used to create the PXDataset instance can be extracted with the pxid() method:

pxid(px)

The file transfer url where the data files can be accessed can be queried with the pxurl method:

pxurl(px)

The species the data has been generated the data can be obtain calling the pxtax function:

pxtax(px)

Relevant bibliographic references can be queried with the pxref method:

strwrap(pxref(px))

All files available for the PX experiment can be obtained with the pxfiles method:

pxfiles(px)

The complete or partial data set can be downloaded with the pxget() function. The function takes an instance of class PXDataset as first mandatory argument.

The next argument, list, specifies what files to download. If missing, a menu is printed and the user can select a file. If set to "all", all files of the experiment are downloaded. Alternatively, numerics or logicals can also be used to subset the relevant files to be downloaded based on the pxfiles(.) output.

f <- pxget(px, "PXD000001_mztab.txt")
f

The rpx package makes use of the r Biocpkg("BiocFileCache") package to avoid repeatedly dowloading files. When downloaded, file are cached, i.e. stored centrally in the package's cache directory. Next time the pxget() function attempts to get that file, it will be directly retrieved from the cache instead being downloaded again.

Finally, a list of recent PX additions and updates can be obtained using the pxannounced() function:

pxannounced()

A simple use-case

Below, we download the fasta file from the PXD000001 dataset and load it with the Biostrings package.

fas <- grep("fasta", pxfiles(px), value = TRUE)
fas
f <- pxget(px, fas)
f ## files available in the rpx cache
library("Biostrings")
readAAStringSet(f)

Questions and help

Either post questions on the Bioconductor support forum or open a GitHub issue.

Session information

sessionInfo()


Try the rpx package in your browser

Any scripts or data that you put into this service are public.

rpx documentation built on March 14, 2021, 6:02 p.m.