suppressPackageStartupMessages(library("BiocStyle"))
suppressPackageStartupMessages(library("Biostrings"))

Introduction

The goal of the r Biocpkg("rpx") package is to provide programmatic access to proteomics data from R, in particular to the ProteomeXchange (Vizcaino J.A. et al, 2014) central repository (see http://www.proteomexchange.org/ and http://central.proteomexchange.org/). Additional repositories are likely to be added in the future.

The r Biocpkg("rpx") package

PXDataset objects

The central object that handles data access is the PXDataset (version 2) class. Such an instance can be generated by passing a valid PX experiment identifier to the PXDataset() constructor.

library("rpx")
id <- "PXD000001"
px <- PXDataset(id)
px

Data and meta-data

Several attributes can be extracted from an PXDataset projects, as described below.

The experiment identifier, that was originally used to create the project can be extracted with the pxid() method:

pxid(px)

The file transfer url where the data files can be accessed can be queried with the pxurl() method:

pxurl(px)

The species the data has been generated the data can be obtain calling the pxtax() function:

pxtax(px)

Relevant bibliographic references can be queried with the pxref() method:

strwrap(pxref(px))

All files available for the PX experiment can be obtained with the pxfiles method:

pxfiles(px)

The complete or partial data set can be downloaded with the pxget() function. The function takes a project instance as first mandatory argument.

The next argument, list, specifies what files to download. If missing, a menu is printed and the user can select a file. If set to "all", all files of the experiment are downloaded. One of multiple file names, their indices or logicals can also be used to download specific files.

f <- pxget(px, "F063721.dat-mztab.txt")
f

The rpx package makes use of the r Biocpkg("BiocFileCache") package to avoid repeatedly dowloading data. When PXDataset projects are created and and project files are downloaded, they stored in the package's central or a user-defined cache. Next time the project is instantiated with PXDataset() or a project file is downloaded with pxget(), existing artefacts will be retrieve from cache, instead of being created/downloaded from the remote server again. See ?rpxCache for details about caching.

A simple use-case

Below, we download the fasta file from the PXD000001 dataset and load it with the Biostrings package.

fas <- grep("fasta", pxfiles(px), value = TRUE)
fas
f <- pxget(px, fas) ## file available in the rpx cache
f
library("Biostrings")
readAAStringSet(f)

Questions and help

Either post questions on the Bioconductor support forum or open a GitHub issue.

Session information

sessionInfo()


lgatto/rpx documentation built on Oct. 2, 2023, 9:15 p.m.