PXDataset2: New PXDataset (v2) to find and download proteomics data
In lgatto/rpx: R Interface to the ProteomeXchange Repository

View source: R/px2.R

PXDataset2

R Documentation

New PXDataset (v2) to find and download proteomics data

Description

The rpx package provides the infrastructure to access, store and retrieve information for ProteomeXchange (PX) data sets. This can be achieved with PXDataset2 objects can be created with the PXDataset2() constructor that takes the unique ProteomeXchange project identifier as input.

The new PXDataset2 class superseeds the previous and now deprecated PXDataset version.

Usage

PXDataset2(id, cache = rpxCache())

PXDataset(id, cache = rpxCache())

## S4 method for signature 'PXDataset2'
pxid(object)

## S4 method for signature 'PXDataset2'
pxurl(object)

## S4 method for signature 'PXDataset2'
pxtax(object)

## S4 method for signature 'PXDataset2'
pxref(object)

pxtitle(object)

pxinstruments(object)

pxSubmissionDate(object)

pxPublicationDate(object)

pxptms(object)

pxprotocols(object, which = c("project", "samples", "data"))

## S4 method for signature 'PXDataset2'
pxfiles(object, n = 10, as.vector = TRUE)

## S4 method for signature 'PXDataset2'
pxCacheInfo(object)

## S4 method for signature 'PXDataset2'
pxget(object, list, cache = rpxCache())

Arguments

`id`	`character(1)` containing a valid ProteomeXchange identifier.
`cache`	Object of class `BiocFileCache`. Default is to use the central `rpx` cache returned by `rpxCache()`, but users can use their own cache. See `rpxCache()` for details.
`object`	An instance of class `PXDataset2`.
`which`	`character()` with one or multiple protocols defined as `"project"`, `"samples"` and `"data"`.
`n`	`integer(1)` indicating the number of files to be printed.
`as.vector`	`logical(1)` defining if the output should be a vector of character with filenames (default) or a data.frame with additional details about each file.
`list`	`character()`, `numeric()` or `logical()` defining the project files to be downloaded. This list of files can retrieved with `pxfiles()`.

Details

The rpx packages uses caching to store ProteomeXchange projects and project files. When creating an object with PXDataset2(), the cache is first queried for the projects identifier. If a unique hit is found, the project is retrieved and returned. If no matching project identifier is found, then the remote resource is accessed to first create the new PXDataset2() project, then cache it before returning it to the user. The same mechanism is applied when project files are requested.

Caching is supported by BiocFileCache package. The PXDataset2() constructor and the px_get() function can be passed a instance of class BiocFileCache that defines the cache. The default is to use the package-wide cache defined in rpxCache(). For more details on how to manage the cache (for example if some files need to be deleted), please refer to the BiocFileCache package vignette and documentation. See also rpxCache() for additional details.

Value

The PXDataset2() returns a cached PXDataset2 object. It thus also modifies the cache used to projet caching, as defined by the cache argument.

Slots

px_id: character(1) containing the dataset's unique ProteomeXchange identifier, as used to create the object.
px_rid: character(1) storing the cached resource name in the BiocFileCache instance stored in cachepath.
px_title: character(1) with the project's title.
px_url: ‘character(1) with the project’s URL.
px_doi: character(1) with the project's DOI.
px_ref: character containing the project's reference(s).
px_ref_doi: character containing the project's reference DOIs.
px_pubmed: character containing the project's reference PubMed identifier.
px_files: data.frame containing information about the project files, including file names, URIs and types. The files are retrieved from the project's README.txt file.
px_tax: charcter (typically of length 1) containing the taxonomy of the sample.
px_metadata: list containing the project's metadata, as downloaded from the ProteomeXchange site. All slots but px_files are populated from this one.
cachepath: character(1) storing the path to the cache the project object is stored in.

Accessors

pxfiles(object, n = 10, as.vector = TRUE) by default, invisibly returns all the project file names. The function prints the first n files specifying whether they are local of remote (based on the cache the object is stored in). The printing can be ignored by wrapping the call in suppressMessages(). If as.vector is set to FALSE, it returns a data.frame with variables ID, NAME, URI, TYPE, MAPPINGS and PXID. Note that the variables and their content will depend on the rpx version that was installed when these objects were created and cached.
pxget(object, list, cache): list is a vector defining the files to be downloaded. If list = "all", all files are downloaded. The file names, as returned by pxfiles() can also be used. Alternatively, a logical or numeric index can be used. If missing, the file to be downloaded can be selected from a menu.

The argument cache can be passed to define the path to the cache. The default cache is the packages' default as returned by rpxCache().
pxtax(object): returns the taxonomic name of object.
pxurl(object): returns the base url on the ProteomeXchange server where the project files reside.
⁠pxCacheInfo(object, cache): prints and invisibly returns ⁠object⁠'s caching information from ⁠cache⁠(default is⁠rpxCache()'). The return value is a named vector of length two containing the resourne identifier and the cache location.
‘pxtitle(object): returns the project’s title.
pxref(object): returns the project's bibliographic reference(s).
pxinstruments(object): returns the instrument(s) used to acquire the data.
pxptms(object): returns the PTMs searched for in the experiment.
pxprotocols(object, which): returns a list with the project description, sample processing and/or data processing protocols.

Author(s)

Laurent Gatto

References

Vizcaino J.A. et al. 'ProteomeXchange: globally co-ordinated proteomics data submission and dissemination', Nature Biotechnology 2014, 32, 223 – 226, doi:10.1038/nbt.2839.

Source repository for the ProteomeXchange project: https://code.google.com/p/proteomexchange/

Examples


px <- PXDataset("PXD000001")
px
pxtax(px)
pxurl(px)
pxref(px)
pxfiles(px)
pxfiles(px, as.vector = FALSE)

pxCacheInfo(px)

fas <- pxget(px, "erwinia_carotovora.fasta")
fas
library("Biostrings")
readAAStringSet(fas)

lgatto/rpx documentation built on Feb. 11, 2025, 3:31 a.m.