PXDataset2: New PXDataset (v2) to find and download proteomics data

View source: R/px2.R

PXDataset2R Documentation

New PXDataset (v2) to find and download proteomics data

Description

The rpx package provides the infrastructure to access, store and retrieve information for ProteomeXchange (PX) data sets. This can be achieved with PXDataset2 objects can be created with the PXDataset2() constructor that takes the unique ProteomeXchange project identifier as input.

The new PXDataset2 class superseeds the previous and now deprecated PXDataset version.

Usage

PXDataset2(id, cache = rpxCache())

PXDataset(id, cache = rpxCache())

## S4 method for signature 'PXDataset2'
pxid(object)

## S4 method for signature 'PXDataset2'
pxurl(object)

## S4 method for signature 'PXDataset2'
pxtax(object)

## S4 method for signature 'PXDataset2'
pxref(object)

pxtitle(object)

pxinstruments(object)

pxSubmissionDate(object)

pxPublicationDate(object)

pxptms(object)

pxprotocols(object, which = c("project", "samples", "data"))

## S4 method for signature 'PXDataset2'
pxfiles(object, n = 10, as.vector = TRUE)

## S4 method for signature 'PXDataset2'
pxCacheInfo(object)

## S4 method for signature 'PXDataset2'
pxget(object, list, cache = rpxCache())

Arguments

id

character(1) containing a valid ProteomeXchange identifier.

cache

Object of class BiocFileCache. Default is to use the central rpx cache returned by rpxCache(), but users can use their own cache. See rpxCache() for details.

object

An instance of class PXDataset2.

which

character() with one or multiple protocols defined as "project", "samples" and "data".

n

integer(1) indicating the number of files to be printed.

as.vector

logical(1) defining if the output should be a vector of character with filenames (default) or a data.frame with additional details about each file.

list

character(), numeric() or logical() defining the project files to be downloaded. This list of files can retrieved with pxfiles().

Details

The rpx packages uses caching to store ProteomeXchange projects and project files. When creating an object with PXDataset2(), the cache is first queried for the projects identifier. If a unique hit is found, the project is retrieved and returned. If no matching project identifier is found, then the remote resource is accessed to first create the new PXDataset2() project, then cache it before returning it to the user. The same mechanism is applied when project files are requested.

Caching is supported by BiocFileCache package. The PXDataset2() constructor and the px_get() function can be passed a instance of class BiocFileCache that defines the cache. The default is to use the package-wide cache defined in rpxCache(). For more details on how to manage the cache (for example if some files need to be deleted), please refer to the BiocFileCache package vignette and documentation. See also rpxCache() for additional details.

Value

The PXDataset2() returns a cached PXDataset2 object. It thus also modifies the cache used to projet caching, as defined by the cache argument.

Slots

px_id

character(1) containing the dataset's unique ProteomeXchange identifier, as used to create the object.

px_rid

character(1) storing the cached resource name in the BiocFileCache instance stored in cachepath.

px_title

character(1) with the project's title.

px_url

‘character(1) with the project’s URL.

px_doi

character(1) with the project's DOI.

px_ref

character containing the project's reference(s).

px_ref_doi

character containing the project's reference DOIs.

px_pubmed

character containing the project's reference PubMed identifier.

px_files

data.frame containing information about the project files, including file names, URIs and types. The files are retrieved from the project's README.txt file.

px_tax

charcter (typically of length 1) containing the taxonomy of the sample.

px_metadata

list containing the project's metadata, as downloaded from the ProteomeXchange site. All slots but px_files are populated from this one.

cachepath

character(1) storing the path to the cache the project object is stored in.

Accessors

  • pxfiles(object, n = 10, as.vector = TRUE) by default, invisibly returns all the project file names. The function prints the first n files specifying whether they are local of remote (based on the cache the object is stored in). The printing can be ignored by wrapping the call in suppressMessages(). If as.vector is set to FALSE, it returns a data.frame with variables ID, NAME, URI, TYPE, MAPPINGS and PXID. Note that the variables and their content will depend on the rpx version that was installed when these objects were created and cached.

  • pxget(object, list, cache): list is a vector defining the files to be downloaded. If list = "all", all files are downloaded. The file names, as returned by pxfiles() can also be used. Alternatively, a logical or numeric index can be used. If missing, the file to be downloaded can be selected from a menu.

    The argument cache can be passed to define the path to the cache. The default cache is the packages' default as returned by rpxCache().

  • pxtax(object): returns the taxonomic name of object.

  • pxurl(object): returns the base url on the ProteomeXchange server where the project files reside.

  • ⁠pxCacheInfo(object, cache): prints and invisibly returns ⁠object⁠'s caching information from ⁠cache⁠(default is⁠rpxCache()'). The return value is a named vector of length two containing the resourne identifier and the cache location.

  • ‘pxtitle(object): returns the project’s title.

  • pxref(object): returns the project's bibliographic reference(s).

  • pxinstruments(object): returns the instrument(s) used to acquire the data.

  • pxptms(object): returns the PTMs searched for in the experiment.

  • pxprotocols(object, which): returns a list with the project description, sample processing and/or data processing protocols.

Author(s)

Laurent Gatto

References

Vizcaino J.A. et al. 'ProteomeXchange: globally co-ordinated proteomics data submission and dissemination', Nature Biotechnology 2014, 32, 223 – 226, doi:10.1038/nbt.2839.

Source repository for the ProteomeXchange project: https://code.google.com/p/proteomexchange/

Examples


px <- PXDataset("PXD000001")
px
pxtax(px)
pxurl(px)
pxref(px)
pxfiles(px)
pxfiles(px, as.vector = FALSE)

pxCacheInfo(px)

fas <- pxget(px, "erwinia_carotovora.fasta")
fas
library("Biostrings")
readAAStringSet(fas)

lgatto/rpx documentation built on Oct. 2, 2023, 9:15 p.m.