Description Objects from the Class Details Developement notes Slots Extends Methods Functions Author(s) References See Also Examples
The Proteins class encapsulates data and meta-data for
proteomics experiments. The class stores the protein sequences as well
as specific subsets of interest, typically peptides, as ranges. The
Proteins instances, the sequence and peptide slots are
described by their respective metadata attributes.
Objects can be created using its constructor Proteins. The
constructor either takes a fasta file name as first argument,
an EnsDb object or a named uniprotIds
argument with valid UniProt accession numbers (not yet implemented).
The Proteins constructor with the EnsDb loads protein data
directly from the EnsDb object. The additional arguments
filter, loadProteinDomains, columns and
fetchLRG allow to additionally specify if only proteins
matching a certain criteria should be fetched, whether all protein
domains should be added as pranges, optional additional
annotation columns that should be retrieved and whether proteins from
Locus Reference Genes (LRG) should also be retrieved from the database.
An instance of class Proteins is characterised by one or
multiple protein sequences that can be accessed as AAStringSet
with the aa accessor. Sequence-specific annotation, such as
accession numbers, protein and gene names, ... is available with the
acols method. General metadata such as the
data of creation of the instance are stored as a list returned
by the metadata accessor, which would typically contain a
created character that documents when the object was created, a
reference genome descriptor, a UniProtRelease with the
release data of the UniProt database and possibly others.
Each sequence of a Proteins instance can also be characterised
by a set of specific ranges describing peptides of interest. These
peptide features can be extracted as an AAStringSetList,
where each protein sequence contains 0 or more peptide features. These
peptides features are encode as ranges along the original proteins
sequences (a list of IRanges) that can be extracted with
the pranges function. These peptide features have their own
metadata describing for example peptide identification scores, number
of missed cleavages, ... available with the pcols methods.
See also the Pbase-data vignette.
The Proteins constructor with argument file being an
EnsDb object allows to retrieve protein
sequences along with all their related protein domains from an
EnsDb annotation database. The optional filter argument
can be used to fetch only proteins matching the defined filtering criteria
from the database. The filter argument takes an object
extending the AnnotationFilter
class, an AnnotationFilterList
combining such objects or a filter expression in form of a
formula. See the
AnnotationFilter and
proteins documentation for more details.
Additional annotation columns from the database that should be
retrieved from the database and included into acols can be
specified with the columns argument. The
listColumns can be used to list all available
annotation columns from the database.
Ensembl protein IDs will be used as the names of the returned
Proteins object. See the vignette from the ensembldb
package for an overview of supported filters or below for some
examples.
Since version 0.2.0, addIdentificationData supports multiple
identification file names to be added to a Proteins instance
(argument renamed filenames) using either mzID or
mzR. Added new Pparams parametrisation infrastructure.
See news(package = "Pbase") for a description of all changes.
Other possible metadata fields: Uniprot.sw, biomaRt
instances.
metadata: Object of class "list" containing
global metadata, accessed with metadata.
aa: Object of class "AAStringSet" storing the
protein sequences, accessed with aa.
.__classVersion__: Object of class "Versions"
documenting the class verions. Intended for developer use and
debugging.
Class "Versioned", directly.
signature(x = "Proteins"): Returns an
AAStringSet instance representing the
sequences of the proteins.
signature(x = "Proteins"): ...
signature(x = "Proteins"): ...
signature(x = "Proteins"): Returns a
list of global metadata of the instance x, including
data of instance creation or, if created from a set of UnitProt
identifiers (see constructors above), the UniProt version and
UnitProt.WS version number.
signature(x = "Proteins"): Returns a
DataFrame of protein metadata.
signature(x = "Proteins"): Returns a
list of feature metadata.
signature(x = "Proteins"): Returns the
names of the sequences metadata.
signature(x = "Proteins"): Returns the
names of the peptide feature metadata.
signature(x = "Proteins"): Returns the
protein sequence names defined as UniProt accession numbers.
signature(x = "Proteins"): Returns the
protein sequence names defined as UniProt accession numbers.
It is just a synonym for seqnames.
signature(x = "Proteins"): Returns the number
of proteins.
signature(x = "Proteins", i = "ANY", j = "missing"):
Creates a subset of the Proteins instance.
signature(x = "Proteins", i = "ANY", j = "missing"):
Returns an AAString instance representing the
sequence of the selected protein.
signature(x = "Proteins", mass = "numeric", len
= "numeric", ...): ...
signature(x = "Proteins", enzym = "character",
missedCleavages = "numeric"):
Cleaves all proteins using the enzym rule while allowing
missedCleavages missing cleavages. Please see
cleave for details.
signature(object = "Proteins",
id = "character", rmEmptyRanges = "logical", par =
"Pparams"): Adds identification data from an IdentMzMl file (id)
to the Proteins object. If rmEmptyRanges is TRUE
proteins without any identification data are removed. See
Pparams for further settings.
signature(object = "Proteins",
filenames = "character", rmEmptyRanges = "logical", par =
"Pparams"): Adds identification data from a fasta file (filenames)
to the Proteins object. Please note that both fasta files (the origin
of the Proteins object and the ones given in filenames) must
share the same Uniprot accession numbers.
If rmEmptyRanges is TRUE
proteins without any identification data are removed. See
Pparams for further settings.
signature(x = "Proteins", y = "missing"): Plots
all proteins and associated peptides using the
Gviz/Pviz infrastructure.
signature(object = "Proteins"): Displays object
summary as text.
signature(x = "Proteins")
: removes proteins with empty peptide ranges.
signature(x = "Proteins"): returns a
modified Proteins object. pcols(x) gains a "Proteotypic"
logical column, indicating of the peptide is proteotypic or now.
signature(pattern = "Proteins"):
calulates the coverage of proteins. pcols(x) gains a
"Coverage" numeric column.
signature(x = "Proteins", missedCleavages =
"numeric"): Tests whether a Protein object was cleaved
already.
Laurent Gatto <lg390@cam.ac.uk>, Sebastian Gibb <mail@sebastiangibb.de> and Johannes Rainer <johannes.rainer@eurac.edu>
Definition of the UniProt fasta comment format: http://www.uniprot.org/help/fasta-headers
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | ## Create a Protein object reading all proteins from a fasta file.
fastaFiles <- list.files(system.file("extdata", package = "Pbase"),
pattern = "fasta", full.names = TRUE)
p <- Proteins(fastaFiles)
p
metadata(p)
## Adding custom metadata
metadata(p, "Comment") <- "I love R"
metadata(p)
## Plotting
plot(p[1:5], from = 1, to = 30)
## Cleaving
pp <- cleave(p[1:100])
pp <- proteotypic(pp)
pp
pcols(pp[1:2])
plot(pp[1:2], from = 20, to = 30)
## Protein coverage
pp <- proteinCoverage(pp)
avarLabels(pp)
acols(pp)$Coverage
pp
## Add indentification data
idfile <- system.file("extdata/Thermo_Hela_PRTC_selected.mzid",
package = "Pbase")
p <- addIdentificationData(p, idfile)
pranges(p)
pfeatures(p)
plot(p[1])
plot(p[1], # the first protein has 36 peptides
fill = c(rep("orange", 13), rep("steelblue", 13)))
## Retrieve a Proteins object from an EnsDb object: first load the annotation
## database for all human genes defined in Ensembl version 86.
library(ensembldb)
library(EnsDb.Hsapiens.v86)
edb <- EnsDb.Hsapiens.v86
## Define a filter to retrieve all genes from chromosome Y
sqnf <- SeqNameFilter("Y")
## Retrieve the proteins without protein domains but specify to retrieve in
## addition the transcript biotype for the encoding transcripts and the gene
## names
prts <- Proteins(edb, filter = sqnf, loadProteinDomains = FALSE,
columns = c("tx_biotype", "gene_name"))
aa(prts)
acols(prts)
## The listColumns method lists all available columns from the database.
listColumns(edb)
## Load all proteins from the gene ZBTB16 including all protein domains from
## the database. Here we pass the filter criteria as a formula to the method
prts <- Proteins(edb, filter = ~ gene_name == "ZBTB16")
## List available pranges
pcols(prts)
## Access the protein domains
pcols(prts)$ProteinDomains
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.