View source: R/extract_metal_binders.R
extract_metal_binders | R Documentation |
Information of metal binding proteins is extracted from UniProt data retrieved with
fetch_uniprot
as well as QuickGO data retrieved with fetch_quickgo
.
extract_metal_binders(
data_uniprot,
data_quickgo,
data_chebi = NULL,
data_chebi_relation = NULL,
data_eco = NULL,
data_eco_relation = NULL,
show_progress = TRUE
)
data_uniprot |
a data frame containing at least the |
data_quickgo |
a data frame containing molecular function gene ontology information for at
least the proteins of interest. This data should be obtained by calling |
data_chebi |
optional, a data frame that can be manually obtained with |
data_chebi_relation |
optional, a data frame that can be manually obtained with
|
data_eco |
optional, a data frame that contains evidence and conclusion ontology data that can be
obtained by calling |
data_eco_relation |
optional, a data frame that contains relational evidence and conclusion
ontology data that can be obtained by calling |
show_progress |
a logical value that specifies if progress will be shown (default is TRUE). |
A data frame containing information on protein metal binding state. It contains the following columns:
accession
: UniProt protein identifier.
most_specific_id
: ChEBI ID that is most specific for the position after combining information from all sources.
Can be multiple IDs separated by "," if a position appears multiple times due to multiple fitting IDs.
most_specific_id_name
: The name of the ID in the most_specific_id
column. This information is based on
ChEBI.
ligand_identifier
: A ligand identifier that is unique per ligand per protein. It consists of the ligand ID and
ligand name. The ligand ID counts the number of ligands of the same type per protein.
ligand_position
: The amino acid position of the residue interacting with the ligand.
binding_mode
: Contains information about the way the amino acid residue interacts with the ligand. If it is
"covalent" then the residue is not in contact with the metal directly but only the cofactor that binds the metal.
metal_function
: Contains information about the function of the metal. E.g. "catalytic".
metal_id_part
: Contains a ChEBI ID that identifiers the metal part of the ligand. This is always the metal atom.
metal_id_part_name
: The name of the ID in the metal_id_part
column. This information is based on
ChEBI.
note
: Contains notes associated with information based on cofactors.
chebi_id
: Contains the original ChEBI IDs the information is based on.
source
: Contains the sources of the information. This can consist of "binding", "cofactor", "catalytic_activity"
and "go_term".
eco
: If there is evidence the annotation is based on it is annotated with an ECO ID, which is split by source.
eco_type
: The ECO identifier can fall into the "manual_assertion" group for manually curated annotations or the
"automatic_assertion" group for automatically generated annotations. If there is no evidence it is annotated as
"automatic_assertion". The information is split by source.
evidence_source
: The original sources (e.g. literature, PDB) of evidence annotations split by source.
reaction
: Contains information about the chemical reaction catalysed by the protein that involves the metal.
Can contain the EC ID, Rhea ID, direction specific Rhea ID, direction of the reaction and evidence for the direction.
go_term
: Contains gene ontology terms if there are any metal related ones associated with the annotation.
go_name
: Contains gene ontology names if there are any metal related ones associated with the annotation.
assigned_by
: Contains information about the source of the gene ontology term assignment.
database
: Contains information about the source of the ChEBI annotation associated with gene ontology terms.
For each protein identifier the data frame contains information on the bound ligand as well as on its position if it is known.
Since information about metal ligands can come from multiple sources, additional information (e.g. evidence) is nested in the returned
data frame. In order to unnest the relevant information the following steps have to be taken: It is
possible that there are multiple IDs in the "most_specific_id" column. This means that one position cannot be uniquely
attributed to one specific ligand even with the same ligand_identifier. Apart from the "most_specific_id" column, in
which those instances are separated by ",", in other columns the relevant information is separated by "||". Then
information should be split based on the source (not the source
column, that one can be removed from the data
frame). There are certain columns associated with specific sources (e.g. go_term
is associated
with the "go_term"
source). Values of columns not relevant for a certain source should be replaced with NA
.
Since a most_specific_id
can have multiple chebi_id
s associated with it we need to unnest the chebi_id
column and associated columns in which information is separated by "|". Afterwards evidence and additional information can be
unnested by first splitting data for ";;" and then for ";".
# Create example data
uniprot_ids <- c("P00393", "P06129", "A0A0C5Q309", "A0A0C9VD04")
## UniProt data
data_uniprot <- fetch_uniprot(
uniprot_ids = uniprot_ids,
columns = c(
"ft_binding",
"cc_cofactor",
"cc_catalytic_activity"
)
)
## QuickGO data
data_quickgo <- fetch_quickgo(
id_annotations = uniprot_ids,
ontology_annotations = "molecular_function"
)
## ChEBI data (2 and 3 star entries)
data_chebi <- fetch_chebi(stars = c(2, 3))
data_chebi_relation <- fetch_chebi(relation = TRUE)
## ECO data
eco <- fetch_eco()
eco_relation <- fetch_eco(return_relation = TRUE)
# Extract metal binding information
metal_info <- extract_metal_binders(
data_uniprot = data_uniprot,
data_quickgo = data_quickgo,
data_chebi = data_chebi,
data_chebi_relation = data_chebi_relation,
data_eco = eco,
data_eco_relation = eco_relation
)
metal_info
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.