Description Usage Arguments Details Value See Also Examples
Find the first ID for each protein that matches a known UniProt ID.
1 |
dat |
data frame, protein expression data |
IDcol |
character, name of column that has the UniProt IDs |
aa_file |
character, name of file with additional amino acid compositions |
updates_file |
character, name of file with old to new ID mappings |
check_IDs
is used to check for known UniProt IDs and to update obsolete IDs.
The source IDs should be provided in the IDcol
column of dat
; multiple IDs for one protein can be separated by a semicolon.
The function keeps the first “known” ID for each protein, which must be present in one of these groups:
The human_aa
dataset of amino acid compositions.
Old UniProt IDs that are mapped to new UniProt IDs in uniprot_updates
or in updates_file
if specified.
IDs of proteins in aa_file
, which lists amino acid compositions in the format described for human_aa
(see extdata/protein/human_extra.csv
for an example and thermo$protein
for more details).
dat
is returned with possibly changed values in the column designated by IDcol
; old IDs are replaced with new ones, the first known ID for each protein is kept, then proteins with no known IDs are assigned NA
.
This function is used by the pdat_
functions, where it is called before cleanup
.
1 2 3 4 5 6 7 8 9 | # Make up some data for this example
ID <- c("P61247;PXXXXX", "PYYYYY;P46777;P60174", "PZZZZZ")
dat <- data.frame(ID = ID, stringsAsFactors = FALSE)
# Get the first known ID for each protein; the third one is NA
check_IDs(dat, "ID")
# Update an old ID
dat <- data.frame(Entry = "P50224", stringsAsFactors = FALSE)
check_IDs(dat, "Entry")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.