check_IDs: Check UniProt IDs
In canprot: Chemical Metrics of Differentially Expressed Proteins

Description Usage Arguments Details Value See Also Examples

Find the first ID for each protein that matches a known UniProt ID.

1	check_IDs(dat, IDcol, aa_file = NULL, updates_file = NULL)

`dat`	data frame, protein expression data
`IDcol`	character, name of column that has the UniProt IDs
`aa_file`	character, name of file with additional amino acid compositions
`updates_file`	character, name of file with old to new ID mappings

check_IDs is used to check for known UniProt IDs and to update obsolete IDs. The source IDs should be provided in the IDcol column of dat; multiple IDs for one protein can be separated by a semicolon.

The function keeps the first “known” ID for each protein, which must be present in one of these groups:

The human_aa dataset of amino acid compositions.
Old UniProt IDs that are mapped to new UniProt IDs in uniprot_updates or in updates_file if specified.
IDs of proteins in aa_file, which lists amino acid compositions in the format described for human_aa (see extdata/protein/human_extra.csv for an example and thermo$protein for more details).

dat is returned with possibly changed values in the column designated by IDcol; old IDs are replaced with new ones, the first known ID for each protein is kept, then proteins with no known IDs are assigned NA.

This function is used by the pdat_ functions, where it is called before cleanup.

# Make up some data for this example
ID <- c("P61247;PXXXXX", "PYYYYY;P46777;P60174", "PZZZZZ")
dat <- data.frame(ID = ID, stringsAsFactors = FALSE)
# Get the first known ID for each protein; the third one is NA
check_IDs(dat, "ID")

# Update an old ID
dat <- data.frame(Entry = "P50224", stringsAsFactors = FALSE)
check_IDs(dat, "Entry")