getUniprotIDs: Retrieve UniProt IDs via ID and Cluster Mappings

View source: R/drugTargetAnnotations_Fct.R

getUniprotIDsR Documentation

Retrieve UniProt IDs via ID and Cluster Mappings

Description

The following returns for a set of query IDs (e.g. Ensembl gene IDs) the corresponding UniProt IDs based on two independent approaches: ID mappings (IDMs) and sequence similarity nearest neighbors (SSNNs) using UNIREF clusters. Note, the 'keys' or query IDs (e.g. ENSEMBL genes) can only be reliably maintained in the SSNN results when 'chunksize=1' since batch queries for protein clusters with 'UnitProt.ws' will often drop the query IDs. To address this, the query result contains an extra 'QueryID' column when 'chunksize=1', but not when it is set to a different value than 1.

The getParalogs function is similar but it uses biomaRt's paralogs instead of UNIREF clusters.

Usage

getUniprotIDs(taxId = 9606, kt = "ENSEMBL", keys, seq_cluster = "UNIREF90", chunksize=20)

Arguments

taxId

An NCBI taxonomy ID

kt

Should be either "ENSEMBL" or "UNIPROTKB".

keys

Query IDs.

seq_cluster

Which cluster to use. Should be one of 'UNIREF100', 'UNIREF90', 'UNIREF50'.

chunksize

Queries are done in batches, this parameter sets the size of each batch.

Value

Returns a list of data.

Author(s)

Thomas Girke

See Also

getParalogs UniProt.ws

Examples


	 
		 keys <- c("ENSG00000145700", "ENSG00000135441", "ENSG00000120071", "ENSG00000120088", "ENSG00000185829", "ENSG00000185829", "ENSG00000185829", "ENSG00000238083", "ENSG00000012061", "ENSG00000104856", "ENSG00000104936", "ENSG00000117877", "ENSG00000130202", "ENSG00000130202", "ENSG00000142252", "ENSG00000189114", "ENSG00000234906") 
		 res_list100 <- getUniprotIDs(taxId=9606, kt="ENSEMBL", keys=keys, seq_cluster="UNIREF100") 
	


girke-lab/drugTargetInteractions documentation built on Oct. 10, 2022, 10:35 p.m.