Description Usage Arguments Examples
Typically the downstream use of data is based on gene symbols. One gene symbol may map to multiple UniProt or RefSeq IDs corresponding to different isoforms. This utility simply retains only the longest isoform per gene.
1 2 3 4 5 6 | keep_longest_isoform_per_gene(
ids,
gene_id_col,
isoform_id_col,
isoform_len_col
)
|
ids |
data.frame object. Must contain 3 columns described below. |
gene_id_col |
character. Name of the column with gene IDs in the 'ids' object. |
isoform_id_col |
character. Name of the column with protein isoform IDs in the 'ids' object. |
isoform_len_col |
character. Name of the column with protein isoform lengths. |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | fasta_file_name <- system.file("extdata/FASTAs",
"rattus_norvegics_uniprot_2018_09.fasta.gz",
package = "vp.misc")
library(Biostrings)
# FASTA
fst <- readAAStringSet(fasta_file_name, format="fasta",
nrec=-1L, skip=0L, use.names=TRUE)
# extracting UniProt Accessions
names(fst) <- sub("^.*\\|(.*)\\|.*$","\\1",names(fst))
data(phospho_identifications_rat)
ids_with_sites <- map_PTM_sites(ids, fst, "UniProtAccFull", "Peptide", "*")
# Adding gene annotation. Note, this is rat data searched against UniProt.
library(dplyr)
# 10116 is rat taxonomy ID
URL <- "http://www.uniprot.org/uniprot/?query=organism:10116&columns=id,genes(PREFERRED)&format=tab"
ids_with_sites <- read.delim(URL, check.names = F, stringsAsFactors = FALSE) %>%
rename(GeneMain = "Gene names (primary )",
UniProtAcc = "Entry") %>%
inner_join(ids_with_sites, ., by="UniProtAcc")
nrow(ids_with_sites)
ids_with_sites <- keep_longest_isoform_per_gene(ids_with_sites,
"GeneMain", "UniProtAccFull", "ProtLength")
nrow(ids_with_sites)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.