knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(ncbi)

Find gene names of membrane proteins

At the NCBI website (https://www.ncbi.nlm.nih.gov/), click on the 'Gene' database, then search for 'Membrane protein', for the organism Homo sapiens, or use the API https://www.ncbi.nlm.nih.gov/gene?term=(membrane%20protein)%20AND%20homo%20sapiens%5BOrganism%5D to get 1123 hits.

Search for "((membrane protein) AND Homo sapiens[ORGN]) AND alive[prop]" in the Gene database using the rentrez R package:

all_gene_ids <- get_all_human_membrane_protein_gene_ids()
length(all_gene_ids)

Here we find 1123 matches again. We did have to expand the query, by adding AND alive[prop] to only show the alive entries, a thing the web interface does by default.

From here, we select the first six:

gene_ids <- head(all_gene_ids)
gene_ids

From our gene IDs, we can get the gene names:

gene_names <- get_gene_names_from_human_gene_ids(gene_ids)

From a gene name, find a SNP

In this example, we'll use the TNF gene.

testthat::expect_true("TNF" %in% gene_names)
gene_name <- "TNF"
testthat::expect_equal("TNF", gene_name)

On the NCBI website, use the SNP database and search for the gene name 'TNF', or use this API call: https://www.ncbi.nlm.nih.gov/snp/?term=TNF%5BGene%20Name%5D

From R, we do:

snp_ids <- get_snp_ids_from_gene_name(gene_name)
head(snp_ids)

From a SNP, get the protein sequence and location

From a SNP, that is not translated

We'll use the SNP ID, 1583051968:

testthat::expect_true("1583051968" %in% snp_ids)
hgvs <- get_snp_variations_in_protein_from_snp_id("1583051968")
tryCatch(is_hgvs_in_tmh(hgvs), error = function(e) print(e))

To get the protein sequence using the NCBI website, search for 1583051968 (or rs1583051968, the rs denotes it's a SNP), or use the API call \code{https://www.ncbi.nlm.nih.gov/snp/?term=1583051968}

Clicking on the rs takes us to https://www.ncbi.nlm.nih.gov/snp/rs1583051968. Scrolling down gives the genomic context:

As the report shows no green (nor red, nor blue) band, means that this SNP does not modify a translation product. Zooming out comfirms this:

From a SNP, that is transcripted

We'll use another SNP ID, 1583051188:

testthat::expect_true("1583051188" %in% snp_ids)
hgvs <- get_snp_variations_in_protein_from_snp_id("1583051188")
tryCatch(is_hgvs_in_tmh(hgvs), error = function(e) print(e))

At NCBI website, at https://www.ncbi.nlm.nih.gov/snp/rs1583051188 we can see it is transcribed to mRNA (but not to protein):

From a SNP, that is transcripted and translated into indentical protein

We'll use another SNP ID, 1583050033:

testthat::expect_true("1583050033" %in% snp_ids)

At https://www.ncbi.nlm.nih.gov/snp/rs1583050033 one can see there is a protein

Hovering over the protein (that is, the red bar), we see that the protein is called NP_000585.2 and that our SNP acts on the 196 amino acid.

Now, to do the same thing from R:

hgvs <- get_snp_variations_in_protein_from_snp_id("1583050033")
tryCatch(is_hgvs_in_tmh(hgvs), error = function(e) print(e))

The error message is clear: the SNP did not cause a mutation in the protein.

From a SNP, that is transcripted and translated into indentical protein

We'll use another SNP ID, 1583049783:

Here the protein is actually changed.

From this variation, we now measure where in the protein the mutation occurs:

testthat::expect_true("1583049783" %in% snp_ids)
hgvs <- get_snp_variations_in_protein_from_snp_id("1583049783")
in_in_tmh <- NA
tryCatch(in_in_tmh <- is_hgvs_in_tmh(hgvs), error = function(e) print(e))
in_in_tmh

At https://www.ncbi.nlm.nih.gov/snp/rs1583049783 one can see there is a protein

Hovering over the protein (that is, the red bar), we see that the protein is called NP_000585.2 and that our SNP acts on the 144 amino acid.



richelbilderbeek/ncbi documentation built on July 9, 2023, 3:51 a.m.