knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(ncbi)
At the NCBI website (https://www.ncbi.nlm.nih.gov/
),
click on the 'Gene' database, then search for 'Membrane protein', for the organism Homo sapiens, or use the API https://www.ncbi.nlm.nih.gov/gene?term=(membrane%20protein)%20AND%20homo%20sapiens%5BOrganism%5D
to get 1123 hits.
Search for "((membrane protein) AND Homo sapiens[ORGN]) AND alive[prop]"
in the Gene
database using the rentrez
R package:
all_gene_ids <- get_all_human_membrane_protein_gene_ids() length(all_gene_ids)
Here we find 1123 matches again. We did have to expand the query, by adding AND alive[prop]
to only show the alive entries, a thing the web interface does by default.
From here, we select the first six:
gene_ids <- head(all_gene_ids) gene_ids
From our gene IDs, we can get the gene names:
gene_names <- get_gene_names_from_human_gene_ids(gene_ids)
In this example, we'll use the TNF gene.
testthat::expect_true("TNF" %in% gene_names) gene_name <- "TNF" testthat::expect_equal("TNF", gene_name)
On the NCBI website, use the SNP database and search for the gene name 'TNF', or use this API call: https://www.ncbi.nlm.nih.gov/snp/?term=TNF%5BGene%20Name%5D
From R, we do:
snp_ids <- get_snp_ids_from_gene_name(gene_name) head(snp_ids)
We'll use the SNP ID, 1583051968:
testthat::expect_true("1583051968" %in% snp_ids) hgvs <- get_snp_variations_in_protein_from_snp_id("1583051968") tryCatch(is_hgvs_in_tmh(hgvs), error = function(e) print(e))
To get the protein sequence using the NCBI website,
search for 1583051968
(or rs1583051968
, the rs
denotes
it's a SNP),
or use the API call \code{https://www.ncbi.nlm.nih.gov/snp/?term=1583051968}
Clicking on the rs
takes us to https://www.ncbi.nlm.nih.gov/snp/rs1583051968. Scrolling down gives the genomic context:
As the report shows no green (nor red, nor blue) band, means that this SNP does not modify a translation product. Zooming out comfirms this:
We'll use another SNP ID, 1583051188:
testthat::expect_true("1583051188" %in% snp_ids) hgvs <- get_snp_variations_in_protein_from_snp_id("1583051188") tryCatch(is_hgvs_in_tmh(hgvs), error = function(e) print(e))
At NCBI website, at https://www.ncbi.nlm.nih.gov/snp/rs1583051188 we can see it is transcribed to mRNA (but not to protein):
We'll use another SNP ID, 1583050033:
testthat::expect_true("1583050033" %in% snp_ids)
At https://www.ncbi.nlm.nih.gov/snp/rs1583050033 one can see there is a protein
Hovering over the protein (that is, the red bar), we see
that the protein is called NP_000585.2
and that our
SNP acts on the 196 amino acid.
Now, to do the same thing from R:
hgvs <- get_snp_variations_in_protein_from_snp_id("1583050033") tryCatch(is_hgvs_in_tmh(hgvs), error = function(e) print(e))
The error message is clear: the SNP did not cause a mutation in the protein.
We'll use another SNP ID, 1583049783
:
Here the protein is actually changed.
From this variation, we now measure where in the protein the mutation occurs:
testthat::expect_true("1583049783" %in% snp_ids) hgvs <- get_snp_variations_in_protein_from_snp_id("1583049783") in_in_tmh <- NA tryCatch(in_in_tmh <- is_hgvs_in_tmh(hgvs), error = function(e) print(e)) in_in_tmh
At https://www.ncbi.nlm.nih.gov/snp/rs1583049783 one can see there is a protein
Hovering over the protein (that is, the red bar), we see
that the protein is called NP_000585.2
and that our
SNP acts on the 144 amino acid.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.