In richelbilderbeek/ncbi: Does Stuff

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(ncbi)
set.seed(42)

Overview

Find human membrane proteins
Pick a human membrane protein
Find related proteins
Align the protein sequences
Determine the topology of all protein sequences
Tally the mutations that have the same location in all topologies

Find human membrane proteins

Either go to https://www.ncbi.nlm.nih.gov/protein?term=(membrane%20protein)%20AND%20homo%20sapiens%5BPrimary%20Organism%5D or run:

Sys.sleep(16)
ids <- search_for_human_membrane_proteins()
ids

Pick a human membrane protein

Pick a human membrane protein by clicking it on the web interface.

In R, we pick a random one:

id <- sample(x = ids, size = 1)
id

Find related proteins

In the web interface, click on 'Run BLAST' to start blastp, then run blastp and wait.

In R, we

Fetch the protein sequence
Run blastp on it

Fetch the protein sequence from the protein ID:

sequence <- sprentrez::fetch_sequence_from_protein_id(id)
sequence

In the web interface, click on 'Run BLAST', https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins&PROGRAM=blastp&BLAST_PROGRAMS=blastp&QUERY=BAA86974.1&LINK_LOC=protein&PAGE_TYPE=BlastSearch

Within R, run blastp by calling bio3d::blast.pdb on the protein sequence

if (1 == 2) {
  blastp_result <- bio3d::blast.pdb(sequence)
}

Show the results:

if (1 == 2) {
  t <- blastp_result$hit.tbl
  t <- t[order(-t$identity), ] # Best results first
  knitr::kable(head(t))
}

Select the proteins that have 80% or more of the max score

Now we select protein sequences that are similar enough. By 'similar enough', in this context, it means that the related sequences should differ only a couple of amino acids.

For now, we keep all...

if (1 == 2) {

  t <- t[order(-t$identity), ] # Best results first
  knitr::kable(head(t))
}

Get the IDs of the hits:

if (1 == 2) {
  similar_subject_ids <- t$subjectids
  similar_subject_ids
}

Download the related proteins' sequences:

if (1 == 2) {
  sequences <- unname(fetch_sequences_from_subject_ids(similar_subject_ids))
  sequences
}

Get the multiple sequence alignment

In the web interface, click 'Multiple alignment' to get COBALT multiple sequence alignment

Download the COBALT MSA alignment
Remove sequences that have deletions in the middle of the sequence
Determine the topology of all remaining sequences
Count mutations that have the same location in all topologies

richelbilderbeek/ncbi documentation built on July 9, 2023, 3:51 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

richelbilderbeek/ncbi
Does Stuff

In richelbilderbeek/ncbi: Does Stuff

Overview

Find human membrane proteins

Pick a human membrane protein

Find related proteins

Select the proteins that have 80% or more of the max score

Get the multiple sequence alignment

R Package Documentation

Browse R Packages

We want your feedback!

richelbilderbeek/ncbi Does Stuff

In richelbilderbeek/ncbi: Does Stuff

Overview

Find human membrane proteins

Pick a human membrane protein

Find related proteins

Select the proteins that have 80% or more of the max score

Get the multiple sequence alignment

R Package Documentation

Browse R Packages

We want your feedback!

richelbilderbeek/ncbi
Does Stuff