knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(ncbi)
set.seed(42)

Overview

Find human membrane proteins

Either go to https://www.ncbi.nlm.nih.gov/protein?term=(membrane%20protein)%20AND%20homo%20sapiens%5BPrimary%20Organism%5D or run:

Sys.sleep(16)
ids <- search_for_human_membrane_proteins()
ids

Pick a human membrane protein

Pick a human membrane protein by clicking it on the web interface.

In R, we pick a random one:

id <- sample(x = ids, size = 1)
id

Find related proteins

In the web interface, click on 'Run BLAST' to start blastp, then run blastp and wait.

In R, we

Fetch the protein sequence from the protein ID:

sequence <- sprentrez::fetch_sequence_from_protein_id(id)
sequence

In the web interface, click on 'Run BLAST', https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins&PROGRAM=blastp&BLAST_PROGRAMS=blastp&QUERY=BAA86974.1&LINK_LOC=protein&PAGE_TYPE=BlastSearch

Within R, run blastp by calling bio3d::blast.pdb on the protein sequence

if (1 == 2) {
  blastp_result <- bio3d::blast.pdb(sequence)
}

Show the results:

if (1 == 2) {
  t <- blastp_result$hit.tbl
  t <- t[order(-t$identity), ] # Best results first
  knitr::kable(head(t))
}

Select the proteins that have 80% or more of the max score

Now we select protein sequences that are similar enough. By 'similar enough', in this context, it means that the related sequences should differ only a couple of amino acids.

For now, we keep all...

if (1 == 2) {

  t <- t[order(-t$identity), ] # Best results first
  knitr::kable(head(t))
}

Get the IDs of the hits:

if (1 == 2) {
  similar_subject_ids <- t$subjectids
  similar_subject_ids
}

Download the related proteins' sequences:

if (1 == 2) {
  sequences <- unname(fetch_sequences_from_subject_ids(similar_subject_ids))
  sequences
}

Get the multiple sequence alignment

In the web interface, click 'Multiple alignment' to get COBALT multiple sequence alignment

#


richelbilderbeek/ncbi documentation built on July 9, 2023, 3:51 a.m.