knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(agvgd)
One of the greatest challenges in the study of cancer risk is the assessment of unclassified missense variants.
Besides using biochemical and cell-based transcriptional assays to assess the structural and functional defects associated with missense variants, one can also use bioinformatics analysis based on multiple sequence alignment data and protein structure prediction to approach this problem.
Align-GVGD is one such method that uses protein multiple sequence alignment (PMSA) data to provide cancer risk estimates.
In this vignette we show you how you can use {agvgd}
to reproduce the results
obtained by Lee et al. (2010) [@Lee.CR.2010] on the study of missense variations in the BRCT
Domain of BRCA1 gene.
In Lee's paper, Align-GVGD prediction scores are part of their cross-validation of structural and functional assays, as indicated in the column labeled AG in Figure 3 of said paper.
knitr::include_graphics('../man/figures/lee2010_fig03.png')
To reproduce the AGVGD prediction scores, we need two data sets:
Both these data sets are already bundled with {agvgd}
.
To read in the alignment of BRCA1 use the function read_alignment()
and the
name of the gene "BRCA1"
:
brca1_alignment <- read_alignment("BRCA1") print(brca1_alignment, line_width = 200)
To read in the 117 BRCA1 missense variants, i.e., the 117 substitutions, we
use the function read_substitutions()
:
path <- system.file("extdata/lee2010_sub.txt", package = 'agvgd', mustWork = TRUE) missense_variants <- read_substitutions(path) missense_variants
The list of missense variants comes with the indication of the residue number in
the protein sequence, i.e. column res
in missense_variants
. As we'll see
next, the function agvgd()
uses the alignment positions (column poi
in
missense_variants
) instead to refer to those positions in the alignment.
The difference between res
and poi
is that res
counts the residues in the
protein primary sequence reference, while poi
refers to the positions in the
alignment, accounting for gaps.
So, before we proceed, we will update the missense_variants
tibble and replace
the NA
values with the corresponding poi
values. For that we use the
function res_to_poi()
:
missense_variants$poi <- res_to_poi(brca1_alignment, missense_variants$res) missense_variants
agvgd()
Run Align-GVGD with the function agvgd()
:
scores <- agvgd(alignment = brca1_alignment, poi = missense_variants$poi, sub = missense_variants$sub) print(scores, n = Inf)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.