Getting started with BaMORC
In BaMORC: Bayesian Model Optimized Reference Correction Method for Assigned and Unassigned Protein NMR Spectra

library(knitr)
opts_chunk$set(tidy.opts=list(width.cutoff=60),tidy=TRUE)
knitr::opts_chunk$set(comment = "#>", collapse = TRUE)

BaMORC quickstart guide

The goal of this document is to get you up and running with BaMORC as quickly as possible. BaMORC stands for the Bayesian Model Optimized Reference Correction (BaMORC) Method for Assigned and Unassigned Protein NMR Spectra. For a detailed explanation of the algorithm, please refer to "Automatic 13C chemical shift reference correction for unassigned protein NMR spectra".

There are two important parts to BaMORC: the bamorc() reference correction function for assigned 13C protein NMR spectra, and the unassigned_bamorc() reference correction function for unassigned 13C protein NMR spectra. In the first section, you'll learn about the basics of running both of these functions. In the second section, you will dive into the basics of input data process, and in third section, you'll learn a little more about the functions used behind-the-scenes by the BaMORC algorithm.

BaMORC basics

To make a correction, first load BaMORC, then call bamorc() or unassigned_bamorc() with the correct arguments:

library(BaMORC)

For assigned protein NMR spectra:

Here we will using the built-in data to demonstrate the arguments that will be passed into bamorc().

## Arguments:
sequence = paste(RefDB_data$carbonDat[[1]]$AA, collapse = "")
secondary_structure = paste(RefDB_data$carbonDat[[1]]$SS, collapse = "")
chemical_shifts_input = RefDB_data$carbonDat[[1]][, c(4,5)]
from= -5
to = 5

## Running bamorc() function:
bamorc(sequence = sequence, secondary_structure = secondary_structure, 
       chemical_shifts_input = chemical_shifts_input, from = -5, to = 5)

sequence: the sequence of protein of interest with single-letter convention.
secondary_structure: the secondary structure information of the protein with with single-letter convention.
chemical_shifts_input: the carbon 13 chemical shift of the protein.
from and to: the upper- and lower-bound of optimization range. Value assigned to from must be lower than to.

The length of the sequence and secondary_structure should be the same, if not please assign secondary_structure = NULL, however, the peaklist groups number could be more or less than the length of the sequence. Often peak lists will have fewer CA/CB pairs than the length of the protein sequence.

Printing an argument object gives you some useful information: the actual format, the size, the object type, and if it's a string.

For unassigned protein NMR spectra:

Next, we will use the built-in data to demonstrate the arguments that will be passed in the unassigned_bamorc(). The output will be slightly different each time it runs, due to the randomness of the optimization.

## Arguments:enerate a temperary sample NMR spectra file and later will be 
## removed.
sequence = "RPAFCLEPPYAGPGKARIIRYFYNAAAGAAQAFVYGGVRAKRNNFASAADALAACAAA"
sample_file_path = system.file("extdata", "bpti_HNcoCACB.txt", package = "BaMORC")

## Running unassigned_bamorc() function:
# unassigned_bamorc will take a while to run
unassigned_bamorc(peakList_file_loc = sample_file_path, sequence = sequence,
                 secondary_structure = NULL, from = -5, to = 5)

The data processing:

The data passed in both above functions should be pre-processed. Luckly, we provide a variety of helper functions. See the following on how to process the data.

The sample generating functions:

There are three file parsing functions within the BaMORC package: read_raw_file(), read_nmrstar_file(), and read_db_file(). These functions handle a wide range of the input files.

Delimiter of white space: ```r ## Arguments: sample_file_path = system.file("extdata", "sample_input_ws.txt", package = "BaMORC")

Running reading function

head(read_raw_file(sample_file_path, delim = "ws")) ```
Delimiter of comma: ```r ## Arguments: sample_file_path = system.file("extdata", "sample_input.csv", package = "BaMORC")

Running reading function

head(read_raw_file(sample_file_path, delim = "comma")) ```
Delimiter of semicolon: ```r ## Arguments: sample_file_path = system.file("extdata", "sample_input_sc.txt", package = "BaMORC")

Running reading function

head(read_raw_file(sample_file_path, delim = "semicolon")) ```

The reading functions will return sequence and chemical shifts:

read_raw_file(): parses user-provided file in customed format as show above examples.
read_nmrstar_file(): parses user-provided file in BMRB Star 2/3 format.

## Download a BMRB file
library(BMRBr)
bmrb_download(4020, output_dir = "./")

## Read in BMRB file and procec
file_path = "bmr4020.str"
head(read_nmrstar_file(file_path = file_path))

## Delete downloaded BMRB file
unlink("./bmr4020.str")

read_db_file(): parses file from BMRB by a given entry ID.

id = 4022
output <- read_db_file(id = id)
head(output[[1]])
head(output[[2]])

Estimating econdary structure from sequence:

Secondary structure information predicted based on the protein sequence is required for the optimization. This is predicted through the JPred and jpredapi.

protein_sequence <- "MQVWPIEGIKKFETLSYLPPLTVEDLLKQI"
secondary_structure <- jpred_fetcher(protein_sequence = protein_sequence)

secondary_structure

Behind-the-scenes functions

The BaMORC algorithm minimizes the difference between the actual relative cummulative frequency (RCF) of the protein sequence and the estimated RCF predicted from the chemical shifts information. The statistical functions used by the algorithm are shown below with examples. For detailed function descriptions, please see the reference:

calculate_aa_prob(): returns the probability (density) for a certain type of amino acid based on a chi-squared statistics wtih 2 degrees of freedom.

calculate_aa_prob(chi_squared_stat = c(0.05, 0.1, 0.5), df = 2)

calculate_chi_squared_stat(): given a pair of alpha and beta carbons chemical shifts, this function will return a list of calculated chi-squared statistics based on the combination of amino acid typings and secondary structures. Here we illustrate with a pair of chemical shifts from alpha and beta carbon.

calculate_chi_squared_stat(cacb_pair = c(54,45))

calculate_rcf(): calculates the relative cummulative freqeucny of amino acid and secondary structure combination.

## Arguments:
sequence = paste(RefDB_data$carbonDat[[1]]$AA, collapse = "")
secondary_structure = paste(RefDB_data$carbonDat[[1]]$SS, collapse = "")

## Function:
calculate_rcf(sequence = sequence, secondary_structure = secondary_structure)

calculate_mse(): calculates mean squared error for each correction value (step).

## chemicalShifts and aaFreq are predefined sample variables for demo purpose 
## within the BaMORC Package.
calculate_mse(step_ca = 1, step_cb = 1, dat_cacb = chemicalShifts[, c(3,4)], 
              aa_Freq = aaFreq)

Any scripts or data that you put into this service are public.

BaMORC documentation built on May 1, 2019, 6:35 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

BaMORC
Bayesian Model Optimized Reference Correction Method for Assigned and Unassigned Protein NMR Spectra

Getting started with BaMORC
In BaMORC: Bayesian Model Optimized Reference Correction Method for Assigned and Unassigned Protein NMR Spectra

BaMORC quickstart guide

BaMORC basics

For assigned protein NMR spectra:

For unassigned protein NMR spectra:

The data processing:

The sample generating functions:

Running reading function

Running reading function

Running reading function

The reading functions will return sequence and chemical shifts:

Estimating econdary structure from sequence:

Behind-the-scenes functions

Try the BaMORC package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

BaMORC Bayesian Model Optimized Reference Correction Method for Assigned and Unassigned Protein NMR Spectra

Getting started with BaMORC In BaMORC: Bayesian Model Optimized Reference Correction Method for Assigned and Unassigned Protein NMR Spectra

BaMORC quickstart guide

BaMORC basics

For assigned protein NMR spectra:

For unassigned protein NMR spectra:

The data processing:

The sample generating functions:

Running reading function

Running reading function

Running reading function

The reading functions will return sequence and chemical shifts:

Estimating econdary structure from sequence:

Behind-the-scenes functions

Try the BaMORC package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

BaMORC
Bayesian Model Optimized Reference Correction Method for Assigned and Unassigned Protein NMR Spectra

Getting started with BaMORC
In BaMORC: Bayesian Model Optimized Reference Correction Method for Assigned and Unassigned Protein NMR Spectra