Timeline Plots

The package contains only a subset of the most important data generated over a period of five years. To get an impression an overview of all annotated sample (S) and workunits (W) in the B-Fabric system, @bfabric, is graphed in the timeline plots.

the NGS data p1644

the mass spec data p1875

Make Data (replaces make-data.R)

NL42_100K.fastq.gz

Sample NGS data contains 100K merged MiSeq reads that demonstrate the linkage between nanobodies (NB) and flycodes (FC) in FASTQ.

NL42_100K <- NestLink:::.getReadsFromFastq("inst/extdata/NL42_100K.fastq.gz")
save(NL42_100K, file="inst/extdata/NestLink_NL42_100K.RData")

knownNB.txt

An optional part of the NestLink workflow is the usage of known nanobodies in the sequencing experiment to estimate sensitity and specificity levels. This example file contains nucleotide sequences of nanobodies that should be detectable in this experiment. In the later workflow, these nanabodies are highlighted and labeled as known NB.

nanobodyFlycodeLinkage.RData

NGS ground truth derived by applying the function runNGSAnalysis to the two previous files.

expFile <- query(eh, c("NestLink", "NL42_100K.fastq.gz"))[[1]]
expect_true(file.exists(expFile))
scratchFolder <- tempdir()
setwd(scratchFolder)

knownNB_File <- query(eh, c("NestLink", "knownNB.txt"))[[1]]
knownNB_data <- read.table(knownNB_File,
                           sep='\t',
                           header = TRUE,
                           row.names = 1,
                           stringsAsFactors = FALSE)

knownNB <- Biostrings::translate(DNAStringSet(knownNB_data$Sequence))
names(knownNB) <- rownames(knownNB_data)
knownNB <- sapply(knownNB, toString)

param <- list()
param[['NB_Linker1']] <- "GGCCggcggGGCC"
param[['NB_Linker2']] <- "GCAGGAGGA"
param[['ProteaseSite']] <- "TTAGTCCCAAGA"
param[['FC_Linker']] <- "GGCCaaggaggcCGG"
param[['knownNB']] <- knownNB
param[['nReads']] <- 100
param[['minRelBestHitFreq']] <- 0.8 
param[['minConsensusScore']] <- 0.9
param[['maxMismatch']] <- 1
param[['minNanobodyLength']] <- 348
param[['minFlycodeLength']] <- 33
param[['FCminFreq']] <- 1

nanobodyFlycodeLinkage.RData <- runNGSAnalysis(file = expFile[1], param)

NB.tryptic and FC.tryptic

Both files are the output of the previous NGS step generating the linkage between NBs and FCs.

The files are used to demonstrate the detectability of the AA sequences.

The wrapper functions are extended by the SSRC prediction and the parent ion mass (pim) determined by using r CRANpkg('protViz').

The column ESP_Prediction was generated by using the service from https://genepattern.broadinstitute.org, see also @pmid19169245.

library(NestLink)
NB <- getNB()
FC <- getFC()

The first ten lines of each table is shown below:

library(knitr)
kable(head(NB))
kable(head(FC))

F255744.RData and WU160118.RData

Mass spec data

the mass spec files below are available through ProteomeXchange PXD009301.

Compute the peptide spectrum matches

the mass spectra were assigned to peptide sequences using the most important parameter listed in the table below and the Matrix Science's Mascot Server @pmid10612281 version 2.5.

|Parameter|Value| |:---|:---| |COM|170819_MS1708116_NL5idx4to5_Competition2BG_db8_db10_swissprot_d_merge| |FASTA 1|p1875_db8_20160704.fasta| |FASTA 2|p1875_db10_20170817.fasta| |TOL|10| |TOLU|ppm| |ITOL|0.6| |ITOLU|Da| |USERNAME|egloffp| |CHARGE|2+| |IT_MODS|Deamidated (NQ),Oxidation (M)| |INSTRUMENT|ESI-TRAP| |release|fgcz_swissprot_d_20140403.fasta|

The results were exported as XML. The XML was parsed and exported as data.frame using r CRANpkg('protViz') @protViz function protViz:::as.data.frame.mascot.

Workflow available through B-Fabric

The above-described results and workflows are available for registered users in B-Fabric. However, it is not necessary to access B-Fabric in order to use this package.

make-data for NestLink

The following code snippet was executed to generate the data set shiped with the r Biocpkg("NestLink") package.

Here only the metadata were extracted (no MS2).

load("~/Downloads/444589.RData")
library(protViz)
library(NestLink)
WU160118 <- do.call('rbind', lapply(list("F255737", "F255744", "F255747", 
  "F255749", "F255751", "F255760", "F255761", "F255762"), 
  function(datfilename){
      df <- as.data.frame.mascot(get(datfilename))
      df$datfilename <- datfilename
      df
    }
  ))
save(WU160118, file = "../inst/extdata/WU160118.RData", 
     compress = TRUE, compression_level = 9)   

The data ships with the r Biocpkg("NestLink") package and can be browsed using the following code snippet:

library(ExperimentHub)
eh <- ExperimentHub(); 
load(query(eh, c("NestLink", "WU160118.RData"))[[1]])
class(WU160118)
PATTERN <- "^GS[ASTNQDEFVLYWGP]{7}(WR|WLTVR|WQEGGR|WLR|WQSR)$"
idx <- grepl(PATTERN, WU160118$pep_seq)
WU <- WU160118[idx & WU160118$pep_score > 25,]
kable(unique(unlist(lapply(strsplit(x=as.character(WU$pep_scan_title), split=";"), function(x)gsub("File:", "", gsub("\\\\", "/", x[1]))))))

PGexport2_normalizedAgainstSBstandards_Peptides.csv

contains mass spectrometry based label free quantitative (LFQ) results of nanobodies expressed in SMEG and COLI species.

Two LC-MS/MS runs were aligned in Progenesis QI (Nonlinear Dynamics) with an alignment score of 93.1 %, followed by peak picking with an allowed ion charge of +2 to +5.

Uploading to S3

1
2
3
4
5
6
7
#!/bin/bash

aws --profile AnnotationContributor s3 cp NestLink/F255744.RData s3://annotation-contributor/NestLink/F255744.RData --acl public-read

aws --profile AnnotationContributor s3 cp NestLink/WU160118.RData s3://annotation-contributor/NestLink/WU160118.RData --acl public-read

aws --profile AnnotationContributor s3 cp NestLink s3://annotation-contributor/NestLink --recursive --acl public-read

Overview/Getting started using Bioconductor r Biocpkg("ExperimentHub")

load metadata

fl <- system.file("extdata", "metadata.csv", package='NestLink')
kable(metadata <- read.csv(fl, stringsAsFactors=FALSE))

query and load r Biocpkg("NestLink") package data from aws s3

library(ExperimentHub)

eh <- ExperimentHub(); 
query(eh, "NestLink")

load(query(eh, c("NestLink", "F255744.RData"))[[1]])
dim(F255744)

load(query(eh, c("NestLink", "WU160118.RData"))[[1]])
dim(WU160118)

Session info

Here is the compiled output of sessionInfo():

sessionInfo()

References



cpanse/NestLink documentation built on May 16, 2022, 2:33 a.m.