In msuprun/bbeaR: an R package and framework for epitope-specific antibody profiling

knitr::opts_chunk$set(echo = TRUE, message = FALSE)

About

The Bead-Based Epitope Assay (BBEA) can be used to quantify the amount of epitope- or peptide-specific antibodies (e.g., IgE) in plasma or serum samples. A detailed assay description is outlined in this publication.

Previous studies used peptide microarray technology and found that immunoglobulin (Ig)E specific to sequential epitopes of ovomucoid protein, a major allergen in egg-white, could be a marker of egg allergy diagnosis and persistence (Martinez-Botas et al (2013) and Jarvinen et al (2007)). In this study and tutorial, we evaluated if BBEA can be used to identify IgE-binding epitopes. We analyzed levels of IgE to 58 peptides (15-mer, 12 amino acid overlap), covering the entire sequence of hen’s egg-white ovomucoid protein, measured using BBEA in 38 allergic children and 6 controls.

We will start by installing bbeaR and reading in the BBEA's raw data (alternatively, one can load an R dataset that comes with this package). We will look at several quality control (QC) measures and normalize the data. Then we will create a topology plot, to highlight immunodominant regions on the ovomucoid protein. At the end, we will demonstrate an approach to identify sequential epitope-specific (ses)IgE antibodies that are different between allergic children and controls, using limma modeling framework.

Installation

bbeaR is an R package and is installed using a standard GitHub code:

library(devtools)
install_github('msuprun/bbeaR')

Loading additional packages that will be used in the analyses.

library(bbeaR)
library(plyr)
library(stringr)
library(ggplot2)
library(gridExtra)
library(pheatmap)
library(RColorBrewer)
library(limma)

Raw Data Import

44 patient samples were assayed in duplicates, ran within the same batch on one 96-well plate. The run additionally included two positive control pools (aka "PP", a mix of plasma from several allergic patients), a negative pool (aka "NP", a mix of plasma from several healthy patients), and 2 wells without any sample (aka "Buffer", for a background quantification).

The original .csv files from the Luminex-200 assay, generated with the xPONENT® software, can be downloaded here.
The .csv sheet consists of several sections, of which "Median", "Count", and "NetMFI" hold the assay's readout values for each epitope-specific antibody (j) in columns and each sample (i) in rows: as shown below

First, we need to create a plate layout using the create.plate.db() function. This layout will be used in the import and some of the plotting functions. The only input to this function is the direction of the plate read (horizontal or vertical) that was used by the Luminex instrument.

l <- create.plate.db(direction = "horizontal")
plate.design.db <- l$plate.design.db
plate.design <- l$plate.design
plate.design

Then read a .csv file of the one plate.
Note: a separate tutorial (Milk Allergy Example) has an example of importing and processing multiple plates.

bbeaEgg <- bbea.read.csv(fname = "BBEA_OVM_58p_IgE_patients.csv")

The bbea list object contains several elements, extracted from the assay's output.

names(bbeaEgg)

The Median, NetMFI, and Count are matrices with rows as Analytes (epitopes) and columns as Samples. Note: this is changed from the original data layout, where samples were in rows and epitopes in columns, to make the dataset usable with common analytical tools in R.

The Median are the Median Fluorescence Intensities (MFIs) and the NetMFI are the Medians normalized to background. In our example, Median and NetMFI have exactly same values, since normalization was not selected during the assay run. This post has more details about the Luminex outputs.

bbeaEgg$Median[1:5, 1:2]

The Count are the numbers of beads counted per analyte, and is important quality control measure.

bbeaEgg$Count[1:5, 1:2]

The AssayInfo saves parameters of the assay.

bbeaEgg$AssayInfo[1:15, 1:2]

Finally, bbea.read.csv() creates a phenotype (p)Data that has some basic information about the samples and the assay run.

colnames(bbeaEgg$pData)
bbeaEgg$pData[1:2, 1:2]

We can now add external information about the samples and analytes.
We will load already imported data that includes phenotype (clinical) data PDegg and an annotation file AnnotEgg. Note: this will also load a bbea list object (generated in the previous steps).

data(Egg)

The annotation AnnotEgg dataset contains the mapping of Luminex beads (Analytes) to the peptides/epitopes.

dim(AnnotEgg)
head(AnnotEgg)

Changing the bbeaEgg object to have peptide names instead of analyte numbers.

bbeaEgg <- bbea.changeAnnotation(bbea.obj = bbeaEgg, 
                              annotation = AnnotEgg,
                              newNameCol = "Peptide", 
                              AnalyteCol = "Analyte")
bbeaEgg$Median[1:5, 1:2]

The PDegg dataset has clinical information about our samples.

dim(PDegg)
head(PDegg)

We will clean up the pData and then merge it with PDegg. This step is not necessary, but will be useful when we do statistical modelling.

bbeaEgg$pData <- mutate(bbeaEgg$pData, 
                     PTID = sapply(strsplit(Sample, "\\-"), head, 1),
                     SampleType = ifelse(grepl("Buff", Sample),"Buffer",
                                         ifelse(grepl("PP|NP", Sample),
                                                sapply(strsplit(Sample, "\\-"),head, 1), "Patient")))
PDm <- merge(bbeaEgg$pData, PDegg, by = "PTID", all.x = T)
rownames(PDm) <- PDm$File
bbeaEgg$pData <- PDm[colnames(bbeaEgg$Median),] # making sure samples are arranged correctly

Quality Control of the Raw Data

Layout of the experimental plate.
Tip: to make sure no position bias is present in the data, samples have to be randomized among wells and plates, as outlined in this publication.

bbea.QC.Image(bbeaEgg$pData,
              filename = "QC.Image.",
              plate.design.db = plate.design.db)

We want to make sure that there are no missing samples or analytes. This would be reflected by the very low counts (<25).
The heatmap shows that all samples and analytes were included in the experimental run.

bbea.QC.heatmap.counts(bbeaEgg,
                       getlog2 = FALSE,
                       filename = "QC.CountsHeatmap.pdf",
                       plateVar = "Plate", ann = NULL)

Now we look at the overall distribution of counts: the minimum count is ~69, which is pretty good.

p<-bbea.QC.Samples(bbeaEgg,
               filename = "QC.",
               plateVar = 'Plate',
               gt = 25)
grid.arrange(p$pmin, p$pavg, nrow=1)

Our counts look good, so we don't need to exclude any samples. However, if this were not the case, samples with low counts can be removed using the bbea.subset() function, only keeping samples with average counts > 25.

bbea.sub <- bbea.subset(bbeaEgg, statement = (bbeaEgg$pData$CountMean > 25))

Data Normalization

We convert the Median to normalized nMFI by taking the log~2~ of values and subtracting the average of the background wells. Note: the Sample column of the pData should include a "Buffer" string in the wells dedicated for the background.

bbeaEgg$pData$Sample # samples 71 & 72 are the background wells

bbeaN <- MFI2nMFI(bbeaEgg, 
                  offset = 0.5, # a constant to add to avoid taking a log of 0
                  rmNeg = TRUE) # if a value of a sample is below background, assign "0" 
names(bbeaN)
bbeaN$nMFI[1:5, 1:2]

R object of class ExpressionSet eset is convenient for a high-throughput data analysis.
bbea object can be converted to eset:

eset <- nMFI2Eset(nMFI.object = bbeaN)
eset

QC of Normalized Data

Distributions

Overall distribution:
Since there are different levels of the antibody to each peptide, the data will be scaled before plotting. Y-axis of the boxplot represents a mean MFI or nMFI of 50 scaled peptides. The plots show that IgE to sequential peptides, as expected, is higher in egg allergic patients and positive pools (PP), but not background wells (buffer) or negative control pool (NP).

adjMFI <- t(apply(as.matrix(bbeaEgg$Median), 1, function(x){x - mean(x, na.rm=T)}))
adjnMFI<-t(apply(as.matrix(exprs(eset)), 1, function(x){x - mean(x, na.rm=T)}))

par(mfrow = c(1, 2)) 
boxplot(colMeans(adjMFI) ~ bbeaEgg$pData$SampleType,
        xlab = " ", ylab = "MFI")
boxplot(colMeans(adjnMFI) ~ eset$SampleType,
        xlab = " ", ylab = "normalized MFI")

Cullen-Frey plots can be used to evaluate the distribution of the data. It shows how the skewness and kurtosis of our data compare to the theoretical distributions.
CullenFreyPlot() function is a wrapper of the fitdistrplus::descdist().

CullenFreyPlot(adjMFI, filename = "QC.CullenFrey.MFI")
CullenFreyPlot(adjnMFI, filename = "QC.CullenFrey.nMFI")

We can see that while both MFI and nMFI data are skewed, nMFI data is closer to log-normal rather than exponential distributions.

Technical Replicates

Intraclass Correlation Coefficient (ICC) is used to evaluate agreement among technical replicates for each peptide, where 0 means no agreement, and 1 is a perfect agreement.
The function returns the ICC and a 95% confidence interval.

icc.db <- getICCbyPeptide(eset, 
                          UR=c("PTID","Plate")) # what is the unit of replication? in this case, it is patient + plate
head(icc.db)

ggplot(icc.db, aes(x = Peptide, y = ICC)) +
  geom_hline(yintercept = 0.7, color = "grey50", linetype = 2) +
  geom_point() +
  geom_errorbar(aes(ymax = UCI, ymin = LCI),width = 0.5) +
  scale_y_continuous(limits = c(0, 1), breaks = seq(0, 1, 0.1)) +
  labs(x = "", y = ("ICC [95% CI]"), title = "ICC across replicates") +
  theme_bw() + coord_flip()

Coefficient of Variation (CV) is used to estimate variability of the replicates. Generally, for biological assays the desired CV is below 20%.

cv.db <- getCVbyPeptide.MFI(bbeaEgg, 
                            UR = c("PTID","Plate")) # what is the unit of replication? in this case, it is patient + plate

head(cv.db)

ggplot(cv.db, aes(x = Peptide,y = mean.cv)) +
  geom_hline(yintercept = 20, color = "grey50", linetype = 2) +
  geom_point() +
  geom_errorbar(aes(ymax = mean.cv + sd.cv, ymin = mean.cv - sd.cv),width = 0.5) +
  scale_y_continuous(limits = c(-10,100),breaks = seq(-10,100,10)) +
  labs(x = "", y = expression("Average %CV "%+-%"SD"), title = "CV across replicates") +
  theme_bw() + coord_flip()

Averaging Technical Replicates

If there are no samples to exclude based on ICC, CV and QC of counts, technical replicates can be averaged for the downstream analyses.

eset.avg <- getAveragesByReps(eset, UR = 'PTID')
eset.avg
exprs(eset.avg)[1:5,1:3]

pheatmap(exprs(eset.avg), scale = "row", fontsize = 6,
         annotation_col = subset(pData(eset.avg), select = Group))

The heatmap shows that few of the patients have higher IgE levels to several ovomucoid peptides. This is consistent with a previous study (Martinez-Botas et al 2013) that used peptide microarrays to show heterogeneity of ses-IgE responses among egg allergic patients. Higher levels or diversity (heatmap below) of ses-IgE could be associated with allergy persistence (Jarvinen et al 2007) , however, a lack of detailed clinical data prevents further conclusions.

Another useful statistic of IgE levels is a binary value of whether the IgE an epitope is "present", which can also help evaluate ses-IgE diversity. ses-IgE can defined as "present" if the MFI 2-3 standard deviations above the background (aka "Buffer" wells). Sometimes it might be more important to compare the MFI to negative control samples rather than the buffer. This can be easily achieved by specifying a desired string (e.g., "NegativeControl" or "NP") in the buffer.name argument of binarizeMFIbySD() function.

binary.db <- binarizeMFIbySD(bbea.object = bbeaEgg,
                             buffer.name = "NP", # a string in the "Sample" column identifying reference sample or background
                             plateVar = "Plate", # column indicating batch/plate
                             UR = 'PTID', # unique sample ID
                             nSD = 3) # numner of standard deviaitons above the buffer
sampleNames(binary.db) <- sapply(strsplit(sampleNames(binary.db), "_"),tail, 1)
# can remove background (Buffer) wells
binary.db <- binary.db[, !grepl("Buffer", sampleNames(binary.db))]

pheatmap(exprs(binary.db), fontsize = 6, 
         annotation_col = subset(pData(eset.avg)[sampleNames(binary.db),], select = Group),
         color = c("azure2", "darkslateblue"))

The blue color represents epitope-specific IgE antibodies detected in our samples. Several patients (right heatmap cluster) have higher ses-IgE diversity, which could potentially be a marker of egg allergy persistence.

Differential Analysis - Limma Modeling

We are interested in identifying epitope-specific IgE antibodies that are different between egg allergic patients and atopic controls. Atopic controls are patients that are not allergic to egg but have other types of allergies.

For this, we will use a limma framework where a linear model is fit to every peptide.

# removing control samples
eseti <- eset.avg[, which(!grepl("NP|PP", eset.avg$PTID))]

design <- model.matrix(~ Group, data = pData(eseti))
colnames(design) <- make.names(colnames(design))
fit <- lmFit(eseti, design)
# means 
contr_m <- makeContrasts(Controls    = X.Intercept.,
                         EggAllergic = X.Intercept.  + GroupEgg.Allergy, levels = design)
# differences 
contr_d <- mutate(as.data.frame(contr_m), Allergic_vs_Controls = EggAllergic - Controls)
contr_d <- subset(contr_d, select=Allergic_vs_Controls)
fitm <- eBayes(contrasts.fit(fit,contr_m)) # marginal group means
ebfit <- eBayes(contrasts.fit(fit,contr_d), trend=TRUE) # lgFCH allergic vs controls
par(mfrow = c(1, 1)) 
plotSA(ebfit) # this plot shows that we have a variance trend, so we should keep trend=TRUE in the eBayes function
D <- decideTests(ebfit, method = "separate", adjust.method = "none", 
                 p.value = 0.05, lfc = log2(1))
t(summary(D)) # number of significant peptides

# Table of top epitope-specific IgE antibodies
topTable(ebfit, number=10)

We have identified 7 epitope-specific IgE antibodies that are higher in the allergic group. In this example, we have a small control group (n=6), so the p-values were not corrected for multiple comparison.

Significant epitope-specific IgE can be visualized using the "net circle" plot.
Area of the circle is proportional to the absolute value of the difference (Allergic vs Controls). Color of the circle indicates the direction (Higher/Up=red, Lower/Down=blue, Non-Significant (NS)=grey) and significance. User needs to make sure that the Annotation file has a column named "lableName", as it will be used to print names of the epitopes.

Run_doNetCirclePlot(ebfit, D, AnnotEgg, fname = "lmFit.EAvsAC.")

Protein Topology Plot

We can now use our experimental data to overlay it on the amino acid sequence of the protein. Ovomucoid doesn't have a complete 3D crystal structure, so a "topology" plot can be helpful in identifying specific areas on a protein that are recognized by patients' IgE.

First, we need to create a protein layout. We will use drawProteins package to download information about the protein sequence from the UniProt database. The meta data includes amino acid (aa) positions of SIGNAL and CHAIN sequences, as well as information about reactive sites, glycosylation, disulfate bridges.

json <- drawProteins::get_features("P01005")
ovm_p01005_meta <- drawProteins::feature_to_dataframe(json)
# Ovomucoid has a signal peptide sequence, that was not part of the peptides used in the assay.
ovm_signalEnd <- ovm_p01005_meta$end[which(ovm_p01005_meta$type == "SIGNAL")]
ovm_p01005_meta <- mutate(ovm_p01005_meta,
                          beginMinusSignal = begin - ovm_signalEnd,
                          endMinusSignal = end - ovm_signalEnd)
# complete aa sequence
ovm_p01005_aaseq <- json[[1]]$sequence
# removing signal peptide
ovm_p01005_aaseqCHAIN <- substr(ovm_p01005_aaseq, ovm_signalEnd + 1, (nchar(ovm_p01005_aaseq)))

getTopologyPlotDB() function will construct a dataframe that can be used for plotting. The getEnzymeCuts() function is based on the R package cleaver and can add sites of enzymatic cuts (by trypsin, pepsin, and chymotrypsin).

ovm_plotDB <- getTopologyPlotDB(ovm_p01005_aaseq, # a character string with amino acid sequence
                                ymax = 20, # number of amino acids in one column 
                                offset = 3, # the first few amino acids to be below the rest of the plot
                                meta = TRUE, # do we have the metadata
                                metadb = ovm_p01005_meta) # metadata dataframe

ovm_plotDB <- getEnzymeCuts(ovm_p01005_aaseq, # a character string with amino acid sequence
                            topo = TRUE, # do we have a dataframe generated with getTopologyPlotDB()
                            topodb = ovm_plotDB, # name of the getTopologyPlotDB() dataframe
                            rmsignal = TRUE, # remove the signal peptide chain?
                            signalend = 24) # at what residue does the signal chain end

ovm_plotDB<-mutate(ovm_plotDB, 
                   EnzymeRmSignal = ifelse(is.na(EnzymeRmSignal), "", EnzymeRmSignal))

makeTopologyPlotBase(subset(ovm_plotDB, type != "Signal")) +
  geom_point(size = 5.5, color='black', aes(shape = glycosylation)) +
  geom_point(size = 5, aes(color = disulf, shape = glycosylation)) +
  scale_shape_manual(values = c(19, 0, 11), labels = c("No", "Yes")) +
  scale_color_manual("bridge", values = c("grey61", "coral"), labels = c("None","SS")) +
  geom_text(aes(label = AA), size = 3) +
  labs(title = paste("Ovomucoid (P01005)")) +
  geom_text(data = subset(ovm_plotDB, EnzymeRmSignal != ""), aes(label = EnzymeRmSignal),
            size = 2, vjust = -1.7, angle = 90)

The plot shows that several Asparagine (N) residues are glycosylated; the Cysteine residues colored in orange form disulfide bridges; and there are many potential enzymatic cleavage sites (P=Pepsin; T=Trypsin, Ch=Chymotrypsin).

Now we will overlay this topology plot with our experimental data to see if any residues stand out.

# Getting sequences of the overlapping peptides (overlap by 12 aa, offset 3)
peptDB_OVM <- do.call("rbind",sapply(seq(1, nchar(ovm_p01005_aaseqCHAIN), 3), 
                                     function(i){
                                       substr(ovm_p01005_aaseqCHAIN, i, 14 + i)
                                       }, simplify = FALSE))
peptDB_OVM <- subset(mutate(as.data.frame(peptDB_OVM),
                            Peptide = paste0("OVM-", str_pad(1:nrow(peptDB_OVM), 3, pad = "0")),
                            Peptide.Number = 1 : nrow(peptDB_OVM)),
                     nchar(as.character(V1)) == 15)
colnames(peptDB_OVM)[1] <- "Sequence"

# generate topology dataframe by using the means of the egg allergic patients
dbt <- getTopologynMFI(eset = eset.avg[,which(eset.avg$Group == "Egg Allergy")],
                       peptideDB = peptDB_OVM,
                       topoDB = ovm_plotDB,
                       endSignal = ovm_signalEnd)
# assigning percentiles to the average level for each peptide
dbt$avgp <- cut(dbt$avg, 
                breaks = quantile(dbt$avg, probs=seq(0, 1, by = 0.25), na.rm = TRUE),
                include.lowest = TRUE, labels = c("<25%", "25-50%", "50-75%", ">75%"))

makeTopologyPlotBase(subset(dbt, type != "Signal")) +
  geom_point(size = 5.5, color = 'black',aes(shape = glycosylation)) +
  geom_point(size = 5, aes(color = avgp, shape = glycosylation)) +
  scale_color_brewer("Percentile of the mean level", palette = "OrRd") +
  scale_shape_manual(values = c(19, 15), labels = c("No", "Yes")) +
  geom_text(aes(label = AA), size = 2) +
  geom_text(data = subset(dbt,EnzymeRmSignal != ""), aes(label = EnzymeRmSignal),
            size = 2, vjust = -1.7, angle = 90) +
  labs(title = "Mean nMFI's percentile for each amino acid")

The topology plot shows that most IgE antibodies of our egg allergic patients were directed to the several groups of amino acids in the N-terminal of the ovomucoid protein. This way we have identified IgE directed at sequential peptides associated with egg allergy, constituting immunogenic IgE epitopes previously identified by several other groups that used different epitope mapping technologies (Cooke et al (1997), Shreffler et al (2005), Jarvinen et al (2007)).

msuprun/bbeaR documentation built on Jan. 31, 2021, 6:49 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com