knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  tidy=FALSE,
  engine='R'
)

1. Overview

Gain-Scan is an R software package implemented to integrate the protein microarray data acquired under different photomultiplier (PMT) gain settings. The integration of the multi-gain array data significantly reduces the technical variations in the protein microarray data acquisition. It avoid the trouble of selecting one single optimal PMT gain setting for imaging the arrays. It aims to achieving the goal of avoiding saturation of the strong signals while maximizing the detection of the low singles at the same time.

Besides the gainscan modeling and integration, the package currently also include other functions to preprocess/nomalize the protein array data. It can do the following,

  1. Import/read the ".GPR" input file.

  2. Background-orrect the array data.

  3. Normalize the array data using the robust linear model (RLM)

2. Theory and Model

In order to integrate the data acquired with different PMT settings, a power function model with baseline has been developed as
$$ log (Y) = \left{ \begin{array}{ll} log(\beta+\phi v^\delta)+\epsilon & \quad \mbox {for } (\beta+\phi v^\delta) \leq \Lambda \ log (\Lambda) & \quad \mbox {for } (\beta+\phi v^\delta) > \Lambda \ \end{array} \right. $$ $Y$ is the signal intensity of each feature points; $v$ is the gain voltage for acquiring the image; $\delta$ is the number of dynodes in the PMT; $\beta$ is the background signal (including the so-called dark current as well as other sources of signal independent of the light influx); $\phi$ is the light influx determined by the interaction between the feature proteins and analytes to be detected. Finally, $\Lambda$ is the maximum representable signal intensity. This is commonly expected in the analog-to-digital transformation and implemented in the hardware. In a 16-bit digital instrument signals are stored as 16-bit integer and values larger than $\Lambda=2^{16}-1$ are presented as 65535 ($2^{16}-1$)

In this mobel, we assume that the parameters $\beta$, $\phi$ and $\sigma$ are independent of the spot, constant within an array and variable among arrays. We vary the gain voltage $v$ and observe the signal intensity $Y$. Then the model is fitted on the data of $v$ and $Y$ to estimate the parameter $\phi$, the light flux. This light influx is taken to be proportional to the quantity of analyte bound to the give spot. Now, the final statistical model is written as $$ log(Y_{ij})=\left { \begin{array}{ll} log(\beta_{j}+\phi_{i} v^{\delta_{j}})+\epsilon_{ij} & \quad \mbox {for } (\beta_{j}+\phi_{j} v^{\delta_{j}})e^{\epsilon_{ij}} \leq \Lambda \ log (\Lambda) & \quad \mbox {for } (\beta_{j}+\phi_{j} v^{\delta_{j}})e^{\epsilon_{ij}} > \Lambda \ \end{array} \right. $$ $$ i=1,2,...,n; j=1,2,...,m; $$ where $i$ is the $ith$ feature spot on the array; $n$ is the total number of feature on the array; $j$ is the $jth$ array and $m$ is the total number of arrays used in the experiments.

The model is fitted using the Levenberg-Marquardt nonlinear least-squres algorithm (the MINPACK package).

3. A Short Example

1. Installation

The package can be installed from the github repository

    devtools::install_github("BULQI/gainscan")

To build the vignette and install the code from github, please incude the "build_vignettes" option,

    devtools::install_github("BULQI/gainscan", build_vignettes=TRUE)

2. Loading gainscan

After installation, the package can be loaded by

    library(gainscan)

3. Importing data

To import/read array GPR files, we rely on a design/target file. A design file is a metadata file describing the information about the GPR files.It is in a table format (tab-delimitted text file). Each row of the table is about one GPR file. It defines the file name, directory, the array name, group information, etc.The package comes with a set of sample input files. The design file can be viewed by the following r code. r gpr<-system.file("extdata", package="gainscan") targets <- list.files(path=gpr, pattern = "target.txt", full.names = TRUE) tbl.targets<-read.table(targets, header=T,sep="\t") print(tbl.targets) As indicated in the above design file, there are a set of GPR files containing the array. Those GPR files contain the data of three ProtoArrays hybridized with the isotype IgG. Each array has been imaged nine times under different photomultiplier gain voltages (ranging from 250 up to 650). For demonstration purpose, we only include in the file the data of the first two blocks of each array. To read/import those data, we need to feed in the design file as r adata.elist <- importGPR(dir.GPR=gpr, design.file=targets, type="ProtoArray")

4. Background correction (optional)

After importing GPR data, we could do the background correction to clean up the data. It removes from the feature signals the background noises, which are obtained through sampling the area surrounding the protein feature spots. In this package, we provide a wrapper function bc to achieve this. It simply calls the backgroud correction function in the limma package. r adata.elistBC <- bc(adata.elist, method="normexp", normexp.method="saddle") we can see the differences between the raw and background-corrected data by running the below code. ```r #comparing the array expression data before and after background correction cbind("before"=adata.elist$E[1:8,1], "after"=adata.elistBC$E[1:8,1])

#comparing the array control expression data before and after background correction
cbind("before"=adata.elist$C[1:8,1], "after"=adata.elistBC$C[1:8,1])

```

5. Model fitting to estimate the incident light signal with multiple gain data

As indicated above, the array data were acquired with nine different gain voltages. Therefore, a power function model with baseline could be applied to estimate incident light signals (see the section _Theory and model). We first prepare the initial values for the parameters. ```r #nine PMT gain settings pmts<-seq(250, 650, by=50)

#estimating the initial values for parameter based on data

B<-log(min(adata.elist$C)); #the initial value of the background
                    #this has to be as close as possible, otherwise might affect the fitting.
G<-2^16-1; # G is fixed

#calling to do the fitting
#optionally we could call to run fitting using the background-corrected object, adata.elistBC
p_elists<-gainAdjust_fit_Pbp(object=adata.elist, 
                            x=pmts,  
                            B=B, G=G, 
                            block.size=2, fit.mode="control",
                             residual.mode="log",
                            debug=F )

The return object contains information about model parameters and the estimated incident light signals integrating the information acquired under the multiple different PMT gain settings.r names(p_elists)

#doing plot of the density, before and after fitting.
op<-par(
            mfrow=c(2,1),
            pty="m",
            mar=c(4,4,0.1,0.5),
            mgp=c(1.5,0.5,0)
        )

 plot(density(log(adata.elist$E[,17])),main="",
    xlab="Intensity (log)",lty=1
    )
 lines(density(log(adata.elist$E[,8])),lty=1)
 lines(density(log(adata.elist$E[,26])),lty=1)
 legend(x=6.5,y=0.8, legend="signal intensities (raw)",lty=1)

 plot(density(p_elists$elist$E[,1]), main="",
    xlab="incident light Intensity (log)",lty=2
    )
 lines(density(p_elists$elist$E[,2]),lty=2)
 lines(density(p_elists$elist$E[,3]),lty=2)
 legend(x=-5.7,y=3.5, legend="incident light intensities (fitted)",lty=2)
par(op)

```

6. Normalization using the robust linear model.

The processed data now can be normalized across different arrays using the robust linear model [@Sboner2009].

    #start plotting the distribution of signal intensity
    object.pbp<-scaleByDelta(p_elists$elist, mean(p_elists$delta))  #<-scale it, not necessary, but just try
    #now do the RLM normalization
    object.pbp.norm<-normalizeArrays.RLM(data=object.pbp,controls="Anti-HumanIgG",
                    method="RLM", #only implemented RLM for now
                    log=F, log.base=exp(1), coding="Deviation"
        )

    object.raw.600<-adata.elist
    object.raw.600$E<-adata.elist$E[,c(8,17,26)];
    object.raw.600$C<-adata.elist$C[,c(8,17,26)];
    object.raw.norm<-normalizeArrays.RLM(data=object.raw.600,controls="Anti-HumanIgG",
                method="RLM", #only implemented RLM for now
                log=TRUE, log.base=exp(1), coding="Deviation"
            )
    #doing plot of the density
    op<-par(
                mfrow=c(2,1),
                    pty="m",
                    mar=c(4,4,0.1,0.5),
                    mgp=c(1.5,0.5,0)
            )

     plot(density((object.raw.norm$E[,3])),main="",
        xlab="intensity (log)",
        )
     lines(density((object.raw.norm$E[,1])))
     lines(density((object.raw.norm$E[,2])))
     legend(x=5.5,y=1.0, legend="signal intensities (raw+RLM)",lty=1)

     plot(density(object.pbp.norm$E[,1]), main="",
        xlab="incident light signal (log)",lty=2
        )
     lines(density(object.pbp.norm$E[,2]),lty=2)
     lines(density(object.pbp.norm$E[,3]),lty=2)
     legend(x=-38.5,y=0.6, legend="signal intensities (gainScan+RLM)",lty=2)
    par(op)

    #calculate the interarray CVs
    #for raw data with PMT600
    dtf.norm<-array.aggregate(object.raw.norm, log=F, method="Arithmetic")

    cDtf.norm<-dtf.norm$C
    cgDtf.norm<-dtf.norm$cgenes
    interArray_ccv.norm<-interArrayCVs(cDtf.norm, cgDtf.norm)#sqrt(apply(cDtf.norm, 1, var))/apply(cDtf.norm, 1, 

    #now do the PMT fitted and normalizatio 
    cDtf.pbp.norm<-object.pbp.norm$C
    cgDtf.pbp.norm<-object.pbp.norm$cgenes
    interArray_ccv.pbp.norm<-interArrayCVs(cDtf.pbp.norm, cgDtf.pbp.norm)#sqrt(apply(cDtf.5pl.norm, 1,      

    #show the mean CVs
    mean(interArray_ccv.norm$cv)#0.052
    mean(interArray_ccv.pbp.norm$cv)

    #start plotting boxplot                     
    interArray_ccv.norm$type<-"RLM Norm"
    ccvByType<-interArray_ccv.norm

    interArray_ccv.pbp.norm$type<-"Gain-Scan+RLM"
    ccvByType<-rbind(ccvByType,interArray_ccv.pbp.norm)

    ccvByType$type<-factor(ccvByType$type,levels<-c("RLM Norm", "Gain-Scan+RLM"))

    boxplot(ccvByType[,1]~type, data=ccvByType,log="",
        main=("InterArray variance"),
        xlab="",ylab="CV", 
        ylim=c(0,0.065),#max(ccvByType[,1])
        #),
        col=2, outline=FALSE
        );

4. Session Info

The version number of R and packages loaded for generating the vignette were

  sessionInfo()

5. Reference and Citation



BULQI/gainscan documentation built on Dec. 25, 2019, 3:17 a.m.