knitr::opts_chunk$set(
  #collapse = TRUE,
  comment = "#>",
  fig.width = 4,
  fig.height = 4,
  message = FALSE,
  warning = FALSE,
  tidy.opts = list(
    keep.blank.line = TRUE,
    width.cutoff = 150
  ),
  options(width = 150),
  eval = TRUE
)

In this vignette, we explore the effect of using normalized specific amount (NSA), relative specific amount (RSA), and relative amount (Acup) data, and their log transformations on CPA. We also discuss the underlying reasons why the different transformations can yield different results, which may be useful in choosing the appropriate transformation for a particular protein distribution. We illustrate using two mixtures: Cyto with Lyso and Cyto with Nuc. We begin by creating the NSA profiles for the reference compartments, which we will use for further transformations and to create mixtures.

Applying CPA to relative specific amount (RSA), normalized specific amount (NSA), or relative amount (Acup) data derived from simulated mixtures of Cyto and Lyso

For clarity of presentation, we simplify embedded data sets by dropping experiment-level notation:

library(protlocassign)
data(protNSA_AT5tmtMS2)
data(totProtAT5)
protNSA <- protNSA_AT5tmtMS2
totProt <- totProtAT5
data(markerListJadot)
refLocationProfilesNSA <- locationProfileSetup(profile=protNSA, 
                            markerList=markerListJadot, numDataCols=9)
refLocationProfilesAcup <- AcupFromNSA(refLocationProfilesNSA, 
                            NstartMaterialFractions=6, totProt=totProt)

Now create markers using refLocationProfilesAcup compartments Cyto (row 1) and Lyso (row 4) in the standard way, and transform the mixtures to relative specific amounts:

i=1
j=4
mixProt1Prot4Acup <- proteinMix(refLocationProfilesAcup, Loc1=i, Loc2=j)
mixProt1Prot4RSA <- RSAfromAcup(Acup=mixProt1Prot4Acup, 
                            NstartMaterialFractions=6, totProt=totProt)
# increment.prop is not specified so we use the default value of 0.1

As in vignette 3, we display the Acup and RSA profiles for these Cyto-Lyso mixtures:

round(mixProt1Prot4Acup, digits=3)
round(mixProt1Prot4RSA, digits=3)

Next, obtain CPA estimates using the RSA-transformed mixtures and RSA-transformed references profiles. We see that the CPA estimates match the proportions we used to create the mixtures:

refLocationProfilesRSA <- RSAfromNSA(NSA=refLocationProfilesNSA, 
                          NstartMaterialFractions=6,
                          totProt=totProt)
mixProt1Prot4CPAfromRSA <- fitCPA(profile=mixProt1Prot4RSA, 
                      refLocationProfiles=refLocationProfilesRSA, 
                      numDataCols=9)
round(mixProt1Prot4CPAfromRSA, digits=3)

We may obtain the normalized specific amounts transformation of mixProtiProtjRSA by using the NSAfromRSA function:

mixProt1Prot4NSA <- NSAfromRSA(mixProt1Prot4RSA)
round(mixProt1Prot4NSA, digits=3)

We next apply the CPA routine to these normalized specific amount (NSA) profiles:

mixProt1Prot4CPAfromNSA <- fitCPA(profile=mixProt1Prot4NSA,
                             refLocationProfiles=refLocationProfilesNSA,
                             numDataCols=9)
round(mixProt1Prot4CPAfromNSA, digits=3)

Note that the results assign the simulated multi-compartment proteins to the correct Cyto and Lyso compartments, but the estimates of the proportions deviate somewhat from those used to for the simulations.

Now, instead of transforming the Acup mixtures to RSA's or using NSA's, what happens if we just use the Acup mixtures themselves, i.e., the relative amounts? There are significant departures of the CPA estimates, including assignments to compartments not used in the simulations from the proportions used to create the mixtures:

mixProt1Prot4CPAfromAcup <- 
                  fitCPA(profile=mixProt1Prot4Acup,
                          refLocationProfiles=refLocationProfilesAcup, 
                         numDataCols=9)
round(mixProt1Prot4CPAfromAcup, digits=3)

Displaying the Acup values using refLocationProfilesAcup helps explain these discrepancies. Note that the three Nyc columns, which are important for classifying lysosomal proteins, are very small, which effectively down-weights their importance in the CPA procedure:

round(refLocationProfilesAcup, digits=4)

Plots of the CPA estimated vs actual mixture proportions can be generated as described in the previous vignette. As before, the x-coordinate represents the theoretical distribution based on simulation parameters and the y-coordinate represents the predicted values based on CPA. The assignment errors are show in parentheses.

par(mfrow=c(1,3))
# In the following, the argument increment.prop is not specified so
#  we use the default value of 0.1
mixturePlot(mixProtiProtjCPA=mixProt1Prot4CPAfromRSA, 
            NstartMaterialFractions=6, Loc1=i, Loc2=j, 
            errorReturn = TRUE, subTitle="RSA")
mixturePlot(mixProtiProtjCPA=mixProt1Prot4CPAfromNSA, 
            NstartMaterialFractions=6, Loc1=i, Loc2=j,
            errorReturn = TRUE, subTitle="NSA")
mixturePlot(mixProtiProtjCPA=mixProt1Prot4CPAfromAcup, 
            NstartMaterialFractions=6, Loc1=i, Loc2=j, 
            errorReturn = TRUE, subTitle="Acup")

We may obtain only the areas indicating the prediction error by using the mixtureAreaError function.

# As discussed earlier, the argument increment.prop=0.1 does not
#  need to be specified because it is the default
#  and is consistent with the previously generated mixtures.
mixtureAreaError(mixProtiProtjCPA=mixProt1Prot4CPAfromRSA, 
            NstartMaterialFractions=6, Loc1=i, Loc2=j)
mixtureAreaError(mixProtiProtjCPA=mixProt1Prot4CPAfromNSA, 
            NstartMaterialFractions=6, Loc1=i, Loc2=j)
mixtureAreaError(mixProtiProtjCPA=mixProt1Prot4CPAfromAcup, 
            NstartMaterialFractions=6, Loc1=i, Loc2=j)

Applying CPA to log transformations of Cyto and Lyso mixtures

As an alternative, let us apply log2 transformations to the different types of profile data. The log transformation results in a marked improvement for the Acup data and a modest improvement for the NSA data, and poorer results for the RSA data. In all cases, the log transformed data do not provide CPA estimates that are as accurate as were obtained using the RSA transformation alone:

eps <- 0.001


mixProt1Prot4CPAfromRSAlog2 <- fitCPA(profile=log2(mixProt1Prot4RSA + eps), 
                      refLocationProfiles=log2(refLocationProfilesRSA + eps), 
                      numDataCols=9)
round(mixProt1Prot4CPAfromRSAlog2, digits=3)

mixProt1Prot4CPAfromNSAlog2 <- 
            fitCPA(profile=log2(mixProt1Prot4NSA + eps),
                refLocationProfiles=log2(refLocationProfilesNSA + eps),
                numDataCols=9)
round(mixProt1Prot4CPAfromNSAlog2, digits=3)

mixProt1Prot4CPAfromAcupLog2 <- 
            fitCPA(profile=log2(mixProt1Prot4Acup + eps),
              refLocationProfiles=log2(refLocationProfilesAcup + eps),
              numDataCols=9)
round(mixProt1Prot4CPAfromAcupLog2, digits=3)

Following are plots of the CPA estimated vs actual mixture proportions for simulated proteins where the different profiles (RSA, NSA, and Acup) are log2-transformed.

par(mfrow=c(1,3))
mixturePlot(mixProtiProtjCPA=mixProt1Prot4CPAfromRSAlog2, 
            NstartMaterialFractions=6,
            Loc1=i, Loc2=j, errorReturn = TRUE,
            subTitle="Log2 RSA")
mixturePlot(mixProtiProtjCPA=mixProt1Prot4CPAfromNSAlog2, 
            NstartMaterialFractions=6, 
            Loc1=i, Loc2=j, errorReturn = TRUE,
            subTitle="Log2 NSA")
mixturePlot(mixProtiProtjCPA=mixProt1Prot4CPAfromAcupLog2, 
            NstartMaterialFractions=6, Loc1=i, Loc2=j,
            errorReturn = TRUE, subTitle="Log2 Acup")

Applying CPA to RSA, NSA, or Acup data derived from simulated mixtures of Cyto and Nuc

In the previous sections, we showed that one can apply CPA to Cyto-Lyso mixtures using a range of transformations, and we saw that the best results were obtained using RSA values, and the worst results from using relative amounts (Acup transformations). Then we saw that log-transformations of the relative amounts improved the quality of the estimates considerably but decreased accuracy of RSA estimates.

In this section we consider Cyto and Nuc mixtures, which we generate as described above. We shall see that here, Acup-transformed values produce good CPA estimates, just as RSA-transformed values. The reason is that, unlike with Lyso, the Nuc (and Cyto) profiles do not depend on the Nyc portions of the values, and thus the estimated subcellular residence proportions do not suffer from the extremely small Nyc profile values.

Here are the CPA estimates from RSA-transformed profiles:

i=1  # Cyto
j=6  # Nuc
mixProt1Prot6Acup <- proteinMix(refLocationProfilesAcup, Loc1=i, Loc2=j)
mixProt1Prot6RSA <- RSAfromAcup(Acup=mixProt1Prot6Acup, 
                          NstartMaterialFractions=6, totProt=totProtAT5)


mixProt1Prot6CPAfromRSA <- fitCPA(profile=mixProt1Prot6RSA, 
            refLocationProfiles=refLocationProfilesRSA, numDataCols=9)
round(mixProt1Prot6CPAfromRSA, digits=3)

The simulated proportions are estimated very accurately.

Following Vignette 3, we may see what happens if we apply the CPA routine to profiles containing NSA data:

mixProt1Prot6NSA <- NSAfromRSA(mixProt1Prot6RSA)
mixProt1Prot6CPAfromNSA <- 
                    fitCPA(profile=mixProt1Prot6NSA,
                    refLocationProfiles=refLocationProfilesNSA, 
                    numDataCols=9)
round(mixProt1Prot6CPAfromNSA, digits=3)

The estimates of the proportions deviate somewhat from those used to generate the mixtures.

Now do this using the relative amounts for markers (“Acup markers”) and simulated protein mixtures (Acup) instead of RSA-transformed values.

mixProt1Prot6CPAfromAcup <- fitCPA(profile=mixProt1Prot6Acup,
                             refLocationProfiles=refLocationProfilesAcup, 
                             numDataCols=9)

Note that, unlike with the Cyto-Lyso mixture, the estimates using Acup profiles of the Cyto-Nuc mixture are very accurate, and superior to those using NSA profiles:

round(mixProt1Prot6CPAfromAcup, digits=3)

The CPA for these different transformations can be plotted versus the true proportions as before:

par(mfrow=c(1,3))
mixturePlot(mixProtiProtjCPA=mixProt1Prot6CPAfromRSA, 
            NstartMaterialFractions=6, Loc1=i, Loc2=j,
            errorReturn = TRUE, subTitle="RSA")
mixturePlot(mixProtiProtjCPA=mixProt1Prot6CPAfromNSA, 
            NstartMaterialFractions=6, Loc1=i, Loc2=j,
            errorReturn = TRUE, subTitle="NSA")
mixturePlot(mixProtiProtjCPA=mixProt1Prot6CPAfromAcup, 
            NstartMaterialFractions=6, Loc1=i, Loc2=j,
            errorReturn = TRUE, subTitle="Acup")

Applying CPA to log transformations of Cyto and Nuc mixtures

Now try a log2 transformation on different types of simulated protein and marker profiles (RSA's, NSA's, and Acup). Note that in all cases, the CPA values using log-transformed data are less accurate than when using untransformed data.

eps <- 0.001

mixProt1Prot6CPAfromRSAlog2 <- 
           fitCPA(profile=log2(mixProt1Prot6RSA + eps), 
                refLocationProfiles=log2(refLocationProfilesRSA + eps),
                numDataCols=9)
round(mixProt1Prot6CPAfromRSAlog2, digits=3)

mixProt1Prot6CPAfromNSAlog2 <- 
           fitCPA(profile=log2(mixProt1Prot6NSA + eps),
                refLocationProfiles=log2(refLocationProfilesNSA + eps), 
                numDataCols=9)
round(mixProt1Prot6CPAfromNSAlog2, digits=3)


mixProt1Prot6CPAfromAcupLog2 <- 
           fitCPA(profile=log2(mixProt1Prot6Acup + eps),
               refLocationProfiles=log2(refLocationProfilesAcup + eps), 
               numDataCols=9)
round(mixProt1Prot6CPAfromAcupLog2, digits=3)

Following are plots of the CPA estimated vs actual proportions using the log2-transformed profiles.

par(mfrow=c(1,3))
mixturePlot(mixProtiProtjCPA=mixProt1Prot6CPAfromRSAlog2, 
            NstartMaterialFractions=6, Loc1=i, Loc2=j,
            errorReturn = TRUE, subTitle="Log2RSA")
mixturePlot(mixProtiProtjCPA=mixProt1Prot6CPAfromNSAlog2, 
            NstartMaterialFractions=6, Loc1=i, Loc2=j,
            errorReturn = TRUE, subTitle="Log2NSA")
mixturePlot(mixProtiProtjCPA=mixProt1Prot6CPAfromAcupLog2, 
            NstartMaterialFractions=6, Loc1=i, Loc2=j,
            errorReturn = TRUE, subTitle="Log2Acup")

Transformation fitting error heatmaps

We may visualize the area-based errors for all pairwise mixtures using the mixtureHeatMap function, which requires prior installation of the plot.matrix R library. For each pair of compartments, this function first creates the mixtures as described earlier using the Acup markers. Then it computes the three transformations we have discussed, as well as the log2 transformations of these. These values are presented as a 2 by 3 array, with the three transformations as columns (RSA, left; NSA, center; Acup, right). The top row uses the original values (identity transformation), and the bottom row uses a log2 transformation of these values. The prediction errors are listed in each box with larger errors indicated by darker colors. These 2 by 3 heat maps are arranged in an upper triangular array, with each entry corresponding to a mixture of the indicated row and column compartments.

This function requires additional packages which should be installed prior to running the mixtureHeatMap function:

install.packages(c("plot.matrix", "viridis", "grid", "gridExtra"))

In addition to plotting a heat map, the mixtureHeatMap function returns a two by three matrix of total errors, which comprise a sum across all 28 tables, which we write to a variable named errorMatAll:

par(mfrow=c(1,1))
errorMatAll <- mixtureHeatMap(Acup=refLocationProfilesAcup, totProt=totProt)

We can view these total errors as follows:

errorMatAll 

Alternatively, we can visualize errorMatAll as a heatmap, with darker colors indicating greater errors:

#library(plot.matrix)
#library("viridis")
op <- par(mar=c(4,4,1,1))  # save default parameters in variable op

col <- rev(viridis::magma(12))
plot(errorMatAll, col=col, breaks=seq(0, 12, 1), key=NULL, main="",
     axis.col=NULL, axis.row=NULL, 
     xlab="RSA                        NSA                       Acup", 
     ylab="Log2 Identity", digits=2, cex=2, cex.lab=1.1)
par(op) # restore default parameters

These heatmaps demonstrate that for all combinations of compartments using relative specific amounts without any further log transformation for CPA results in the most accurate estimates of the true mixtures.

Plotting all pairs of mixtures

We may use the function mixturePlotPanel to make these plots for all 8*7/2 = 28 possible pairs of mixtures. The function does this for CPA values using different types of mixture profiles, specifying with the argument fitType (e.g., fitType = “RSA”, “NSA” or “Acup”). An option is also available to transform either of these using y = log2(x + eps), where eps is a small number; by default, eps = 0.001, by specifying log2Transf=TRUE. The function also can output area-based errors for all 28 mixture pairs by specifying the option errorReturn = T. causes the function to return a table of all area-based errors. If specifying this option, we typically write the output to a data frame, which is named errorAll in the example below.

We execute the function as follows: (one may first need to open a 7 by 11 window using the "windows" command):

# If the function produces a figure margins error message, 
#    open a 7 by 11 window:
# windows(width=7, height=11)

errorAllRSAlinear <- mixturePlotPanel(refLocationProfilesAcup=
                                        refLocationProfilesAcup, 
              totProt=totProtAT5, NstartMaterialFractions=6, errorReturn = T, 
              fitType="RSA", log2Transf=FALSE)

We then can display a table of the errors.

errorAllRSAlinear

By summing the third column of errorAll (the errors), we obtain a global measure of error.

sum(errorAllRSAlinear[,3])

Appendix: CPA plots for all combinations of fit type and functional transformation

Here we present code that will produce the remaining combinations of fit type (RSA, NSA, and Acup) and log transformation (True or False). First is RSA with log-transformed values (linear RSA having been plotted previously in the vignette).

fitType <- "RSA"
log2Transf <- TRUE


errorAllRSAlog2 <- mixturePlotPanel(refLocationProfilesAcup=
              refLocationProfilesAcup, totProt=totProtAT5, 
              NstartMaterialFractions=6, errorReturn = TRUE, 
              fitType=fitType, log2Transf=log2Transf, eps=0.001)



fitType <- "NSA"
log2Transf <- FALSE

errorAllNSAlinear <- mixturePlotPanel(refLocationProfilesAcup=
                               refLocationProfilesAcup,
              totProt=totProtAT5, NstartMaterialFractions=6,
              errorReturn = T, 
              fitType=fitType, log2Transf=log2Transf)

fitType <- "NSA"
log2Transf <- TRUE


errorAllNSAlog2 <- mixturePlotPanel(refLocationProfilesAcup=
                   refLocationProfilesAcup, totProt=totProtAT5, 
                   NstartMaterialFractions=6, errorReturn = TRUE, 
                   fitType=fitType, log2Transf=log2Transf, eps=0.001)

fitType <- "Acup"
log2Transf <- FALSE


errorAllAcupLinear <- mixturePlotPanel(refLocationProfilesAcup=
              refLocationProfilesAcup, totProt=totProtAT5, 
              NstartMaterialFractions=6, errorReturn = TRUE, 
              fitType=fitType, log2Transf=log2Transf)

fitType <- "Acup"
log2Transf <- TRUE


errorAllAcupLog2 <- mixturePlotPanel(refLocationProfilesAcup=
              refLocationProfilesAcup, totProt=totProtAT5, 
              NstartMaterialFractions=6, errorReturn = TRUE, 
              fitType=fitType, log2Transf=log2Transf, eps=0.001)

If one needs to prepare the plots as pdf files which are written to an external directory, first set the working directory, e.g.,

setwd("c:\\temp\\myProteinOutput")

To output the plots to a pdf file, set things up as follows:

pdf(file="CPA assignProts.pdf", width=7, height=11)

Then issue the desired calls to mixturePlotsPanel as above. All plots requested will be written to the same file. Finally, close out the pdf file to complete writing it out:

dev.off()

We may show the pairwise and overall errors for all transformations as follows:

errorAllRSAlinear
sum(errorAllRSAlinear[,3])

errorAllRSAlog2
sum(errorAllRSAlog2[,3])

errorAllNSAlinear
sum(errorAllNSAlinear[,3])

errorAllNSAlog2
sum(errorAllNSAlog2[,3])

errorAllAcupLinear
sum(errorAllAcupLinear[,3])

errorAllAcupLog2
sum(errorAllAcupLog2[,3])


mooredf22/protlocassign0p1p1 documentation built on Feb. 7, 2022, 1:55 a.m.