knitr::opts_chunk$set(
  #collapse = TRUE,
  comment = "#>",
  fig.width = 4,
  fig.height = 4,
  message = FALSE,
  warning = FALSE,
  tidy.opts = list(
    keep.blank.line = TRUE,
    width.cutoff = 150
    ),
  options(width = 150),
  eval = TRUE
)

Introduction

In Vignette 1 (“Getting Started”), we illustrated the principles of constrained proportional assignment using protein profiles that represent mass spectrometry data from a set of subcellular fractions in the form of normalized specific amounts (NSAs). NSA profiles have equivalent amounts of total protein analyzed per fraction, with the sum of all fractions constrained to 1. This vignette describes how to use functions in the protlocassign package to transform NSA profiles in a manner that is more appropriate for inferring proportional residence in subcellular compartments.

Setting up the data and reference protein files, and transforming to Relative Specific Amounts (RSAs)

As explained in the Vignette 1, we will demonstrate using two R data sets that are included in the protlocassign package. One, protNSA_AT5tmtMS2, consists of row names that indicate protein identifiers, each of which is followed by data describing the identifier profile across the nine normalized specific amounts derived from a subcellular fractionation experiment. The other data set, markerListJadot, consists of a list of reference proteins and their associated known subcellular compartments. As before, to run the program, the protlocassign library must be installed.

library(protlocassign)

In Vignette 1, we use protNSA_AT5tmtMS2 and an untransformed average of reference protein profiles in the form of NSAs for each compartment to conduct CPA. However, it may be advantageous to transform the data prior to conducting CPA to yield a more accurate prediction of cellular location. For this purpose, we express profile data as relative specific amounts (RSAs). As explained in the main text and elaborated in Vignette 3, RSA is the ratio of two ratios: the numerator is the amount of a given protein in a particular fraction divided by the amount of that given protein in the starting material while the denominator is amount of total protein in a particular fraction divided by the amount of total protein in the starting material. The RSA describes the fold-enrichment (RSA>1) or depletion (RSA<1) of a protein during the fractionation process, and is analogous to the relative specific activity term used in classical analytical subcellular fractionation. Be aware that to perform this transformation, one needs to have estimates of all these quantities, and this was incorporated into our experimental design. In our example, the first six fractions (the differential fractions) can be used to estimate amounts in the starting material. We also measured total protein in each fraction, and these are contained in the 9-element vector totProtAT5 which is preloaded in protlocassign. Note that the order and numbers of the measurements for total protein (e.g., N, M, L1, L2, P, S, Nyc1, Nyc2 and Nyc3 in totProtAT5) must correspond to those in the data set containing individual protein profiles (e.g., protNSA_AT5tmtMS2). For clarify of presentation, we rename totProtAT5 and protProfileNSA_AT5tmtMS2 to totProt and protProfileNSA, respectively.

data(protNSA_AT5tmtMS2)
data(totProtAT5)
protNSA <- protNSA_AT5tmtMS2 
str(protNSA)
totProt <- totProtAT5
round(totProt, digits=4)

The function RSAfromNSA calculates transformed profiles from individual and total protein measurements. This requires specifying which values are used to estimate the amount in the starting material (typically the homogenate) and the values used to construct the profile. In our case, the first six fractions of the nine-fraction profile are summed to estimate the starting material. Our code requires that the fractions representing the starting material are contiguous and are located at the beginning of the profile. Note that the function RSAfromNSA can use protein profiles expressed either as NSAs or as specific amounts. Thus we select the first nine columns of protNSA:

protRSA  <- RSAfromNSA(NSA=protNSA[,1:9],
                         NstartMaterialFractions=6, totProt=totProt)
dim(protRSA)
str(protRSA)

Since there is additional information in the last two columns of protNSA that we want to include in the new file, specifically the numbers of spectra and peptides (Nspectra and Nseq), we add them to the output as follows:

protRSA <- data.frame(protRSA, protNSA[,10:11])
#note data frame is being overwritten
dim(protRSA)
str(protRSA)

We also need to transform the profiles of the markers for each compartment. As in Vignette 1, we use the function locationProfilesetup to average the profiles (which must be normalized specific amounts) to obtain profiles for the reference proteins:

data(markerListJadot)
refLocationProfilesNSA <- locationProfileSetup(profile=protNSA,
                          markerList=markerListJadot, numDataCols=9)
round(refLocationProfilesNSA, digits=4)

We then use RSAfromNSA to transform these reference profiles.

refLocationProfilesRSA <- RSAfromNSA(NSA=refLocationProfilesNSA, NstartMaterialFractions=6,
       totProt=totProt)
round(refLocationProfilesRSA, digits=4)

We computed the RSA reference profiles above from the NSA reference profiles. Note that, as an alternative, one could compute the RSA reference profiles directly from protRSA. These two approaches yield similar but non-identical results, and we typically use the first procedure to generate RSA-transformed reference location profiles.

refLocationProfilesRSA_2 <- locationProfileSetup(profile=protRSA,
                          markerList=markerListJadot, numDataCols=9)
round(refLocationProfilesRSA_2, digits=4)
# we use the `as.matrix` function for display purposes in the vignette
as.matrix(all.equal(refLocationProfilesRSA, refLocationProfilesRSA_2, 
                    precision=0, countEQ=TRUE))

Plotting RSA-transformed profiles, and finding RSA-based constrained proportional assignments

As in Vignette 1, we can plot reference profiles, but this time using the RSA-transformed data. For example, here is a plot for all markers:

loc.list <- rownames(refLocationProfilesRSA)
n.loc <- length(loc.list)
par(mfrow=c(4,2))
for (i in 1:n.loc) {
  markerProfilePlot(refLoc=loc.list[i], profile=protRSA,
                     markerList=markerListJadot,
                     refLocationProfiles=refLocationProfilesRSA, ylab="RSA")
  }

Now we can run the CPA routine on the RSA-transformed levels; note that this may take several minutes to complete. The result is a matrix with protein identifiers as row names, and data indicating the estimated proportional assignments of each protein among the eight subcellular locations.

protCPAfromRSA <- fitCPA(profile=protRSA,
                        refLocationProfiles=refLocationProfilesRSA, 
                        numDataCols=9)
str(protCPAfromRSA)

Note that the protein "AIF1" (protein 356) has all missing values, which is why the spg function returns an error for that one protein.

The protPlotfun function is designed to plot profiles of eight subcellular locations. If a data set has more than eight of these, it will be necessary to modify the code to accommodate the larger number.

Now we plot the results for protein TLN1:

protPlotfun(protName="TLN1", profile=protRSA, numDataCols=9,
                        refLocationProfiles=refLocationProfilesRSA,
                        assignPropsMat=protCPAfromRSA,
                        yAxisLabel="Relative Specific Amount")

The x-axis represents the nine fractions, which are N, M, L1, L2, P, S, Nyc.1, Nyc.2, and Nyc.3. In each of the eight plots, the red line is the average profile of the protein. The dashed yellow-black lines show the expected profile for a protein entirely resident in the respective subcellular location. In this set of plots, we see that the CPA procedure assigns a 35 percent residence proportion to plasma membrane and 53 percent residence to cytosol. As in Vignette 1, the observed red profile is a weighted mixture of the expected yellow-black lines.

References

Jadot, M.; Boonen, M.; Thirion, J.; Wang, N.; Xing, J.; Zhao, C.; Tannous, A.; Qian, M.; Zheng, H.; Everett, J. K., Accounting for protein subcellular localization: A compartmental map of the rat liver proteome. Molecular & Cellular Proteomics 2017, 16, (2), 194-212.

Tannous, A.; Boonen, M.; Zheng, H.; Zhao, C.; Germain, C. J.; Moore, D. F.; Sleat, D. E.; Jadot, M.; Lobel, P., Comparative Analysis of Quantitative Mass Spectrometric Methods for Subcellular Proteomics. J Proteome Res 2020, 19, (4), 1718-1730



mooredf22/protlocassign0p1p1 documentation built on Feb. 7, 2022, 1:55 a.m.