Nothing
#' @title Analyze TPP-TR experiment
#'
#' @description Performs analysis of a TPP-TR experiment by invoking routines
#' for data import, data processing, normalization, curve fitting, and
#' production of the result table.
#'
#' @references Savitski, M. M., Reinhard, F. B., Franken, H., Werner, T.,
#' Savitski, M. F., Eberhard, D., ... & Drewes, G. (2014). Tracking cancer
#' drugs in living cells by thermal profiling of the proteome. Science,
#' 346(6205), 1255784.
#'
#' Franken, H, Mathieson, T, Childs, D. Sweetman, G. Werner, T. Huber, W. & Savitski, M. M. (2015),
#' Thermal proteome profiling for unbiased identification of drug targets and detection of downstream effectors.
#' Nature protocols 10(10), 1567-1593.
#'
#' @return A data frame in which the fit results are stored row-wise for each
#' protein.
#'
#' @details Invokes the following steps: \enumerate{ \item Import data using the
#' \code{\link{tpptrImport}} function. \item Perform normalization (optional)
#' using the \code{\link{tpptrNormalize}} function. To perform normalization,
#' set argument \code{normalize=TRUE}. The normalization will be filtered
#' according to the criteria specified in the \code{normReqs} argument (also
#' see the documentation of \code{\link{tpptrNormalize}} and
#' \code{\link{tpptrDefaultNormReqs}} for further information). \item Fit
#' melting curves using the function \code{\link{tpptrCurveFit}}. \item
#' Produce result table using the function \code{\link{tpptrAnalyzeMeltingCurves}}.
#' \item Export results to Excel using the function \code{\link{tppExport}}. }
#'
#' The default settings are tailored towards the output of the python package
#' isobarQuant, but can be customized to your own dataset by the arguments
#' \code{idVar, fcStr, naStrs, qualColName}.
#'
#' If \code{resultPath} is not specified, the location of the first input file
#' specified in \code{configTable} will be used. If the input data are not
#' specified in \code{configTable}, no result path will be set. This means
#' that no output files or melting curve plots are produced and
#' \code{analyzeTPPTR} just returns the results as a data frame.
#'
#' The function \code{analyzeTPPTR} reports intermediate results to the
#' command line. To suppress this, use \code{\link{suppressMessages}}.
#'
#' The \code{configTable} argument is a dataframe, or the path to a
#' spreadsheet (tab-delimited text-file or xlsx format). Information about
#' each experiment is stored row-wise. It contains the following columns:
#' \itemize{ \item{\code{Path}:}{location of each datafile. Alternatively,
#' data can be directly handed over by the \code{data} argument.}
#' \item{\code{Experiment}: }{unique experiment names.}
#' \item{\code{Condition}: }{experimental conditions of each dataset.}
#' \item{Label columns: } each isobaric label names a column that contains the
#' temperatures administered for the label in the individual experiments. }
#'
#' The argument \code{methods} can be one of the following:
#' More than one method can be specified. For example, parametric testing of
#' melting points and nonparametric spline-based goodness-of-fit tests can be
#' performed sequentially in the same analysis. The results are then written
#' to separate columns of the output table.
#'
#' If \code{methods} contains "meltcurvefit", melting curve plots will be
#' stored in a subfolder with name \code{Melting_Curves} at the location
#' specified by \code{resultPath}.
#' If \code{methods} contains "splinefit", plots of the natural spline fits will be
#' stored in a subfolder with name \code{Spline_Fits} at the location
#' specified by \code{resultPath}.
#'
#' The argument \code{nCores} could be either 'max' (use all available cores)
#' or an upper limit of CPUs to be used.
#'
#' If \code{doPlot = TRUE}, melting curve plots are generated separately for
#' each protein and stored in separate pdfs.
#' Each file is named by the unique protein identifier. Filenames are
#' truncated to 255 characters (requirement by most operation systems).
#' Truncated filenames are indicated by the suffix "_truncated[d]", where [d]
#' is a unique number to avoid redundancies.
#' All melting curve plots are stored in a subfolder with name
#' \code{Melting_Curves} at the location specified by \code{resultPath}.
#'
#' If the melting curve fitting procedure does not converge, it will be
#' repeatedly started from perturbed starting parameters (maximum iterations
#' defined by argument \code{maxAttempts}).
#'
#' Argument \code{splineDF} specifies the degrees of freedom for natural
#' spline fitting. As a single numeric value, it is directly passed on to the
#' \code{splineDF} argument of \code{splines::ns}. Experience shows that
#' \code{splineDF = 4} yields good results for TPP data sets with 10
#' temperature points. It is also possible to provide a numeric vector. In
#' this case, splines are fitted for each entry and the optimal value is
#' chosen per protein using Akaike's Information criterion.
#'
#' @examples
#' data(hdacTR_smallExample)
#' tpptrResults <- analyzeTPPTR(configTable = hdacTR_config, data = hdacTR_data,
#' methods = "meltcurvefit", nCores = 1)
#'
#' @param configTable dataframe, or character object with the path to a file,
#' that specifies important details of the TPP-TR experiment. See Section
#' \code{details} for instructions how to create this object.
#' @param data single dataframe, or list of dataframes, containing fold change
#' measurements and additional annotation columns to be imported. Can be used
#' instead of specifying the file path in the \code{configTable} argument.
#' @param resultPath location where to store melting curve plots, intermediate
#' results, and the final results table.
#' @param methods statistical methods for modeling melting behavior and detecting
#' significant differences between experimental conditions. Ich more than one
#' method are specified, results will be computed for each and concatenated in
#' the result table (default: meltcurvefit).
#' @param idVar character string indicating which data column provides the
#' unique identifiers for each protein.
#' @param ciStr character string indicating which columns contain confidence
#' intervals for the fold change measurements. If specified, confidence
#' intervals will be plotted around the melting curves.
#' @param fcStr character string indicating which columns contain the actual
#' fold change values. Those column names containing the suffix \code{fcStr}
#' will be regarded as containing fold change values.
#' @param naStrs character vector indicating missing values in the data table.
#' When reading data from file, this value will be passed on to the argument
#' \code{na.strings} in function \code{read.delim}.
#' @param qualColName character string indicating which column can be used for
#' additional quality criteria when deciding between different non-unique
#' protein identifiers.
#' @param normalize perform normalization (default: TRUE).
#' @param normReqs list of filtering criteria for construction of the
#' normalization set.
#' @param ggplotTheme ggplot theme for melting curve plots.
#' @param nCores either a numerical value given the desired number of CPUs, or
#' 'max' to automatically assign the maximum possible number (default).
#' @param startPars start values for the melting curve parameters. Will be
#' passed to function \code{\link{nls}} for curve fitting.
#' @param maxAttempts maximal number of curve fitting attempts if model does not
#' converge.
#' @param plotCurves boolean value indicating whether melting curves should be
#' plotted. Deactivating plotting decreases runtime.
#' @param fixedReference name of a fixed reference experiment for normalization.
#' If NULL (default), the experiment with the best R2 when fitting a melting
#' curve through the median fold changes is chosen as the reference.
#' @param pValMethod Method for p-value computation. Currently restricted to
#' 'robustZ' (see Cox & Mann (2008)).
#' @param pValParams optional list of parameters for p-value computation.
#' @param pValFilter optional list of filtering criteria to be applied before
#' p-value computation.
#' @param verbose print name of each fitted protein to the command lin as a
#' means of progress report.
#' @param xlsxExport boolean value indicating whether to produce result table in
#' .xlsx format (requires package \code{openxlsx} and a zip application to be
#' installed).
#' @param splineDF degrees of freedom for natural spline fitting.
#'
#' @seealso tppDefaultTheme, tpptrImport, tpptrNormalize, tpptrCurveFit,
#' tpptrAnalyzeMeltingCurves
#' @export
analyzeTPPTR <- function(configTable, data = NULL, resultPath = NULL,
methods = c("meltcurvefit", "splinefit"),
idVar = "gene_name", fcStr = "rel_fc_", ciStr = NULL, # settings for data import
naStrs = c("NA", "n/d", "NaN", "<NA>"),
qualColName = "qupm",
normalize = TRUE, normReqs = tpptrDefaultNormReqs(), # settings for cross-experiment normalization
ggplotTheme = tppDefaultTheme(),
nCores = 'max',
startPars = c("Pl" = 0, "a" = 550, "b" = 10), # settings for melting curve fitting
splineDF = c(3:7), # settings for spline fitting
maxAttempts = 500, plotCurves = TRUE,
fixedReference = NULL,
pValMethod = "robustZ", # to do: remove this argument
pValFilter = list(minR2 = 0.8, maxPlateau = 0.3), # settings for robust-z-test
pValParams = list(binWidth = 300),
verbose = FALSE, xlsxExport = TRUE){
## Trigger Warning: using method = "splineFit" is deprecated
if("splinefit" %in% methods)
message("Warning: using method = 'splineFit' is deprecated")
## Initialize variables to prevent "no visible binding for global
## variable" NOTE by R CMD check:
meltcurve_plot = Protein_ID <- NULL
message("This is TPP version ", packageVersion("TPP"),".")
## Import data:
trData <- tpptrImport(configTable=configTable, data=data, idVar=idVar, fcStr=fcStr,
naStrs=naStrs, qualColName=qualColName)
if(!is.null(ciStr)){
# set option to indiacte use of confidence intervals during analysis
options("TPPTR_CI" = TRUE)
trDataCI <- tpptrImport(configTable=configTable, data=data, idVar=idVar, fcStr=ciStr,
naStrs=naStrs, qualColName=qualColName)
stopifnot(all(sapply(names(trData), function(i){all(dim(trData[[i]]) == dim(trDataCI[[i]]))})))
} else {
options("TPPTR_CI" = FALSE)
trDataCI <- NULL
}
# # set plot parameter for curve plotting
# options("TPPTR_plot" = plotCurves)
expInfo <- sapply(trData, annotation)
expNames <- names(trData)
expNum <- length(expNames)
expConds <- sapply(trData, function(s) s@annotation[["condition"]])
expComps <- createComparisonTable(infoTable=expInfo)
## Extract directory from the filenames in config table, if specified:
confgFields <- suppressMessages(importCheckConfigTable(infoTable=configTable,
type="TR"))
files <- confgFields$files
outDirList <- importFct_makeOutputDirs(outDir=resultPath, fNames=files)
flagDoWrite <- outDirList$doWrite
pathDataObj <- outDirList$pathDataObj
pathExcel=resultPath <- outDirList$outDir
if (!flagDoWrite) plotCurves <- FALSE
## Save imported data before normalization:
if (flagDoWrite){
save(list=c("trData", "trDataCI"), file=file.path(pathDataObj, "importedData.RData"))
}
## Normalize data:
if (normalize){
normResults <- tpptrNormalize(data=trData, normReqs = normReqs,
qcPlotTheme = ggplotTheme, qcPlotPath = NULL,
fixedReference = fixedReference)
trDataNormalized <- normResults[["normData"]]
## Save normalized data:
if (flagDoWrite){
save(list=c("trDataNormalized"), file=file.path(pathDataObj,
"normalizedData.RData"))
}
} else {
trDataNormalized <- trData
}
allIDs <- lapply(trDataNormalized, featureNames) %>% unlist %>% unname %>%
unique %>% sort
resultTable <- data.frame(Protein_ID = allIDs, stringsAsFactors = FALSE)
if (any(methods == "splinefit")){
## Preparation: convert eSets to long tables:
fitData <- trDataNormalized %>% tpptrTidyUpESets(., returnType = "exprs")
if (!any(methods == "meltcurvefit")){
annotData <- trDataNormalized %>% tpptrTidyUpESets(., returnType = "featureData")
} else {
# Only append results that will not be appended anyway:
annotData <- NULL
}
splineFitResultTable <- tpptrSplineFitAndTest(data = fitData,
factorsH1 = "condition",
resultPath = resultPath,
doPlot = plotCurves,
splineDF = splineDF,# settings for spline fits
nCores = nCores,
additionalCols = annotData) %>%
mutate(Protein_ID = as.character(Protein_ID))
if (flagDoWrite){
save(list = c("splineFitResultTable"),
file = file.path(pathDataObj, "trResultsSplineFit.RData"))
}
resultTable <- left_join(resultTable, splineFitResultTable )
}
if (any(methods == "meltcurvefit")){
## Fit melting curves:
trDataFitted <- tpptrCurveFit(data=trDataNormalized, dataCI=trDataCI,
resultPath=resultPath,
ggplotTheme=ggplotTheme, doPlot=plotCurves,
startPars=startPars, maxAttempts=maxAttempts,
nCores=nCores, verbose=verbose)
## Save data including parameters of the model fits:
if (flagDoWrite){
save(list=c("trDataFitted"), file=file.path(pathDataObj, "fittedData.RData"))
}
## Analyze melting curves and create result table:
meltCurveResultTable <- tpptrAnalyzeMeltingCurves(data = trDataFitted,
pValMethod = pValMethod,
pValFilter = pValFilter,
pValParams = pValParams) %>%
rename(meltcurve_plot = plot) %>%
mutate(meltcurve_plot = as.character(meltcurve_plot)) %>%
mutate(Protein_ID = as.character(Protein_ID))
if (flagDoWrite){
save(list = c("meltCurveResultTable"),
file = file.path(pathDataObj, "trResultsMeltCurveFit.RData"))
}
resultTable <- suppressMessages(left_join(resultTable, meltCurveResultTable))
}
# Save result table to file if required
if (flagDoWrite){
save(list = c("resultTable"),
file = file.path(pathDataObj, "results_TPP_TR.RData"))
}
## Save result table as xlsx spreadsheet:
if (xlsxExport ){
if (flagDoWrite){
if (expNum > 1){
## Determine background colors for the columns belonging to the same experiment:
compNums <- assignCompNumber_to_expName(compDF=expComps, expNames=expNames)
expColors <- plotColors(expConditions = expConds, comparisonNums = compNums)
} else {
expColors <- NULL
}
# Create output table:
tppExport(tab=resultTable, file=file.path(pathExcel, "results_TPP_TR.xlsx"),
expColors=expColors, expNames=expNames)
} else {
message("Cannot produce xlsx output because no result path is specified.")
}
}
## --------------------------------------------------------------------------------------------
## Create QC plots:
if (flagDoWrite){
pdf(file=file.path(resultPath, "QCplots.pdf"), width=8, height=9)
## 1. QC plot to illustrate median curve fits:
message("Creating QC plots to visualize median curve fits...")
if (normalize){
qcPlotMedianFit <- normResults[["qcPlotObj"]]
suppressWarnings(print(qcPlotMedianFit))
}
message("done.\n")
## 2. QC plot to correlate fold changes before and after normalization between all experiments:
message("Creating QC plots to visualize normalization effects...")
qcPlotCorrelateGroupsRaw <- tppQCPlotsCorrelateExperiments(tppData=trData,
annotStr="Non-normalized Fold Changes",
ggplotTheme=ggplotTheme)
if (normalize){
qcPlotCorrelateGroupsNorm <- tppQCPlotsCorrelateExperiments(tppData=trDataNormalized,
annotStr="Normalized Fold Changes",
ggplotTheme=ggplotTheme)
}
for (pn in names(qcPlotCorrelateGroupsRaw)){
suppressWarnings(print(qcPlotCorrelateGroupsRaw[[pn]]))
if (normalize) suppressWarnings(print(qcPlotCorrelateGroupsNorm[[pn]]))
}
message("done.\n")
if (any(methods == "meltcurvefit")){
tpptrQCplots(resultTab = resultTable, expNames = expNames,
expConditions = expConds, compDF = expComps,
minR2 = pValFilter$minR2, ggplotTheme = ggplotTheme)
}
dev.off()
## --------------------------------------------------------------------------------------------
}
invisible(resultTable)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.