R/RMAError.R

Defines functions combine_and_deduplicate calculateIncrementalRMAError

Documented in calculateIncrementalRMAError combine_and_deduplicate

#' Calculate IncrementalRMA Error
#'
#' @description Calculate the differences between incrementalRMA and canonical RMA.
#'
#' @details
#' IncrementalRMA applies existing parameters to one or more new samples, without using these
#' samples in the parameter estimation. As a result, the expression values produced will be
#' slightly different than using RMA on all samples together. This is because the parameter
#' estimates, including these new samples, will be slightly different. Normally, we would
#' expect this type of difference to be small if the samples are similar to the input or
#' if a sufficiently large training set was used to normalize samples. For instance, both
#' frma and refRMA use large public repositories for calculating parameters. In these
#' cases, adding one new sample should not really change the quantile normalization or
#' row effects in median polish.
#'
#' In the case of small sample sizes used for estimating parameters, a single sample could
#' have more of an impact. However, the purpose of this package is to support precision of
#' gene expression estimates over potential robustness. Earlier work indicated that tissue
#' specificity could play a role in estimation as can processing technique. Therefore, the
#' goal is to produce estimates that are as precise for the given sample set.
#'
#' Given the above discussion, it would be helpful to estimate how different incrementalRMA
#' is from just using RMA against a sample set. If the samples that were used to produce
#' the original estimates are available (in the params) then we can compare the incrementalRMA
#' estimates for the new samples against calculating RMA against the reference set and the
#' new samples. Small differences suggest that the incremental process did not significantly
#' impact the overall results, while providing the needed stability in expression estimates.
#'
#' This function calculates the probeset-level error that occurred when applying incrementalRMA
#' vs. using RMA against the full dataset. For convenience, we will consider this to be the
#' standard error (se.exprs).
#'
#' @note Since this routine reruns RMA against a potentially large set of data, it can be quite
#' computationally demanding.
#'
#' @param exprs Matrix of expression values from incrementalRMA.
#' @param abatch A \code{\link[affy]{AffyBatch-class}} of new samples that were used for incrementalRMA.
#' @param params An incremental parameter list as generated by
#'     \code{\link{parameterizeRMA}}.
#'
#' @return A matrix of probeset-level errors between incrementalRMA and canonical RMA.
#' @export
#'
#' @examples
#' \dontrun{
#' calculateIncrementalRMAError(exprs,  AffyBatch, params=list(referenceCELFiles=()))
#' }
calculateIncrementalRMAError<-function(exprs, abatch, params) {

  stopifnot("AffyBatch of new samples and incremental expression estimates of new samples are not the same." =
              length(intersect(colnames(exprs), affy::sampleNames(abatch))) == ncol(exprs))
  stopifnot("Params does not contain referenceCELFiles" = utils::hasName(params, "referenceCELFiles"))

  # Calculate canonical RMA on the reference CEL files plus the new CEL files. Note: we have to
  # remove duplicates so that we are getting a fair comparison with RMA. Two copies of the same
  # sample would influence the original rma result.
  canonical_rma <- affy::rma(combine_and_deduplicate(abatch, params$referenceCELFiles))

  # Samples to compare are the new samples
  samples_to_compare <- affy::sampleNames(abatch)

  se <- exprs[,samples_to_compare, drop=F] - Biobase::exprs(canonical_rma)[,samples_to_compare, drop=F]


  se
}



#' Combine and De-duplicate AffyBatch objects.
#'
#' @description
#' Having two AffyBatch objects, combine them into a single AffyBatch object, paying attention to duplicates.
#'
#' @details
#' Two AffyBatch objects (esets) can be combined using \code{combine} function. However, if they have
#' several samples in common we don't want the duplicates. This tries to remove duplicates by examining
#' both the names and the expression data for duplication.
#'
#' The result of this effort is to produce a single AffyBatch which combines the two objects, but with
#' duplicates removed.
#'
#' @param new \code{\link[affy]{AffyBatch-class}} of samples to use
#' @param ref Reference \code{\link[affy]{AffyBatch-class}} to merge in non-duplicates for
#'
#' @return An \code{\link[affy]{AffyBatch-class}} combining the two arguments, with duplicates removed.
#' @export
#'
#' @examples
#' \dontrun{
#' combine_and_deduplicate(Dilution, Dilution)
#' }
combine_and_deduplicate<-function(new, ref) {
  stopifnot("new and ref are not AffyBatch objects." = class(new)=="AffyBatch" & class(ref)=="AffyBatch")

  # For each suspected duplicate, check if it is then remove from reference set.
  duplicates <- sapply(Biobase::sampleNames(ref), function(d) {
    d %in% Biobase::sampleNames(new) &&
      identical(Biobase::exprs(new)[,d],Biobase::exprs(ref)[,d])
  })

  Biobase::combine(new, ref[, !duplicates])

}
steveneschrich/IncrementalRMA documentation built on Dec. 23, 2021, 5:32 a.m.