R/model-Pnear-run.R

Defines functions Pnear

Documented in Pnear

#' Compute the Pnear Quality Metric for a RMSD Funnel
#'
#nolint start
#' @description
#' In protein structure prediction a key measure of accuracy is how well does
#' the predicted energy or score correlate with the distance to a native
#' conformation. A common distance measure is the all-atom root mean squared
#' distance (RMSD). A challenge, however, is that we don't expect that far away
#' from the native conformation, the energy should be discriminating, so we want
#' to bias the assessment to those near the native conformation. We therefore
#' The Pnear metric defined in [(Bhardwaj, et al., Nature,
#' 2016)](https://www.nature.com/articles/nature19791)
#' measures the how "funnel-like" a score-vs-rmsd plot is.
#' [Pnear Rosetta Documentation](https://github.com/RosettaCommons/main/blob/master/tests/benchmark/util/quality_measures.py#L268")
#nolint end
#' @details
#' ```
#' # subtract off the min-score as is done in the Rosetta Code
#' scores = scores - min(scores)
#'
#' # write down the equation in more code-like notation
#' Pnear <- Sum_i[exp(-RMSD[i]^2/lambda^2)*exp(-scores[i]/k_BT)] /
#'          Sum_i[exp(-scores[i]/k_BT)]
#'
#' # combine the terms in the first exponential
#' Pnear = Sum_i[exp(-RMSD[i]^2/lambda^2 - scores[i]/k_BT)] /
#'         Sum_i[exp(-scores[i]/k_BT)]
#'
#' let x_i  = RMSD[i]^2/lambda^2 * k_BT/scores[i]
#'     beta = -scores[i]
#'
#' Pnear = Sum_i[exp(-RMSD[i]^2/lambda^2*k_BT/scores])]
#'
#' # Use the log-sum-exponential trick
#' log(Pnear) =   log_sum_exp(-RMSD[i]^2/lambda^2 - scores[i]/k_BT)
#'              - log_sum_exp(-scores[i]/k_BT)
#' ```
#'
#' @note Unlike the Conway discrimination score, the PNear calculation uses no
#' hard cutoffs. This is advantageous for repeated testing: if the scatter of
#' points on the RMSD plot changes very slightly from run to run, the PNear
#' value will only change by a small amount, whereas any metric dependent on
#' hard cutoffs could change by a large amount if a low-energy point crosses an
#' RMSD threshold.
#'
#' @author Vikram K. Mulligan \email{vmulligan@@flatironinstitute.org} adapted
#'   from Rosetta.
#'
#' @param score a vector of scores e.g. Rosetta energies e.g. in the Ref2015.
#' @param rmsd root mean squared deviation values for e.g. backbone atoms
#' @param lambda Lambda is a value in Angstroms indicating the breadth of the
#'   Gaussian used to define "native-like-ness". The bigger the value, the more
#'   permissive the calculation is to structures that deviate from native.
#'   Typical values for peptides range from 1.5 to 2.0, and for proteins from
#'   2.0 to perhaps 4.0.
#' @param kbt The value of k_B*T, in energy units, determines how large an
#'   energy gap must be in order for a sequence to be said to favor the
#'   native state. The default value, 0.62, should correspond to physiological
#'   temperature for ref2015 or any other scorefunction with units of kcal/mol.
#' @param verbose give verbose output.
#' @returns `numeric` value.
#'
#' @examples
#'\dontrun{
#'  Pnear(score = score_a, rmsd = rmsd_a)
#'}
#'
#' @export
Pnear <- function(
  score,
  rmsd,
  lambda = 1.5,
  kbt = 0.62,
  verbose = FALSE) {

  nscores <- length(score)
  if (nscores == 0) {
    stop("ERROR: length(nscores) == 0")
  }

  if (nscores != length(rmsd)) {
    stop(paste0(
      "ERROR: Length of scores and rsmds do not match:\n",
      "ERROR:    length(scores) == '", length(score), "'\n",
      "ERROR:    length(rmsds)  == '", length(rmsd),  "'\n"))
  }

  if (!all(rmsd >= 0)) {
    stop(paste0("ERROR: All RMSD values must be greater or equal to zero."))
  }

  if (!is.numeric(kbt) || kbt <= 0) {
    stop(paste0("ERROR: kbt must be great than zero"))
  }

  if (!is.numeric(lambda) || lambda <= 0) {
    stop(paste0("ERROR: lambda must be great than zero"))
  }

  score <- score - min(score)
  # log-sum-exponent trick is to make it more numerically stable
  # This may be important if there are lots of high-scoring points
  exp(
    matrixStats::logSumExp(-rmsd^2 / lambda^2 - score / kbt) -
      matrixStats::logSumExp(-score / kbt))
}
maomlab/BayesPharma documentation built on Aug. 24, 2024, 8:45 a.m.