sumR: Approximate Summation of Series

Documented in infiniteSum

#' Approximates the sum of a positive discrete infinite series with a single
#' maximum
#'
#' For series that pass the ratio test, the approximation is analytically
#' guaranteed to have an error that is smaller than epsilon. This can
#' occasionally not happen due to floating point arithmetic. Result is returned
#' in the log scale.
#' @param logFunction The function that returns the series absolute value
#' \ifelse{html}{\out{|a<sub>n</sub>|}}{\eqn{|a_n|}} in
#' the log scale. If it is an alternating series, this is defined in
#' argument `alternate`. Can either be an `R` function or a string
#' indicating one of the precompiled functions. See [precompiled()]
#' for a list of available functions. If defined in `R`, the function's
#' definition must have two arguments. The first argument must be the integer
#' argument equivalent to \eqn{n} in
#' \ifelse{html}{\out{a<sub>n</sub>}}{\eqn{a_n}} and the second must be a vector
#' of numeric parameters.
#' @param parameters A numeric vector with parameters used in `logFunction`.
#' Vectorized summation over various parameter values sets is not implemented.
#' Use [apply()] or their variants to achieve this.
#' @param logL The log of the limit value of
#' \ifelse{html}{\out{a<sub>n+1</sub>/a<sub>n</sub>}}{\eqn{a_{n+1}/a_n}} which
#' must be smaller than 1, or smaller than 0 in the log scale. Ignored if the
#' series is alternating, defined with argument `alternate`. If left as
#' `NULL` and `logFunction` is defined in `R`, the
#' `batches` algorithm with default settings is used. See 'details'.
#' @param alternate Either -1, 0 or 1. If 0 (or `FALSE`), the series is not
#' alternating and positive. Otherwise, the series is alternating where the
#' first element's sign is either 1 or -1, as entered in this parameter. If not
#' 0, arguments `logL` and `forceAlgorithm` are ignored.
#' @param epsilon The desired error margin for the approximation. See 'details'.
#' @param maxIter The maximum number of iterations for the approximation. In
#' most cases, this number will not be reached unless it is very small. A value
#' too high is not recommended as an array of this size is reserved in memory
#' during the algorithm.
#' @param n0 The sum will be approximated for the series starting at this value
#' for the first parameter of `logFunction`.
#' @param forceAlgorithm A value to control which summation algorithm to use.
#' Ignored if the series is alternating, defined with argument `alternate`.
#' See 'details'.
#' @return A [summed-objects()] object. Note that the sum is returned in the
#' \code{log} scale.
#' @details The approximated sum is based on some theoretical results which,
#' analytically, guarantee that the approximation will be within `epsilon`
#' distance to the true value. It is possible that the numerical result fails
#' to fall in this distance due to floating point arithmetic. The `C` code
#' under the hood is being continuously reviewed to minimize this problem. They
#' seem to occur more often when the series decays very fast to zero or when the
#' total is a large number.
#'
#' For these theoretical results to work, the series must pass the ratio test,
#' which means that the ratio
#' \ifelse{html}{\out{a<sub>n+1</sub>/a<sub>n</sub>}}{\eqn{a_{n+1}/a_n}} must
#' converge to a number \eqn{L < 1} when \eqn{n} goes to infinity. The log of
#' \eqn{L} should be provided to the function for a better approximation.
#' This is not necessary in case a precompiled function is used. In this case
#' the value of \eqn{L} is coded into the package.
#'
#' Another requirement in the current installment of this function is that the
#' series must have only a single maximum. This is the case for most discrete
#' probability distributions and marginalization problems. This limitation
#' will be addressed in the future.
#'
#' There are currently two implemented algorithms that perform the calculations.
#' The first, called Sum-To-Threshold, sums the series values until the series
#' values are smaller than `epsilon`. This is the fastest algorithm, but
#' it is only guaranteed to provide an approximation within the desired error
#' margin when \eqn{L < 0.5}.
#'
#' The second algorithm, called Error bounding pairs is based on a more general
#' result which works for any \eqn{0 \le L < 1}. This algorithm sums the series
#' until
#'
#' \ifelse{html}{\out{|a<sub>n+1</sub>/(1-L) - a<sub>n+1</sub> a<sub>n</sub>/(a<sub>n</sub> - a<sub>n+1</sub>)| < 2 ε.}}{\deqn{\left|\frac{a_{n+1}}{1-L} - \frac{a_{n+1}a_n}{a_n-a_{n+1}}\right| < 2 \epsilon.}}
#'
#' Then the approximation is the added values of the sum plus
#'
#' \ifelse{html}{\out{ 0.5 (a<sub>n+1</sub>/(1-L) + a<sub>n+1</sub> a<sub>n</sub>/(a<sub>n</sub> - a<sub>n+1</sub>))}}{\deqn{0.5 (\frac{a_{n+1}}{1-L} + \frac{a_{n+1}a_n}{a_n-a_{n+1}}).}}
#'
#' The Error bounding pairs method usually requires less function evaluations
#' than the Sum-To-Threshold one, however the convergence checking is more
#' demanding, which means that it is typically slower, albeit slightly. If
#' \eqn{L = 0}, the convergence checking can be reduced and the Error bounding
#' pairs becomes almost as fast as the Sum-To-Threshold method.
#' 
#' The third algorithm is called batches method and is used when \eqn{L} is
#' left at `NULL`. Its use requires some fine tuning, so there is a
#' standalone function for it called [infiniteSum_batches()]. Its use
#' and functionality can be seen in its own documentation. When called as a
#' result of this function, default settings are used.
#'
#' The `forceAlgorithm` parameter can be used to
#' control which algorithm to use. When it is 0, the program automatically
#' selects the Sum-to-threshold when \eqn{L < 0.5} and the Error bounding pairs
#' when \eqn{0.5 \le L < 1}. Method batches is selected when \eqn{L} is left
#' `NULL`. If `forceAlgorithm` is set to 1, the Sum-To-Threshold
#' algorithm is forced. If it is 2, then the Error bounding pairs is forced. A
#' small note, the Error bounding pairs algorithm can go up to `maxIter` +
#' 1 function evaluations. This is due to its convergence checking dependence on
#' \ifelse{html}{\out{a<sub>n+1</sub>}}{\eqn{a_{n+1}}}. Finally, if the
#' parameter is set as 3, the batches algorithm is used with default settings.
#' 
#' If the series is alternating, the Sum-To-Threshold convergence condition on
#' the series absolute value guarantees the result, regardless of the ratio
#' limit.
#'
#' The function to be summed can be an R function or a string naming the
#' precompiled function in the package. The list of precompiled functions can
#' be found in [precompiled()], and more functions will be added in
#' time. As is intuitive, using a precompiled function is much faster than using
#' an `R` function. In fact, it has been observed to be dozens times faster.
#'
#' The advanced user can program their own precompiled functions and use the
#' package's summation algorithms by linking the appropriate header file. See
#' the [GitHub](https://github.com/GuidoAMoreira/sumR) readme for the a
#' quick tutorial.
#' @seealso [precompiled()] provides a list with precompiled functions
#' that can be used for the summation. [infiniteSum_batches()] is
#' an alternate method which does not require knowledge of the `logL`
#' argument.
#' @examples
#' ## Define some function that is known to pass the ratio test.
#' param <- 0.1
#' funfun <- function(k, p) return(k * log1p(-p[1]))
#' result <- infiniteSum(funfun, parameters = param, logL = log1p(-param))
#'
#' ## This series is easy to verify analytically
#' TrueSum <- -log(param)
#' TrueSum - result$sum
#' # Since exp(logL) = 0.9, the Error bounding pairs
#' # algorithm is used. Notice that it only required
#' # 2 function evaluations for the approximation, that is
#' result$n
#'
#' ## A common problem is finding the normalizing constant for the
#' ## Conway-Maxwell-Poisson distribution. It has already been included
#' ## in the precompiled list of functions.
#' comp_params <- c(lambda = 5, nu = 3)
#' result <- infiniteSum("COMP", comp_params)
#' result
#' @export
infiniteSum <- function(logFunction, parameters = numeric(), logL = NULL,
                        alternate = FALSE,  epsilon = 1e-15, maxIter = 1e5,
                        n0 = 0, forceAlgorithm = 0){

  # Make these tests at the C level to make function faster.
  stopifnot(is.function(logFunction) || is.character(logFunction),
            length(logFunction) == 1,
            is.numeric(parameters),
            is.numeric(alternate) || isFALSE(alternate),
            alternate %in% -1:1,
            is.numeric(epsilon),
            epsilon > 0,
            length(epsilon) == 1,
            is.numeric(maxIter),
            maxIter > 0,
            length(maxIter) == 1,
            length(logL) %in% 0:1,
            is.numeric(n0),
            n0 >= 0,
            length(n0) == 1,
            forceAlgorithm %in% 0:3)

  if (forceAlgorithm == 1 || alternate){
    if (!is.null(logL) && forceAlgorithm == 1)
      warning("Sum-To-Threshold algorithm doesn't use parameter logL. It will be ignored.")
  }
  maxIter <- as.integer(maxIter); n0 <- as.integer(n0)
  forceAlgorithm <- as.integer(forceAlgorithm)
  alternate <- as.integer(alternate)

  if (is.character(logFunction)){
    if (!is.null(logL)) warning("Summation over precompiled functions uses pre-determined logL. Inputted value ignored.")
    out <- .Call("infinite_sum_callPrecomp", logFunction, parameters, alternate,
                 epsilon, maxIter, n0, forceAlgorithm,
                 PACKAGE = "sumR")
  } else if(is.function(logFunction)) {
    if (!alternate) stopifnot(logL < 0)
    if (!is.null(logL) && alternate)
      warning('Parameter logL is not used in an alternating series. See help("infiniteSum") for details.')
    if (forceAlgorithm && alternate)
      warning('Parameter forceAlgorithm is disabled in an alternating series. See help("infiniteSum") for details.')

    f <- function(k, Theta) logFunction(k, Theta)

    out <- .Call("inf_sum",
                body(f), parameters, logL, alternate, epsilon, maxIter, n0,
                new.env(), forceAlgorithm,
                PACKAGE = "sumR")
  } else {
    warning('Argument lFun must either be the name of a precompiled function or a function. See help("precompiled") to see which functions are available.')
    return(list(sum = -Inf, n = 0, method = "canceled"))
  }

  if (forceAlgorithm == 0)
    if (is.null(logL) && !is.character(logFunction))
      m <- "Batches in C"
    else {
      if (is.character(logFunction)) logL <- determineLogL_(logFunction, parameters)
      if (logL < -log(2))
        m <- "Sum-to-threshold"
      else if (logL < 0)
        m <- "Error bounding pairs"
      else m <- "Unidentified"
    } else if (forceAlgorithm == 1)
      m <- "Sum-to-threshold"
    else if (forceAlgorithm == 2)
      m <- "Error bounding pairs"
    else if (forceAlgorithm == 3)
      m <- "Batches"
  
  out$method <- m
  out$maxReached <- out$n >= maxIter
  class(out) <- "summed"
  out
}