R/dt_basis.R

Defines functions Predict.matrix.dt.smooth getTF smooth.construct.dt.smooth.spec

Documented in getTF Predict.matrix.dt.smooth smooth.construct.dt.smooth.spec

#' Domain Transformation basis constructor
#' 
#' The \code{dt} basis allows for any of the standard \code{mgcv} (or
#'   user-defined) bases to be aplied to a transformed version of the
#'   original terms. Smooths may be of any number of terms. Transformations
#'   are specified by supplying a function of any or all of the original terms.
#'   "\code{by}" variables are not transformed.
#' 
#' @param object a smooth specification object, generated by \code{s()},
#'   \code{te()}, \code{ti()}, or \code{t2()}, with \code{bs="dt"}
#' @param data a list containing just the data (including any by variable)
#'   required by this term, with names corresponding to \code{object$term}
#'   (and \code{object$by}). The \code{by} variable is the last element.
#' @param knots a list containing any knots supplied for basis setup - in same
#'    order and with same names as \code{data}. Can be \code{NULL}.
#' 
#' @details
#' \code{object} should be creaated with an \code{xt} argument. For
#' non-tensor-product smooths, this will be a list with the following elements:
#' \enumerate{
#'   \item \code{tf} (required): a function or character string (or list of functions
#'   and/or character strings) defining the coordinate transormations; see
#'   further details below.
#'   \item \code{bs} (optional): character string indicating the \code{bs} for
#'   the basis applied to the transformed coordinates; if empty, the appropriate
#'   defaults are used.
#'   \item \code{basistype} (optional): character string indicating type of
#'   bivariate basis used. Options include \code{"s"} (the default), \code{"te"},
#'   \code{"ti"}, and \code{"t2"}, which correspond to \code{\link[mgcv]{s}},
#'   \code{\link[mgcv]{te}}, \code{\link[mgcv]{ti}}, and \code{\link[mgcv]{t2}}.
#'   \item \code{...} (optional): for tensor product smooths, additional arguments
#'   to the function specified by \code{basistype} that are not available in
#'   \code{s()} can be included here, e.g. \code{d}, \code{np}, etc.
#' }
#' 
#' For tensor product smooths, we recommend using \code{s()} to set up the basis,
#' and specifying the tensor product using \code{xt$basistype} as described
#' above. If the basis is set up using \code{te()}, then the variables in
#' \code{object$term} will be split up, meaning all transformation functions
#' would have to be univariate.
#' 
#' @section Transformation Functions:
#' 
#' Let \code{nterms = length(object$term)}. The \code{tf} element can take one
#' of the following forms:
#' \enumerate{
#'   \item a function of \code{nargs} arguments, where \code{nargs <= nterms}.
#'     If \code{nterms > 1}, it is assumed that this function will be applied to
#'     the first term of \code{object$term}. If all argument names of the
#'     function are term names, then those arguments will correspond to those
#'     terms; otherwise, they will correspond to the first \code{nargs} terms in
#'     \code{object$term}.
#'   \item a character string corresponding to one of the built-in
#'     transformations (listed below).
#'   \item A list of length \code{ntfuncs}, where \code{ntfuncs<=nterms},
#'     containing either the functions or character strings described above. If
#'     this list is named with term names, then the transformation functions
#'     will be applied to those terms; otherwise, they will be applied to the
#'     first \code{ntfuncs} terms in \code{object$term}.
#' }
#' The following character strings are recognized as built-in transformations:
#' \itemize{
#'   \item \code{"log"}: log transformation (univariate)
#'   \item \code{"ecdf"}: empirical cumulative distribution function (univariate)
#'   \item \code{"linear01"}: linearly rescale from 0 to 1 (univariate)
#'   \item \code{"s-t"}: first term ("s") minus the second term ("t") (bivariate)
#'   \item \code{"s/t"}: first term ("s") divided by the second term ("t") (bivariate)
#' }
#' 
#' 
# @section Pivot points:
#' 
#' Some transformations rely on a fixed "pivot point" based on the data used to
#' fit the model, e.g. quantiles (such as the min or max) of this data.
#' When making predictions based on these transformaitons, the transformation
#' function will need to know what the pivot points are, based on the original
#' (not prediction) data. In order to accomplish this, we allow the user to
#' specify that they want their transformation function to refer to the original
#' data (as opposed to whatever the "current" data is). This is done by appending
#' a zero ("0") to the argument name.
#' 
#' For example, suppose you want to scale
#' the term linearly so that the data used to define the basis ranges from
#' 0 to 1. The wrong way to define this transformation function:
#' \code{function(x) {(x - max(x))/(max(x) - min(x))}}.
#' This function will result in incorrect predictions if the range of data for
#' which preditions are being made is not the same as the range of data that was
#' used to define the basis. The proper way to define this function:
#' \code{function(x) {(x - max(x0))/(max(x0) - min(x0))}}.
#' By refering to \code{x0} instead of \code{x}, you are indicating that you
#' want to use the original data instead of the current data. This may seem
#' strange to refer to a variable that is not one of the arguments, but the
#' \code{"dt"} constructor explicity places these variables in the environment
#' of the transformation function to make them available.
#' 
#' @return An object of class "dt.smooth". This will contain all the elements
#'   associated with the \code{\link[mgcv]{smooth.construct}} object from the
#'   inner smooth (defined by \code{xt$bs}), in addition to:
#'   \enumerate{
#'     \item tf: the named list of transformation functions applied to the
#'     input data
#'     \item class: the class of the smooth object for the inner smooth
#'   }
#' @author Jonathan Gellar \email{JGellar@@mathematica-mpr.com}
#' @seealso \code{\link[mgcv]{smooth.construct}}
#' @export
#' @importFrom stats loess
#' @importFrom stats ecdf
#' @importFrom mgcv Predict.matrix
#' @importFrom mgcv smooth.construct
#' 

smooth.construct.dt.smooth.spec <- function(object, data, knots) {
  # Constructor method for parametric bivariate basis
  
  # Input Checks (TO DO)
  xt <- object$xt
  
  # Need alternative approach:
  #  - can't add x0 like we're doing
  #  - add data0 to the xt object, and have the transformation funcs take data, data0?
  
  
  # Extract tf and make it a list of functions (if it isn't already)
  tf <- xt$tf
  if (is.null(tf)) {
    tf <- identity
    warning("No transformation function supplied... using identity")
  }
  if (!is.list(tf)) tf <- as.list(tf)
  tf <- lapply(tf, function(f) {
    if (is.character(f)) {
      # Create transformation functions for recognized character strings
      getTF(f, length(object$term))
    } else if (is.function(f)) {
      # Already a function - just return it
      f
    } else stop("Unrecognized type for 'dt' transformation function")
  })
    
  # Names of tf are the names of the variables to transform
  if (is.null(names(tf)))
    # Unnamed: use the first length(tf) terms as the names
    names(tf) <- object$term[1:length(tf)]
  
  # Transform data
  tdata <- lapply(1:length(tf), function(i) {
    f <- tf[[i]]
    args <- formals(f)
    argnms <- names(args)
    if (length(args) > length(object$term))
      stop("Transformation function of too many arguments is supplied")
    trmnms <- if (all(argnms %in% object$term)) {
      argnms
    } else if (length(args) == 1 & length(tf) <= length(object$term)) {
      object$term[i]
    } else {
      object$term[1:length(args)]      
    }
    if (!all(trmnms %in% names(data))) {
      miss <- trmnms[!(trmnms %in% names(data))]
      stop(paste0("Variable(s) ", paste(trmnms[miss], collapse=", "),
                  " needed but not supplied to smooth constructor"))
    }
    calldat <- data[trmnms]
    names(calldat) <- argnms
    sapply(1:length(calldat), function(j) {
      # Assign raw data to environemnt of transformation function, as the
      # variable name appended with "0".
      # This allows functions like min() and max() to refer to the raw data
      # as opposed to the current data.
      assign(paste0(argnms[j], "0"), calldat[[j]], envir=environment(tf[[i]]))
      invisible(NULL)
    })
    f <- tf[[i]]
    do.call(f, calldat)
  })
  names(tdata) <- names(tf)
  
  # Include untransformed data (if any)
  untr <- names(data)[!(names(data) %in% names(tdata))]
  tdata[untr] <- data[untr]
    
  # New smooth.spec object
  object <- if (any(is.null(xt$basistype), xt$basistype=="s")) {
    # s smooth: Modify existing object
    bs <- ifelse(is.null(xt$bs), "tp", xt$bs)
    class(object) <- paste0(bs, ".smooth.spec")
    object$xt <- xt$xt
    object
  } else {
    # tensor product smooth: need to create new smooth.spec object
    args <- xt[!(names(xt) %in% c("tf", "basistype"))]
    if (is.null(args$k) & object$bs.dim>0) args$k <- object$bs.dim
    if (is.null(args$m) & !is.na(object$p.order)) args$m <- object$p.order
    if (is.null(args$id) & !is.null(object$id)) args$id <- object$id
    if (is.null(args$sp) & !is.null(object$sp)) args$sp <- object$sp
    if ((xt$basistype %in% c("te", "ti")) & is.null(args$fx))
      args$fx <- object$fixed
    if (!(object$by == "NA"))
      args$by <- as.symbol(object$by)
    eval(as.call(c(list(quote(eval(parse(text=paste0("mgcv::", xt$basistype))))),
                   lapply(object$term, as.symbol),
                   args)))    
  }
  
  # Create smooth and modify return object
  sm <- mgcv::smooth.construct(object, data = tdata, knots = knots)
  sm$tf <- tf
  sm$class <- class(sm)
  class(sm) <- "dt.smooth"
  sm
}

#' Get recognized transformation function
#' @keywords internal
getTF <- function(fname, nterm) {
  # fname: transformation function identifier (character string)
  # nterm: number of terms of smooth
  if (fname=="log") {
    function(x) log(x)
  } else if (fname=="ecdf") {
    function(x) ecdf(x0)(x)
  } else if (fname=="linear01") {
    function(x) (x - min(x0))/(max(x0) - min(x0))
  } else if (fname=="s-t") {
    if (nterm >= 2) function(s,t) s-t
    else stop(paste0("Not enough terms for ", fname, " transformation"))
  } else if (fname=="s/t") {
    if (nterm >= 2) function(s,t) ifelse(t==0, 0.5, s/t)
    else stop(paste0("Not enough terms for ", fname, " transformation"))
  } else stop("Unrecognized character string for 'dt' transformation function")
}

#     if (is.character(f) & length(object$term)==1) {
#       if (f=="log") {
#         function(x) log(x)
#       } else {
#         stop("Unrecognized character string for 'dt' transformation function")
#       }
#     } else if (is.character(f) & length(object$term)==2) {
#       if (f=="s-t") {
#         function(s,t) s-t
#       } else if (f=="s/t") {
#         function(s,t) s/t
#       } else {
#         stop("Unrecognized character string for 'dt' transformation function")
#       }
#     } else if (is.function(f)) {
#       f
#     } else stop("Unrecognized type for 'dt' transformation function")
#   })




#' Predict.matrix method for dt basis
#' 
#' @param object a \code{dt.smooth} object created by
#'   \code{\link{smooth.construct.dt.smooth.spec}}, see
#'   \code{\link[mgcv]{smooth.construct}}
#' @param data  see \code{\link[mgcv]{smooth.construct}}
#' @return design matrix for \code{dt} terms
#' @author Jonathan Gellar
#' @export
#' 

Predict.matrix.dt.smooth <- function(object, data) {
  # Prediction method for parameteric bivariate basis
  
  tf <- object$tf
  tdata <- lapply(tf, function(f) {
    args <- formals(f)
    argnms <- names(args)
    if (!(all(argnms %in% object$term)))
      # If argnms aren't term names, defaults to first length(args) terms
      argnms <- object$term[1:length(args)]
    if (!all(argnms %in% names(data))) {
      miss <- argnms[!(argnms %in% names(data))]
      stop(paste0("Variable(s) ", paste(argnms[miss], collapse=", "),
                  " needed but not supplied to Predict.matrix"))
    } else {
      calldat <- data[argnms]
    }
    names(calldat) <- names(args)
    do.call(f, calldat)
  })
  
  # Include untransformed data (if any)
  untr <- names(data)[!(names(data) %in% names(tdata))]
  tdata[untr] <- data[untr]
  
  # Modify smooth object and call Predict.matrix
  object$tf <- NULL
  class(object) <- object$class
  mgcv::Predict.matrix(object, tdata)
}
jgellar/mgcvTrans documentation built on Aug. 10, 2022, 4:02 p.m.