R/nnf-vision.R
In torch: Tensors and Neural Networks with 'GPU' Acceleration

Documented in nnf_affine_grid nnf_grid_sample

#' Affine_grid
#'
#' Generates a 2D or 3D flow field (sampling grid), given a batch of
#'     affine matrices `theta`.
#'
#' @section Note:
#'
#' This function is often used in conjunction with [nnf_grid_sample()]
#' to build `Spatial Transformer Networks`_ .
#'
#'
#' @param theta (Tensor) input batch of affine matrices with shape
#'   (\eqn{N \times 2 \times 3}) for 2D or  (\eqn{N \times 3 \times 4}) for 3D
#' @param size (torch.Size) the target output image size. (\eqn{N \times C \times H \times W}
#'   for 2D or \eqn{N \times C \times D \times H \times W} for 3D)
#'   Example: torch.Size((32, 3, 24, 24))
#' @param align_corners (bool, optional) if `True`, consider `-1` and `1`
#'   to refer to the centers of the corner pixels rather than the image corners.
#'   Refer to [nnf_grid_sample()] for a more complete description. A grid generated by
#'   [nnf_affine_grid()] should be passed to [nnf_grid_sample()]  with the same setting for
#'   this option. Default: `False`
#'
#' @export
nnf_affine_grid <- function(theta, size, align_corners = FALSE) {
  torch_affine_grid_generator(theta, size, align_corners)
}

#' Grid_sample
#'
#' Given an `input` and a flow-field `grid`, computes the
#' `output` using `input` values and pixel locations from `grid`.
#'
#' Currently, only spatial (4-D) and volumetric (5-D) `input` are
#' supported.
#'
#' In the spatial (4-D) case, for `input` with shape
#' \eqn{(N, C, H_{\mbox{in}}, W_{\mbox{in}})} and `grid` with shape
#' \eqn{(N, H_{\mbox{out}}, W_{\mbox{out}}, 2)}, the output will have shape
#' \eqn{(N, C, H_{\mbox{out}}, W_{\mbox{out}})}.
#'
#' For each output location `output[n, :, h, w]`, the size-2 vector
#' `grid[n, h, w]` specifies `input` pixel locations `x` and `y`,
#' which are used to interpolate the output value `output[n, :, h, w]`.
#' In the case of 5D inputs, `grid[n, d, h, w]` specifies the
#' `x`, `y`, `z` pixel locations for interpolating
#' `output[n, :, d, h, w]`. `mode` argument specifies `nearest` or
#' `bilinear` interpolation method to sample the input pixels.
#'
#' `grid` specifies the sampling pixel locations normalized by the
#' `input` spatial dimensions. Therefore, it should have most values in
#' the range of `[-1, 1]`. For example, values `x = -1, y = -1` is the
#' left-top pixel of `input`, and values  `x = 1, y = 1` is the
#' right-bottom pixel of `input`.
#'
#' If `grid` has values outside the range of `[-1, 1]`, the corresponding
#' outputs are handled as defined by `padding_mode`. Options are
#'
#' * `padding_mode="zeros"`: use `0` for out-of-bound grid locations,
#' * `padding_mode="border"`: use border values for out-of-bound grid locations,
#' * `padding_mode="reflection"`: use values at locations reflected by
#' the border for out-of-bound grid locations. For location far away
#' from the border, it will keep being reflected until becoming in bound,
#' e.g., (normalized) pixel location `x = -3.5` reflects by border `-1`
#' and becomes `x' = 1.5`, then reflects by border `1` and becomes
#' `x'' = -0.5`.
#'
#' @section Note:
#'
#' This function is often used in conjunction with [nnf_affine_grid()]
#' to build `Spatial Transformer Networks`_ .
#'
#' @param input (Tensor) input of shape \eqn{(N, C, H_{\mbox{in}}, W_{\mbox{in}})} (4-D case)                    or \eqn{(N, C, D_{\mbox{in}}, H_{\mbox{in}}, W_{\mbox{in}})} (5-D case)
#' @param grid (Tensor) flow-field of shape \eqn{(N, H_{\mbox{out}}, W_{\mbox{out}}, 2)} (4-D case)                   or \eqn{(N, D_{\mbox{out}}, H_{\mbox{out}}, W_{\mbox{out}}, 3)} (5-D case)
#' @param mode (str) interpolation mode to calculate output values `'bilinear'` | `'nearest'`.
#'   Default: `'bilinear'`
#' @param padding_mode (str) padding mode for outside grid values `'zeros'` | `'border'`
#'   | `'reflection'`. Default: `'zeros'`
#' @param align_corners (bool, optional) Geometrically, we consider the pixels of the
#'   input  as squares rather than points. If set to `True`, the extrema (`-1` and
#'   `1`) are considered as referring to the center points of the input's corner pixels.
#'   If set to `False`, they are instead considered as referring to the corner
#'   points of the input's corner pixels, making the sampling more resolution
#'   agnostic. This option parallels the `align_corners` option in  [nnf_interpolate()], and
#'   so whichever option is used here should also be used there to resize the input
#'   image before grid sampling. Default: `False`
#'
#' @export
nnf_grid_sample <- function(input, grid, mode = c("bilinear", "nearest"),
                            padding_mode = c("zeros", "border", "reflection"),
                            align_corners = FALSE) {
  if (mode == "bilinear") {
    mode_enum <- 0
  } else if (mode == "nearest") {
    mode_enum <- 1
  } else {
    value_error(
      "Unknown mode name '{mode}'. Supported modes are 'bilinear'",
      "and 'nearest'."
    )
  }


  if (padding_mode == "zeros") {
    padding_mode_enum <- 0
  } else if (padding_mode == "border") {
    padding_mode_enum <- 1
  } else if (padding_mode == "reflection") {
    padding_mode_enum <- 2
  } else {
    value_error(
      "Unknown padding mode name '{padding_mode}'. Supported modes are",
      "'zeros', 'border' and 'reflection'."
    )
  }

  torch_grid_sampler(
    input = input, grid = grid, interpolation_mode = mode_enum,
    padding_mode = padding_mode_enum, align_corners = align_corners
  )
}