R/veg_cover_functions.R

Defines functions .prepare_beta_prior .sanity_check_point_counts beta_prior_helper .do_cover_difference_sample .do_cover_difference_all get_cover_difference get_cover_quantiles

Documented in beta_prior_helper get_cover_difference get_cover_quantiles

#' Calculate quantiles of stratum vegetation cover based on LiDAR point counts
#'
#' This function takes a multi-band raster of LiDAR point counts for ground
#' level and vegetation strata, arranged in increasing height order (i.e. ground
#' is band 1), and returns a raster of vegetation cover estimates for strata.
#' Estimates are based on a Bayesian approach, where the detection of stratum
#' vegetation is viewed as a binomial sampling process. LiDAR points (i.e.
#' discrete returns) from a stratum are treated as 'successes' while points from
#' ground level and any lower strata are 'failures'. Under this model, the
#' binomial probability of success is an estimator of vegetation cover, while
#' the degree of certainty about the estimate will depend on the LiDAR point
#' density. Selected quantiles for estimated cover are returned as raster bands
#' for each stratum. The default is to calculate median cover and the central
#' 90\% bounds (5\%, 50\% and 95\% quantiles) for each stratum. Optionally, the
#' maximum likelihood estimate of mean cover can also be returned.
#'
#' If we view the detection of vegetation within a stratum via discrete LiDAR
#' returns as a binomial sampling process, then we can make use of the
#' relationship between the binomial and beta distributions. For a given pixel,
#' if we have \code{n} returns from vegetation within a stratum out of a total
#' of \code{N} returns from that stratum or lower (including ground level), then
#' the distribution of the possible vegetation cover values for the stratum that
#' could result in the observed points counts will follow a beta distribution.
#' For the baseline no-data case (ie. no LiDAR returns from the stratum or
#' below, perhaps because higher strata blocked all LiDAR pulses), it is
#' reasonable to depict the possible cover distribution as uniform between 0 and
#' 1, i.e. with no data we have no reason to prefer any cover value over any
#' other. This baseline can be represented as a \code{beta(1, 1)} distribution
#' which, for quantile calculations, we refer to as the \emph{Bayesian prior
#' distribution} of cover values. Most of the time this is probably what you
#' want to use. We use the LiDAR data within a pixel to \emph{update} our
#' expectation of stratum cover from the flat prior \code{beta(1, 1)}
#' distribution to a \code{beta(1+n, 1+n-N)} distribution. We can then summarize
#' this distribution via selected quantiles which can be output as raster
#' layers.
#'
#' Sometimes, it might be preferable to adopt something other than a uniform
#' prior expectation for vegetation cover. For example, from previous studies
#' and/or expert knowledge we might expect that tree canopy cover will typically
#' be around 30\%, relative to the pixel size being used for mapping, with a
#' certain degree of variation around this value. In this case, a beta
#' distribution can be chosen to represent this expectation (e.g.
#' \code{beta(2,5)}) and specified for the stratum using the \code{beta_prior}
#' argument. See examples below.
#'
#' In addition to quantiles of stratum vegetation cover, this function can be
#' used to calculate the simpler maximum likelihood estimate of mean cover
#' (MLE), equivalent to the value returned by the deprecated function
#' \code{get_stratum_cover}. The MLE is the ratio of the number of LiDAR points
#' returned from within the stratum height range to the total number of points
#' returned between ground level and the top of the stratum. This value
#' corresponds to the mode of the beta distribution used to calculate quantiles,
#' assuming that the default \code{beta(1,1)} prior distribution has been
#' specified.
#'
#'
#' @param rcounts A raster object with a band of point count data for each
#'   vegetation stratum plus a band for the ground layer. Normally, this will be
#'   a \code{terra} package \code{SpatRaster} object generated by function
#'   \code{get_stratum_counts()}, but it may also be a \code{RasterStack} or
#'   \code{RasterBrick} (\code{raster} package) object. \strong{Important:} the
#'   bands must be arranged in increasing height order, i.e. ground layer, then
#'   understorey layer(s), then overstorey layer(s).
#'
#' @param probs A numeric vector of one or more probability values specifying
#'   the quantiles of vegetation cover.  The default \code{c(0.05, 0.5, 0.95)}
#'   returns quantiles for the median and the central 90\% interval. To suppress
#'   quantile calculations and only return the maximum likelihood estimate of
#'   mean cover, set this argument to \code{NULL, NA} or an empty vector, and
#'   set argument \code{mle=TRUE}.
#'
#' @param mle A logical value specifying whether the maximum likelihood estimate
#'   (MLE) of mean cover should be returned in addition to any specified cover
#'   quantiles. The MLE is the same value as that returned by the deprecated
#'   function \code{get_stratum_cover}. Set to \code{FALSE} (default) if the MLE
#'   is not required; or to \code{TRUE} to return the MLE in addition to any
#'   quantiles specified via the \code{probs} argument. See Details for
#'   information.
#'
#' @param backgroundNA Controls what pixel value should be returned for a
#'   stratum when there are no LiDAR points for the stratum, any lower strata or
#'   ground. If set to \code{TRUE} (the default), such cases will have missing
#'   (\code{NA}) pixel values for all quantile layers for the stratum. If
#'   \code{FALSE}, pixel values will be calculated as quantiles of the prior
#'   beta distribution. Note that if maximum likelihood mean cover estimates are
#'   specified (via \code{mle=TRUE}) these will always be \code{NA} for any
#'   pixels without LiDAR data.
#'
#' @param beta_prior (\strong{Expert use only!}) Either a numeric vector of two
#'   values defining the prior beta distribution to use for all strata, or a
#'   named list of two-element vectors where names match raster layer names for
#'   vegetation strata (case-insensitive). The default is \code{c(1,1)} for a
#'   \code{beta(1,1)} distribution which is equivalent to a uniform distribution
#'   on the (0,1) interval for all strata. This is probably what you want unless
#'   (a) you really know what you are doing and (b) there are particular
#'   reasons, i.e. previous studies or expert knowledge, to expect cover values
#'   for one or more strata to follow alternative distributions. For the list
#'   option, the default \code{c(1,1)} prior parameters will be used any strata
#'   not matched to list element names. The function \code{beta_prior_helper}
#'   can be used to explore candidate distributions given an expected cover
#'   value. See the examples and the Details section for further explanation.
#'
#' @return A \code{SpatRast} (\code{terra} package) raster object with a layer
#'   for each requested quantile within each stratum. For example, calculating
#'   the default median plus 90\% bounds (0.05 and 0.95 quantiles) for 5
#'   vegetation strata would return a raster object with 15 layers. Layers are
#'   arranged by stratum, then by quantile. Layer names have the form
#'   \code{stratum_q_prob}, e.g. \code{'TallShrub_q_0.5'} for the median (50\%)
#'   quantile. If the maximum likelihood cover estimate has been requested
#'   (\code{mle=TRUE}), this will appear before any quantile layers for each
#'   stratum with a name of the form \code{stratum_mle}.
#'
#' @examples
#' \dontrun{
#' # Estimate quantiles of vegetation cover for each stratum corresponding to
#' # 50\% and 90\% bounds and the median.
#' #
#' probs <- c(0.05, 0.25, 0.5, 0.75, 0.95)
#' rcover <- get_cover_quantiles(rcounts, probs)
#'
#' # Alternatively, calculate the maximum likelihood cover estimate for each
#' # stratum. Note that this is insensitive to LiDAR point density, i.e. it provides
#' # no indication of the uncertainty associated with the estimates.
#' #
#' rcover_mle <- get_cover_quantiles(rcounts, probs = NULL, mle = TRUE)
#'
#' # An example of setting 'informed' beta priors for two strata:
#' # 'LowShrub' and 'MediumTree', corresponding to an expectation of 25\% cover
#' # for shrubs and 40\% cover for trees. Note that case is ignored when matching
#' # strata names. The default \code{beta(1,1)} prior will be used for other strata.
#' #
#' beta_prior <- list(lowshrub=c(2,4), mediumtree=c(2,2.5))
#' }
#'
#' @seealso \code{\link{get_cover_difference}}
#' @seealso \code{\link{beta_prior_helper}}
#'
#' @importFrom stats qbeta

#' @export
#'
get_cover_quantiles <- function(rcounts,
                                probs = c(0.05, 0.5, 0.95),
                                mle = FALSE,
                                backgroundNA = TRUE,
                                beta_prior = c(1, 1)) {

  rcounts <- .as_spat_raster(rcounts)
  .sanity_check_point_counts(rcounts)

  # Check if quantiles are not required
  if (is.null(probs) || is.na(probs[1])) {
    probs <- numeric()
  } else {
    if (!is.numeric(probs)) stop("Argument probs should be a numeric vector or NULL")
    if (!all(probs > 0 & probs < 1)) stop("Requested quantile (probs) must all be >0 and <1")

    probs <- sort(unique(probs))
  }

  # Check that at least one of quantiles or MLE has been requested
  mle <- .as_boolean(mle)
  if (!mle && length(probs) == 0) stop("At least one of quantiles or max likelihood estimate must be specified")

  backgroundNA <- .as_boolean(backgroundNA)

  nbands <- terra::nlyr(rcounts)
  if (nbands < 2) {
    msg <- glue::glue("The point count raster must have at least two layers
                       with the first layer being ground level point counts and
                       subsequent layers being stratum point counts in order of
                       increasing height.")

    stop(msg)
  }

  # Check the beta_prior argument (vector or list) and format it
  # as a matrix of parameters
  beta_prior <- .prepare_beta_prior(beta_prior, rcounts)


  # Worker function for cover calculations
  fn <- function(x, istratum) {
    nout <- mle + length(probs)
    res <- numeric(nout)
    k <- 1

    prior_params <- beta_prior[istratum, ]

    if (mle) {
      res[1] <- ifelse(x[2] == 0, NA, x[1] / x[2])
      k <- 2
    }
    if (length(probs) > 0) {
      if (backgroundNA && x[2] == 0) {
        res[k:nout] <- NA_real_
      } else {
        res[k:nout] <- qbeta(probs,
                             shape1 = prior_params[1] + x[1],
                             shape2 = prior_params[2] + x[2] - x[1])
      }
    }
    res
  }

  # Initialize a raster of total point counts with the ground layer
  rsum <- terra::subset(rcounts, 1)

  # For each veg stratum band:
  #   - increment total point count
  #   - call worker function to calculate quantiles and/or MLE
  #
  res <- lapply(2:nbands, function(iband) {
    rband <- terra::subset(rcounts, iband)
    rsum <<- rsum + rband

    rres <- terra::app(c(rband, rsum), fn, istratum=iband-1)

    # Set names for result layer(s) for this stratum
    nout_layers <- mle + length(probs)
    k <- 1
    if (mle) {
      names(rres)[1] <- sprintf("%s_mle", names(rcounts)[iband])
      k <- 2
    }
    if (length(probs) > 0) {
      names(rres)[k:nout_layers] <- sprintf("%s_q_%g", names(rcounts)[iband], probs)
    }

    rres
  })

  # Merge list of result layers into a single SpatRaster
  # and return
  terra::rast(res)
}


#' Calculate quantiles for change in vegetation cover between two times
#'
#' This function uses the same approach for Bayesian estimation of vegetation
#' cover as described for function \code{\link{get_cover_quantiles}}, but
#' applied to two rasters representing LiDAR point counts for the same area at
#' two different times. Argument \code{rcounts0} is the raster for the earlier
#' time \code{t0}, while argument \code{rcounts1} is the raster for the later
#' time \code{t1}. Each input raster should have two or more bands, with the
#' first band representing ground point counts and subsequent bands representing
#' point counts for vegetation strata arranged in increasing height order (i.e.
#' ground is band 1). For a given pixel and stratum, the distribution of
#' credible cover change values from time \code{t0} to time \code{t1} is found
#' via a simulation approach (see Details). This is a very time consuming
#' process, so rather than computing change quantiles for all pixels there is an
#' option to restrict the calculations to a set of sample locations which can
#' either be specified as either a two-column matrix or data frame of X-Y
#' coordinate values, or an \code{sf} spatial data frame of point features.
#'
#' As described for function \code{\link{get_cover_quantiles}}, the estimation
#' of stratum vegetation cover is based on treating the discrete LiDAR returns
#' as a binomial sampling process, where the probability of a return from
#' vegetation in the stratum being considered corresponds to vegetation cover.
#' Under this model, the distribution of possible cover values that could have
#' generated the observed point counts for a pixel will follow a beta
#' distribution. Change in vegetation cover can therefore be estimated by taking
#' the distribution of differences between the beta distribution for time 2
#' minus that for time 1. This function uses a simulation approach to
#' approximate the distribution of differences.
#'
#' @param rcounts0 Point counts raster for the first (earlier) time. Normally,
#'   this will be a \code{terra} package \code{SpatRaster} object generated by
#'   function \code{get_stratum_counts()}, but it may also be a
#'   \code{RasterStack} or \code{RasterBrick} (\code{raster} package) object.
#'   \strong{Important:} the bands must be arranged in increasing height order,
#'   i.e. ground layer, then understorey layer(s), then overstorey layer(s).
#'
#' @param rcounts1 Point counts raster for the second (later) time. This must
#'   have the same dimensions (rows, columns and number of bands) as
#'   \code{rcounts0}. Generally it will also have the same spatial bounds, but
#'   this is not checked to allow the function to be used for other types of
#'   comparisons (e.g. space for time substitution).
#'
#' @param probs A numeric vector of one or more probability values specifying
#'   the quantiles of vegetation cover change to calculate.  The default
#'   \code{c(0.05, 0.5, 0.95)} returns quantiles for the median and the central
#'   90\% interval of cover change.
#'
#' @param samplelocs Spatial locations of raster pixels to sample. Can be a
#'   two-column matrix or data frame of X-Y coordinates, or an \code{sf} data
#'   frame of point features. Note that the default value (\code{NULL}) will
#'   result in change estimates being calculated for all pixels. Only do this if
#'   you have a lot of time on your hands! If providing sample locations as a
#'   two-column matrix or data frame, the two point count rasters must be in the
#'   same projection otherwise an error is raised. The projection is then
#'   assumed to apply to the sample coordinates.
#'
#' @param backgroundNA Controls what pixel value should be returned for a
#'   stratum when there are no LiDAR points for the stratum , any lower strata
#'   or ground level in one or both point counts rasters. If set to \code{TRUE}
#'   (the default), such cases will have missing (\code{NA}) pixel values for
#'   all quantiles for the stratum. If \code{FALSE}, quantiles for change values
#'   will be based on the prior beta distribution.
#'
#' @param beta_prior (\strong{Expert use only!}) Either a numeric vector of two
#'   values defining the prior beta distribution to use for all strata, or a
#'   named list of two-element vectors where names match raster layer names for
#'   vegetation strata (case-insensitive). The default is \code{c(1,1)} for a
#'   \code{beta(1,1)} distribution which is equivalent to a uniform distribution
#'   on the (0,1) interval for all strata. This is probably what you want unless
#'   (a) you really know what you are doing and (b) there are particular
#'   reasons, i.e. previous studies or expert knowledge, to expect cover values
#'   for one or more strata to follow alternative distributions. For the list
#'   option, the default \code{c(1,1)} prior parameters will be used any strata
#'   not matched to list element names. The function \code{beta_prior_helper}
#'   can be used to explore candidate distributions given an expected cover
#'   value. See the examples and the Details section for further explanation.
#'
#' @param nsim Number of iterations for simulating the difference between the
#'   two beta distributions representing possible cover values at times
#'   \code{t0} and \code{t1}. Default is \code{1e5}.
#'
#' @param seed An optional integer value to use as an initial seed for random
#'   number generation when simulating the differences between beta
#'   distributions representing possible cover values for a pixel and stratum at
#'   the two times. This is useful when exactly reproducible results are
#'   preferred. If \code{NULL} (default), results will vary slightly between
#'   runs with the same input data.
#'
#' @return If estimating change for all pixels (\code{samplelocs=NULL}), the
#'   function returns a \code{SpatRast} (\code{terra} package) raster object
#'   with a layer for each requested quantile of vegetation cover change within
#'   each stratum. For example, calculating the default median plus 90\% bounds
#'   (0.05 and 0.95 quantiles) of cover change for 5 vegetation strata would
#'   return a raster object with 15 layers. Layers are arranged by stratum, then
#'   by quantile. Layer names have the form \code{stratum_dq_prob}, e.g.
#'   \code{'TallShrub_dq_0.5'} for the median (50\%) quantile of cover change.
#'   If estimating change for a subset of pixels specified via the
#'   \code{samplelocs} argument, an \code{sf} spatial data frame is returned
#'   with a row for each sample location and columns for stratum x quantile
#'   change estimates. Column names have the same form as that described for
#'   raster band names.
#'
#' @examples
#' \dontrun{
#' # Given two point count rasters, r0 (earlier time t0) and r1 (later time t1),
#' # calculate quantiles for difference in stratum cover at t1 compared to t0.
#' #
#' # Randomly select 100 pixels and create a set of point features
#' library(sf)
#' library(terra)
#'
#' icells <- sample(1:ncells(r0), size = 100)
#' xy <- as.data.frame( xyFromCell(icells) )
#' xy <- st_as_sf(xy, coords = 1:2)
#'
#' # Set the coordinate reference system for sample points. This is optional
#' # if r0 and r1 are in the same projection.
#' st_crs(xy) <- st_crs(r0)
#'
#' # Quantiles of cover change (time t1 relative to t0) corresponding to
#' # median, 50\% bounds and 90\% bounds. The result is a data frame with a
#' # column for each stratum x quantile.
#' #
#' probs <- c(0.05, 0.25, 0.5, 0.75, 0.95)
#' dat_change <- get_cover_difference(r0, r1, probs, sample = xy, seed = 42)
#'
#'
#' # If the point count rasters are not too large, or you are happy to wait,
#' # you can calculate change estimates for all pixels by leaving the
#' # sample argument at its default of NULL. In this case, a terra::SpatRaster
#' # object is returned with a band for each stratum x quantile.
#' #
#' r_change <- get_cover_difference(r0, r1, probs, seed = 42)
#' }
#'
#' @seealso \code{\link{get_cover_quantiles}}
#'
#' @export
#'
get_cover_difference <- function(rcounts0,
                                 rcounts1,
                                 probs = c(0.05, 0.5, 0.95),
                                 samplelocs = NULL,
                                 backgroundNA = TRUE,
                                 beta_prior = c(1, 1),
                                 nsim = 1e5,
                                 seed = NULL) {

  rcounts0 <- .as_spat_raster(rcounts0)
  .sanity_check_point_counts(rcounts0)

  rcounts1 <- .as_spat_raster(rcounts1)
  .sanity_check_point_counts(rcounts1)

  if (!is.numeric(probs)) stop("Argument probs should be a numeric vector or NULL")
  if (!all(probs > 0 & probs < 1)) stop("Requested quantile (probs) must all be >0 and <1")

  probs <- sort(unique(probs))

  backgroundNA <- .as_boolean(backgroundNA)

  nbands <- terra::nlyr(rcounts0)
  if (nbands < 2) {
    msg <- glue::glue("Both point count rasters must have at least two layers
                       with the first layer being ground level point counts and
                       subsequent layers being stratum point counts in order of
                       increasing height.")

    stop(msg)
  }

  if (terra::nlyr(rcounts1) != nbands) {
    stop("Both point count rasters must have the same number of bands")
  }

  if (terra::ncol(rcounts0) != terra::ncol(rcounts1) ||
      terra::nrow(rcounts0) != terra::nrow(rcounts1)) {
    stop("Both point count rasters must have the same number of rows and columns")
  }

  # Check the beta_prior argument and convert to matrix form
  beta_prior <- .prepare_beta_prior(beta_prior, rcounts0)

  if (is.null(samplelocs)) {
    # Calculating change values for all cells.
    # This will return a SpatRaster object.
    #
    .do_cover_difference_all(rcounts0, rcounts1, probs, backgroundNA, beta_prior, nsim, seed)

  } else {
    # Calculating change for sample locations.
    # This will return an 'sf' data frame with a column for each stratum x quantile.
    #
    .do_cover_difference_sample(rcounts0, rcounts1, samplelocs, probs, backgroundNA, beta_prior, nsim, seed)
  }
}


# Helper function for get_cover_difference() that estimates change for all cells x strata
# and returns the results as a raster.
#
.do_cover_difference_all <- function(rcounts0, rcounts1, probs, backgroundNA, beta_prior, nsim, seed) {
  # Worker function for cover calculations.
  # Expects input raster 'x' to have four bands in order:
  #   s0, n0, s1, n1
  # where s_ is stratum points and n_ is total points.
  #
  fn <- function(x, istratum) {
    if (backgroundNA && (x[2] == 0 || x[4] == 0)) {
      # No lidar points in one or both layers for this stratum and below
      rep(NA_real_, length(probs))
    } else {
      qdiffbeta_sim(probs,
                    s0 = x[1], n0 = x[2], s1 = x[3], n1 = x[4],
                    beta_prior = beta_prior[istratum, ], nsim = nsim, seed = seed)
    }
  }

  # Initialize a raster of total point counts with the ground layer
  # for each time
  rsum0 <- terra::subset(rcounts0, 1)
  rsum1 <- terra::subset(rcounts1, 1)

  nbands <- terra::nlyr(rcounts0)

  # For each veg stratum band:
  #   - increment total point count
  #   - call worker function to calculate quantiles and/or MLE
  #
  res <- lapply(2:nbands, function(iband) {
    rband0 <- terra::subset(rcounts0, iband)
    rband1 <- terra::subset(rcounts1, iband)

    rsum0 <<- rsum0 + rband0
    rsum1 <<- rsum1 + rband1

    rres <- terra::app(c(rband0, rsum0, rband1, rsum1), fn, istratum=iband-1)

    # Set names for result layer(s) for this stratum
    if (length(probs) > 0) {
      names(rres) <- sprintf("%s_dq_%g", names(rcounts0)[iband], probs)
    }

    rres
  })

  # Merge list of result layers into a single SpatRaster
  # and return
  terra::rast(res)
}


# Helper function for get_cover_difference() that estimates change for a subset of cells
# identified by a set of point features (sf data frame) and returns the results as an
# sf data frame.
#
.do_cover_difference_sample <- function(rcounts0, rcounts1, samplelocs, probs, backgroundNA, beta_prior, nsim, seed) {
  # Both input rasters must have coordinate reference systems defined, although
  # we allow them to be different (e.g. GDA94 vs GDA2020)
  crs0 <- sf::st_crs(rcounts0)
  if (is.na(crs0)) {
    stop("First point counts raster does not have a coordinate reference system")
  }

  crs1 <- sf::st_crs(rcounts1)
  if (is.na(crs1)) {
    stop("Second point counts raster does not have a coordinate reference system")
  }

  if (inherits(samplelocs, "sf")) {
    # Sample locations must have a coordinate reference system defined
    if (is.na(sf::st_crs(samplelocs))) {
      stop("Sample locations must have a coordinate reference system defined")
    }

  } else {
    # Sample locations should be a two column matrix or data frame
    if (!(is.matrix(samplelocs) || is.data.frame(samplelocs)) || ncol(samplelocs) != 2) {
      stop("Sample locations should be a two-column matrix or data frame of X-Y coordinates")
    }

    if (is.matrix(samplelocs)) {
      samplelocs <- as.data.frame(samplelocs)
    }

    # Before proceeding - make sure that the two rasters are in the same projection
    if (crs0 != crs1) {
      msg <- glue::glue("The two input rasters have different coordinate reference systems.
                        Sample locations must be provided as an 'sf' spatial data frame in this case.")
      stop(msg)
    }

    # Assume (hope) that the coordinates are in the correct projection for the rasters
    samplelocs <- sf::st_as_sf(samplelocs, coords = 1:2, crs = crs1)
  }

  crs_sample <- sf::st_crs(samplelocs)

  # Extract sample values from the first point counts raster
  if (crs_sample == crs0) {
    dat0 <- terra::extract(rcounts0, samplelocs)
  } else {
    dat0 <- terra::extract(rcounts0, sf::st_transform(samplelocs, crs0), ID = TRUE)
  }

  # Extract sample values from the second point counts raster
  if (crs_sample == crs1) {
    dat1 <- terra::extract(rcounts1, samplelocs)
  } else {
    dat1 <- terra::extract(rcounts1, sf::st_transform(samplelocs, crs1), ID = TRUE)
  }

  # Check that there is a common set of sample locations
  if ( !(nrow(dat0) == nrow(dat1) &&
         all(dat0$ID == dat1$ID)) ) {
    stop("One or more sample locations are inside one raster but outside the other")
  }

  # Column names for results
  res_names <- paste0(rep(colnames(dat0)[-(1:2)], each=length(probs)), "_dq_", probs)

  .do_beta_diff <- function(x0, x1) {
    xsum0 <- cumsum(x0)
    xsum1 <- cumsum(x1)

    res <- lapply(2:length(x0), function(i) {
      if (backgroundNA && (xsum0[i] == 0 || xsum1[i] == 0)) {
        # No lidar points in one or both layers for this stratum and below
        rep(NA_real_, length(probs))
      } else {
        qdiffbeta_sim(probs,
                      s0 = x0[i], n0 = xsum0[i],
                      s1 = x1[i], n1 = xsum1[i],
                      beta_prior = beta_prior[i-1, ], # i-1 is stratum index
                      nsim = nsim, seed = seed)
      }
    })

    # reshape into a single vector with elements ordered by stratum then quantile
    res <- do.call(c, res)

    res <- round(res, digits = 4)
    names(res) <- res_names
    res
  }

  # Convert data frame to matrix for each set of samples to make
  # subsetting easier
  dat0 <- as.matrix(dat0)
  dat1 <- as.matrix(dat1)

  # Do beta difference calculations for each sample point x stratum
  res <- lapply(1:nrow(dat0), function(i) .do_beta_diff(dat0[i,-1], dat1[i,-1]))

  # Join results to the sf data frame for the sample locations
  # and return
  res <- do.call(rbind, res)
  res <- cbind(samplelocs, res)

  res
}


#' Generate candidates for beta prior parameters
#'
#' This function can be used to explore alternative beta distributions to use
#' with the \code{beta_prior} argument of functions \code{get_cover_quantiles}
#' and \code{get_cover_difference}. It returns parameter values of distributions
#' corresponding to a prior expected value for stratum vegetation cover and,
#' optionally, draws a graph of the alternative distributions.
#'
#' This function interprets the expected vegetation cover value as the mode of
#' the beta distribution, i.e. the maximum likelihood value. For this reason it
#' will only accept values greater than 1.0 for the first beta parameter
#' (argument \code{'a'}). A beta(1,1) distribution is equivalent to a uniform
#' distribution of cover values, while \code{beta(a, b)} distributions with
#' \code{a < 1} are concave, i.e. the mode is undefined.
#'
#' Given an expected
#' (maximum likelihood) cover value, and candidate values for the first beta
#' parameter, the function calculates corresponding values for the second beta
#' parameter and returns a matrix with columns for the beta parameters (named
#' 'a' and 'b'), and summary statistics.
#'
#' @param cover Expected value (i.e. maximum likelihood value) of vegetation
#'   cover. Must be a single value greater than 1.0.
#'
#' @param a Candidate values for the first parameter of the beta distribution.
#'   Default is \code{c(2, 4, 8, 16)}.
#'
#' @param plot A logical value (default is \code{FALSE}) specifying whether to draw a
#'   graph of the beta distributions.
#'
#' @return A matrix with columns for the beta parameters ('a' and 'b') plus
#'   columns for the median cover and the \code{c(0.05, 0.95)} quantiles
#'   (end-points of central 90\% interval). The input expected cover value is
#'   attached as the attribute \code{"cover"}.
#'
#' @examples
#' # Return parameter values and draw a graph of beta distributions for
#' # an expected stratum cover value of 40\%
#' params <- beta_prior_helper(0.4, plot=TRUE)
#'
#' # Retrieve the input expected cover value used for calculations
#' attr(params, "cover")
#'
#' @importFrom graphics abline curve legend
#' @importFrom stats dbeta qbeta
#' @export
#'
beta_prior_helper <- function(cover, a = c(2, 4, 8, 16), plot = FALSE) {
  if (!all(a > 1)) stop("Value(s) for first beta parameter (a) must be greater than 1.0")

  if (length(cover) > 1) {
    warning("Only using first cover value")
    cover <- cover[1]
  }

  stopifnot(cover > 0 & cover < 1)

  a <- rev(sort(unique(a)))
  b <- (a - 1) / cover - a + 2

  if (plot) {
    clrs <- grDevices::palette.colors(length(a))

    curve(dbeta(x, a[1], b[1]), xlab = "vegetation cover", ylab = "density", lwd=2)
    abline(v = cover, lty = 2)

    if (length(a) > 1) {
      clrs <- grDevices::palette.colors(length(a))

      for (k in 2:length(a)) curve(dbeta(x, a[k], b[k]), add=TRUE, col = clrs[k], lwd=2)

      x <- 0.7
      y <- dbeta(cover, a[1], b[1])
      labels <- sprintf("beta(%.3g, %.3g)", a, b)
      legend(x, y, legend = labels, col = clrs, lwd = 2, bty = "n")
    }
  }


  res <- cbind(a = a, b = b,
               median = qbeta(0.5, a, b),
               lwr90 = qbeta(0.05, a, b),
               upr90 = qbeta(0.95, a, b))

  # Order matrix rows by increasing 'a' value
  res <- res[order(a), ]

  # Record the input cover value as an attribute
  attr(res, "cover") <- cover

  res
}


# Private helper to check that a raster looks like valid point count data.
# Input must be a terra::SpatRaster object.
#
.sanity_check_point_counts <- function(r, label = NULL) {
  if (is.null(label)) label <- deparse(substitute(r))

  if (!inherits(r, "SpatRaster")) stop(label, " must be a terra package SpatRaster object")

  if (!all(terra::hasMinMax(r))) terra::setMinMax(r)
  lims <- t( terra::minmax(r) ) # bands as rows; min,max as cols

  # Check sane minimum values
  if (any(lims[,1] < 0)) stop(label, " has minimum value less than zero in one or more bands")

  # Check values are integers
  x <- as.vector(lims)
  if (any(x - as.integer(x) != 0)) stop(label, " has non-integer values in one or more bands")
}


# Private helper function called by get_cover_quantiles() and get_cover_difference().
# Checks that the beta_prior argument has valid form and values and converts it to
# matrix form.
#
# x - numeric vector or list of number vectors
#
# rcounts - a multi-band terra::SpatRaster with point counts for
#   ground (band 1) and vegetation strata
#
.prepare_beta_prior <- function(x, rcounts) {
  nm <- deparse(substitute(x))
  nlayers <- terra::nlyr(rcounts)

  .check_vector <- function(v, index = NULL) {
    if (!is.null(index)) nm <- paste(nm, "element", index)
    if (length(v) != 2) stop(nm, " should be a numeric vector with two values")
    if (!all(v > 0)) stop("Both values in ", nm, " must be greater than zero")
  }

  if (is.numeric(x)) {
    .check_vector(x)

    # Return matrix with same beta parameters for all stratum bands
    matrix(x, nrow = nlayers-1, ncol=2, byrow = TRUE)


  } else if (is.list(x)) {
    if (length(x) > nlayers - 1) {
      msg <- glue::glue(
        "The list of beta prior parameters '{nm}' has {length(x)} elements.
        Expected at most {nlayers - 1} elements for the non-ground strata in
        the point counts raster")
      stop(msg)
    }

    # Match list element names to raster layer names
    if (is.null(names(x))) {
      msg <- glue::glue(
        "The list of beta prior parameters '{nm}' must have element names
         matching point count layer names for vegetation strata")
      stop(msg)

    } else {
      el.names <- stringr::str_trim(tolower(names(x)))
      lyr.names <- stringr::str_trim(tolower(names(rcounts)))[-1] # ignore first (ground) layer

      if (any(table(el.names) > 1)) {
        stop("One or more beta prior list element names are repeated (case-insensitive)")
      }

      found <- el.names %in% lyr.names
      if (!all(found)) {
        msg <- glue::glue(
          "The list of beta prior parameters '{nm}' contains
           one or more elements that do not match raster layer names:
           {paste(el.names[!found], collapse = ', ')}")
        stop(msg)
      }
    }

    # Finally, check that element values are all valid length-2 vectors
    for (i in seq_along(x)) .check_vector(x[[i]], i)


    # Convert list to a matrix with c(1,1) default parameters for any layers
    # not matched to a list element name.
    m <- matrix(rep(1, 2*(nlayers-1)), ncol = 2)
    index <- match(el.names, lyr.names)
    for (k in seq_along(index)) {
      m[index[k], ] <- x[[k]]
    }

    m

  } else {
    stop(nm, " must be a two-element numeric vector or a named list of vectors")
  }


}
mbedward/CERMBlidar documentation built on April 10, 2024, 2:05 p.m.