R/assign.R

Defines functions assign

Documented in assign

#' Tracks genets through time
#'
#' @description This function tracks individual organisms through time, but only
#' for one species in one quadrat. It is designed for use within the
#'  \code{\link{trackSpp}} function, and is not intended for use on its own.
#'
#' @details see \code{\link{trackSpp}} for details of arguments and usage.
#'
#' @param dat An sf data.frame of the same format as
#' \code{\link{grasslandData}}. 'dat' must contain data for only one species in
#' one quadrat. It must have columns that contain...
#' * a unique identification for each research site in character format
#' with no NAs (the default column name is "Site")
#' * species name in character format with no NAs (the default column
#' name is "Species")
#' * unique quadrat identifier in character format with no NAs (the default
#' column name is "Quad")
#' *  year of data collection in integer format with no NAs (the
#' default column name is "Year")
#' * an s.f 'geometry' column that contains a= polygon or multipolygon data type
#' for each individual observation (the default column name is "geometry")
#'
#' This function will add columns called "basalArea_ramet", "trackID",
#' "age", "size_tplus1", "recruit" and "survives_tplus1", so 'dat' should not
#' contain columns with these names.
#' @param inv An integer vector that contains all of years in which this quadrat
#' (or other unique spatial area) was sampled. Years must be
#' ordered sequentially.
#' @param dorm A numeric vector of length 1, indicating the number of years this
#' species is allowed to go dormant, i.e. be absent from the map but be
#' considered the same individual when it reappears. This must be an integer
#' greater than or equal to 0.
#' @param buff A numeric vector of length 1 that is greater than or equal to
#' zero, indicating how far (in the same units as spatial values in 'dat') a
#' polygon can move from year \code{t} to year \code{t+1} and still be
#' considered the same individual.
#' @param buffGenet A numeric vector of length 1 that is greater than or equal
#' to zero, indicating how close (in the same units as spatial values in 'dat')
#' polygons must be to one another in the same year to be grouped as a genet
#' (if 'clonal' argument = TRUE). This argument is passed to
#' the \code{\link{groupByGenet}} function, which is used inside the
#' \code{\link{assign}} function.
#' @param clonal A logical argument of length 1, indicating whether this
#' species is allowed to be clonal or not (i.e. if multiple polygons (ramets)
#' can be grouped into one individual (genet)). If clonal = TRUE, the species is
#' allowed to be clonal, and if clonal = FALSE, the species is not allowed to
#' be clonal.
#' @param flagSuspects A logical argument of length 1, indicating whether
#' observations that are 'suspect' will be flagged. The default is
#' `flagSuspects = FALSE`. If `flagSuspects = TRUE`, then a column called
#' 'Suspect' is added to the output data.frame. Any suspect observations get a
#' 'TRUE' in the 'Suspect' column, while non-suspect observations receive a
#' 'FALSE'. There are two ways that an observation can be classified as
#' 'suspect'. First, if two consecutive observations have the same trackID, but
#' the basal area of the observation in year \code{t+1} is less that a certain
#' percentage (defined by the `shrink` arg.) of the basal area of the
#' observation in year t, it is possible that the observation in year \code{t+1}
#' is a new recruit and not the same individual. The second way an observation
#' can be classified as 'suspect' is if it is very small before going dormant.
#' It is unlikely that a very small individual will survive dormancy, so it is
#' possible that the function has mistakenly given a survival value of '1' to
#' this individual. A 'very small individual' is any observation with an area
#' below a certain percentile (specified by 'dormSize') of the size distribution
#' for this species, which is generated using all of the size data for this
#' species in 'dat'.
#' @param shrink A single numeric value. This value is only used when
#' `flagSuspects = TRUE`. When two consecutive observations have the same
#' trackID, and the ratio of size_t+1 to size_t is smaller than the value of
#' `shrink`, the observation in year \code{t} gets a 'TRUE' in the 'Suspect'
#' column. For example, `shrink = 0.2`, and an individual that the tracking
#' function has identified as 'BOUGRA_1992_5' has an area of 9 cm^2 in
#' year \code{t} and an area of 1.35 cm^2 in year \code{t+1}. The ratio of
#' size \code{t+1} to size \code{t} is 1.35/9 = 0.15, which is smaller than the
#' cutoff specified by `shrink`, so the observation of BOUGRA_1992_5' in year
#' \code{t} gets a 'TRUE' in the 'Suspect' column. The default value
#' is `shrink = 0.10`.
#' @param dormSize A single numeric value. This value is only used when
#' `flagSuspects = TRUE` and `dorm` is greater than or equal to 1. An individual
#' is flagged as 'suspect'
#' if it 'goes dormant' and has a size that is less than or equal to the
#' percentile of the size distribution for this species that is designated by
#' `dormSize`. For example, `dormSize = 0.05`, and the focal individual has a
#' basal area of 0.5 cm^2. The 5th percentile of the distribution of size for
#' this species, which is made using the mean and standard deviation of all
#' observations in 'dat' for the species in question, is 0.6 cm^2. This
#' individual does not have any overlaps in the next year (year \code{t+1}), but
#' does have an overlap in year \code{t+2}. However, because the basal area of
#' this observation is smaller than the 5th percentile of size for this species,
#' the observation in year \code{t} will get a 'TRUE' in the 'Suspect' column.
#' It is possible that the tracking function has mistakenly assigned a '1' for
#' survival in year t, because it is unlikely that this individual is large
#' enough to survive dormancy. The default value is `dormSize = .05`.
#' @param ... Other arguments passed on to methods. Not currently used.
#' @param  inheritsFromTrackSpp A logical argument that is only applicable when
#' 'assign()' is used internally in 'trackSpp()'.
#' @param nearEdgeBox An sf data frame indicating the bounding box of the
#' quadrat. This argument is only used if 'assign()' is used internally
#' in 'trackSpp()'.
#'
#' @seealso [trackSpp()], which is a wrapper for the [assign()] function that
#' applies it over many species and quadrats. The [assign()] function uses the
#' [groupByGenet()] function to group ramets into genets
#' (if 'clonal' argument = TRUE).
#'
#' @import sf
#' @importFrom stats aggregate reshape qnorm sd
#' @importFrom utils stack

 assign <- function(dat,
                    inv,
                    dorm,
                    buff,
                    buffGenet,
                    clonal,
                    flagSuspects = FALSE,
                    shrink = 0.1,
                    dormSize = .05,
                    inheritsFromTrackSpp = FALSE,
                    nearEdgeBox = NULL,
                    ...){
  ## argument checking --------------------------------------------------------
   ## don't really do arg. checking, since this function is not exported, and
   # expects to recieve an input directly from the 'trackSpp' function, which
   # has already done comprehensive arg. checking.

  ## internal functions ------------------------------------------------------
  ## FUNCTION FOR AGGREGATING BY GENET (if clonal or not clonal)
  ## this fxn is internal to the 'assign' fxn
  ifClonal <- function(cloneDat, clonal, buffGenet, ...) {
    ## arguments
    ## cloneDat = cloneDataset for one species/quad/year (input from 'assign()'
    # fxn), is a  subset of the 'dat' argument from the 'assign()' fxn
    ## clonal = inherits from the 'clonal' argument in the 'assign()' fxn
    ## buffGenet = inherits from the 'buffGenet' argument in the 'assign()' fxn,
    # is input into the plantTracker::groupByGenet() fxn
    if(clonal == TRUE) {
      cloneDat$genetID <- groupByGenet(cloneDat, buffGenet)
      ## aggregate size by genetID (total size for each ramet)
      tempCloneDat <- stats::aggregate(basalArea_ramet ~ genetID, sum,
                                       data = cloneDat)
      names(tempCloneDat) <- c("tempGenetID", "basalArea_genet")
      ## add aggregated size data to the cloneData df
      cloneDat$basalArea_genet <- tempCloneDat[match(cloneDat$genetID,
                                               tempCloneDat$tempGenetID),
                                         "basalArea_genet"]
    }
    ## assign unique genetIDs for every polygon (if clonal = FALSE)
    else { #if(clonal==FALSE) {
      cloneDat$genetID <- 1:nrow(cloneDat)
      cloneDat$basalArea_genet <- cloneDat$basalArea_ramet
    }

    return(cloneDat)
  }

  ## work -------------------------------------------------------------------
  ## remove the out data.frame
  if(exists("assignOut") == TRUE){
    suppressWarnings(rm("assignOut"))
  }
  ## make sure dat is in the correct sf format
  dat <- sf::st_as_sf(x = dat, sf_column_name = "geometry")
  ## add columns to the 'dat' dataset needed for output from assign()
  dat$trackID <- as.character(NA)
  dat$age <- as.integer(NA)
  dat$size_tplus1 <- as.numeric(NA)
  dat$recruit <- as.integer(NA)
  dat$survives_tplus1 <- as.integer(NA)
  dat$ghost <- as.integer(NA)
  dat$basalArea_genet <- as.numeric(NA)
  dat$genetID <- as.character(NA)
  ## if not already present, create a column called 'basalArea_ramet' in 'dat'
  if (sum(dat$basalArea_ramet) == 0) {
    dat$basalArea_ramet <-  sf::st_area(dat)
  }
  ## if 'flagSuspects' is TRUE, add a column for the flags to go into
  if (flagSuspects == TRUE) {
    dat$Suspect <- FALSE ## default value is 0
    }

  ## get the 6-letter species code for each observation
  ## make a column in the d.f with the 6-letter species code for each row
  ## only if the species column contains species name with a space
  if (sum(grepl(pattern = "[[:space:]]",x = dat$Species)) > 0  ## does the
      # species column contain a space? (is the length of the rows that contain
      # a space greater than 0?)
      ) {
    dat$sp_code_6  <- sapply(strsplit(dat$Species, " "), function(x)
      paste0(substr(toupper(x[1]), 1, 3), ## species name
             substr(toupper(x[2]), 1, 3)) ## genus name
    )
  } else if (sum(grepl(pattern = "_",x = dat$Species)) > 0) { ## if the species
    # name doesn't contain a space, does it have an underscore?
    dat$sp_code_6  <- sapply(strsplit(dat$Species, "_"), function(x)
      paste0(substr(toupper(x[1]), 1, 3), ## species name
             substr(toupper(x[2]), 1, 3)) ## genus name
    )
  } else {  ## if the species column contains some sort of code (i.e.
    # uppercase letters), or something else...
    # put the name alone in the sp_code_6 column
    dat$sp_code_6 <- dat$Species
  }


  ## assign an arbitrary, unique index number to each row in the dataset
  dat$index <- c(1:nrow(dat))

  ## find the first year in the dataset that was actually sampled
  firstDatYear <- min(dat$Year)
  ## find the index of the first year in the quadrat inventory
  firstYearIndex <- which(inv == firstDatYear)

  ## get the dataset for the first year of actually sampling
  tempPreviousYear <- dat[dat$Year == firstDatYear,]
  ## assign genetIDs to the first year of sampling dataset
  tempPreviousYear <- ifClonal(cloneDat = tempPreviousYear, clonal = clonal,
                              buffGenet = buffGenet)
  ## assign a unique trackID to every unique genetID in this first-year dataset
  IDs <- data.frame( "genetID" = sort(unique(tempPreviousYear$genetID)), ## get
                     # a vector of all of the unique genetIDs
                     "trackID" = paste0(unique(dat$sp_code_6), ## get the unique
                                        # 6-letter species code
                                        "_",unique(tempPreviousYear$Year), ## get
                                        # the unique year
                                        "_",c(1:length(unique(
                                          tempPreviousYear$genetID))))) ## get a
  # vector of  unique numbers that is the same length as the genetIDs in this
  # quad/year

  ## get the row index numbers of the rows in 'IDs' that match the genetID of
  # the row in 'tempPreviousYear'
  tempPreviousYear$trackID <- IDs[match(tempPreviousYear$genetID, IDs$genetID),
                                 "trackID"]

  ## give all individuals in year #1 a '0' in the ghost column
  tempPreviousYear$ghost <- 0

  ## check that first year in quadrat inventory and first year of
  # actual data match
  if (min(dat$Year) > inv[1]) { ## if the year of 'tempPreviousYear' is NOT
    # the same as the first year of data in the quadrat inventory (if it is
    # larger), then each individual in tempPreviousYear gets a '1' in the
    # 'recruit' column, and a '0' in the 'age' column
    tempPreviousYear$recruit <- 1
    tempPreviousYear$age <- 0
    }
  if (min(dat$Year) == inv[1]) { ## if the year of 'tempPreviousYear' is the SAME
    # as the first year of the quadrat inventory, then make sure that there is
    # an 'NA' in both the 'recruit' and 'age' columns
    tempPreviousYear$recruit <- NA
    tempPreviousYear$age <- NA
  }
  if (min(dat$Year) < inv[1]) { ## if the year of 'tempPreviousYear' is SMALLER
    # (earlier) than the first year of the quadrat inventory, then return an
    # error
    stop("Quadrat inventory dataset does not match years in the sample dataset")
  }

  ## 'tempPreviousYear' data.frame will get redefined for
  # each iteration of the for-loop below
  tempPrevI_empty <- FALSE
  ##  i = year in inventory
  if (inv[firstYearIndex] < max(inv)) { ## is the first year of data also the
    # last year of sampling?
    for (i in (firstYearIndex+1):length(inv)) {
       if (tempPrevI_empty == TRUE) {
         tempPreviousYear <- dat[dat$Year == inv[i-1], ]
       if (nrow(tempPreviousYear) > 0) {
         tempPreviousYear <- ifClonal(cloneDat = tempPreviousYear,
                                      clonal = clonal, buffGenet = buffGenet)
         tempPreviousYear$trackID <- paste0(tempPreviousYear$sp_code_6,
                                            "_", tempPreviousYear$Year, "_",
                                            tempPreviousYear$genetID)
         tempPrevI_empty <- FALSE
         } else {
         tempPrevI_empty <- TRUE
         next
       }
    }
      ## CHECK IF YEARS ARE CONTINUOUS -- check to see if the sampling years of
      # 'tempPreviousYear' and 'tempCurrentYear' are not far enough apart to
      # exceed the 'dormancy' argument. If dormancy is not exceeded, then go
      # ahead with this loop. If it is, then freshly redefine 'tempPreviousYear'
      # and proceed to the next 'i'
      if (inv[i] - inv[i-1] > (dorm+1)) { ## if the gap between years EXCEEDS
        # the dormancy argument
        ## put the 'tempPreviousYear' data into the 'assignOut' d.f, making sure
        # that the columns are in the correct order, and that the
        # 'survives_tplus1' and 'size_tplus1' columns contain 'NA'
        if (exists("assignOut") == TRUE) {
          ## if this is not the first year, then add demographic data to the
          # output d.f
          assignOut <- rbind(assignOut, tempPreviousYear)
        } else{ ## if the assignOut df is empty
          assignOut <- tempPreviousYear
        }
       ## is there any more data? if not, then end the loop
       if(inv[i] > max(dat$Year)) {
        break
       }
        ## get data from year i and put in 'tempPreviousYear'
        tempPreviousYear <- dat[dat$Year==inv[i],]

        ## determine if the tempPreviousYear data.frame has any data
        if (nrow(tempPreviousYear) < 1) { ## if there is NOT data in year i,
          # then go to the next i--but first indicate that the previous i
         # did not have any data
         tempPrevI_empty <- TRUE
          next
        }
        if (nrow(tempPreviousYear) > 0 ) { ## if there IS data in year i, then
          # group by genets and assign trackIDs, but first check if this is the
          # first year (if so, then don't have to make trackIDs b/c it already
          # has them!)

          ## is there data for genetIDs in the 'tempPreviousYear' data?
          if (sum(is.na(tempPreviousYear$genetID) == TRUE) >=
            1) { ## if there is
            # NOT data for genetID
            ## group by genet
            tempPreviousYear <- ifClonal(cloneDat  = tempPreviousYear,
                                        clonal = clonal, buffGenet = buffGenet)

            ## assign a unique trackID to every unique genetID
            IDs <- data.frame( "genetID" = sort(unique(
              tempPreviousYear$genetID)), ## get a vector of all the genetIDs
              "trackID" = paste0(unique(dat$sp_code_6), ## get the sp. code
                                 "_",unique(tempPreviousYear$Year), ## get the
                                 # unique year
                                 "_",c(1:length(unique(
                                   tempPreviousYear$genetID))))) ## get a
            # vector of unique numbers that is the same length as the genetIDs
            # in this quad/year

            ## add trackIDs to the tempPreviousYear data.frame
            tempPreviousYear$trackID <- IDs[match(tempPreviousYear$genetID,
                                                 IDs$genetID),"trackID"]

          } ## end of 'if' that determines if there is genetID data, and if not,
          # assigns genetID and trackID
        } ## end of 'if' that determines if there is data in year i
        ## end of 'if' that determines what to do if the gap between years
        # exceeds the dormancy argument
    } else { ## if the gap between years (inv[i] - inv[i-1] <= (dorm+1))
        # does NOT exceed the dormancy argument
        ## 'tempPreviousYear' is the sf data.frame of the 'current' year
        ## need to get the sf data.frame of the 'next' year (year 'i')
        tempCurrentYear <-  sf::st_as_sf(dat[dat$Year==inv[i],])

        ## MAKE SURE THERE IS DATA IN YEAR i-1 (tempPreviousYear isn't empty) If
        # not, then go to the next i (only after replacing 'tempPreviousYear'
        # with the tempCurrentYear data.frame)
        if (nrow(tempPreviousYear) < 1) { ## what to do if 'tempPreviousYear'
          # does NOT exist, then overwrite the tempPreviousYear w/ data from the
          # 'next' year, and go to the next i
          ## but first, give them trackIDs and recruit/age data (if
          # appropriate)-- because they go directly into 'tempPreviousYear,' so
          # don't get these data later in the 'orphans' section
          if (nrow(tempCurrentYear) > 0) {
            ## get a d.f that contains the obs. w/ no trackIDs
            tempTrackIDs <- tempCurrentYear[
              is.na(tempCurrentYear$trackID)==TRUE,]
            ## assign them genetIDs first
            tempTrackIDs <- ifClonal(tempTrackIDs,
                                     clonal = clonal, buffGenet = buffGenet)
             ## add genet basal areas to the tempCurrentYear data.frame
             tempCurrentYear[is.na(tempCurrentYear$trackID) == TRUE,
                             "basalArea_genet"] <- tempTrackIDs$basalArea_genet
            ## add trackIDs to the tempCurrentYear data.frame
             tempCurrentYear[is.na(
               tempCurrentYear$trackID)==TRUE,"trackID"] <- paste0(
               tempTrackIDs$sp_code_6, "_", tempTrackIDs$Year, "_",
               tempTrackIDs$genetID)
            ## then need to add 'age' and 'recruit' data (but first check that
            # this isn't the first year after a gap in sampling)
            tempCurrentYear[(tempCurrentYear$Year - inv[i-1]) < 2,
                         c("age")] <- 0
            tempCurrentYear[(tempCurrentYear$Year - inv[i-1]) < 2,
                         c("recruit")] <- 1
          }
          ## if else to determine what to do next, but depending on whether this
          # is the last year of sampling or not
          if (inv[i] == max(inv)) { ## if this IS the last year
            ## put the 'tempCurrentYear' data into the assignOut d.f
            if (exists("assignOut") == TRUE) {
              ## if this is not the first year, then add demographic data to the
              # output d.f
              assignOut <- rbind(assignOut, tempCurrentYear)
            } else{ ## if the assignOut df is empty
              assignOut <- tempCurrentYear
            }
          } else { ## if this is NOT the last year
            ## put tempCurrentYear data into tempPreviousYear, then go to the next i
            tempPreviousYear <- sf::st_as_sf(tempCurrentYear)
            next
          }
          ## end of 'if' of what to do if tempPreviousYear is empty
        } else  { ## what to do if the 'tempPreviousYear'
          # (nrow(tempPreviousYear)>=1) DOES have data
          ## add a buffer to the previous year data
          tempPreviousBuff <- sf::st_buffer(tempPreviousYear, buff)

          ## MAKE SURE THERE IS DATA IN YEAR i (tempCurrentYear)
          if (nrow(tempCurrentYear) < 1) { ## if the tempCurrentYear data does NOT
            # exist, then keep the tempPreviousYear data.frame for the next i
            ## if the gap between year of measurement (year inv[i-1]) and the
            # next i (year inv[i+1]) does not exceed the dormancy argument, then
            # roll those individuals over into 'tempPreviousYear' for the next i
            # (with a '1' in the 'ghost' column)

            ## do this workflow for years when that are not the 'last' year
            if (inv[i] != max(inv)) {
              ## get 'ghosts' (if there are any)
              ghosts <- tempPreviousYear[((inv[i+1]-tempPreviousYear$Year) <=
                                           (dorm + 1)),]
              ## put a '1' in the 'ghost' column for ghosts
              if (nrow(ghosts)>0) {
                ghosts$ghost <- '1'
              }

              ## get 'deadGhosts' (if there are any) (if the gap between year
              # i-1 and year i+1 exceeds the dormancy argument) and make sure
              # columns are in correct order
              deadGhosts <- tempPreviousYear[((inv[i+1]-tempPreviousYear$Year) >
                                               (dorm + 1)),
                                            names(dat)]

              ## rewrite 'tempPreviousYear' so it just has 'ghosts'
              # (not 'deadGhosts')
              tempPreviousYear <- ghosts

              if (nrow(deadGhosts)>0) {
                ## deadGhosts get a '0' in the 'survival' column
                deadGhosts$survives_tplus1 <- 0
                ## add the 'deadGhosts' to the 'assignOut' df
                if (exists("assignOut") == TRUE) {
                  ## if this is not the first year, then add demographic data
                  # to the output d.f
                  assignOut <- rbind(assignOut, deadGhosts)
                } else  { ## if the assignOut df is empty
                  assignOut <- deadGhosts
                }
              }
          } else {  ## what to do if this i is the 'last' year!
              ## get 'ghosts' (if there are any)
              ghosts <- tempPreviousYear[(((inv[i]+1)-tempPreviousYear$Year) <=
                                           (dorm + 1)),]

              ## get 'deadGhosts' (if there are any) (if the gap between year
              # i-1 and year i+1 exceeds the dormancy argument) and make sure
              # columns are in correct order
              deadGhosts <- tempPreviousYear[(((inv[i]+1)-tempPreviousYear$Year) >
                                               (dorm + 1)),
                                            c(names(dat))]

              if (nrow(deadGhosts)>0) {
                ## deadGhosts get a '0' in the 'survival' column
                deadGhosts$survives_tplus1 <- 0
                ## add the 'deadGhosts' to the 'assignOut' df
                if (exists("assignOut") == TRUE) {
                  ## if this is not the first year, then add demographic data
                  # to the output d.f
                  assignOut <- rbind(assignOut, deadGhosts)
                } else  { ## if the assignOut df is empty
                  assignOut <- deadGhosts
                }
              }
              ## put a '1' in the 'ghost' column for ghosts
              if (nrow(ghosts)>0) {
                ghosts$ghost <- '1'
                ghosts$survives_tplus1 <- NA
                ghosts$size_tplus1 <- NA
                ## add the 'deadGhosts' to the 'assignOut' df
                if (exists("assignOut") == TRUE) {
                  ## if this is not the first year, then add demographic data
                  # to the output d.f
                  assignOut <- rbind(assignOut, ghosts)
                } else  { ## if the assignOut df is empty
                  assignOut <- ghosts
                }
              }
            }
            ## go to next i
            next
            ## end of 'else' that has steps if tempCurrentYear is empty
          } else { ## if the tempCurrentYear data DOES exist
            ## FIND OVERLAPPING POLYGONS
            ## get the amount of overlap between each polygon
            overlapArea <- suppressWarnings(sf::st_intersection(tempPreviousBuff,
                                                                tempCurrentYear))

            ## determine if there are overlaps between the previous year and
            # next
            ## if there is NOT overlap, then skip the 'while' loop below, and
            # proceed with the code to assign trackIDs to new 'orphans' and to
            # assign 'ghost' status
            if (nrow(overlapArea) > 0) { ## if there IS overlap
              ## get the trackID names merged with the index value (same values
              # used for names of rows in 'overlaps' matrix)
              overlapArea$parentName <- paste0(overlapArea$trackID, "__",
                                               overlapArea$index)
              ## get the genetID names merged with the index value (same values
              # used for names of columns in 'overlaps' matrix)
              overlapArea$childName <- paste0("childPoly__",
                                              overlapArea$index.1)

              ## calculate the overlap between each parent poly and each child
              overlapArea$overlappingArea <- sf::st_area(overlapArea$geometry)
             # remove units from the overlap matrix, if they exist
              base::units(overlapArea$overlappingArea) <- NULL

              overlapArea <- sf::st_drop_geometry(overlapArea)

              ## get only the necessary data from the overlapArea d.f
              overlaps <- overlapArea[,c("parentName", "childName",
                                         "overlappingArea")]

              ## transform the overlaps dataframe into a 'wide' dataframe, so
              # that every row is a unique parent trackID, and the columns are
              # children (not completely aggregated yet). The value is the
              # overlap between each parent/child pair
              overlapsTemp <- stats::reshape(overlaps,
                                             v.names = "overlappingArea",
                                             idvar = "parentName",
                                             timevar = "childName",
                                             direction = "wide")

              ## remove the 'index' ID from the parent name
              overlapsTemp$parentName <- sapply(strsplit(
                overlapsTemp$parentName, "__"), unlist)[1,]

              ## aggregate by parent genet, so that each genet has only one
              # row of data (if clonal == TRUE)
              if (clonal == TRUE) {
                oldNames <- names(overlapsTemp)
                overlapsTemp <- stats::aggregate(x = overlapsTemp[,2:ncol(overlapsTemp)],
                                          by = list(
                                            "parentName" =
                                              overlapsTemp$parentName),
                                          FUN = sum, na.rm = TRUE)
                # change '0's to 'NA's
                overlapsTemp[which(overlapsTemp==0, arr.ind = TRUE)] <- NA
                names(overlapsTemp) <- oldNames
              }

              ## add parentTrackID as rownames
              rownames(overlapsTemp) <- as.character(overlapsTemp$parentName)
              ## add childGenetID as the column names
              names(overlapsTemp)  <- c("parentName", sapply(strsplit(names(overlapsTemp)[2:ncol(
                overlapsTemp)], "overlappingArea."), unlist)[2,])
              ## remove old "parentTrackID" column
              overlaps <- as.data.frame(overlapsTemp[,2:ncol(overlapsTemp)])
              rownames(overlaps) <- rownames(overlapsTemp)
              names(overlaps) <- names(overlapsTemp)[2:ncol(overlapsTemp)]

              ## determine if clonality is allowed...
              if (clonal == TRUE) { ## if yes, then each parent can have
                # multiple children (but each child can only have one parent)

                ## each child can have only one parent, so if there is only ever
                # one value in each column, than the next step is easy... each
                # column gets the trackID of the 'parent' that it overlaps with
                multParents <- apply(X = overlaps,
                                     MARGIN = 2,
                                     FUN = function(x) sum(is.na(x)==FALSE))
                # does each child only have one parent?
                if (sum(multParents > 1) != 0) {  ## if no (at least one child has
                  # more than one parent)...
                  ## get a vector of individuals with 'ties' (>1 parent)
                  ties <- names(multParents[multParents > 1])
                  if (length(ties) > 1) {
                    for (m in 1:ncol(as.data.frame(overlaps[,ties]))) {
                     # get the maximum value of overlaps
                      winnerVal <- max(overlaps[,ties][,m], na.rm = TRUE)
                      ## get the location of the highest value between the two ties
                      winner <- which(overlaps[,ties][,m] == winnerVal)
                      # if there is only one 'winner':
                      if (length(winner) ==1) {
                        ## set all other values that aren't the highest in that
                        # column to 'NA'
                        overlaps[,ties][,m][overlaps[,ties][,m] != winnerVal] <- NA
                      } else if (length(winner) > 1) {
                        ##  if there is more than 1 winner (i.e. if there are two
                        # parents with the exact same overlap with the child), then
                        # we have to break the tie some other way. Use the distance
                        # between the centroids to do this

                        ## get the name of the problem 'child'
                        badChild_name <- names(overlaps[,ties])[m]
                        ## get the spatial data for that child
                        badChild <- suppressWarnings(
                          sf::st_centroid(
                            tempCurrentYear[tempCurrentYear$index == as.numeric(
                              strsplit(badChild_name, "__")[[1]][2]),]))

                        ## get the names of the problem 'parents'
                        badParents_names <- rownames(
                          overlaps[is.na(overlaps[,ties][,m])==FALSE,])
                        badParents <-
                          tempPreviousYear[tempPreviousYear$trackID
                                           %in% badParents_names,]
                        ## aggregate badParents by trackID so that each genet has
                        # only one row of data
                        badParents <- suppressWarnings(
                          sf::st_centroid(
                            stats::aggregate(x = badParents[,"trackID"],
                                              by = list("trackID" = badParents$trackID),
                                              FUN = nrow,
                                              do_union = TRUE)))

                        ## compare the centroid distances between parents and child
                        dists <- sf::st_distance(badChild, badParents,
                                                 which = 'Euclidean')
                        rownames(dists) <- badChild_name
                        colnames(dists) <- badParents_names

                        smallDist <- colnames(dists)[which(dists == min(dists))]
                        ## replace the non-winning cell in the 'overlaps' matrix
                        # with an 'NA'
                        overlaps[rownames(overlaps) != smallDist, badChild_name] <- NA
                      }
                    } # end of loop going through each tie
                  } else if (length(ties) == 1) {
                    ## get the highest value between the two ties
                    winner <- max(overlaps[,ties], na.rm = TRUE)
                    # if there is only one 'winner':
                    if (length(which(overlaps == winner)) ==1) {
                      ## set all other values that aren't the highest in that
                      # column to 'NA'
                      overlaps[,ties][overlaps[,ties] != winner] <- NA
                    } else if (length(which(overlaps == winner)) > 1) {
                      ##  if there is more than 1 winner (i.e. if there are two
                      # parents with the exact same overlap with the child), then
                      # we have to break the tie some other way. Use the distance
                      # between the centroids to do this

                      ## get the name of the problem 'child'
                      badChild_name <- ties
                      ## get the spatial data for that child
                      badChild <- suppressWarnings(
                        sf::st_centroid(
                          tempCurrentYear[tempCurrentYear$index == as.numeric(
                            strsplit(badChild_name, "__")[[1]][2]),]))

                      ## get the names of the problem 'parents'
                      badParents_names <- rownames(overlaps)[!is.na(
                       overlaps[,ties])]
                      # get the spatial data for those parents
                      badParents <-
                        tempPreviousYear[tempPreviousYear$trackID
                                         %in% badParents_names,]
                      ## aggregate badParents by trackID so that each genet has
                      # only one row of data
                      badParents <- suppressWarnings(
                        sf::st_centroid(
                          stats::aggregate(x = badParents[,"trackID"],
                                            by = list("trackID" = badParents$trackID),
                                            FUN = nrow,
                                            do_union = TRUE)))

                      ## compare the centroid distances between parents and child
                      dists <- sf::st_distance(badChild, badParents,
                                               which = 'Euclidean')
                      rownames(dists) <- badChild_name
                      colnames(dists) <- badParents_names

                      smallDist <- colnames(dists)[which(dists == min(dists))]
                      ## replace the non-winning cell in the 'overlaps' matrix
                      # with an 'NA'
                      overlaps[rownames(overlaps) != smallDist, badChild_name] <- NA
                    }
                  }
                }

                ## after the previous loop, there is now only one parent for each
                # child. Proceed w/ assigning appropriate trackIDs to children

                ## get the numeric genetIDs of the children (for each parent)
                nameDF <- utils::stack(apply(X = t(overlaps), MARGIN = 1,
                                      function(x) names(x[which(is.na(x)==FALSE)])))
                # rename the columns so they make sense
                names(nameDF) <- c("parentTrackID", "childIndex")

                ## get just the index for the 'children'
                nameDF$childIndex <- as.numeric(sapply(
                  strsplit(
                    as.character(nameDF$childIndex), split = "__"), unlist)[2,])

                ## put the trackID from the parent into the tempCurrentYear d.f
                # rows that correspond to the genetID of the child
                tempCurrentYear[match(nameDF$childIndex, tempCurrentYear$index),
                                "trackID"] <- nameDF$parentTrackID
              } else if (clonal == FALSE) { ## if no, then each parent can have
                # only one child and each child can have only one parent

                ## set up a 'while' loop
                 whileOverlaps <- overlaps
                 done <- FALSE
                 counter <- 0

                 while (!done) {

                   ## get the row and column indices of the maximum value
                   maxInds <- which(whileOverlaps ==
                                      max(whileOverlaps,na.rm = TRUE),
                                    arr.ind = TRUE)
                   ## what if there is a tie?? --i.e. there are >=2 overlaps
                   # that have the same value
                   if (nrow(maxInds) > 1) {
                     # get the names of the potential parents
                     maybeParents_names <- rownames(whileOverlaps)[unique(
                       maxInds[,"row"])]
                     # get the names of the potential children
                     maybeChildren_names <- names(whileOverlaps)[unique(maxInds[,"col"])]
                     ## get a matrix of centroid distances
                     # the the spatial data for maybeChidlren
                     maybeChildren <- suppressWarnings(
                       sf::st_centroid(
                         tempCurrentYear[tempCurrentYear$index %in%
                                           as.numeric(
                                             as.vector(
                                               data.frame(
                                                 strsplit(
                                                   x = maybeChildren_names,
                                                   split = "__")
                                                 )[2,])),]))

                     # get the spatial data for maybeParents
                     maybeParents <-
                       tempPreviousYear[tempPreviousYear$trackID
                                        %in% maybeParents_names,]

                     ## aggregate badParents by trackID so that each genet has
                     # only one row of data
                     maybeParents <- suppressWarnings(
                       sf::st_centroid(
                         stats::aggregate(x = maybeParents[,"trackID"],
                                          by = list("trackID" = maybeParents$trackID),
                                          FUN = nrow,
                                          do_union = TRUE)))

                     ## compare the centroid distances between parents and child
                     dists <- sf::st_distance(maybeChildren, maybeParents,
                                              which = 'Euclidean')
                     rownames(dists) <- maybeChildren_names
                     colnames(dists) <- maybeParents_names

                     ## get the indices of the smallest distance pair
                     smallDistInds <- which(dists == min(dists), arr.ind = TRUE)

                  # if the smallDistInds d.f contains data for TWO parents (>1 column number!)
                  if (length(unique(smallDistInds[,"col"])) > 1) {
                    for (n in unique(smallDistInds[,"col"])) {
                      # get the data just for the nth parent
                      tinyDistInds <- smallDistInds[smallDistInds[,"col"]==n,]
                      # get the name of the nth child
                      tinyChild <- rownames(dists)[tinyDistInds["row"]]
                      # get the name of the nth parent
                      tinyParent <- colnames(dists)[tinyDistInds["col"]]
                      # give the nth child the trackID of the nth parent (in tempCurrentYear)
                      tempCurrentYear[tempCurrentYear$index ==
                                        strsplit(x = tinyChild, split = "__")[[1]][2],
                                      "trackID"] <- tinyParent
                      # update the "whileOverlaps" matrix w/ appropriate zeros
                      whileOverlaps[tinyParent, tinyChild] <- 0
                      whileOverlaps[tinyParent, ] <- 0
                      whileOverlaps[, tinyChild] <- 0
                    }
                  } else {
                    # if the smallDistInds d.f contains data for only one parent (same column number)

                     ## get the index of the smallest distance child
                     smallChild <- rownames(dists)[smallDistInds[,"row"]]
                     ## get the trackID of the smallest distance parent
                     smallParent <- colnames(dists)[smallDistInds[,"col"]]

                     ## put the trackID from the parent into the tempCurrentYear
                     # d.f rows that correspond to the row index of the child
                     tempCurrentYear[
                       tempCurrentYear$index==strsplit(
                         x = smallChild, split = "__")[[1]][2],
                                     "trackID"] <- smallParent

                     ## overwrite the 'max' value with an NA, so we can find the
                     # next largest value
                     whileOverlaps[smallParent,smallChild] <- 0

                     ## overwrite all of the other values in the parent row and
                     # the child column with NAs also (since each parent can
                     #only have one child, and each child can only have
                     # one parent)
                     whileOverlaps[smallParent,] <- 0
                     whileOverlaps[,smallChild] <- 0
                    }

                   } else { ## if there is only one maximum overlap--no ties

                     ## get the trackID of the parent
                     maxParent <- rownames(whileOverlaps)[maxInds[1,1]]

                     ## get the numeric index of the 'child'
                     maxChild <- as.numeric(strsplit(
                       colnames(whileOverlaps)[maxInds[1,2]],"__")[[1]][2])

                     ## put the trackID from the parent into the tempCurrentYear
                     # d.f rows that correspond to the genetID of the child
                     tempCurrentYear[tempCurrentYear$index==maxChild,
                                     "trackID"] <- maxParent

                     ## overwrite the 'max' value with an NA, so we can find the
                     # next largest value
                     whileOverlaps[maxInds[1,1],maxInds[1,2]] <- 0

                     ## overwrite all of the other values in the parent row and
                     # the child column with NAs also (since each parent can only
                     # have one child, and each child can only have one parent)
                     whileOverlaps[maxInds[1,1],] <- 0
                     whileOverlaps[,maxInds[1,2]] <- 0
                   }

                   ## update the 'counter'
                   counter <- counter + 1

                   if (counter > 3000) {

                     stop("tracking 'while' loop is running out of control!")

                   } ## end of 'if' checking that the counter isn't too large

                   if (sum(whileOverlaps, na.rm = TRUE)==0) {
                     ## if the sum of the  matrix is empty (is all NAs), then
                     # stop the while loop
                     done <- TRUE
                   } ## end of 'if' that's redefining 'done' if matrix is NAs
                 } ## end of 'while' that's finding the max values in the matrix
              } # end of 'if' that determines if clonal is 'TRUE' or 'FALSE'
            } # end of 'if' that determines if there ARE overlaps

            ## make optional checks that will flag obs. as 'suspect'
            if (flagSuspects == TRUE) {
              ## check: individuals w/ the same trackID-- can't have a decrease
              # in size more than 90% and still be the same individual (i.e.
              # current year must be > 10% of the size of previous year)

              ## make sure the areas are aggregated by genet
              smallPrevious <- stats::aggregate(x = sf::st_drop_geometry(
                tempPreviousYear[, "basalArea_ramet"]),
                by = list("Year_prev" = tempPreviousYear$Year,
                          "trackID" = tempPreviousYear$trackID),
                FUN = sum
              )
              names(smallPrevious)[3] <- "Area"

              smallCurrent <- stats::aggregate(x = sf::st_drop_geometry(
                tempCurrentYear[, "basalArea_ramet"]),
                by = list("Year_curr" = tempCurrentYear$Year,
                          "trackID" = tempCurrentYear$trackID),
                FUN = sum
              )
              names(smallCurrent)[3] <- "Area"

              ## get a list of the trackIDs that are present in both the
              # previous and current years
              shrinkage <- merge(smallPrevious, smallCurrent,
                                 by = "trackID")

              ## get ratio of current year size to previous year size. Is the
              # ratio less than or equal to .1?
              if (nrow(shrinkage) > 0) {
                shrinkers <- shrinkage$trackID[(shrinkage$Area.y /
                                                  shrinkage$Area.x) <= shrink]

                if (length(shrinkers) > 0) { ## if there are any shrinkers...
                  ## flag the observation in the PREVIOUS year
                  tempPreviousYear[tempPreviousYear$trackID %in% shrinkers,
                                  "Suspect"] <- TRUE
                }
              }

              ## check: for individuals that are dormant (and only if dorm = 1),
              # then a really tiny organism can't become a really big organism
              # (i.e. an organism that is really tiny probably can't go dormant)
              if (dorm >= 1 & ## if the dorm argument is > 0...
                  length(unique(round(dat$basalArea_ramet, 5))) > 3 ## if the
                  # sizes are not all the same (i.e. if the species is  measured
                  # as polygons, not points)
                  ) {
                ## if there are data that survive from year 1 to year 2
                if (nrow(shrinkage) > 0) {
                  ## is there a difference of greater than 1 year between any of
                  # the individuals with the same trackID?
                  ## get individuals that have a gap > 1
                  # between current and previous year
                  dormants <- shrinkage[shrinkage$trackID %in%
                                          which((shrinkage$Year_curr -
                                                   shrinkage$Year_prev ) > 1),]
                  if (nrow(dormants) > 0) {
                    ## get individuals that have a 'very small size' in the
                    # previous year (as long as the size isn't exactly the same
                    # from year to year--since they are likely then points and
                    # have a fixed radius)
                    dormants <- dormants[round(dormants$Area.x,8) !=
                                           round(dormants$Area.y,8),]
                    ## get the trackID of individuals in the previous year that
                    # were smaller than the 5th percentile of the distribution
                    # of sizes for this species
                    smallest <- exp(qnorm(p = dormSize,
                                          mean = mean(log(dat$Area)),
                                          sd = sd(log(dat$Area))))
                    tooSmallIDs <- dormants[dormants$Area.x < smallest,
                                            "trackID"]

                    ## flag individuals in the PREVIOUS year
                    # that are likely too small to have survived dormancy
                    tempPreviousYear[tempPreviousYear$trackID %in% tooSmallIDs,
                                    "Suspect"] <- TRUE
                  }
                }
              }
            }

            ## ORPHANS: deal with 'child' polygons that don't have parents
            ## give them new trackIDs
            orphans <- tempCurrentYear[is.na(tempCurrentYear$trackID)==TRUE,]
            ## make sure that there are orphans
            if (nrow(orphans)>0) {
              ## orphans are NOT grouped by genet, since it is very unlikely
              # that new recruits consist of multiple ramets

              ## add the orphan trackIDs to the 'orphan's data.frame
              orphans$trackID <- paste0(orphans$sp_code_6, "_",orphans$Year, "_", 1:nrow(orphans))

              ## populate the 'basalArea_genet' column for the orphans
              orphans$basalArea_genet <- orphans$basalArea_ramet

              ## check that the orphans don't come after a gap in sampling (if
              # they do, then leave NA's for all demographic values, if they
              # don't, then proceed with the following code)
              ## check that this is NOT the first year of sampling (i=1)
              if (inv[i] - inv[i-1] <= 1) { ## if this year does NOT come after
                # a gap in sampling
                ## assign a '1' in the recruits column
                orphans$recruit <- 1
                orphans$age <- 0
              }
            } ## end of if that determines if there are any orphans

            ## CHILDREN
            ## (first have to assign the parents)
            ## define the PARENTS data.frame
            parents <- tempPreviousYear[tempPreviousYear$trackID %in%
                                         tempCurrentYear$trackID,]
            ## define the children data.frame
            children <- tempCurrentYear[tempCurrentYear$trackID %in%
                                       tempPreviousYear$trackID,]

            ## ASSIGN DEMOGRAPHIC DATA TO CHILDREN
            if (nrow(children)>0) {
              ## give children a 0 in the recruit column, since they have a
              # parent (as long as the parent doesn't have an NA--was recruited
              # in a year when we couldn't know when it was recruited)
              if (inv[i] - inv[i-1] <= 1) {
                children$recruit <- parents[match(children$trackID,
                                                  parents$trackID),]$recruit
                ## if the parent is an 'NA', the child also gets an 'NA', but if
                # the parent is a '1', the child gets a '0'. If the parent is a
                # '0', then the child gets a '0'
                if (nrow(children[children$recruit==1 &
                                  is.na(children$recruit) == FALSE,]) > 0) {
                  children[children$recruit==1 &
                             is.na(children$recruit) == FALSE,]$recruit <- 0
                }
              } else {
                children$recruit <- NA
              }
              ## give the children the appropriate age column (only if the
              # parents don't have an 'NA' for age) (parent's age + 1)
              ## get the trackID and age+1 of the parents in the 'tempParents'
              # df
              ## get the two relevant columns and get rid of geometry column
              tempParents <- sf::st_drop_geometry(parents[,c("trackID", "age")])
              ## add 1 to the parent age (get an 'NA' if the parent age is NA)
              tempParents$age <- (tempParents$age + 1)
              names(tempParents) <- c("trackIDtemp", "age")

              ## add the'age'column from the appropriate parents (+1), joined by
              # trackID
              children$age <- tempParents[match(children$trackID,
                                                tempParents$trackIDtemp),]$age

              ## calculate the appropriate 'basalArea_genet' data for each child
              childGenet_area <- stats::aggregate(children$basalArea_ramet,
                                           by = list(
                                             "trackID" = children$trackID), sum)
              names(childGenet_area) <- c("trackID", "basalArea_genet")
              children <- merge(x = children[,names(children) !="basalArea_genet"],
                    y = childGenet_area, by = "trackID")
            }

            ## PARENTS
            ## ASSIGN DEMOGRAPHIC DATA FOR PARENTS
            ## make sure that there is actually a parents data.frame
            if (nrow(parents) > 0) {
              ## PARENTS ('parents' data.frame + 'deadGhosts' data.frame)
              ## give the 'parents' a '1' for survival
              parents[,"survives_tplus1"] <- 1
              ## give the 'parents' a 0 in the 'ghosts' column
              parents[,"ghost"] <- 0
              ## assign size in the current year (From 'children' df) to the
              # appropriate rows in the parents data.frame
              childSizeTemp <- unique(sf::st_drop_geometry(
                children[,c("trackID", "basalArea_genet")]))
              names(childSizeTemp) <- c("trackIDtemp", "size_tplus1")
              ## join size_tplus data to 'parents' data.frame
              parents$size_tplus1 <- childSizeTemp[
                match(parents$trackID,childSizeTemp$trackIDtemp),]$size_tplus1
            }

            ## ONLY ALLOW GHOSTS IF DORMANCY > 1
            if (dorm > 0) {
              ## GHOSTS: parent polygons that don't have 'children' in year i.
              # If they were observed in a year that is more than dorm+1 years
              # prior to year i, then they don't get saved to the current year. If
              # they were observed in a year that is >= to dorm+1 years prior to
              # year i, then they get added to the 'tempCurrentYear' data.frame,
              # which both go into the 'tempPreviousYear' data.frame before the
              # next iteration of the loop
              ## get the ghost individuals
              ghostsTemp <- tempPreviousYear[!(tempPreviousYear$trackID %in%
                                                tempCurrentYear$trackID),]

              ## is 'i' the last year of sampling? If not, then do the
              # following:
              if (inv[i] != max(inv)) {
                ## check that these individuals can be 'ghosts' in the current
                # year (if the gap between the year of their observation and
                # year i+1 is greater than the dormancy argument (+1), then)
                # they are not ghosts, and get a 0 for survival
                ghosts <- ghostsTemp[((inv[i+1] - ghostsTemp$Year) <=
                                        (dorm + 1)),]
                ## put the 'ghosts' that exceed the dormancy argument into their
                # own data.frame
                deadGhosts <- ghostsTemp[((inv[i+1] - ghostsTemp$Year) >
                                            (dorm + 1)),]
                ## if the 'i' year is the last year of sampling:
              } else { #if (inv[i] == max(inv)) {
                ## get the 'ghost' data (have to use a different 'i+1' value if
                # this is the current year)
                ghosts <- ghostsTemp[inv[i]+1 - ghostsTemp$Year <= (dorm+1),]
                ## get the 'deadGhost' data
                deadGhosts <- ghostsTemp[inv[i]+1 - ghostsTemp$Year > (dorm+1),]
              }

              ## ASSIGN DEMOGRAPHIC DATA TO GHOSTS
              ##give these survived ghosts a '1' in the 'ghost' column
              if (nrow(ghosts)>0) {
                ghosts$ghost <- 1
                ghosts$survives_tplus1 <- NA
              }
              ## give the deadGhosts a 0 in survival (if there are any
              # deadGhosts)
              if (nrow(deadGhosts)>0) {
                deadGhosts$survives_tplus1 <- 0
                deadGhosts$ghost <- 0
              }
              ## get the data.frames in a consistent format
              ghosts <- ghosts[,names(children)]
              deadGhosts <- deadGhosts[,names(children)]
            } else { ## if (dorm==0); any trackID that doesn't have a child in
              # year 'i' is 'dead'
              deadGhosts <- tempPreviousYear[!(tempPreviousYear$trackID %in%
                                                tempCurrentYear$trackID),]
              if (nrow(deadGhosts) > 0) { ## make sure there are deadGhosts
                ## give the dead ghosts a '0' for survival
                deadGhosts$survives_tplus1 <- 0
              }
              deadGhosts <- deadGhosts[,names(children)]
              ghosts <- NULL
            }

            ## PREPARE FOR NEXT i
            ## arrange columns of children, orphans, and ghosts in the same
            # order
            orphans <- orphans[, names(children)]
            parents <- parents[,names(children)]

            ## bind children, orphans, and ghosts into one data.frame, that will
            # become the data for the previous year in the next i of the loop
            tempCurrentYear <- rbind(children, orphans, ghosts)

            if (exists("assignOut") == TRUE) {
              ## if this is not the first year, then add demographic data to the
              # output d.f
              assignOut <- rbind(assignOut, parents, deadGhosts)
            } else { #if (exists("assignOut") == FALSE) {
              ## if the assignOut df is empty
              assignOut <- suppressWarnings(rbind(parents, deadGhosts))
            }

            ## if this is the last year of sampling, put the 'tempCurrentYear'
            # data into the 'assignOut' d.f, but only after making sure that
            # they have an 'NA' in the 'survives_tplus1' and
            # 'size_tplus1' columns
            if (inv[i] == max(inv)) {
              assignOut <- rbind(assignOut, tempCurrentYear)
            }

            ## assign the data from the current i (tempCurrentYear) to be the
            # past year data (tempPreviousYear) for the next i
            tempPreviousYear <- tempCurrentYear
          } ## end of 'if' statement that determines if the tempCurrentYear data
          # exists
          } ## end of 'if' that determines if the tempPreviousYear data exists
        } ## end of 'if' statement that determines if gap between inv[i-1] and
      # inv[i] is less than or equal to  dorm+1
      } ## end of loop i
    } else if (inv[firstYearIndex] == max(inv)) {
    ## if there are only observations in the last year of 'inv', then there
    # cannot be any demographic data assigned.
    ## make sure there are 'NA's' in the appropriate columns
    tempPreviousYear[, c('size_tplus1', 'survives_tplus1')] <- NA
    assignOut <- tempPreviousYear
    }

  ## make an empty column
  assignOut$nearEdge <- FALSE
  ## populate the 'nearEdge' column (if isn't inheriting data from trackSpp)
  if (inheritsFromTrackSpp == FALSE) {
    ## make a boundary box that is within the 'buff' argument of the actual quad
    buffEdgeOutside <- sf::st_as_sfc(sf::st_bbox(assignOut))
    buffEdgeInside <- sf::st_as_sfc(sf::st_bbox(assignOut) + c(buff, buff,-buff, -buff))
    buffEdge <-  sf::st_difference(buffEdgeOutside, buffEdgeInside)
    } else if (inheritsFromTrackSpp == TRUE) {
    buffEdge <- nearEdgeBox
  }

  ## find out which polys intersect with the buffered quad
  assignOut[sf::st_intersects(assignOut, buffEdge, sparse = FALSE),
            "nearEdge"] <- TRUE

  ## clean up output assignOut data.frame (remove NAs and unneeded columns)
  assignOut <- assignOut[is.na(assignOut$Species)==FALSE,
                         !(names(assignOut) %in% c("ghost","genetID", "index",
                                                   "sp_code_6"))]
  ## output ---------------------------------------------------------------
return(assignOut)
  }

# testing -----------------------------------------------------------------
# example input data ------------------------------------------------------
# #
# ## prepares the dataset to feed into the 'assign' function (the 'trackSpp'
# # function will do this ahead of time when the user calls it)
# sampleDat <- grasslandData[grasslandData$Site == "AZ"
#                            & grasslandData$Quad == "SG4"
#                            & grasslandData$Species == "Bouteloua rothrockii",]
# # this should be a data.frame
# dat <- sampleDat
# #
# # # get the appropriate grasslandInventory data for the "unun_11" quadrat,
# # # to tell the 'assign' function when the quadrat was sampled
# sampleInv<- grasslandInventory[["SG2"]]
# # this should be an integer vector
# inv <- sampleInv
#
# # remove a year to simulate 'dorm' being exceeded
# dat <- sampleDat[sampleDat$Year != 1924,]
# inv <- inv[c(1:2,4:5)]
#
# testOutput <- assign(dat = dat,
#                      inv = inv,
#                      dorm = 1,
#                      buff = .05,
#                      # buffGenet = .001,
#                      clonal =  FALSE,
#                      flagSuspects = TRUE)

#
# ggplot(testOutput) +
#   geom_sf(aes(fill = as.factor(trackID),colour = Year), alpha = 0.5) +
#   scale_fill_discrete(guide = "none") +
#   theme_classic()
# #
# # ggplot() +
# #   geom_sf(data = st_buffer(testOutput[testOutput$Year==2009,],.05), fill = "purple", alpha = 0.5) +
# #   geom_sf(data = st_buffer(dat[dat$Year==2009,],.05), fill = "green", alpha = 0.5) +
# #   theme_linedraw()
# #
# #
# # ggplot() +
# #   geom_sf(data = dat[dat$Year==1923,], fill = "green", alpha = 0.5) +
# #   geom_sf(data = testOutput[testOutput$Year==1923,], fill = "purple", alpha = 0.5) +
# #   theme_linedraw()
# #
# # tempDat <- dat[dat$Year==1923,]
# # tempOut <- testOutput[testOutput$Year==1923,]
# # mapview(tempDat) + mapview(tempOut, col.regions = "green")
#
# ## find duplicated polygons in testOutput d.f
# #testOutput_cent <- st_centroid(testOutput)
#
# dupValues <- testOutput[duplicated(testOutput$geometry),"geometry"]$geometry
#
# dups <- testOutput[testOutput$geometry %in% dupValues,]
# plot(dups[dups$geometry==dupValues[1],"geometry"])
#
# mapview(dups[dups$geometry==dupValues[1],])
# ## all the duplicates are in 2003, what?? that doesn't make sense
# unique(dups$Year)
#
# #see which obs. aren't in the outPut dataset (in comparison to the raw data)
# testDat <- st_drop_geometry(dat)
# testDat$test <- "old"
# testOutputTest <- st_drop_geometry(testOutput)
# testOutputTest$test <- "new"
#
# testTest <- full_join(testDat,testOutputTest, by = c("Species", "Clone", "Seedling", "Stems", "Basal", "Type", "Site", "Quad", "Year", "sp_code_4", "sp_code_6", "Area"))
#
aestears/PlantTracker documentation built on July 20, 2023, 1:52 p.m.