hfv_qc: Assess Quality of hfv Datasets
In lefko3: Historical and Ahistorical Population Projection Matrix Analysis

hfv_qc

R Documentation

Assess Quality of hfv Datasets

Description

Function hfv_qc() tests the overall quality of hfv datasets, and also runs a series of tests to assess which statistical distributions match the variables within these datasets. The input format is equivalent to the input format of function modelsearch(), allowing users to assess vital rate variable distributions assuming the same internal dataset subsetting used by the latter function and simply copy and pasting the parameter options from one function to the other.

Usage

hfv_qc(
  data,
  stageframe = NULL,
  historical = TRUE,
  suite = "size",
  vitalrates = c("surv", "size", "fec"),
  surv = c("alive3", "alive2", "alive1"),
  obs = c("obsstatus3", "obsstatus2", "obsstatus1"),
  size = c("sizea3", "sizea2", "sizea1"),
  sizeb = c(NA, NA, NA),
  sizec = c(NA, NA, NA),
  repst = c("repstatus3", "repstatus2", "repstatus1"),
  fec = c("feca3", "feca2", "feca1"),
  stage = c("stage3", "stage2", "stage1"),
  matstat = c("matstatus3", "matstatus2", "matstatus1"),
  indiv = "individ",
  patch = NA,
  year = "year2",
  density = NA,
  patch.as.random = TRUE,
  year.as.random = TRUE,
  juvestimate = NA,
  juvsize = FALSE,
  fectime = 2,
  censor = NA,
  age = NA,
  indcova = NA,
  indcovb = NA,
  indcovc = NA,
  random.indcova = FALSE,
  random.indcovb = FALSE,
  random.indcovc = FALSE,
  test.group = FALSE,
  ...
)

Arguments

`data`	The vertical dataset to be used for analysis. This dataset should be of class `hfvdata`, but can also be a data frame formatted similarly to the output format provided by functions `verticalize3()` or `historicalize3()`, as long as all needed variables are properly designated.
`stageframe`	The stageframe characterizing the life history model used. Optional unless `test.group = TRUE`, in which case it is required. Defaults to `NULL`.
`historical`	A logical variable denoting whether to assess the effects of state in occasion t-1, in addition to state in occasion t. Defaults to `TRUE`.
`suite`	This describes the global model for each vital rate estimation, and has the following possible values: `full`, includes main effects and all two-way interactions of size and reproductive status; `main`, includes main effects only of size and reproductive status; `size`, includes only size (also interactions between size in historical model); `rep`, includes only reproductive status (also interactions between status in historical model); `age`, all vital rates estimated with age and y-intercepts only; `cons`, all vital rates estimated only as y-intercepts. Defaults to `size`.
`vitalrates`	A vector describing which vital rates will be estimated via linear modeling, with the following options: `surv`, survival probability; `obs`, observation probability; `size`, overall size; `repst`, probability of reproducing; and `fec`, amount of reproduction (overall fecundity). May also be set to `vitalrates = "leslie"`, which is equivalent to setting `c("surv", "fec")` for a Leslie MPM. This choice also determines how internal data subsetting for vital rate model estimation will work. Defaults to `c("surv", "size", "fec")`.
`surv`	A vector indicating the variable names coding for status as alive or dead in occasions t+1, t, and t-1, respectively. Defaults to `c("alive3", "alive2", "alive1")`.
`obs`	A vector indicating the variable names coding for observation status in occasions t+1, t, and t-1, respectively. Defaults to `c("obsstatus3", "obsstatus2", "obsstatus1")`.
`size`	A vector indicating the variable names coding for the primary size variable on occasions t+1, t, and t-1, respectively. Defaults to `c("sizea3", "sizea2", "sizea1")`.
`sizeb`	A vector indicating the variable names coding for the secondary size variable on occasions t+1, t, and t-1, respectively. Defaults to `c(NA, NA, NA)`, in which case `sizeb` is not used.
`sizec`	A vector indicating the variable names coding for the tertiary size variable on occasions t+1, t, and t-1, respectively. Defaults to `c(NA, NA, NA)`, in which case `sizec` is not used.
`repst`	A vector indicating the variable names coding for reproductive status in occasions t+1, t, and t-1, respectively. Defaults to `c("repstatus3", "repstatus2", "repstatus1")`.
`fec`	A vector indicating the variable names coding for fecundity in occasions t+1, t, and t-1, respectively. Defaults to `c("feca3", "feca2", "feca1")`.
`stage`	A vector indicating the variable names coding for stage in occasions t+1, t, and t-1. Defaults to `c("stage3", "stage2", "stage1")`.
`matstat`	A vector indicating the variable names coding for maturity status in occasions t+1, t, and t-1. Defaults to `c("matstatus3", "matstatus2", "matstatus1")`.
`indiv`	A text value indicating the variable name coding individual identity. Defaults to `"individ"`.
`patch`	A text value indicating the variable name coding for patch, where patches are defined as permanent subgroups within the study population. Defaults to `NA`.
`year`	A text value indicating the variable coding for observation occasion t. Defaults to `"year2"`.
`density`	A text value indicating the name of the variable coding for spatial density, should the user wish to test spatial density as a fixed factor affecting vital rates. Defaults to `NA`.
`patch.as.random`	If set to `TRUE` and `approach = "mixed"`, then `patch` is included as a random factor. If set to `FALSE` and `approach = "glm"`, then `patch` is included as a fixed factor. All other combinations of logical value and `approach` lead to `patch` not being included in modeling. Defaults to `TRUE`.
`year.as.random`	If set to `TRUE` and `approach = "mixed"`, then `year` is included as a random factor. If set to `FALSE`, then `year` is included as a fixed factor. All other combinations of logical value and `approach` lead to `year` not being included in modeling. Defaults to `TRUE`.
`juvestimate`	An optional variable denoting the stage name of the juvenile stage in the vertical dataset. If not `NA`, and `stage` is also given (see below), then vital rates listed in `vitalrates` other than `fec` will also be estimated from the juvenile stage to all adult stages. Defaults to `NA`, in which case juvenile vital rates are not estimated.
`juvsize`	A logical variable denoting whether size should be used as a term in models involving transition from the juvenile stage. Defaults to `FALSE`, and is only used if `juvestimate` does not equal `NA`.
`fectime`	A variable indicating which year of fecundity to use as the response term in fecundity models. Options include `2`, which refers to occasion t, and `3`, which refers to occasion t+1. Defaults to `2`.
`censor`	A vector denoting the names of censoring variables in the dataset, in order from occasion t+1, followed by occasion t, and lastly followed by occasion t-1. Defaults to `NA`.
`age`	Designates the name of the variable corresponding to age in time t in the vertical dataset. Defaults to `NA`, in which case age is not included in linear models. Should only be used if building Leslie or age x stage matrices.
`indcova`	Vector designating the names in occasions t+1, t, and t-1 of an individual covariate. Defaults to `NA`.
`indcovb`	Vector designating the names in occasions t+1, t, and t-1 of a second individual covariate. Defaults to `NA`.
`indcovc`	Vector designating the names in occasions t+1, t, and t-1 of a third individual covariate. Defaults to `NA`.
`random.indcova`	A logical value indicating whether `indcova` should be treated as a random categorical factor, rather than as a fixed factor. Defaults to `FALSE`.
`random.indcovb`	A logical value indicating whether `indcovb` should be treated as a random categorical factor, rather than as a fixed factor. Defaults to `FALSE`.
`random.indcovc`	A logical value indicating whether `indcovc` should be treated as a random categorical factor, rather than as a fixed factor. Defaults to `FALSE`.
`test.group`	A logical value indicating whether to include the `group` variable from the input `stageframe` as a fixed categorical variable in linear models. Defaults to `FALSE`.
`...`	Other parameters.

Value

This function yields text output describing the subsets to be used in linear vital rate modeling. No value or object is returned.

Notes

This function is meant to handle input as would be supplied to function modelsearch(). To use most easily, users may copy all input parameters from a call to function modelsearch(), and paste directly within this function. The exact subsets used in the modelsearch() run will also be created here.

Tests of Gaussian normality are conducted as Shapiro-Wilk tests via base R's shapiro.test() function. If datasets with more than 5000 rows are supplied, function hfv_qc() will sample 5000 rows from the dataset and conduct the Shapiro-Wilk test on the data sample.

Random factor variables are also tested for the presence of singleton categories, which are factor values that occur only once in the used data subset. Singleton categories may cause problems with estimation under mixed modeling.

Examples

data(lathyrus)

sizevector <- c(0, 4.6, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8,
  9)
stagevector <- c("Sd", "Sdl", "Dorm", "Sz1nr", "Sz2nr", "Sz3nr", "Sz4nr",
  "Sz5nr", "Sz6nr", "Sz7nr", "Sz8nr", "Sz9nr", "Sz1r", "Sz2r", "Sz3r", 
  "Sz4r", "Sz5r", "Sz6r", "Sz7r", "Sz8r", "Sz9r")
repvector <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1)
obsvector <- c(0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
matvector <- c(0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
immvector <- c(1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
propvector <- c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
  0)
indataset <- c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
binvec <- c(0, 4.6, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 
  0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5)

lathframeln <- sf_create(sizes = sizevector, stagenames = stagevector, 
  repstatus = repvector, obsstatus = obsvector, matstatus = matvector, 
  immstatus = immvector, indataset = indataset, binhalfwidth = binvec, 
  propstatus = propvector)

lathvertln <- verticalize3(lathyrus, noyears = 4, firstyear = 1988,
  patchidcol = "SUBPLOT", individcol = "GENET", blocksize = 9, 
  juvcol = "Seedling1988", sizeacol = "lnVol88", repstracol = "Intactseed88",
  fecacol = "Intactseed88", deadacol = "Dead1988", 
  nonobsacol = "Dormant1988", stageassign = lathframeln, stagesize = "sizea",
  censorcol = "Missing1988", censorkeep = NA, NAas0 = TRUE, censor = TRUE)

lathvertln$feca2 <- round(lathvertln$feca2)
lathvertln$feca1 <- round(lathvertln$feca1)
lathvertln$feca3 <- round(lathvertln$feca3)

hfv_qc(lathvertln, historical = TRUE, suite = "main", 
  vitalrates = c("surv", "obs", "size", "repst", "fec"), juvestimate = "Sdl",
  indiv = "individ", patch = "patchid", year = "year2",year.as.random = TRUE,
  patch.as.random = TRUE)

lefko3 documentation built on June 8, 2025, 10:04 a.m.