hfv_qc: Check Quality and Distributions of hfv Datasets

View source: R/popchar.R

hfv_qcR Documentation

Check Quality and Distributions of hfv Datasets

Description

Function hfv_qc() tests the overall quality of hfv datasets, and also runs a series of tests to assess which statistical distributions match the variables within these datasets. The input format is equivalent to the input format of function modelsearch(), allowing users to assess vital rate variable distributions assuming the same internal dataset subsetting used by the latter function and simply copy and pasting the parameter options from one function to the other.

Usage

hfv_qc(
  data,
  stageframe = NULL,
  historical = TRUE,
  suite = "size",
  vitalrates = c("surv", "size", "fec"),
  surv = c("alive3", "alive2", "alive1"),
  obs = c("obsstatus3", "obsstatus2", "obsstatus1"),
  size = c("sizea3", "sizea2", "sizea1"),
  sizeb = c(NA, NA, NA),
  sizec = c(NA, NA, NA),
  repst = c("repstatus3", "repstatus2", "repstatus1"),
  fec = c("feca3", "feca2", "feca1"),
  stage = c("stage3", "stage2", "stage1"),
  matstat = c("matstatus3", "matstatus2", "matstatus1"),
  indiv = "individ",
  patch = NA,
  year = "year2",
  density = NA,
  patch.as.random = TRUE,
  year.as.random = TRUE,
  juvestimate = NA,
  juvsize = FALSE,
  fectime = 2,
  censor = NA,
  age = NA,
  indcova = NA,
  indcovb = NA,
  indcovc = NA,
  random.indcova = FALSE,
  random.indcovb = FALSE,
  random.indcovc = FALSE,
  test.group = FALSE,
  ...
)

Arguments

data

The vertical dataset to be used for analysis. This dataset should be of class hfvdata, but can also be a data frame formatted similarly to the output format provided by functions verticalize3() or historicalize3(), as long as all needed variables are properly designated.

stageframe

The stageframe characterizing the life history model used. Optional unless test.group = TRUE, in which case it is required. Defaults to NULL.

historical

A logical variable denoting whether to assess the effects of state in occasion t-1, in addition to state in occasion t. Defaults to TRUE.

suite

This describes the global model for each vital rate estimation, and has the following possible values: full, includes main effects and all two-way interactions of size and reproductive status; main, includes main effects only of size and reproductive status; size, includes only size (also interactions between size in historical model); rep, includes only reproductive status (also interactions between status in historical model); age, all vital rates estimated with age and y-intercepts only; cons, all vital rates estimated only as y-intercepts. Defaults to size.

vitalrates

A vector describing which vital rates will be estimated via linear modeling, with the following options: surv, survival probability; obs, observation probability; size, overall size; repst, probability of reproducing; and fec, amount of reproduction (overall fecundity). May also be set to vitalrates = "leslie", which is equivalent to setting c("surv", "fec") for a Leslie MPM. This choice also determines how internal data subsetting for vital rate model estimation will work. Defaults to c("surv", "size", "fec").

surv

A vector indicating the variable names coding for status as alive or dead in occasions t+1, t, and t-1, respectively. Defaults to c("alive3", "alive2", "alive1").

obs

A vector indicating the variable names coding for observation status in occasions t+1, t, and t-1, respectively. Defaults to c("obsstatus3", "obsstatus2", "obsstatus1").

size

A vector indicating the variable names coding for the primary size variable on occasions t+1, t, and t-1, respectively. Defaults to c("sizea3", "sizea2", "sizea1").

sizeb

A vector indicating the variable names coding for the secondary size variable on occasions t+1, t, and t-1, respectively. Defaults to c(NA, NA, NA), in which case sizeb is not used.

sizec

A vector indicating the variable names coding for the tertiary size variable on occasions t+1, t, and t-1, respectively. Defaults to c(NA, NA, NA), in which case sizec is not used.

repst

A vector indicating the variable names coding for reproductive status in occasions t+1, t, and t-1, respectively. Defaults to c("repstatus3", "repstatus2", "repstatus1").

fec

A vector indicating the variable names coding for fecundity in occasions t+1, t, and t-1, respectively. Defaults to c("feca3", "feca2", "feca1").

stage

A vector indicating the variable names coding for stage in occasions t+1, t, and t-1. Defaults to c("stage3", "stage2", "stage1").

matstat

A vector indicating the variable names coding for maturity status in occasions t+1, t, and t-1. Defaults to c("matstatus3", "matstatus2", "matstatus1").

indiv

A text value indicating the variable name coding individual identity. Defaults to "individ".

patch

A text value indicating the variable name coding for patch, where patches are defined as permanent subgroups within the study population. Defaults to NA.

year

A text value indicating the variable coding for observation occasion t. Defaults to "year2".

density

A text value indicating the name of the variable coding for spatial density, should the user wish to test spatial density as a fixed factor affecting vital rates. Defaults to NA.

patch.as.random

If set to TRUE and approach = "mixed", then patch is included as a random factor. If set to FALSE and approach = "glm", then patch is included as a fixed factor. All other combinations of logical value and approach lead to patch not being included in modeling. Defaults to TRUE.

year.as.random

If set to TRUE and approach = "mixed", then year is included as a random factor. If set to FALSE, then year is included as a fixed factor. All other combinations of logical value and approach lead to year not being included in modeling. Defaults to TRUE.

juvestimate

An optional variable denoting the stage name of the juvenile stage in the vertical dataset. If not NA, and stage is also given (see below), then vital rates listed in vitalrates other than fec will also be estimated from the juvenile stage to all adult stages. Defaults to NA, in which case juvenile vital rates are not estimated.

juvsize

A logical variable denoting whether size should be used as a term in models involving transition from the juvenile stage. Defaults to FALSE, and is only used if juvestimate does not equal NA.

fectime

A variable indicating which year of fecundity to use as the response term in fecundity models. Options include 2, which refers to occasion t, and 3, which refers to occasion t+1. Defaults to 2.

censor

A vector denoting the names of censoring variables in the dataset, in order from occasion t+1, followed by occasion t, and lastly followed by occasion t-1. Defaults to NA.

age

Designates the name of the variable corresponding to age in time t in the vertical dataset. Defaults to NA, in which case age is not included in linear models. Should only be used if building Leslie or age x stage matrices.

indcova

Vector designating the names in occasions t+1, t, and t-1 of an individual covariate. Defaults to NA.

indcovb

Vector designating the names in occasions t+1, t, and t-1 of a second individual covariate. Defaults to NA.

indcovc

Vector designating the names in occasions t+1, t, and t-1 of a third individual covariate. Defaults to NA.

random.indcova

A logical value indicating whether indcova should be treated as a random categorical factor, rather than as a fixed factor. Defaults to FALSE.

random.indcovb

A logical value indicating whether indcovb should be treated as a random categorical factor, rather than as a fixed factor. Defaults to FALSE.

random.indcovc

A logical value indicating whether indcovc should be treated as a random categorical factor, rather than as a fixed factor. Defaults to FALSE.

test.group

A logical value indicating whether to include the group variable from the input stageframe as a fixed categorical variable in linear models. Defaults to FALSE.

...

Other parameters.

Value

This function yields text output describing the subsets to be used in linear vital rate modeling. No value or object is returned.

Notes

This function is meant to handle input as would be supplied to function modelsearch(). To use most easily, users may copy all input parameters from a call to function modelsearch(), and paste directly within this function. The exact subsets used in the modelsearch() run will also be created here.

Tests of Gaussian normality are conducted as Shapiro-Wilk tests via base R's shapiro.test() function. If datasets with more than 5000 rows are supplied, function hfv_qc() will sample 5000 rows from the dataset and conduct the Shapiro-Wilk test on the data sample.

Examples

data(lathyrus)

sizevector <- c(0, 4.6, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8,
  9)
stagevector <- c("Sd", "Sdl", "Dorm", "Sz1nr", "Sz2nr", "Sz3nr", "Sz4nr",
  "Sz5nr", "Sz6nr", "Sz7nr", "Sz8nr", "Sz9nr", "Sz1r", "Sz2r", "Sz3r", 
  "Sz4r", "Sz5r", "Sz6r", "Sz7r", "Sz8r", "Sz9r")
repvector <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1)
obsvector <- c(0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
matvector <- c(0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
immvector <- c(1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
propvector <- c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
  0)
indataset <- c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
binvec <- c(0, 4.6, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 
  0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5)

lathframeln <- sf_create(sizes = sizevector, stagenames = stagevector, 
  repstatus = repvector, obsstatus = obsvector, matstatus = matvector, 
  immstatus = immvector, indataset = indataset, binhalfwidth = binvec, 
  propstatus = propvector)

lathvertln <- verticalize3(lathyrus, noyears = 4, firstyear = 1988,
  patchidcol = "SUBPLOT", individcol = "GENET", blocksize = 9, 
  juvcol = "Seedling1988", sizeacol = "lnVol88", repstracol = "Intactseed88",
  fecacol = "Intactseed88", deadacol = "Dead1988", 
  nonobsacol = "Dormant1988", stageassign = lathframeln, stagesize = "sizea",
  censorcol = "Missing1988", censorkeep = NA, NAas0 = TRUE, censor = TRUE)

lathvertln$feca2 <- round(lathvertln$feca2)
lathvertln$feca1 <- round(lathvertln$feca1)
lathvertln$feca3 <- round(lathvertln$feca3)

hfv_qc(lathvertln, historical = TRUE, suite = "main", 
  vitalrates = c("surv", "obs", "size", "repst", "fec"), juvestimate = "Sdl",
  indiv = "individ", patch = "patchid", year = "year2",year.as.random = TRUE,
  patch.as.random = TRUE)


lefko3 documentation built on Oct. 14, 2023, 1:07 a.m.