verticalize3: Create Historical Vertical Data Frame from Horizontal Data...

View source: R/datamanag.R

verticalize3R Documentation

Create Historical Vertical Data Frame from Horizontal Data Frame

Description

Function verticalize3() returns a vertically formatted demographic data frame organized to create historical projection matrices, given a horizontally formatted input data frame. It also handles stage assignments if given an appropriate stageframe.

Usage

verticalize3(
  data,
  noyears,
  firstyear = 1,
  popidcol = 0,
  patchidcol = 0,
  individcol = 0,
  blocksize = NA,
  xcol = 0,
  ycol = 0,
  juvcol = 0,
  sizeacol,
  sizebcol = 0,
  sizeccol = 0,
  repstracol = 0,
  repstrbcol = 0,
  fecacol = 0,
  fecbcol = 0,
  indcovacol = 0,
  indcovbcol = 0,
  indcovccol = 0,
  aliveacol = 0,
  deadacol = 0,
  obsacol = 0,
  nonobsacol = 0,
  censorcol = 0,
  repstrrel = 1,
  fecrel = 1,
  stagecol = 0,
  stageassign = NA,
  stagesize = NA,
  censorkeep = 0,
  censorRepeat = FALSE,
  censor = FALSE,
  coordsRepeat = FALSE,
  spacing = NA,
  NAas0 = FALSE,
  NRasRep = FALSE,
  NOasObs = FALSE,
  prebreeding = TRUE,
  age_offset = 0,
  reduce = TRUE,
  a2check = FALSE,
  quiet = FALSE
)

Arguments

data

The horizontal data file. A valid data frame is required as input.

noyears

The number of years or observation occasions in the dataset. A valid integer is required as input.

firstyear

The first year or occasion of observation. Defaults to 1.

popidcol

A variable name or column number corresponding to the identity of the population for each individual.

patchidcol

A variable name or column number corresponding to the identity of the patch or subpopulation for each individual, if patches have been designated within populations.

individcol

A variable name or column number corresponding to the identity of each individual.

blocksize

The number of variables corresponding to each occasion in the input dataset designated in data, if a set pattern of variables is used for each observation occasion in the data frame used as input. If such a pattern is not used, and all variable names are properly noted as character vectors in the other input variables, then this may be set to NA. Defaults to NA.

xcol

A variable name(s) or column number(s) corresponding to the X coordinate of each individual, or of each individual at each occasion, in Cartesian space. Can refer to the only instance, the first instance, or all instances of X variables. In the last case, the values should be entered as a vector.

ycol

A variable name(s) or column number(s) corresponding to the Y coordinate of each individual, or of each individual at each occasion, in Cartesian space. Can refer to the only instance, the first instance, or all instances of Y variables. In the last case, the values should be entered as a vector.

juvcol

A variable name(s) or column number(s) that marks individuals in immature stages within the dataset. This function assumes that immature individuals are identified in this variable marked with a number equal to or greater than 1, and that mature individuals are marked as 0 or NA. Can refer to the first instance, or all instances of these variables. In the latter case, the values should be entered as a vector.

sizeacol

A variable name(s) or column number(s) corresponding to the size entry associated with the first year or observation occasion in the dataset. Can refer to the first instance, or all instances of these variables. In the latter case, the values should be entered as a vector. This variable should refer to the first size variable in the stageframe, unless stagesize = "sizeadded".

sizebcol

A second variable name(s) or column number(s) corresponding to the size entry associated with the first year or observation occasion in the dataset. Can refer to the first instance, or all instances of these variables. In the latter case, the values should be entered as a vector. This variable should refer to the second size variable in the stageframe, unless stagesize = "sizeadded".

sizeccol

A third variable name(s) or column number(s) corresponding to the size entry associated with the first year or observation occasion in the dataset. Can refer to the first instance, or all instances of these variables. In the latter case, the values should be entered as a vector. This variable should refer to the third size variable in the stageframe, unless stagesize = "sizeadded".

repstracol

A variable name(s) or column number(s) corresponding to the production of reproductive structures, such as flowers, associated with the first year or observation period in the input dataset. This can be binomial or count data, and is used to analyze the probability of reproduction. Can refer to the first instance, or all instances of these variables. In the latter case, the values should be entered as a vector.

repstrbcol

A second variable name(s) or column number(s) corresponding to the production of reproductive structures, such as flowers, associated with the first year or observation period in the input dataset. This can be binomial or count data, and is used to analyze the probability of reproduction. Can refer to the first instance, or all instances of these variables. In the latter case, the values should be entered as a vector.

fecacol

A variable name(s) or column number(s) denoting fecundity associated with the first year or observation occasion in the input dataset. This may represent egg counts, fruit counts, seed production, etc. Can refer to the first instance, or all instances of these variables. In the latter case, the values should be entered as a vector.

fecbcol

A second variable name(s) or column number(s) denoting fecundity associated with the first year or observation occasion in the input dataset. This may represent egg counts, fruit counts, seed production, etc. Can refer to the first instance, or all instances of these variables. In the latter case, the values should be entered as a vector.

indcovacol

A variable name(s) or column number(s) corresponding to an individual covariate to be used in analysis. Can refer to the only instance, the first instance, or all instances of these variables. In the last case, the values should be entered as a vector.

indcovbcol

A variable name(s) or column number(s) corresponding to an individual covariate to be used in analysis. Can refer to the only instance, the first instance, or all instances of these variables. In the last case, the values should be entered as a vector.

indcovccol

A second variable name(s) or column number(s) corresponding to an individual covariate to be used in analysis. Can refer to the only instance, the first instance, or all instances of these variables. In the last case, the values should be entered as a vector.

aliveacol

Variable name(s) or column number(s) providing information on whether an individual is alive at a given occasion. If used, living status must be designated as binomial (living = 1, dead = 0). Can refer to the first instance of a living status variable in the dataset, or a full vector of all living status variables in temporal order.

deadacol

Variable name(s) or column number(s) providing information on whether an individual is alive at a given occasion. If used, dead status must be designated as binomial (dead = 1, living = 0). Can refer to the first instance of a dead status variable in the dataset, or a full vector of all dead status variables in temporal order.

obsacol

A variable name(s) or column number(s) providing information on whether an individual is in an observable stage at a given occasion. If used, observation status must be designated as binomial (observed = 1, not observed = 0). Can refer to the first instance of an observation status variable in the dataset, or a full vector of all observation status variables in temporal order.

nonobsacol

A variable name(s) or column number(s) providing information on whether an individual is in an unobservable stage at a given occasion. If used, observation status must be designated as binomial (not observed = 1, observed = 0). Can refer to the first instance of a non-observation status variable in the dataset, or a full vector of all non-observation status variables in temporal order.

censorcol

A variable name(s) or column number(s) corresponding to the first entry of a censor variable, used to distinguish between entries to use and entries not to use, or to designate entries with special issues that require further attention. Can refer to the first instance of a censor status variable in the dataset, or a full vector of all censor status variables in temporal order. Can also refer to a single censor status variable used for the entire individual, if singlecensor = TRUE.

repstrrel

This is a scalar multiplier on variable repstrbcol to make it equivalent to repstracol. This can be useful if two reproductive status variables have related but unequal units, for example if repstracol refers to one-flowered stems while repstrbcol refers to two-flowered stems. Defaults to 1.

fecrel

This is a scalar multiplier on variable fecbcol to make it equivalent to fecacol. This can be useful if two fecundity variables have related but unequal units. Defaults to 1.

stagecol

Optional variable name(s) or column number(s) corresponding to life history stage at a given occasion. Can refer to the first instance of a stage identity variable in the dataset, or a full vector of all stage identity variables in temporal order.

stageassign

The stageframe object identifying the life history model being operationalized. Note that if stagecol is provided, then this stageframe is not used for stage designation.

stagesize

A variable name or column number describing which size variable to use in stage estimation. Defaults to NA, and can also take sizea, sizeb, sizec, sizeab, sizebc, sizeac, sizeabc, or sizeadded, depending on which size variable within the input dataset is chosen. Note that the variable(s) chosen should be presented in the order of the primary, secondary, and tertiary variables in the stageframe input with stageassign. For example, choosing sizeb assumes that this size is the primary variable in the stageframe.

censorkeep

The value of the censor variable identifying data to be included in analysis. Defaults to 0, but may take any value including NA. Note that if NA is the value to keep, then this function will alter all NAs to 0 values, and all other values to 1, treating 0 as the new value to keep.

censorRepeat

A logical value indicating whether the censor variable is a single column, or whether it repeats across occasion blocks. Defaults to FALSE.

censor

A logical variable determining whether the output data should be censored using the variable defined in censorcol. Defaults to FALSE.

coordsRepeat

A logical value indicating whether X and Y coordinates correspond to single X and Y columns. If TRUE, then each observation occasion has its own X and Y variables. Defaults to FALSE.

spacing

The spacing at which density should be estimated, if density estimation is desired and X and Y coordinates are supplied. Given in the same units as those used in the X and Y coordinates given in xcol and ycol. Defaults to NA.

NAas0

If TRUE, then all NA entries for size and fecundity variables will be set to 0. This can help increase the sample size analyzed by modelsearch(), but should only be used when it is clear that this substitution is biologically realistic. Defaults to FALSE.

NRasRep

If TRUE, then will treat non-reproductive but mature individuals as reproductive during stage assignment. This can be useful when a MPM is desired without separation of reproductive and non-reproductive but mature stages of the same size. Only used if stageassign is set to a stageframe. Defaults to FALSE.

NOasObs

If TRUE, then will treat individuals that are interpreted as not observed in the dataset as though they were observed during stage assignment. This can be useful when a MPM is desired without separation of observable and unobservable stages. Only used if stageassign is set to a stageframe. Defaults to FALSE.

prebreeding

A logical term indicating whether the life history model is pre-breeding. If so, then 1 is added to all ages. Defaults to TRUE.

age_offset

A number to add automatically to all values of age at time t. Defaults to 0.

reduce

A logical variable determining whether unused variables and some invariant state variables should be removed from the output dataset. Defaults to TRUE.

a2check

A logical variable indicating whether to retain all data with living status at occasion t. Defaults to FALSE, in which case data for occasions in which the individual is not alive in time t is not retained. This option should be kept FALSE, except to inspect potential errors in the dataset.

quiet

A logical variable indicating whether to silence warnings. Defaults to FALSE.

Value

If all inputs are properly formatted, then this function will output a historical vertical data frame (class hfvdata), meaning that the output data frame will have three consecutive occasions of size and reproductive data per individual per row. This data frame is in standard format for all functions used in lefko3, and so can be used without further modification.

Variables in this data frame include the following:

rowid

Unique identifier for the row of the data frame.

popid

Unique identifier for the population, if given.

patchid

Unique identifier for patch within population, if given.

individ

Unique identifier for the individual.

year2

Year or time at occasion t.

firstseen

Occasion of first observation.

lastseen

Occasion of last observation.

obsage

Observed age in occasion t, assuming first observation corresponds to age = 0.

obslifespan

Observed lifespan, given as lastseen - firstseen + 1.

xpos1,xpos2,xpos3

X position in Cartesian space in occasions t-1, t, and t+1, respectively, if provided.

ypos1,ypos2,ypos3

Y position in Cartesian space in occasions t-1, t, and t+1, respectively, if provided.

sizea1,sizea2,sizea3

Main size measurement in occasions t-1, t, and t+1, respectively.

sizeb1,sizeb2,sizeb3

Secondary size measurement in occasions t-1, t, and t+1, respectively.

sizec1,sizec2,sizec3

Tertiary measurement in occasions t-1, t, and t+1, respectively.

size1added,size2added,size3added

Sum of primary, secondary, and tertiary size measurements in occasions t-1, t, and t+1, respectively.

repstra1,repstra2,repstra3

Main numbers of reproductive structures in occasions t-1, t, and t+1, respectively.

repstrb1,repstrb2,repstrb3

Secondary numbers of reproductive structures in occasions t-1, t, and t+1, respectively.

repstr1added,repstr2added,repstr3added

Sum of primary and secondary reproductive structures in occasions t-1, t, and t+1, respectively.

feca1,feca2,feca3

Main numbers of offspring in occasions t-1, t, and t+1, respectively.

fecb1,fecb2, fecb3

Secondary numbers of offspring in occasions t-1, t, and t+1, respectively.

fec1added,fec2added,fec3added

Sum of primary and secondary fecundity in occasions t-1, t, and t+1, respectively.

censor1,censor2,censor3

Censor state values in occasions t-1, t, and t+1, respectively.

juvgiven1,juvgiven2,juvgiven3

Binomial variable indicating whether individual is juvenile in occasions t-1, t, and t+1. Only given if juvcol is provided.

obsstatus1,obsstatus2,obsstatus3

Binomial observation state in occasions t-1, t, and t+1, respectively.

repstatus1,repstatus2,repstatus3

Binomial reproductive state in occasions t-1, t, and t+1, respectively.

fecstatus1,fecstatus2,fecstatus3

Binomial offspring production state in occasions t-1, t, and t+1, respectively.

matstatus1,matstatus2,matstatus3

Binomial maturity state in occasions t-1, t, and t+1, respectively.

alive1,alive2,alive3

Binomial state as alive in occasions t-1, t, and t+1, respectively.

density

Radial density of individuals per unit designated in spacing. Only given if spacing is not NA.

Notes

In some datasets on species with unobservable stages, observation status (obsstatus) might not be inferred properly if a single size variable is used that does not yield sizes greater than 0 in all cases in which individuals were observed. Such situations may arise, for example, in plants when leaf number is the dominant size variable used, but individuals occasionally occur with inflorescences but no leaves. In this instances, it helps to mark related variables as sizeb and sizec, because observation status will be interpreted in relation to all 3 size variables. Further analysis can then utilize only a single size variable, of the user's choosing. Similar issues can arise in reproductive status (repstatus).

Juvenile designation should only be used when juveniles fall outside of the size classification scheme used in determining stages. If juveniles are to be size classified along the size spectrum that adults also fall on, then it is best to treat juveniles as mature but not reproductive.

Warnings that some individuals occur in state combinations that do not match any stages in the stageframe used to assign stages are common when first working with a dataset. Typically, these situations can be identified as NoMatch entries in stage3, although such entries may crop up in stage1 and stage2, as well. In rare cases, these warnings will arise with no concurrent NoMatch entries, which indicates that the input dataset contained conflicting state data at once suggesting that the individual is in some stage but is also dead. The latter is removed if the conflict occurs in occasion t or t-1, as only living entries are allowed in time t and time t-1 may involve living entries as well as non-living entries immediately prior to birth.

Care should be taken to avoid variables with negative values indicating size, fecundity, or reproductive or observation status. Negative values can be interpreted in different ways, typically reflecting estimation through other algorithms rather than actual measured data. Variables holding negative values can conflict with data management algorithms in ways that are difficult to predict.

Unusual errors (e.g. "Error in .pfj...") may occur in cases where the variables are improperly passed, where seemingly numeric variables include text, or where the blocksize is improperly set.

Density estimation is performed as a count of individuals alive and within the radius specified in spacing of the respective individual at some point in time.

If a censor variable is included for each monitoring occasion, and the blocksize option is set, then the user must set censorRepeat = TRUE in order to censor the correct transitions. Failing this step will likely lead to the loss of a large portion of the data as all data for entire individuals will be excluded.

Examples

# Lathyrus example using blocksize - when repeated patterns exist in variable
# order
data(lathyrus)

sizevector <- c(0, 100, 13, 127, 3730, 3800, 0)
stagevector <- c("Sd", "Sdl", "VSm", "Sm", "VLa", "Flo", "Dorm")
repvector <- c(0, 0, 0, 0, 0, 1, 0)
obsvector <- c(0, 1, 1, 1, 1, 1, 0)
matvector <- c(0, 0, 1, 1, 1, 1, 1)
immvector <- c(1, 1, 0, 0, 0, 0, 0)
propvector <- c(1, 0, 0, 0, 0, 0, 0)
indataset <- c(0, 1, 1, 1, 1, 1, 1)
binvec <- c(0, 100, 11, 103, 3500, 3800, 0.5)

lathframe <- sf_create(sizes = sizevector, stagenames = stagevector,
  repstatus = repvector, obsstatus = obsvector, matstatus = matvector,
  immstatus = immvector, indataset = indataset, binhalfwidth = binvec,
  propstatus = propvector)

lathvert <- verticalize3(lathyrus, noyears = 4, firstyear = 1988,
  patchidcol = "SUBPLOT", individcol = "GENET", blocksize = 9,
  juvcol = "Seedling1988", sizeacol = "Volume88", repstracol = "FCODE88",
  fecacol = "Intactseed88", deadacol = "Dead1988",
  nonobsacol = "Dormant1988", stageassign = lathframe, stagesize = "sizea",
  censorcol = "Missing1988", censorkeep = NA, censor = TRUE)

# Cypripedium example using partial repeat patterns with blocksize and part
# explicit variable name cast
data(cypdata)

sizevector <- c(0, 0, 0, 0, 0, 0, 1, 2.5, 4.5, 8, 17.5)
stagevector <- c("SD", "P1", "P2", "P3", "SL", "D", "XSm", "Sm", "Md", "Lg",
  "XLg")
repvector <- c(0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1)
obsvector <- c(0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1)
matvector <- c(0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1)
immvector <- c(0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0)
propvector <- c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
indataset <- c(0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1)
binvec <- c(0, 0, 0, 0, 0, 0.5, 0.5, 1, 1, 2.5, 7)

cypframe_raw <- sf_create(sizes = sizevector, stagenames = stagevector,
  repstatus = repvector, obsstatus = obsvector, matstatus = matvector,
  propstatus = propvector, immstatus = immvector, indataset = indataset,
  binhalfwidth = binvec)

cypraw_v1 <- verticalize3(data = cypdata, noyears = 6, firstyear = 2004,
  patchidcol = "patch", individcol = "plantid", blocksize = 4,
  sizeacol = "Inf2.04", sizebcol = "Inf.04", sizeccol = "Veg.04",
  repstracol = c("Inf.04", "Inf.05", "Inf.06", "Inf.07", "Inf.08", "Inf.09"),
  repstrbcol = c("Inf2.04", "Inf2.05", "Inf2.06", "Inf2.07", "Inf2.08", "Inf2.09"), 
  fecacol = "Pod.04", stageassign = cypframe_raw, stagesize = "sizeadded",
  NAas0 = TRUE, NRasRep = TRUE)


lefko3 documentation built on Oct. 14, 2023, 1:07 a.m.