add_duration: Add duration variables to panel data
In spduration: Split-Population Duration (Cure) Regression

add_duration

R Documentation

Add duration variables to panel data

Description

Builds a duration version of a data frame representing panel data.

Usage

add_duration(
  data,
  y,
  unitID,
  tID,
  freq = "month",
  sort = FALSE,
  ongoing = TRUE,
  slice.last = FALSE
)

Arguments

`data`	Data frame representing panel data.
`y`	A binary indicator of the incidence of some event, e.g. a coup.
`unitID`	Name of the variable in the data frame identifying the cross-sectional units, e.g. `"country"`.
`tID`	Name of the variable in the data frame identifying the time unit, preferably as class `Date`. E.g. `"year"`.
`freq`	Frequency at which units are measured in `tID`. Currently yearly, monthly, and daily data are supported, i.e. `"year"`, `"month"`, or `"day"`.
`sort`	Sort data by unit and time? Default is `FALSE`, i.e. return data in original order.
`ongoing`	If `TRUE`, successive 1's are considered ongoing events and treated as `NA` after the first 1. If `FALSE`, successive 1's are all treated as failures.
`slice.last`	Set to `TRUE` to create a slice of the last time period; used with `forecast.spdur`. For compatibility with CRISP and ICEWS projects.

Details

This function processes a panel data frame by creating a failure variable from y and corresponding duration counter, as well as risk/immunity indicators. Supported time resolutions are year, month, and day, and input data should be (dis-)aggregated to one of these levels.

The returned data frame should have the same number of rows at the original. If y is an indicator of the incidence of some event, rather than an onset indicator, then ongoing spells of failure beyond the initial event are coded as NA (e.g. 000111 becomes a spell of 0001 NA NA). This is to preserve compatibility with the base dataset. Note that the order of rows may be different though.

There cannot be missing values ("NA") in any of the key variables y, unitID, or tID; they will stop the function.

Furthermore, series that start with an event, e.g. (100), are treated as experiencing failure in the first time period. If those events are in fact ongoing, e.g. the last year of a war that started before the start time of the dataset, they should be dropped manually before using buildDuration().

t.0 is the starting time of the period of observation at tID. It is by default set as duration - 1 and currently only serves as a placeholder to allow future expansion for varying observation times.

Value

Returns the original data frame with 8 duration-specific additional variables:

`failure`	Binary indicator of an event.
`ongoing`	Binary indicator for ongoing events, not counting the initial failure time.
`end.spell`	Binary indicator for the last observation in a spell, either due to censoring or failure.
`cured`	Binary indicator for spells that are coded as cured, or immune from failure. Equal to 1 - `atrisk`.
`atrisk`	Binary indicator for spells that are coded as at risk for failure. Equal to 1 - `cured`.
`censor`	Binary indicator for right-censored spells.
`duration`	`t`, counter for how long a spell has survived without failure.
`t.0`	Starting time for period observed during `t`, by default equals `duration` - 1.

Examples

# Yearly data
data <- data.frame(y=c(0,0,0,1,0), 
                   unitID=c(1,1,1,1,1), 
                   tID=c(2000, 2001, 2002, 2003, 2004))
dur.data <- add_duration(data, "y", "unitID", "tID", freq="year")
dur.data

spduration documentation built on May 29, 2024, 1:30 a.m.