flag | R Documentation |
flag
is an S3 generic to compute (sequences of) lags and leads. L
and F
are wrappers around flag
representing the lag- and lead-operators, such that L(x,-1) = F(x,1) = F(x)
and L(x,-3:3) = F(x,3:-3)
. L
and F
provide more flexibility than flag
when applied to data frames (i.e. column subsetting, formula input and id-variable-preservation capabilities...), but are otherwise identical.
Note: Since v1.9.0, F
is no longer exported, but can be accessed using collapse:::F
, or through setting options(collapse_export_F = TRUE)
before loading the package. The syntax is the same as L
.
flag(x, n = 1, ...)
L(x, n = 1, ...)
## Default S3 method:
flag(x, n = 1, g = NULL, t = NULL, fill = NA, stubs = TRUE, ...)
## Default S3 method:
L(x, n = 1, g = NULL, t = NULL, fill = NA, stubs = .op[["stub"]], ...)
## S3 method for class 'matrix'
flag(x, n = 1, g = NULL, t = NULL, fill = NA, stubs = length(n) > 1L, ...)
## S3 method for class 'matrix'
L(x, n = 1, g = NULL, t = NULL, fill = NA, stubs = .op[["stub"]], ...)
## S3 method for class 'data.frame'
flag(x, n = 1, g = NULL, t = NULL, fill = NA, stubs = length(n) > 1L, ...)
## S3 method for class 'data.frame'
L(x, n = 1, by = NULL, t = NULL, cols = is.numeric,
fill = NA, stubs = .op[["stub"]], keep.ids = TRUE, ...)
# Methods for indexed data / compatibility with plm:
## S3 method for class 'pseries'
flag(x, n = 1, fill = NA, stubs = length(n) > 1L, shift = "time", ...)
## S3 method for class 'pseries'
L(x, n = 1, fill = NA, stubs = .op[["stub"]], shift = "time", ...)
## S3 method for class 'pdata.frame'
flag(x, n = 1, fill = NA, stubs = length(n) > 1L, shift = "time", ...)
## S3 method for class 'pdata.frame'
L(x, n = 1, cols = is.numeric, fill = NA, stubs = .op[["stub"]],
shift = "time", keep.ids = TRUE, ...)
# Methods for grouped data frame / compatibility with dplyr:
## S3 method for class 'grouped_df'
flag(x, n = 1, t = NULL, fill = NA, stubs = length(n) > 1L, keep.ids = TRUE, ...)
## S3 method for class 'grouped_df'
L(x, n = 1, t = NULL, fill = NA, stubs = .op[["stub"]], keep.ids = TRUE, ...)
x |
a vector / time series, (time series) matrix, data frame, 'indexed_series' ('pseries'), 'indexed_frame' ('pdata.frame') or grouped data frame ('grouped_df'). Data must not be numeric. |
n |
integer. A vector indicating the lags / leads to compute (passing negative integers to |
g |
a factor, |
by |
data.frame method: Same as |
t |
a time vector or list of vectors. Data frame methods also allows one-sided formula i.e. |
cols |
data.frame method: Select columns to lag using a function, column names, indices or a logical vector. Default: All numeric variables. Note: |
fill |
value to insert when vectors are shifted. Default is |
stubs |
logical. |
shift |
pseries / pdata.frame methods: character. |
keep.ids |
data.frame / pdata.frame / grouped_df methods: Logical. Drop all identifiers from the output (which includes all variables passed to |
... |
arguments to be passed to or from other methods. |
If a single integer is passed to n
, and g/by
and t
are left empty, flag/L/F
just returns x
with all columns lagged / leaded by n
. If length(n)>1
, and x
is an atomic vector (time series), flag/L/F
returns a (time series) matrix with lags / leads computed in the same order as passed to n
. If instead x
is a matrix / data frame, a matrix / data frame with ncol(x)*length(n)
columns is returned where columns are sorted first by variable and then by lag (so all lags computed on a variable are grouped together). x
can be of any standard data type.
With groups/panel-identifiers supplied to g/by
, flag/L/F
efficiently computes a panel-lag/lead by shifting the entire vector(s) but inserting fill
elements in the right places. If t
is left empty, the data needs to be ordered such that all values belonging to a group are consecutive and in the right order. It is not necessary that the groups themselves are alphabetically ordered. If a time-variable is supplied to t
(or a list of time-variables uniquely identifying the time-dimension), the series / panel is fully identified and lags / leads can be securely computed even if the data is unordered / irregular.
Note that the t
argument is processed as follows: If is.factor(t) || (is.numeric(t) && !is.object(t))
(i.e. t
is a factor or plain numeric vector), it is assumed to represent unit timesteps (e.g. a 'year' variable in a typical dataset), and thus coerced to integer using as.integer(t)
and directly passed to C++ without further checks or transformations at the R-level. Otherwise, if is.object(t) && is.numeric(unclass(t))
(i.e. t
is a numeric time object, most likely 'Date' or 'POSIXct'), this object is passed through timeid
before going to C++. Else (e.g. t
is character), it is passed through qG
which performs ordered grouping. If t
is a list of multiple variables, it is passed through finteraction
. You can customize this behavior by calling any of these functions (including unclass/as.integer
) on your time variable beforehand.
At the C++ level, if both g/by
and t
are supplied, flag
works as follows: Use two initial passes to create an ordering through which the data are accessed. First-pass: Calculate minimum and maximum time-value for each individual. Second-pass: Generate an internal ordering vector (o
) by placing the current element index into the vector slot obtained by adding the cumulative group size and the current time-value subtracted its individual-minimum together. This method of computation is faster than any sort-based method and delivers optimal performance if the panel-id supplied to g/by
is already a factor variable, and if t
is an integer/factor variable. For irregular time/panel series, length(o) > length(x)
, and o
represents the unobserved 'complete series'. If length(o) > 1e7 && length(o) > 3*length(x)
, a warning is issued to make you aware of potential performance implications of the oversized ordering vector.
The 'indexed_series' ('pseries') and 'indexed_frame' ('pdata.frame') methods automatically utilize the identifiers attached to these objects, which are already factors, thus lagging is quite efficient. However, the internal ordering vector still needs to be computed, thus if data are known to be ordered and regularly spaced, using shift = "row"
to toggle a simple group-lag (same as utilizing g
but not t
in other methods) can yield a significant performance gain.
x
lagged / leaded n
-times, grouped by g/by
, ordered by t
. See Details and Examples.
fdiff
, fgrowth
, Time Series and Panel Series, Collapse Overview
## Simple Time Series: AirPassengers
L(AirPassengers) # 1 lag
flag(AirPassengers) # Same
L(AirPassengers, -1) # 1 lead
head(L(AirPassengers, -1:3)) # 1 lead and 3 lags - output as matrix
## Time Series Matrix of 4 EU Stock Market Indicators, 1991-1998
tsp(EuStockMarkets) # Data is recorded on 260 days per year
freq <- frequency(EuStockMarkets)
plot(stl(EuStockMarkets[,"DAX"], freq)) # There is some obvious seasonality
head(L(EuStockMarkets, -1:3 * freq)) # 1 annual lead and 3 annual lags
summary(lm(DAX ~., data = L(EuStockMarkets,-1:3*freq))) # DAX regressed on its own annual lead,
# lags and the lead/lags of the other series
## World Development Panel Data
head(flag(wlddev, 1, wlddev$iso3c, wlddev$year)) # This lags all variables,
head(L(wlddev, 1, ~iso3c, ~year)) # This lags all numeric variables
head(L(wlddev, 1, ~iso3c)) # Without t: Works because data is ordered
head(L(wlddev, 1, PCGDP + LIFEEX ~ iso3c, ~year)) # This lags GDP per Capita & Life Expectancy
head(L(wlddev, 0:2, ~ iso3c, ~year, cols = 9:10)) # Same, also retaining original series
head(L(wlddev, 1:2, PCGDP + LIFEEX ~ iso3c, ~year, # Two lags, dropping id columns
keep.ids = FALSE))
# Regressing GDP on its's lags and life-Expectancy and its lags
summary(lm(PCGDP ~ ., L(wlddev, 0:2, ~iso3c, ~year, 9:10, keep.ids = FALSE)))
## Indexing the data: facilitates time-based computations
wldi <- findex_by(wlddev, iso3c, year)
head(L(wldi, 0:2, cols = 9:10)) # Again 2 lags of GDP and LIFEEX
head(L(wldi$PCGDP)) # Lagging an indexed series
summary(lm(PCGDP ~ L(PCGDP,1:2) + L(LIFEEX,0:2), wldi)) # Running the lm again
summary(lm(PCGDP ~ ., L(wldi, 0:2, 9:10, keep.ids = FALSE))) # Same thing
## Using grouped data:
library(magrittr)
wlddev |> fgroup_by(iso3c) |> fselect(PCGDP,LIFEEX) |> flag(0:2)
wlddev |> fgroup_by(iso3c) |> fselect(year,PCGDP,LIFEEX) |> flag(0:2,year) # Also using t (safer)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.