Description Usage Arguments Details Examples
This function looks for a list of values (usually, just NA
) in a variable .var
and overwrites those values with the most recent (or next-coming) values that are not from that list ("last observation carried forward").
1 2 3 4 5 6 7 8 9 10 11 12 | panel_locf(
.var,
.df = get(".", envir = parent.frame()),
.fill = NA,
.backwards = FALSE,
.resolve = "error",
.group_i = TRUE,
.i = NULL,
.t = NULL,
.d = 1,
.uniqcheck = FALSE
)
|
.var |
Vector to be modified. |
.df |
Data frame, pibble, or tibble (usually the one containing |
.fill |
Vector of values to be overwritten. Just |
.backwards |
By default, values of newly-created observations are copied from the most recently available period. Set |
.resolve |
If there is more than one observation per individal/period, and the value of |
.group_i |
By default, if |
.i |
Quoted or unquoted variables that identify the individual cases. Note that setting any one of |
.t |
Quoted or unquoted variable indicating the time. |
.d |
Number indicating the gap in |
.uniqcheck |
Logical parameter. Set to TRUE to always check whether |
panel_locf()
is unusual among last-observation-carried-forward functions (like zoo::na.locf()
) in that it is usable even if observations are not uniquely identified by .t
(and .i
, if defined).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | # The SPrail data has some missing price values.
# Let's fill them in!
# Note .d=0 tells it to ignore how big the gaps are
# between one period and the next, just look for the most recent insert_date
# .resolve tells it what value to pick if there are multiple
# observed prices for that route/insert_date
# (.resolve is not necessary if .i and .t uniquely identify obs,
# or if .var is either NA or constant within them)
# Also note - this will fill in using CURRENT-period
# data first (if available) before looking for lagged data.
data(SPrail)
sum(is.na(SPrail$price))
SPrail <- SPrail %>%
dplyr::mutate(price = panel_locf(price,
.i = c(origin, destination), .t = insert_date, .d = 0,
.resolve = function(x) mean(x, na.rm = TRUE)
))
# The spec is a little easier with data like Scorecard where
# .i and .t uniquely identify observations
# so .resolve isn't needed.
data(Scorecard)
sum(is.na(Scorecard$earnings_med))
Scorecard <- Scorecard %>%
# Let's speed this up by just doing four-year colleges in Colorado
dplyr::filter(
pred_degree_awarded_ipeds == 3,
state_abbr == "CO"
) %>%
# Now let's fill in NAs and also in case there are any erroneous 0s
dplyr::mutate(earnings_med = panel_locf(earnings_med,
.fill = c(NA, 0),
.i = unitid, .t = year
))
# Note that there are still some missings - these are missings that come before the first
# non-missing value in that unitid, so there's nothing to pull from.
sum(is.na(Scorecard$earnings_med))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.