after_join | R Documentation |
Join two tables based on observations in one table happening after
observations in the other. Each table must have a user_id column,
which must always match for two observations to be joined,
and a time column, which must be greater in y
than in x
for
the two to be joined.
Supports all types of dplyr joins (inner, left, anti, etc.) and requires a
type argument to specify which observations in a funnel get kept
(see details for supported types).
after_join(
x,
y,
by_time,
by_user,
mode = "inner",
type = "first-first",
max_gap = NULL,
min_gap = NULL,
gap_col = FALSE,
suffix = c(".x", ".y")
)
after_inner_join(
x,
y,
by_time,
by_user,
type,
max_gap = NULL,
min_gap = NULL,
gap_col = FALSE,
suffix = c(".x", ".y")
)
after_left_join(
x,
y,
by_time,
by_user,
type,
max_gap = NULL,
min_gap = NULL,
gap_col = FALSE,
suffix = c(".x", ".y")
)
after_right_join(
x,
y,
by_time,
by_user,
type,
max_gap = NULL,
min_gap = NULL,
gap_col = FALSE,
suffix = c(".x", ".y")
)
after_full_join(
x,
y,
by_time,
by_user,
type,
max_gap = NULL,
min_gap = NULL,
gap_col = FALSE,
suffix = c(".x", ".y")
)
after_anti_join(
x,
y,
by_time,
by_user,
type,
max_gap = NULL,
min_gap = NULL,
gap_col = FALSE,
suffix = c(".x", ".y")
)
after_semi_join(
x,
y,
by_time,
by_user,
type,
max_gap = NULL,
min_gap = NULL,
gap_col = FALSE,
suffix = c(".x", ".y")
)
x |
A tbl representing the first event to occur in the funnel. |
y |
A tbl representing an event to occur in the funnel. |
by_time |
A character vector to specify the time columns in x and y. This would typically be a datetime or a date column. These columns are used to filter for time y being after time x. |
by_user |
A character vector to specify the user or identity columns in x and y. |
mode |
The method used to join: "inner", "full", "anti", "semi",
"right", "left". Each also has its own function, such as
|
type |
The type of funnel used to distinguish between event pairs, such as "first-first", "last-first", or "any-firstafter". See details for more. |
max_gap |
Optional: the maximum gap allowed between events. Can be a
integer representing the number of seconds or a difftime object, such as
|
min_gap |
Optional: the maximum gap allowed between events. Can be a
integer representing the number of seconds or a difftime object, such as
|
gap_col |
Whether to include a numeric column, |
suffix |
If there are non-joined duplicate variables in x and y, these suffixes will be added to the output to disambiguate them. Should be a character vector of length 2. |
type
can be any combination of first
, last
, any
, lastbefore
, firstwithin
with first
, last
, any
, firstafter
. Some common ones you may use include:
Take the earliest x and y for each user before joining. For example, you want the first time someone entered an experiment, followed by the first time someone ever registered. If they registered, entered the experiment, and registered again, you do not want to include that person.
Take the first x, then the first y after that. For example, you want when someone first entered an experiment and the first course they started afterwards. You don't care if they started courses before entering the experiment.
First x that's followed by a y before the next x. For example, in last click paid ad attribution, you want the last time someone clicked an ad before the first subscription they did afterward.
Take all Xs followed by the first Y after it. For example, you want all the times someone visited a homepage and their first product page they visited afterwards.
Take all Xs followed by all Ys. For example, you want all the times someone visited a homepage and all the product pages they saw afterward.
library(dplyr)
landed <- tribble(
~user_id, ~timestamp,
1, "2018-07-01",
2, "2018-07-01",
2, "2018-07-01",
3, "2018-07-02",
4, "2018-07-01",
4, "2018-07-04",
5, "2018-07-10",
5, "2018-07-12",
6, "2018-07-07",
6, "2018-07-08"
) %>%
mutate(timestamp = as.Date(timestamp))
registered <- tribble(
~user_id, ~timestamp,
1, "2018-07-02",
3, "2018-07-02",
4, "2018-06-10",
4, "2018-07-02",
5, "2018-07-11",
6, "2018-07-10",
6, "2018-07-11",
7, "2018-07-07"
) %>%
mutate(timestamp = as.Date(timestamp))
after_inner_join(landed, registered, by_user = "user_id",
by_time = "timestamp", type = "first-first")
# You can use different methods of joining:
after_left_join(landed, registered, by_user = "user_id",
by_time = "timestamp", type = "first-first")
after_anti_join(landed, registered, by_user = "user_id",
by_time = "timestamp", type = "any-any")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.