e_match_closest_in_range: Match the closest observations from one dataset to a key...
In erikerhardt/erikmisc: Erik Erhardt's miscellaneous functions for solving complex data analysis workflows

View source: R/e_match_closest_in_range.R

e_match_closest_in_range

R Documentation

Match the closest observations from one dataset to a key dataset.

Description

Similar to survival::neardate but chooses closest in both directions restricted to an asymmetrical range.

Usage

e_match_closest_in_range(
  dat_to_match,
  id_vars_to_match,
  val_var_to_match,
  dat_key,
  id_vars_key,
  val_var_key,
  diff_lower = -Inf,
  diff_upper = +Inf,
  sw_criteria = c("closest", "minimum", "maximum")[1],
  sw_return_key_vars = FALSE
)

Arguments

`dat_to_match`	data to match to the key dataset
`id_vars_to_match`	associated ID variables in data to match
`val_var_to_match`	associated value variable in data to match
`dat_key`	key dataset
`id_vars_key`	ID variables in key dataset
`val_var_key`	value variable to determine closeness in key dataset
`diff_lower`	match from data to match can be no lower than the key data by this amount
`diff_upper`	match from data to match can be no higher than the key data by this amount
`sw_criteria`	criteria for match proximity (useful when range values `diff_lower` and `diff_upper` are used): closest, minimum, or maximum.
`sw_return_key_vars`	T/F return the key value for use in matching if multiple records per ID

Details

Can also be used to match closest within a range of dates in the future by setting diff_lower and diff_upper to be positive numbers, e.g., 5 and 7.

Value

dat_to_match restricted to only those unique observations that are closest to the key data

Examples


set.seed(1)

dat_key <-
  tidyr::expand_grid(
    key1 = c("a", "b", "c")
  , key2 = c("x", "y")
  ) |>
  dplyr::mutate(
    value = 1:dplyr::n()
  )

dat_to_match <-
  tidyr::expand_grid(
    key1_m = c("a", "b")      # no "c"
  , key2_m = c("x", "y", "z") # added "z"
  ) |>
  dplyr::slice(
    sample.int(n = 2*3, size = 4 * 2*3, replace = TRUE) # produce multiple per obs
  ) |>
  dplyr::mutate(
    value_m = runif(n = dplyr::n(), min = -5, max = 10)
  , other1  = rnorm(dplyr::n())
  , other2  = rnorm(dplyr::n())
  ) |>
  dplyr::arrange(
    key1_m, key2_m
  )

dat_to_match_sub <-
  e_match_closest_in_range(
    dat_to_match      = dat_to_match
  , id_vars_to_match  = c("key1_m", "key2_m")
  , val_var_to_match  = "value_m"
  , dat_key           = dat_key
  , id_vars_key       = c("key1"  , "key2"  )
  , val_var_key       = "value"
  , diff_lower        = -Inf
  , diff_upper        = +Inf
  )

dat_key          |> print()
dat_to_match     |> print(n = Inf)
dat_to_match_sub |> print()


# within specified range
e_match_closest_in_range(
  dat_to_match      = dat_to_match
, id_vars_to_match  = c("key1_m", "key2_m")
, val_var_to_match  = "value_m"
, dat_key           = dat_key
, id_vars_key       = c("key1"  , "key2"  )
, val_var_key       = "value"
, diff_lower        = -2
, diff_upper        = +4
, sw_return_key_vars = TRUE
)

# within specified range, maximum value
e_match_closest_in_range(
  dat_to_match      = dat_to_match
, id_vars_to_match  = c("key1_m", "key2_m")
, val_var_to_match  = "value_m"
, dat_key           = dat_key
, id_vars_key       = c("key1"  , "key2"  )
, val_var_key       = "value"
, diff_lower        = -2
, diff_upper        = +4
, sw_criteria       = "maximum"
, sw_return_key_vars = TRUE
)

# within specified range, minimum value
e_match_closest_in_range(
  dat_to_match      = dat_to_match
, id_vars_to_match  = c("key1_m", "key2_m")
, val_var_to_match  = "value_m"
, dat_key           = dat_key
, id_vars_key       = c("key1"  , "key2"  )
, val_var_key       = "value"
, diff_lower        = -2
, diff_upper        = +4
, sw_criteria       = "minimum"
, sw_return_key_vars = TRUE
)

erikerhardt/erikmisc documentation built on April 17, 2025, 10:48 a.m.

erikerhardt/erikmisc index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

erikerhardt/erikmisc
Erik Erhardt's miscellaneous functions for solving complex data analysis workflows

e_match_closest_in_range: Match the closest observations from one dataset to a key...
In erikerhardt/erikmisc: Erik Erhardt's miscellaneous functions for solving complex data analysis workflows

Match the closest observations from one dataset to a key dataset.

Description

Usage

Arguments

Details

Value

Examples

Related to e_match_closest_in_range in erikerhardt/erikmisc...

R Package Documentation

Browse R Packages

We want your feedback!

erikerhardt/erikmisc Erik Erhardt's miscellaneous functions for solving complex data analysis workflows

e_match_closest_in_range: Match the closest observations from one dataset to a key... In erikerhardt/erikmisc: Erik Erhardt's miscellaneous functions for solving complex data analysis workflows

Match the closest observations from one dataset to a key dataset.

Description

Usage

Arguments

Details

Value

Examples

Related to e_match_closest_in_range in erikerhardt/erikmisc...

R Package Documentation

Browse R Packages

We want your feedback!

erikerhardt/erikmisc
Erik Erhardt's miscellaneous functions for solving complex data analysis workflows

e_match_closest_in_range: Match the closest observations from one dataset to a key...
In erikerhardt/erikmisc: Erik Erhardt's miscellaneous functions for solving complex data analysis workflows