data_match: Return filtered or sliced data frame, or row indices
In datawizard: Easy Data Wrangling and Statistical Transformations

data_match

R Documentation

Return filtered or sliced data frame, or row indices

Description

Return a filtered (or sliced) data frame or row indices of a data frame that match a specific condition. data_filter() works like data_match(), but works with logical expressions or row indices of a data frame to specify matching conditions.

Usage

data_match(
  x,
  to,
  match = "and",
  return_indices = FALSE,
  remove_na = TRUE,
  drop_na,
  ...
)

data_filter(x, ...)

Arguments

`x`	A data frame.
`to`	A data frame matching the specified conditions. Note that if `match` is a value other than `"and"`, the original row order might be changed. See 'Details'.
`match`	String, indicating with which logical operation matching conditions should be combined. Can be `"and"` (or `"&"`), `"or"` (or `"\|"`) or `"not"` (or `"!"`).
`return_indices`	Logical, if `FALSE`, return the vector of rows that can be used to filter the original data frame. If `FALSE` (default), returns directly the filtered data frame instead of the row indices.
`remove_na`	Logical, if `TRUE`, missing values (`NA`s) are removed before filtering the data. This is the default behaviour, however, sometimes when row indices are requested (i.e. `return_indices=TRUE`), it might be useful to preserve `NA` values, so returned row indices match the row indices of the original data frame.
`drop_na`	Deprecated, please use `remove_na` instead.
`...`	A sequence of logical expressions indicating which rows to keep, or a numeric vector indicating the row indices of rows to keep. Can also be a string representation of a logical expression (e.g. `"x > 4"`), a character vector (e.g. `c("x > 4", "y == 2")`) or a variable that contains the string representation of a logical expression. These might be useful when used in packages to avoid defining undefined global variables.

Details

For data_match(), if match is either "or" or "not", the original row order from x might be changed. If preserving row order is required, use data_filter() instead.

# mimics subset() behaviour, preserving original row order
head(data_filter(mtcars[c("mpg", "vs", "am")], vs == 0 | am == 1))
#>                    mpg vs am
#> Mazda RX4         21.0  0  1
#> Mazda RX4 Wag     21.0  0  1
#> Datsun 710        22.8  1  1
#> Hornet Sportabout 18.7  0  0
#> Duster 360        14.3  0  0
#> Merc 450SE        16.4  0  0

# re-sorting rows
head(data_match(mtcars[c("mpg", "vs", "am")],
                data.frame(vs = 0, am = 1),
                match = "or"))
#>                    mpg vs am
#> Mazda RX4         21.0  0  1
#> Mazda RX4 Wag     21.0  0  1
#> Hornet Sportabout 18.7  0  0
#> Duster 360        14.3  0  0
#> Merc 450SE        16.4  0  0
#> Merc 450SL        17.3  0  0

While data_match() works with data frames to match conditions against, data_filter() is basically a wrapper around ⁠subset(subset = <filter>)⁠. However, unlike subset(), it preserves label attributes and is useful when working with labelled data.

Value

A filtered data frame, or the row indices that match the specified configuration.

Examples

data_match(mtcars, data.frame(vs = 0, am = 1))
data_match(mtcars, data.frame(vs = 0, am = c(0, 1)))

# observations where "vs" is NOT 0 AND "am" is NOT 1
data_match(mtcars, data.frame(vs = 0, am = 1), match = "not")
# equivalent to
data_filter(mtcars, vs != 0 & am != 1)

# observations where EITHER "vs" is 0 OR "am" is 1
data_match(mtcars, data.frame(vs = 0, am = 1), match = "or")
# equivalent to
data_filter(mtcars, vs == 0 | am == 1)

# slice data frame by row indices
data_filter(mtcars, 5:10)

# Define a custom function containing data_filter()
my_filter <- function(data, variable) {
  data_filter(data, variable)
}
my_filter(mtcars, "cyl == 6")

# Pass complete filter-condition as string.
my_filter <- function(data, condition) {
  data_filter(data, condition)
}
my_filter(mtcars, "am != 0")

# string can also be used directly as argument
data_filter(mtcars, "am != 0")

# or as variable
fl <- "am != 0"
data_filter(mtcars, fl)

datawizard documentation built on June 8, 2025, 12:47 p.m.