data_match: Return filtered or sliced data frame, or row indices

View source: R/data_match.R

data_matchR Documentation

Return filtered or sliced data frame, or row indices

Description

Return a filtered (or sliced) data frame or row indices of a data frame that match a specific condition. data_filter() works like data_match(), but works with logical expressions or row indices of a data frame to specify matching conditions.

Usage

data_match(x, to, match = "and", return_indices = FALSE, drop_na = TRUE, ...)

data_filter(x, ...)

Arguments

x

A data frame.

to

A data frame matching the specified conditions. Note that if match is a value other than "and", the original row order might be changed. See 'Details'.

match

String, indicating with which logical operation matching conditions should be combined. Can be "and" (or "&"), "or" (or "|") or "not" (or "!").

return_indices

Logical, if FALSE, return the vector of rows that can be used to filter the original data frame. If FALSE (default), returns directly the filtered data frame instead of the row indices.

drop_na

Logical, if TRUE, missing values (NAs) are removed before filtering the data. This is the default behaviour, however, sometimes when row indices are requested (i.e. return_indices=TRUE), it might be useful to preserve NA values, so returned row indices match the row indices of the original data frame.

...

A sequence of logical expressions indicating which rows to keep, or a numeric vector indicating the row indices of rows to keep. Can also be a string representation of a logical expression (e.g. "x > 4"), a character vector (e.g. c("x > 4", "y == 2")) or a variable that contains the string representation of a logical expression. These might be useful when used in packages to avoid defining undefined global variables.

Details

For data_match(), if match is either "or" or "not", the original row order from x might be changed. If preserving row order is required, use data_filter() instead.

# mimics subset() behaviour, preserving original row order
head(data_filter(mtcars[c("mpg", "vs", "am")], vs == 0 | am == 1))
#>                    mpg vs am
#> Mazda RX4         21.0  0  1
#> Mazda RX4 Wag     21.0  0  1
#> Datsun 710        22.8  1  1
#> Hornet Sportabout 18.7  0  0
#> Duster 360        14.3  0  0
#> Merc 450SE        16.4  0  0

# re-sorting rows
head(data_match(mtcars[c("mpg", "vs", "am")],
                data.frame(vs = 0, am = 1),
                match = "or"))
#>                    mpg vs am
#> Mazda RX4         21.0  0  1
#> Mazda RX4 Wag     21.0  0  1
#> Hornet Sportabout 18.7  0  0
#> Duster 360        14.3  0  0
#> Merc 450SE        16.4  0  0
#> Merc 450SL        17.3  0  0

While data_match() works with data frames to match conditions against, data_filter() is basically a wrapper around ⁠subset(subset = <filter>)⁠. However, unlike subset(), it preserves label attributes and is useful when working with labelled data.

Value

A filtered data frame, or the row indices that match the specified configuration.

See Also

  • Functions to rename stuff: data_rename(), data_rename_rows(), data_addprefix(), data_addsuffix()

  • Functions to reorder or remove columns: data_reorder(), data_relocate(), data_remove()

  • Functions to reshape, pivot or rotate data frames: data_to_long(), data_to_wide(), data_rotate()

  • Functions to recode data: rescale(), reverse(), categorize(), recode_values(), slide()

  • Functions to standardize, normalize, rank-transform: center(), standardize(), normalize(), ranktransform(), winsorize()

  • Split and merge data frames: data_partition(), data_merge()

  • Functions to find or select columns: data_select(), extract_column_names()

  • Functions to filter rows: data_match(), data_filter()

Examples

data_match(mtcars, data.frame(vs = 0, am = 1))
data_match(mtcars, data.frame(vs = 0, am = c(0, 1)))

# observations where "vs" is NOT 0 AND "am" is NOT 1
data_match(mtcars, data.frame(vs = 0, am = 1), match = "not")
# equivalent to
data_filter(mtcars, vs != 0 & am != 1)

# observations where EITHER "vs" is 0 OR "am" is 1
data_match(mtcars, data.frame(vs = 0, am = 1), match = "or")
# equivalent to
data_filter(mtcars, vs == 0 | am == 1)

# slice data frame by row indices
data_filter(mtcars, 5:10)

# Define a custom function containing data_filter()
my_filter <- function(data, variable) {
  data_filter(data, variable)
}
my_filter(mtcars, "cyl == 6")

# Pass complete filter-condition as string.
my_filter <- function(data, condition) {
  data_filter(data, condition)
}
my_filter(mtcars, "am != 0")

# string can also be used directly as argument
data_filter(mtcars, "am != 0")

# or as variable
fl <- "am != 0"
data_filter(mtcars, fl)

datawizard documentation built on Oct. 6, 2024, 1:08 a.m.