query_vec: Data validation queries vectorized over multiple query...

View source: R/query_vec.R

query_vecR Documentation

Data validation queries vectorized over multiple query expressions

Description

Data validation queries with query or query_list, but vectorized over a set of query expressions in string format (and optionally a corresponding vector of query names/IDs). Results of the multiple queries are stacked and returned in a single tidy data frame, with columns referenced in the query expressions pivoted to long-form (e.g. "variable1", "value1", "variable2", "value2", ...).

Usage

query_vec(
  x,
  cond,
  element,
  name,
  cols_base,
  name_col = "query_id",
  join_type = "left",
  join_by = NULL,
  pivot_var = "variable",
  pivot_val = "value",
  as_chr = TRUE
)

Arguments

x

A data frame or a list of data frames to query. If a single data frame will vectorize with query, whereas given a list of data frames will use query_list.

cond

Character vector of expressions to evaluate with respect to variables within x.

element

If x is a list of data frames, the names or integer indexes of the focal list element of x corresponding to each query expression (i.e. each element of cond). Only used if x is a list of data frames (see query_list).

name

(Optional) Character vector giving query names/IDs for each of the expressions within cond. If missing the expressions themselves (in string format) are used as names.

cols_base

(Optional) Tidy-selection of other columns within x (or x[[element]]) to retain in the final output. Can be set for an entire session using option "queryr_cols_base", e.g. options(queryr_cols_base = quote(id:site)).

name_col

Column name for the query names/IDs. Defaults to "query_id".

join_type

If x is a list of data frames and cond references variables within elements of x apart from x[[element]], what type of join should be used to join the relevant elements? Options are "left" (the default) and "inner". Based on dplyr join types. Can specify different join types for different query expressions by passing a vector the same length as cond.

join_by

A character vector of variables to join by, or list of vectors the same length as cond. If the join key columns have different names in x[[element]] and x[[other]], use a named vector. For example, join_by = c("a" = "b") will match x[[element]]$a to x[[other]]$b. Can specify different join columns for different query expressions by passing a list of vectors the same length as cond.

pivot_var

Prefix for pivoted variable column(s). Defaults to "variable".

pivot_val

Prefix for pivoted value column(s). Defaults to "value".

as_chr

Logical indicating whether to coerce the columns referenced in the query expression(s) to character prior to returning. This enables row-binding multiple queries with variables of different classes. Defaults to TRUE.

Value

A data frame reflecting the rows of data that match the given queries. Returned columns include:

  • query name/ID column (name taken from argument name_col)

  • (optional) columns matched by argument cols_base

  • columns referenced within the query expressions, pivoted to long form

See Also

query

Examples

data(ll)          # example dataset, an epidemiological linelist
data(ll_queries)  # example data frame defining queries to run on ll

# run all queries defined in ll_queries
query_vec(
  ll,
  cond = ll_queries$query,
  name = ll_queries$query_id,
  cols_base = c(id, site)
)


epicentre-msf/queryr documentation built on July 17, 2025, 12:22 a.m.