query: Query an R data frame with SQL

View source: R/query.R

queryR Documentation

Query an R data frame with SQL

Description

query takes a SQL SELECT statement and uses it to query an R data frame

Usage

query(data, sql)

Arguments

data

a data frame or data frame-like object (optional)

sql

a character string containing a SQL SELECT statement

Details

If the data argument is not specified, then the FROM clause of the SQL statement determines which data frame to query.

The names of data frames and columns are case-sensitive (like in R). Keywords and function names are not case-sensitive (like in SQL).

In addition to R data frames and tibbles (tbl_df objects), this function can query dtplyr_step objects created by dtplyr, a data.table backend for dbplyr. It is also possible to use this function together with dbplyr to query remote database tables (tbl_sql objects), but this depends on which database and which backend package (if any) you are using, so results may vary.

This function is subject to the current limitations of the queryparser package.

Value

An object of the same class as data.

Examples

library(dplyr)

iris %>% query("SELECT Species, AVG(Petal.Length) GROUP BY Species")

query("SELECT Species, AVG(Petal.Length) FROM iris GROUP BY Species")

iris %>%
  filter(Petal.Length > 4) %>%
  query("SELECT Species, MAX(Sepal.Length) AS max_sep_len
           GROUP BY Species") %>%
  arrange(desc(max_sep_len))

library(nycflights13)

query <- "SELECT origin, dest,
    COUNT(flight) AS num_flts,
    round(AVG(distance)) AS dist,
    round(AVG(arr_delay)) AS avg_delay
  FROM flights
  WHERE distance BETWEEN 200 AND 300
    AND air_time IS NOT NULL
  GROUP BY origin, dest
  HAVING num_flts > 5000
  ORDER BY num_flts DESC, avg_delay DESC
  LIMIT 100;"

query(query)

tidyquery documentation built on Jan. 14, 2023, 5:07 p.m.