query: Query an R data frame with SQL

Description Usage Arguments Details Value Examples

View source: R/query.R

Description

query takes a SQL SELECT statement and uses it to query an R data frame

Usage

1
query(data, sql)

Arguments

data

a data frame or data frame-like object (optional)

sql

a character string containing a SQL SELECT statement

Details

If the data argument is not specified, then the FROM clause of the SQL statement determines which data frame to query.

The names of data frames and columns are case-sensitive (like in R). Keywords and function names are not case-sensitive (like in SQL).

In addition to R data frames and tibbles (tbl_df objects), this function can query dtplyr_step objects created by dtplyr, a data.table backend for dbplyr. It is also possible to use this function together with dbplyr to query remote database tables (tbl_sql objects), but this depends on which database and which backend package (if any) you are using, so results may vary.

This function is subject to the current limitations of the queryparser package.

Value

An object of the same class as data.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
library(dplyr)

iris %>% query("SELECT Species, AVG(Petal.Length) GROUP BY Species")

query("SELECT Species, AVG(Petal.Length) FROM iris GROUP BY Species")

iris %>%
  filter(Petal.Length > 4) %>%
  query("SELECT Species, MAX(Sepal.Length) AS max_sep_len
           GROUP BY Species") %>%
  arrange(desc(max_sep_len))

library(nycflights13)

query <- "SELECT origin, dest,
    COUNT(flight) AS num_flts,
    round(AVG(distance)) AS dist,
    round(AVG(arr_delay)) AS avg_delay
  FROM flights
  WHERE distance BETWEEN 200 AND 300
    AND air_time IS NOT NULL
  GROUP BY origin, dest
  HAVING num_flts > 5000
  ORDER BY num_flts DESC, avg_delay DESC
  LIMIT 100;"

query(query)

Example output

Attaching package:dplyrThe following objects are masked frompackage:stats:

    filter, lag

The following objects are masked frompackage:base:

    intersect, setdiff, setequal, union

# A tibble: 3 x 2
  Species    `mean(Petal.Length, na.rm = TRUE)`
  <fct>                                   <dbl>
1 setosa                                   1.46
2 versicolor                               4.26
3 virginica                                5.55
Warning messages:
1: In readChar(rc, nchars) : truncating string with embedded nuls
2: In readChar(rc, nchars) : truncating string with embedded nuls
3: In readChar(rc, nchars) : truncating string with embedded nuls
4: In readChar(rc, 1L, useBytes = TRUE) :
  truncating string with embedded nuls
5: In readChar(rc, 1L, useBytes = TRUE) :
  truncating string with embedded nuls
6: In readChar(rc, 1L, useBytes = TRUE) :
  truncating string with embedded nuls
# A tibble: 3 x 2
  Species    `mean(Petal.Length, na.rm = TRUE)`
  <fct>                                   <dbl>
1 setosa                                   1.46
2 versicolor                               4.26
3 virginica                                5.55
Warning messages:
1: In readChar(rc, nchars) : truncating string with embedded nuls
2: In readChar(rc, nchars) : truncating string with embedded nuls
3: In readChar(rc, nchars) : truncating string with embedded nuls
4: In readChar(rc, 1L, useBytes = TRUE) :
  truncating string with embedded nuls
5: In readChar(rc, 1L, useBytes = TRUE) :
  truncating string with embedded nuls
6: In readChar(rc, 1L, useBytes = TRUE) :
  truncating string with embedded nuls
# A tibble: 2 x 2
  Species    max_sep_len
  <fct>            <dbl>
1 virginica          7.9
2 versicolor         7  
Warning messages:
1: In readChar(rc, nchars) : truncating string with embedded nuls
2: In readChar(rc, nchars) : truncating string with embedded nuls
3: In readChar(rc, nchars) : truncating string with embedded nuls
4: In readChar(rc, 1L, useBytes = TRUE) :
  truncating string with embedded nuls
5: In readChar(rc, 1L, useBytes = TRUE) :
  truncating string with embedded nuls
6: In readChar(rc, 1L, useBytes = TRUE) :
  truncating string with embedded nuls
# A tibble: 1 x 5
  origin dest  num_flts  dist avg_delay
  <chr>  <chr>    <int> <dbl>     <dbl>
1 EWR    BOS       5247   200         5
There were 40 warnings (use warnings() to see them)

tidyquery documentation built on Feb. 6, 2021, 9:15 a.m.