sqldf | R Documentation |
sqldf is an R package for running SQL statements on R data frames, optimized for convenience. The user simply specifies an SQL statement in R using data frame names in place of table names and a database with appropriate table layouts/schema is automatically created, the data frames are automatically loaded into the database, the specified SQL statement is performed, the result is read back into R and the database is deleted all automatically behind the scenes making the database's existence transparent to the user who only specifies the SQL statement. Surprisingly this can at times be even faster than the corresponding pure R calculation (although the purpose of the project is convenience and not speed).
sqldf(statement, ..., conn, tx)
statement |
A SQL statement. Use |
... |
parameters to be interpolated into |
conn |
optional, a DBIConnection object. |
tx |
optional, one of |
All DBI
drivers are supported, although features may be limited by driver support. RSQLite and
duckdb are recommended.
DBIsqldf
is free software published under the GNU General Public License that can be downloaded from CRAN.
sqldf.driver
: A DBIDriver (or cloneable DBIConnection) object for a transient database connection when conn
is omitted,
or an expression or function that yields one. The default is quote(RSQLite::SQLite)
How does DBIsqldf handle classes and factors?
It doesn't. The exact behaviour will depend on the database you choose to use. This includes dates.
Why does DBIsqldf seem to mangle certain variable names?
If a data frame contains variables with names that are not valid in SQL, the behavior is left up to the database driver you choose to use. Notably, periods can cause issues - you may need to double quote the affected variables in your SQL statement. Also please be aware that SQL is not case sensitive, while R is.
Why does sqldf("select var(x) from DF") not work?
Functions are provided by the database you choose to use, not by R.
Why am I having problems with update?
Be sure to set tx
to commit to persist any changes you have made to the database.
Why do certain calculations come out as integer rather than double?
Calculations performed within the database will follow the database's rules.
A data frame of results if statement
was a query, otherwise nothing.
if(is.null(getOption("sqldf.driver")) && !requireNamespace("RSQLite", quietly=TRUE)) { message("Please configure `options(sqldf.driver)` or install RSQLite to run below examples.") } else { data(iris, envir=environment()) # head sqldf("select * from iris limit 5") # Filter sqldf("select * from iris where species = 'virginica' limit 5 ") # Parameterized query sqldf("select * from iris where species = ? limit 5", "versicolor") # Aggregate, quoting sqldf('select species, avg("Petal.Width") from iris group by 1 limit 5') # Compare with aggregate(Petal.Width~Species, iris, FUN = var) # CTE, back join sqldf(' with tbl_width as ( select species, avg("Petal.Width") as xbar, count(1) as n from iris group by 1 ) select Species, 1.0/(n-1) * sum(("Petal.Width" - xbar)*("Petal.Width" - xbar)) as var from iris join tbl_width using (species) group by 1 ') } if(!requireNamespace("RSQLite", quietly=TRUE)) { message("Below examples require RSQLite specifically.") } else { # Manually managing the DB connection, and writing to a database conn <- DBI::dbConnect(RSQLite::SQLite(), tempfile()) data(mtcars, envir=environment()) sqldf("create table usacars as select * from mtcars where am = 1", conn=conn, tx='commit') #Now compare DBI::dbListTables(conn) # vs persisted tables sqldf("SELECT type, name FROM sqlite_schema", conn=conn) # hang up and reconnect DBI::dbDisconnect(conn) conn <- DBI::dbConnect(conn) DBI::dbListTables(conn) sqldf("SELECT cyl, count(1) as n from usacars group by 1", conn=conn) DBI::dbDisconnect(conn) }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.