knitr::opts_chunk$set( collapse = TRUE, comment = "#---" ) library(smalltableobject) create_test_data <- function(conn, dbtype = "sqlite", n = 100, seed = 20190427){ #checkmate::assert_choice(dbtype, choices = c("sqlite")) #checkmate::assert_true(DBI::dbIsValid(conn)) if(dbtype == "sqlite"){ # Create connection object to sqlite. con_temp <- conn #DBI::dbListTables(con_temp) # Create a table with all pertinent datatypes # -Date/time specific class is not an option for sqlite. # -SQLIte has 4 "storage classes" and 1 column can mix different types(!). # -More on data types : http://www.sqlitetutorial.net/sqlite-data-types/ DBI::dbExecute(con_temp, "CREATE TABLE test1 (a int , b real , c text , d blob )") # Insert 100 random rows for each column. # -No missing values. # -Ensure both positive and negative numerics. # -Ensure empty strings, but not NULLs/NA. Ensure åäö nordic characters in strings. # -Ensure dates in the past, and the future. (not applicable here) DBI::dbExecute(con_temp, "INSERT INTO test1 VALUES( 9223372036854775807, 9.0, 'abcdefghijklmnopqrstuvxyzåäö 1234567890?!', x'050015a0' ), ( -9223372036854775805, -9.9, '', x'0419ffff' ) ") # Set seed & table size. size_n <- n set.seed(seed = seed, kind = "default") # Generate random numbers to df, and ... randomrows <- data.frame( a = sample.int(n = 999999999999, size = size_n) * sample( x = c(-1, 1), size = size_n, replace = T ), b = runif( n = size_n, min = -99999999, max = 99999999 ), c = replicate(n = size_n, expr = paste("'", paste( sample( x = c(LETTERS, letters), size = 10, replace = T ), collapse = "" ), "'", sep = "") ), d = replicate(n = size_n, expr = paste( "x'", paste(sample( x = c(0:9, letters[1:6]), size = 10, replace = T ), collapse = ""), "'", sep = "" )) ) # ... prep for SQL insert... inserts <- paste(apply( X = randomrows, MARGIN = 1, FUN = function(x) paste("(", paste(x, collapse = ","), ")") ), collapse = ",") # ... and insert into data base table. DBI::dbExecute(con_temp, paste("INSERT INTO test1 VALUES", inserts)) # Disconnect db connection. #DBI::dbDisconnect(con_temp) } return(T) } get_sto <- function(n = 100, seed = 20202020){ tryCatch({ # Create data and an object. tf <- tempfile() create_test_data(conn = DBI::dbConnect(drv = RSQLite::SQLite(), dbname = tf), n = n, seed = seed) sto2 <- NULL sto2 <- SmallTableObject$new(dbtype = "sqlite", host = tf, tablename = "test1") return(sto2) }, error = function(e){ stop("Couldnt create new object.") })#end trycatch } tmpfile <- tempfile(fileext = ".sqlite") ### Temporary db. tempfile_con <- DBI::dbConnect(drv = RSQLite::SQLite(), dbname = tmpfile) ### Connection. create_test_data(conn = tempfile_con) ### Magic function creates test data.
smalltableobject makes (small) SQL-tables essentially into data frames. That means you can access pesky little SQL-tables using any R function expecting a data frame. You're welcome.Package features
sto[x, y] <- m immediately syncs to DBIn short: loads an SQL-table into memory + overwrites underlying table when changed. Not elegant, but brutal.
Side note smalltableobject or STO, or sto, for short was developed in _Sto_ckholm, Sweden.
Package Tagline: Tired of moving small tables between R and a SQL-database? Try smalltableobject!
Intended audience: the RAD (Rock-star Analyst or Developer).
We created a small SQL-table test1 and now it's time to put it into a STO object. One column is a raw byte column, special for sqlite-tables.
require(smalltableobject) ### Load pack. sto1 <- smalltableobject$new(dbtype = "sqlite", host = tmpfile, tablename = "test1") ### Get table.
DBI::dbGetQuery(tempfile_con, "SELECT count(*) FROM test1") nrow(sto1[]) ### Note brackets. _Always_ use brackets.
Now we have sto1 as our object to access that table.
We can definitely use the STO as a regular data frame.
sto1[1, 1]
sto1[1, ]
Any function expecting a data frame can be used.
head(sto1[, 3]) ### Any function taking data frames will work. summary(sto1[, 1])
For all practical purposes, this is just another data frame.
apply(sto1[1:10, ], MARGIN = 1, FUN = function(x)length(x) ) nrow(sto1[sto1[]$b > median(sto1[]$b), ])
Dplyr will not notice the difference between your object and a real data frame.
library(dplyr, warn.conflicts = F) sto1[, ] %>% filter(b < 0) %>% ### Filter on named column slice(1:2) ### This operation cannot be reliably performed, as rows gets shuffled.
The STO checks for type safety, when you assign new data to it.
class(sto1[, "a"]) sto1[1, "a"] <- -999 ### Triggers and immediate write to DB. sto1[1, "a"] ### Only sqlite preserves ordering. Other DBs reshuffle.
sto1[1, "a"] <- "one cell wrong type" sto1[1, "a"] ### R has this behaviour. What can you do.
tryCatch({ sto1[, "a"] <- "wrong type" ### At least with this we get an error. }, error = function(e){print(e)})
Beware, as each write, or assignment, to the data frame triggers an upload. This behaviour could change in the future if need be to allow for caching of writes, or using a unique key and only updating changed rows.
What if some joker changes the table you are working on? And you assign your changed data on top of his/hers? Will the STO overwrite ? Nope.
The STO does a simple hashing of columns. The local data and the remote data are compared using the hashing. If hashes differ, you get ERROR. Guaranteed to signal ERROR if row counts change, or parts of rows are changed.
This example adds some lines to underlying table. Any writes are now rejected. The STO is not forcing any writes.
tmp <- DBI::dbExecute(tempfile_con, "INSERT INTO test1 VALUES(1, 2, 'c', x'010203a4')") tryCatch({ sto1[1, 1] <- 909 }, error = function(e){print(e)})
Great work reading all the way down! Here is the summary :
smalltableobject removes the hassle of updating small tables on a remote DB. Thanks for using this piece of software. Make issues or pull requests or contact me on Github to suggest improvements. Or report bugs.
DBI::dbDisconnect(tempfile_con)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.