knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

date: "last edit: 9/21/2022"

Load packages into R session. It will automatically load the package of dplyr and dbplyr.

library(SQLDataFrame)
library(DBI)

SQLDataFrame slots

dbfile <- system.file("extdata/test.db", package = "SQLDataFrame")
conn <- DBI::dbConnect(dbDriver("SQLite"), dbname = dbfile)
dbListTables(conn)
obj <- SQLDataFrame(
    conn = conn, dbtable = "state", dbkey = "state")
obj

Slot accessors

To make the SQLDataFrame object as light and compact as possible, there are only 5 slots contained in the object: tblData, dbkey, dbnrows, dbconcatKey, indexes. Metadata information could be returned through these 5 slots using slot accessors or other utility functions.

slotNames(obj)
dbtable(obj)
dbkey(obj)
dbcon(obj)

tblData slot

The tblData slot saves the dbplyr::tbl_dbi version of the database table, which is a light-weight representation of the database table in R. Of note is that this lazy tbl only contains unique rows. The rows could also be sorted by the dbkey(obj) if the SQLDataFrame object was generated from union or rbind. So when the saveSQLDataFrame() function was called, a database table will be written into a physical disk space containing only the unique records.

Accessor function is made avaible for this slot:

tblData(obj)

dbnrows and dbconcatKey

The dbnrows slot saves the number of rows corresponding to the tblData, and dbconcatKey saves the realized (concatenated if multiple) key columns corresponding to the tblData. Accessor functions are also available for these 2 slots:

dbnrows(obj)  ## equivalent to nrow(obj)
dbconcatKey(obj)

indexes slot

The indexes slots is an unnamed list saving the row and column indexes respectively corresponding to the tblData slot, so that the SQLDataFrame could possibly have duplicate rows or only a subset of data records from the tblData, while the tblData slot doesn't need to be changed. To be consistent, the slots of dbnrows and dbconcatKey will also remain unchanged.

obj@indexes
obj_sub <- obj[sample(5, 3, replace = TRUE), 2:3]
obj_sub
obj_sub@indexes
identical(tblData(obj), tblData(obj_sub))
identical(dbnrows(obj), dbnrows(obj_sub))
nrow(obj)
nrow(obj_sub)

With a filter or select function (which is similar to [i, ] subsetting), only the indexes slot will be updated for the row or column index pointing to the tblData.

obj_filter <- obj %>% filter(division == "South Atlantic" & size == "medium")
obj_filter@indexes
identical(tblData(obj), tblData(obj_filter))

obj_select <- obj %>% select(division, size)
obj_select@indexes
identical(tblData(obj), tblData(obj_select))

SQLDataFrame methods

ROWNAMES

The ROWNAMES,SQLDataFrame method was defined to return the (concatenated if multiple) key column(s) value, so that the row subsetting with character vector works for the SQLDataFrame objects.

rnms <- ROWNAMES(obj)
obj[sample(rnms, 3), ]

For SQLDataFrame object with composite keys:

obj1 <- SQLDataFrame(conn = conn, dbtable = "state",
                     dbkey = c("region", "population"))
ROWNAMES(obj1[1:10,])
obj1[c("South:3615.0", "West:365.0"), ]

SessionInfo()

sessionInfo()

Compatibility with GenomicRanges and SummarizedExperiment

library(SQLDataFrame)
library(DBI)
dbfile <- system.file("extdata/test.db", package = "SQLDataFrame")
conn <- DBI::dbConnect(DBI::dbDriver("SQLite"), dbname = dbfile)
obj <- SQLDataFrame(conn = conn, dbtable = "state", dbkey = "state")
dim(obj)
is(obj, "DataFrame")

library(GenomicRanges)
gr <- GRanges("chr1", IRanges(1:50, 60))
mcols(gr) <- obj

## assay <- array(1:10, dim=c(50, 2))
## cdata <- obj[1:2, ]
## SE(assays = assya, rowRanges = gr, colData = cdata) ## see inst/extdata/sqldf_udpate.Rmd for FIXME tag!!! 


Bioconductor/SQLDataFrame documentation built on Oct. 25, 2023, 6:32 a.m.