update (4/5/2019)

  1. Add mutate method and modified saveSQLDataFrame accordingly.
  2. modified .join_union_prepare() function to have "mutate + rbind/join" work!
  3. vignettes (general + internal)
  4. documentations with easy examples.

update (3/29/2019)

  1. saveSQLDataFrame returns SQLDataFrame constructed from new database and table.
  2. makeSQLDataFrame function added to read text files into SQLite database and return the SQLDataFrame constructed according to the database, table, and dbkey.
  3. Combined *_join and union functions with utility function to avoid duplicate code.
  4. Rewrote union()@dbconcatKey to be more efficient.
  5. BiocGenerics:::replaceSlots().

update (3/25/2019)

  1. .join_union_prepare for attaching database to current connection.
  2. rbind do take-one-at-a-time union.
  3. saveSQLDataFrame write unique index with dbkey(x). Save ridx(x) if input SQLDataFrame comes from rbind (with non-null ridx(x)).
  4. SQLDataFrame() constructor read ridx() if exists.
  5. rewrote dbtable() method. Only works for @tbldata with real database on disk. Doesn't work for lazy tbl.

Questions:

  1. Try real database with different SQL format. (MySQL, BigQuery, ...)
  2. apply()?? (e.g., print...)
  3. coerce into matrix for row/colSums, etc.
  4. tryCatch: Error in result_create(conn@ptr, statement) : parser stack overflow. -- show, .join_union_prepare.
  5. as.data.frame(), show(), extra layer of query?
  6. *_join no S4 generics. -- Keep S3 method.
  7. show() of rbind(ss11, ss21, ss31, ss22) slow. But after saveSQLDataFrame, show is fast. So the many layers of queries slowed down the total performance, including showing of @tblData. Message for users: when you notice slow down, do "saveSQLDataFrame"!
  8. Should overwrite = FALSE check dbname or dbtable? (in makeSQLDataFrame() and saveSQLDataFrame()). Use 2 arguments. add "index = TRUE" for saveSQLDF.
  9. try big database.
  10. add unit tests.
  11. [,]<- assignment, calls mutate().
  12. update DESCRIPTION (description) and vignette intro.

misc:

add a new table "table_metadata",

2 additional tables to add to database: 1) dbkey info. serialize(dbkey(x), NULL) to convert to "raw", in "BLOB" (long binary??).

Update:

  1. .fun() to do complicated calls like $src$con@dbname. ident(dbtable(sdf)), ridx(sdf), normalizeRowindex(sdf) ...
  2. rbind(), 1) write database table as unique rows. 2) return SDF with updateded @indexes.
  3. debug: rbind(ss3, ss2)
  4. ** union, then rbind. -- todo! (sql union? )
  5. ** union, lazy? dbplyr::union.tbl_lazy

  6. printROWS for show method, subset again [1:nhead, ]

  7. .printROWS(), rewrite.
  8. .extract_tbl_rows_by_key,
  9. as.data.frame() rewrite ( match ridx to sort(unique(ridx)) ).

sprint update for rbind

1st trick: attach database as aux 2nd trick: insert into from aux. with queries. (lazy tbl from aux.) 3rd trick: unique database table, updated indexes with returned sdf.

Update:

  1. saveSQLDataFrame create a new path is !exists()?

  2. sdf[list(), ]

  3. sdf[sdf, ]
  4. filter(sdf, key1=="" & key2 > ..); sdf %>% filter(key1 ... key2 ... )
  5. rbind(), copy_to/db_insert_into/dbWriteTable ??
  6. SQLDataFrame show method, print the source? (refer saveSQLDataFrame print message).

row subsetting with numeric indexes,

DOING

DONE

3/5/2019 - modify the "state" table, for unique "region+population". - add "union" method for SDF, which returns union with unique rows with automatic sorting. - reimplement "rbind" method, which extend "union" and update slots of "dbconcatKey" and "indexes". - add @dbconcatKey slot which corresponds to @tblData (has '.0' for numeric columns). now each SQLDF has @dbconcatKey, which is heavy... But anyway it will have the key cols evaluated when [ subsetting. ?? connect the current slot with the - ROWNAMES() applies ridx(x)to dbconcatKey(x), good for filter(, condition) where 'condition' has or doesn't have '.0' in the end. But for '[rnm, ]', has to include '.0' to match. Do not encourage.

reimplementation of SQLDataFrame to keep key columns as fixed and show on the left-hand-side, with | in between to separate key columns with other columns. ncol, and colnames would only correspond to the non-key-columns.

always keep key() column, show as first column.

Initial

slots - add extra slots @indexes in SQLDataFrame to save the row/col indexes. - ?? remove the @colnames slot? and keep colnames() accessor? (consistent with DataFrame) - renamed the SQLDataFrame@rownames slot into dbrownames.

validity check - validity check for dbtable() name. - validity check for the length of @indexes slot.

constructor - update SQLDataFrame() constructor for @indexes slot. - update SQLDataFrame() constructor with specified columns and error message for "col.names" argument.

accessors - update accessors of nrow(), ncol(), dim(), length(), colnames(), rownames() to reflect the @indexes slot.

show - add utility function .extract_tbl_from_SQLDataFrame() to return tbl_dbi object with row/col filter/selection from @indexes. - ??? add utility function .extract_tbl_rows_by_key() to extract certain rows by key. Need to rewrite, by removing the call of dbkey(x). - update show,SQLDataFrame to reflect the row/col indexes.

[, [[ - add extractROWS,SQLDataFrame, with both input and output to be SQLDataFrame object. - define [[,SQLDataFrame to return SQLDataFrame object. Only for single column extraction and do realize automatically. (works like "drop=TRUE") - define $,SQLDataFrame method, which calls [[,SQLDataFrame. - define [,SQLDataFrame to return SQLDataFrame object by adding/updating @indexes slot.



Liubuntu/SQLDataFrame documentation built on May 17, 2019, 7:43 a.m.