blocks_to_rowrecs_q: Map sets rows to columns (query based, take name of table).
In cdata: Fluid Data Transformations

blocks_to_rowrecs_q

R Documentation

Map sets rows to columns (query based, take name of table).

Description

Transform data facts from rows into additional columns using SQL and controlTable.

Usage

blocks_to_rowrecs_q(
  tallTable,
  keyColumns,
  controlTable,
  my_db,
  ...,
  columnsToCopy = NULL,
  tempNameGenerator = mk_tmp_name_source("mvtcq"),
  strict = FALSE,
  controlTableKeys = colnames(controlTable)[[1]],
  checkNames = TRUE,
  checkKeys = FALSE,
  showQuery = FALSE,
  defaultValue = NULL,
  dropDups = TRUE,
  temporary = FALSE,
  resultName = NULL,
  incoming_qualifiers = NULL,
  outgoing_qualifiers = NULL,
  executeQuery = TRUE
)

Arguments

`tallTable`	name of table containing data to be mapped (db/Spark data)
`keyColumns`	character list of column defining row groups
`controlTable`	table specifying mapping (local data frame)
`my_db`	db handle
`...`	force later arguments to be by name.
`columnsToCopy`	character list of column names to copy
`tempNameGenerator`	a tempNameGenerator from cdata::mk_tmp_name_source()
`strict`	logical, if TRUE check control table name forms
`controlTableKeys`	character, which column names of the control table are considered to be keys.
`checkNames`	logical, if TRUE check names
`checkKeys`	logical, if TRUE check keying of tallTable
`showQuery`	if TRUE print query
`defaultValue`	if not NULL literal to use for non-match values.
`dropDups`	logical if TRUE suppress duplicate columns (duplicate determined by name, not content).
`temporary`	logical, if TRUE make result temporary.
`resultName`	character, name for result table.
`incoming_qualifiers`	optional named ordered vector of strings carrying additional db hierarchy terms, such as schema.
`outgoing_qualifiers`	optional named ordered vector of strings carrying additional db hierarchy terms, such as schema.
`executeQuery`	logical, if TRUE execute the query and return result.

Details

This is using the theory of "fluid data"n (https://github.com/WinVector/cdata), which includes the principle that each data cell has coordinates independent of the storage details and storage detail dependent coordinates (usually row-id, column-id, and group-id) can be re-derived at will (the other principle is that there may not be "one true preferred data shape" and many re-shapings of data may be needed to match data to different algorithms and methods).

The controlTable defines the names of each data element in the two notations: the notation of the tall table (which is row oriented) and the notation of the wide table (which is column oriented). controlTable[ , 1] (the group label) cross colnames(controlTable) (the column labels) are names of data cells in the long form. controlTable[ , 2:ncol(controlTable)] (column labels) are names of data cells in the wide form. To get behavior similar to tidyr::gather/spread one builds the control table by running an appropriate query over the data.

Some discussion and examples can be found here: https://winvector.github.io/FluidData/FluidData.html and here https://github.com/WinVector/cdata.

Value

wide table built by mapping key-grouped tallTable rows to one row per group

Examples


if (requireNamespace("DBI", quietly = TRUE) &&
  requireNamespace("RSQLite", quietly = TRUE)) {
  my_db <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
  # pivot example
  d <- data.frame(meas = c('AUC', 'R2'), val = c(0.6, 0.2))
  rquery::rq_copy_to(my_db,
                    'd',
                    d,
                    temporary = TRUE)
  cT <- build_pivot_control_q('d',
                              columnToTakeKeysFrom= 'meas',
                              columnToTakeValuesFrom= 'val',
                              my_db = my_db)
  tab <- blocks_to_rowrecs_q('d',
                             keyColumns = NULL,
                             controlTable = cT,
                             my_db = my_db)
  qlook(my_db, tab)
  DBI::dbDisconnect(my_db)
}

cdata documentation built on Aug. 20, 2023, 1:06 a.m.