fold.data.frame: Fold a Data Frame
In fold: A Self-Describing Dataset Format and Interface

Description Usage Arguments Details Value See Also Examples

Folds a data.frame. Stacks columns, while isolating metadata and capturing keys.

1
2
3

## S3 method for class 'data.frame'
fold(x, ..., meta = obj_attr(x), simplify = TRUE,
  sort = TRUE, tol = 10)

`x`	data.frame
`...`	unquoted names of grouping columns. See also `fold.grouped_df`. Alternatively, pre-specify as a groups attribute (character vector).
`meta`	a list of formulas in the form object ~ attribute. Pass something with length 0 to suppress guessing.
`simplify`	set to NA any groups values that do not help distinguish values, and remove resulting duplicate records
`sort`	whether to sort the result
`tol`	maximum number of categories for guessing whether to encode metadata; encoding will always be attempted if metadata (attr) or its referent (obj) is a factor

See package?fold for micro-vignette.

A folded data.frame is formalized re-presentation of a conventional data.frame. Items in the conventional form are of three conceptual types: data, metadata, and keys. Data items contain the primary values, to be described. Metadata gives additional details about the data items or values. Keys are grouping items; combinations of grouping values should uniquely identify each conventional record.

In the result, names of data items appear in VARIABLE, while values of data items are stacked in VALUE. Data items are all columns from the input not otherwise identified as metadata or keys.

Likewise, names of metatdata items appear in META, while the name of the described data item appears in VARIABLE. Values of metadata items appear in VALUE. The metadata VALUE will be an encoding (see package: encode) if there is exactly one unique metadata value corresponding to each unique data value, AND one of the two is a factor (or neither factor, but there are tol or fewer unique values of data, and multiple unique values of metadata). Metadata items are identified explicitly using a list of formulas, or implicitly by means of column naming conventions.

Grouping items that are present in the input persist in the result and serve as keys. Both data and metadata values may have keys, but neither require them. Keys are identified explicitly by supplying unnamed, unquoted arguments (non-standard evaluation). Use dplyr::group_by to add groups that will be respected when fold.grouped_df (or generic) is called. Or supply a groups attribute to the data.frame, e.g. attr(x,'groups') <- c('USUBJID','TIME').

By default, superflous keys (those that do not help distinguish data items) are removed on a per-data-item basis. Column order is used to resolve ambiguities: checking proceeds right to left, preferentially discarding keys to the right.

Note that metadata items may describe other metadata items, recursively. In practice, however, such complexity could be problematic and is best avoided if possible.

There are cases where a real grouping item may appear superfluous, e.g. for a one-record dataset. Enforce the groups by setting simplify to FALSE.

The folded format supports mixed object types, as inferred from differences in relevant grouping items on a per record basis. Mixed typing works best when object types form a nested hierarchy, i.e. all keys are left-subsets of the full key. Thus the order of grouping values is considered informative, e.g. for sorting.

folded data.frame with columns VARIABLE, META, VALUE and any supplied grouping items.

obj_attr.data.frame fold print.folded simplify.folded sort.folded unfold.folded

library(magrittr)
library(dplyr)
data(events)
x <- events
x %<>% filter(CMT == 2) %>% select(-EVID,-CMT,-AMT)
x %>% fold(USUBJID,TIME)
x %>% fold(USUBJID,TIME, meta = list(DV ~ BLQ, DV ~ LLOQ))
x <- events %>% 
  filter(CMT == 2) %>% 
  select(ID, TIME, TAD, DV, BLQ, LLOQ, SEX) 
x
attr(x,'groups') <- c('ID','TIME')

# less than 10 values of DV, so BLQ looks like an encoding
y <- x  %>% fold(meta=list(DV~BLQ,BLQ~LLOQ))
y %>% data.frame

# reducing the tolerance forces BLQ to match by groups (ID, TIME) instead of DV value
z <- x %>% fold(meta=list(DV~BLQ,BLQ~LLOQ),tol=3)
z

# another example
x <- Theoph
x %<>% mutate(
  conc_LABEL = 'theophylline concentration',
  conc_GUIDE = 'mg/L',
  Time_LABEL = 'time since drug administration',
  Time_GUIDE = 'hr',
  Time_HALF = Time / 2 # to demonstrate variant attribute of key column
)
x %<>% fold(Subject, Time)
x %>% unfold %>% head