digest: Diagnose Static Data Relationships

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Description

Break a data frame into components static on variants of a proposed key.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
## S3 method for class 'digest'
as.best(x, ...)
## S3 method for class 'data.frame'
as.digest(x, key = character(0), strict = TRUE, ...)
## S3 method for class 'digest'
as.digest(x, ...)
## S3 method for class 'keyed'
as.digest(x, key = match.fun("key")(x), strict = TRUE, ...)
## S3 method for class 'nm'
as.digest(x,key=match.fun('key')(x),...)
## S3 method for class 'nm'
as.keyed(x, key = match.fun("key")(x), ...)
## S3 method for class 'digest'
head(x, ...)

Arguments

x

object of dispatch

key

a vector of column names in x representing a proposed object hierarchy

strict

passed to lyse

...

passed to or from other functions

Details

Well-constructed data tables typically admit a set of columns (a key), the interaction of which uniquely distinguish all rows. The columns may be ordered from most general to most specific, in which case they may be thought of as an object hierarchy. The hierarchy accounts for structural redunancy of identifier variables across rows. When exploring data, it may be useful to remove such redundancy to focus on singular relationships within the data (e.g., like static).

digest recursively cleaves a data frame using appropriate subsets of a key. The original data frame and any dynamic residuals are cleaved using increasingly longer left subsets (empty; 1; 1,2; 1,2,3; etc.) of the proposed key. Effectively, this is a search for columns that are static on (i.e. are attributes of) various objects and sub-objects. The static results of cleaving, if any, are further explored (if possible) with increasingly shorter right subsets (e.g. 1,2,3; 2,3; 3) to detect any columns that are super-keyed: i.e. are still strictly attributes of some sub-object, without appeal to more general hierarchical levels. digest returns a list of keyed data frames, such that each original non-key column appears in exactly one data frame, together with the smallest necessary set of key columns, and all siblings (like-keyed non-key columns). If indeed the proposed key completely distinguishes all rows, the result consists only of static data frames. Otherwise, the last data frame is dynamic. For columns that are constant in the data, irrespective of the proposed key, the key of the sub-result has length zero. The resulting key for a dynamic sub-result is the last key tried (possibly different from the proposed key, as elements may be removed from consideration if they are themselves static on some prior key). Elements are named with their keys, pasted together with dots; except if the key is character(0), the name will be a single dot, or two dots for the last element if it is dynamic on the proposed key.

Value

as.digest and as.best.digest return an object of class digest: a list of keyed data frames, with names suggesting their keys ('.' for character(0), '..' for a dynamic data frame).

Note

digest is an alias for the generic as.digest.

Author(s)

Tim Bergsma

References

http://metrumrg.googlecode.com

See Also

Examples

1
2
digest(Theoph,c('Subject','Time'))
head(digest(Theoph,c('Subject','Time')))

metrumrg documentation built on May 2, 2019, 5:55 p.m.