findGrain_: Find Data Grain

Description Usage Arguments Value Examples

View source: R/findGrain.R

Description

This function attempts to answer the question: At what grain (combination of dimensions) are records in a dataframe unique? It searches over all combinations of candidate dimensions (specified via ... or .dots), up to combinations of size max_comb, and returns a list of the combinations ranked according to maximum de-duplication of the table.

Usage

1
2
3
findGrain_(data, ..., .dots, max_comb = 3, topn = 25)

findGrain(data, ..., max_comb = 3, topn = 25)

Arguments

data

A dataframe.

...

Bare variable names of candidate dimensions to test for defining the grain of data.

.dots

Quoted variable names of candidate dimensions.

max_comb

Maximum number of candidate dimensions to combine for testing grain.

topn

Number of top combinations to report in results.

Value

A dataframe with, for each combination of dimensions, the number of duplicate cases (i.e., cases that share the same values of the key variables with one or more other cases). Ideally, if the true grain of the table were found, there would be 0 duplicates.

Examples

1
2
data(mtcars)
findGrain(mtcars, gear, cyl, carb)

rebelrebel04/xplor documentation built on May 27, 2019, 4:01 a.m.