olapConstruct: Create Data for Discovery-driven Exploration of OLAP Data...

Description Usage Arguments Value Author(s) References Examples

Description

This function is an implementation in R to realize the methodology covered in the paper "Discovery-driven Exploration of OLAP Data Cubes" by Sunita Sarawagi, Rakesh Agarawal and Nimord Megiddo. This methodology is to identify anomalies in multi-dimensional OLAP data cubes. The author does not own the methodology proposed in this paper. Please contact the author at g.t.tongchuan@gmail.com for any potential violation or removal of the package.

Usage

1
olapConstruct(data, measure, dimensions, user_agg_function, exception_tau=2.5, one_side_trim, output_dir, output_csv=FALSE)

Arguments

data

a data frame containing OLAP data cube structure with multiple columns representing dimensions and one column representing measure.

measure

a character string representing the name of the column in data which should be treated as measure.

dimensions

a character vector representing the names of the columns in data which should be treated as dimensions.

user_agg_function

a aggregation function that user can choose from either "mean" or "sum".

exception_tau

a numeric number representing the threshold for the standard residual to determine exception, with default value as 2.5 corresponding to a probability of 99 pct in the normal distribution

one_side_trim

a numeric value representing the fraction (0 to 0.5) of observations to be trimmed from each end in taking averages.

output_dir

a string representing the output directory.

output_csv

a boolen representing whether to output the result as csv, with default value as FALSE. Note that the result in .rds format will always be output to the output directory.

Value

The result is a dataframe in .rds format with rows representing all cells in each groupby, containing 'SelfExp', 'InExp' and 'PathExp' to aid the user in identifying anomalies. Values calculated in the intermediate steps to derive the final results are also saved in extra columns.

Author(s)

Tongchuan Yu

References

Sunita Sarawagi, Rakesh Agrawal, and Nimrod Megiddo. Discovery-driven exploration of OLAP data cubes. Research Report RJ 10102 (91918), IBM Almaden Research Center, San Jose, CA 95120, January 1998. Available from http://www.almaden.ibm.com/cs/quest.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
require(olapConstruct)
recoveryData_dimensions = names(recoveryData)[-ncol(recoveryData)]
current_wd = getwd()

olapConstruct (data=recoveryData,
               measure="AvgOfDEF_PRICE",
               dimensions=recoveryData_dimensions,
               user_agg_function="mean",
               exception_tau, one_side_trim=0,
               output_dir=current_wd, output_csv=FALSE)

ASound18/olapConstruct documentation built on May 8, 2019, 5:40 p.m.