create_schema: Create Mondrian schema

Description Usage Arguments Details Examples

Description

This function returns XML definition of Mondrian schema. Optionally returns file with data source definition for Saiku.

Usage

1
2
3
4
create_schema(engine = NA, table = NA, primary_key = NA, con = NA,
  dimension = NA, aggregator = NA, schema_dest = NA,
  data_source_dest = NA, time_table = NA, calculated_member = NA,
  security_type = NA, security_roles = NA, debug = FALSE)

Arguments

engine

Data engine. Valid options are: R, PostgreSQL. If PostgreSQL is used then con is required. Required.

table

Table for which schema should be created. In Mondrian terminology, this is fact table

primary_key

Primary key of table. Has to be unique. Required

con

Connection to PostgreSQL database. Character vector of exactly five (5) elements: 'user', 'password', 'database', 'host', 'port'. Required for PostgreSQL.

dimension

Rule to modify inclusion/exclusion of columns in table as dimenions. Has to be defined as valid SQL CASE statement. Optional.

aggregator

Rule to modify inclusion/exclusion of aggregators for columns in table. Has to be defined as valid SQL CASE statement and always return string with exactly six (6) digits. If same aggregetors should be used for all variables then string with exactly 6 digits can be used instead of case statement. If digit is 1 than aggregator is enabled. If 0 then aggregator is disabled. Aggregators are defined in the following order: average, count, distinct-count, maximum, minimum, sum. Argument is evaluated after argument dimension. Optional

schema_dest

Path to file where Mondrian schema will be stored. Has to include file name.

data_source_dest

Path to file where data source definition for Saiku will be stored. Has to include file name. Optional.

calculated_member

List defining additional calculated members (measures calculated on results of other measures). Unlimited number of calculated members can be created. Every component in list has to have exactly three elements:

  • Name of calculated member. This will be displayed as user-friendly name in Mondrian compatible analytical tools

  • Formula. Valid MDX expression for calculated member.

  • Format. Format to apply. If default should be used then use NA.

Optional.

time_table

Name of table with time dimension. If PostgreSQL than this should be name of existing table with columns time_date, year_number, quarter_number, month_number. If R than any name can be used as time dimension is created dynamically. Optional. If not used than date columns will be treated as generic dimension.

security_type

Type of security for Saiku. Currently only one2one is supported. Using this makes sense only when security_roles_xml is used. Optional.

security_roles

Vector defining what roles will have access to cube. Meaningful only when security_typ is set to one2one. When only one role is given than this will be used to GRANT all rights to cube to this role. When two roles are given than the first will be used to GRANT all rights to cube and the second will be used to DENY access to everything. Optional.

debug

Print additional information useful for debugging.

Details

Arguments dimension and aggregator are required to be valid SQL CASE statement. Case statement is similar to if statement available in many languages. Basic structure is: CASE WHEN <CONDITION> THEN <VALUE> [[WHEN <CONDITION> THEN <VALUE>] ELSE <VALUE>] END. Both dimension and aggregator arguments are used to modify content of data frame with default design (see get_default_design for details). This means that you can use columns name, class, type, dimension, aggregator in case statement. Examples for dimension:

Examples for aggregator:

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# Get some valid destinations
schema_dest <- paste0(getwd(),'/','test.xml')
data_source_dest <- paste0(getwd(),'/','test')

# Simple schema, no time dimension, no data source definition
create_schema(engine='R',table='big_portfolio',primary_key='id', schema_dest=schema_dest)

# Do this again, but print additional debug messages.
create_schema(engine='R',table='big_portfolio',primary_key='id',
              schema_dest=schema_dest, debug=TRUE)

# Create schema with time dimension
create_schema(engine='R',table='big_portfolio',primary_key='id', time_table = 'any',
              schema_dest=schema_dest, data_source_dest=data_source_dest)

# Modify default dimensions (create dimension only for factors)
create_schema(engine='R',table='big_portfolio',primary_key='id', time_table = 'any',
              schema_dest=schema_dest, dimension="case when type='factor' then 1 else 0 end")

# Modify default dimensions (create dimension only for specified columns)
create_schema(engine='R',table='big_portfolio',primary_key='id', time_table = 'any',
              schema_dest=schema_dest,
              dimension="case when name in('product','region') then 1 else 0 end")

# Modify aggregators - enable all aggregators for numeric columns, keep default for others
create_schema(engine='R',table='big_portfolio',primary_key='id', time_table = 'any',
              schema_dest=schema_dest,
              aggregator="case when type='numeric' then '111111' else aggregator end")

# Create schema with calculated members
create_schema(engine='R',table='big_portfolio',primary_key='id',time_table = 'any_name',
             schema_dest=schema_dest, data_source_dest=data_source_dest,
             calculated_member=list(
                c('Not repaid pct',
                  '[Measures].[Current balance-Sum]/[Measures].[Original balance-Sum]',NA),
                c('Repaid pct',
                  '[Measures].[Current balance-Sum]/[Measures].[Original balance-Sum]','##.00%')
            ))

## Not run: 
# Complex example for PostgreSQL. tmp_time has to exist
create_schema(engine='PostgreSQL',table='big_portfolio',primary_key='id',time_table = 'tmp_time',
             schema_dest=schema_dest, data_source_dest=data_source_dest,
             con=c('usr','pwd','db','host','port'),
             calculated_member=list(
                c('Not repaid pct',
                  '[Measures].[Current balance-Sum]/[Measures].[Original balance-Sum]',NA),
                c('Repaid pct',
                  '[Measures].[Current balance-Sum]/[Measures].[Original balance-Sum]','##.00%')
            ))

## End(Not run)

tomasgreif/mondrianr documentation built on May 31, 2019, 5:15 p.m.