create_schema: Create Mondrian schema
In tomasgreif/mondrianr: Create Mondrian schema in R

Description Usage Arguments Details Examples

This function returns XML definition of Mondrian schema. Optionally returns file with data source definition for Saiku.

create_schema(engine = NA, table = NA, primary_key = NA, con = NA,
  dimension = NA, aggregator = NA, schema_dest = NA,
  data_source_dest = NA, time_table = NA, calculated_member = NA,
  security_type = NA, security_roles = NA, debug = FALSE)

`engine`	Data engine. Valid options are: `R`, `PostgreSQL`. If `PostgreSQL` is used then `con` is required. Required.
`table`	Table for which schema should be created. In Mondrian terminology, this is fact table
`primary_key`	Primary key of `table`. Has to be unique. Required
`con`	Connection to PostgreSQL database. Character vector of exactly five (5) elements: `'user', 'password', 'database', 'host', 'port'`. Required for PostgreSQL.
`dimension`	Rule to modify inclusion/exclusion of columns in `table` as dimenions. Has to be defined as valid SQL `CASE` statement. Optional.
`aggregator`	Rule to modify inclusion/exclusion of aggregators for columns in `table`. Has to be defined as valid SQL `CASE` statement and always return string with exactly six (6) digits. If same aggregetors should be used for all variables then string with exactly 6 digits can be used instead of case statement. If digit is `1` than aggregator is enabled. If `0` then aggregator is disabled. Aggregators are defined in the following order: `average, count, distinct-count, maximum, minimum, sum`. Argument is evaluated after argument `dimension`. Optional
`schema_dest`	Path to file where Mondrian schema will be stored. Has to include file name.
`data_source_dest`	Path to file where data source definition for Saiku will be stored. Has to include file name. Optional.
`calculated_member`	List defining additional calculated members (measures calculated on results of other measures). Unlimited number of calculated members can be created. Every component in list has to have exactly three elements: Name of calculated member. This will be displayed as user-friendly name in Mondrian compatible analytical tools Formula. Valid MDX expression for calculated member. Format. Format to apply. If default should be used then use `NA`. Optional.
`time_table`	Name of table with time dimension. If PostgreSQL than this should be name of existing table with columns `time_date, year_number, quarter_number, month_number`. If R than any name can be used as time dimension is created dynamically. Optional. If not used than date columns will be treated as generic dimension.
`security_type`	Type of security for Saiku. Currently only one2one is supported. Using this makes sense only when `security_roles_xml` is used. Optional.
`security_roles`	Vector defining what roles will have access to cube. Meaningful only when `security_typ` is set to `one2one`. When only one role is given than this will be used to GRANT all rights to cube to this role. When two roles are given than the first will be used to GRANT all rights to cube and the second will be used to DENY access to everything. Optional.
`debug`	Print additional information useful for debugging.

Arguments dimension and aggregator are required to be valid SQL CASE statement. Case statement is similar to if statement available in many languages. Basic structure is: CASE WHEN <CONDITION> THEN <VALUE> [[WHEN <CONDITION> THEN <VALUE>] ELSE <VALUE>] END. Both dimension and aggregator arguments are used to modify content of data frame with default design (see get_default_design for details). This means that you can use columns name, class, type, dimension, aggregator in case statement. Examples for dimension:

For numeric data types do not create dimenions, otherwise do create dimensions: case when class='numeric' then 0 else 1 end
For columns a, b create dimenions and for other columns keep default value: case when name in('a','b') then 1 else dimension end

Examples for aggregator:

Only count aggregator for non-numeric columns and default for other columns: case when class <> 'numeric' then '010000' else aggregator end
No aggregator for date and text columns and count for all others: case when class in('text','numeric') then '000000' else '010000' end
No aggregator for dimensions, default for others: case when dimension=1 then '000000' else aggregator end
No aggregators at all (only count for PK will be displayed): 000000

# Get some valid destinations
schema_dest <- paste0(getwd(),'/','test.xml')
data_source_dest <- paste0(getwd(),'/','test')

# Simple schema, no time dimension, no data source definition
create_schema(engine='R',table='big_portfolio',primary_key='id', schema_dest=schema_dest)

# Do this again, but print additional debug messages.
create_schema(engine='R',table='big_portfolio',primary_key='id',
              schema_dest=schema_dest, debug=TRUE)

# Create schema with time dimension
create_schema(engine='R',table='big_portfolio',primary_key='id', time_table = 'any',
              schema_dest=schema_dest, data_source_dest=data_source_dest)

# Modify default dimensions (create dimension only for factors)
create_schema(engine='R',table='big_portfolio',primary_key='id', time_table = 'any',
              schema_dest=schema_dest, dimension="case when type='factor' then 1 else 0 end")

# Modify default dimensions (create dimension only for specified columns)
create_schema(engine='R',table='big_portfolio',primary_key='id', time_table = 'any',
              schema_dest=schema_dest,
              dimension="case when name in('product','region') then 1 else 0 end")

# Modify aggregators - enable all aggregators for numeric columns, keep default for others
create_schema(engine='R',table='big_portfolio',primary_key='id', time_table = 'any',
              schema_dest=schema_dest,
              aggregator="case when type='numeric' then '111111' else aggregator end")

# Create schema with calculated members
create_schema(engine='R',table='big_portfolio',primary_key='id',time_table = 'any_name',
             schema_dest=schema_dest, data_source_dest=data_source_dest,
             calculated_member=list(
                c('Not repaid pct',
                  '[Measures].[Current balance-Sum]/[Measures].[Original balance-Sum]',NA),
                c('Repaid pct',
                  '[Measures].[Current balance-Sum]/[Measures].[Original balance-Sum]','##.00%')
            ))

## Not run: 
# Complex example for PostgreSQL. tmp_time has to exist
create_schema(engine='PostgreSQL',table='big_portfolio',primary_key='id',time_table = 'tmp_time',
             schema_dest=schema_dest, data_source_dest=data_source_dest,
             con=c('usr','pwd','db','host','port'),
             calculated_member=list(
                c('Not repaid pct',
                  '[Measures].[Current balance-Sum]/[Measures].[Original balance-Sum]',NA),
                c('Repaid pct',
                  '[Measures].[Current balance-Sum]/[Measures].[Original balance-Sum]','##.00%')
            ))

## End(Not run)