starschemar: Obtaining Star Schemas from Flat Tables

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(starschemar)

Introduction

The multidimensional data model was defined in the 1990s with the aim of supporting data analysis. Data in multidimensional systems is obtained from operational systems and is transformed to adapt it to the new structure.

Transformations can be carried out using professional ETL (extract, transform and load) tools. Recently, tools aimed at end users have emerged, which are also aimed at performing transformation operations. All these tools are very useful to carry out the transformation process, they provide a development environment to define the transformation operations in a general way.

Frequently, the operations to be performed aim to transform a flat table (with data that comes from operational systems) into a star schema (which implements a multidimensional system). With the tools mentioned above, this transformation can be carried out, but it requires a lot of work.

This is the main objective of starschemar package: Define transformations that allow obtaining star schemas from flat tables easily. In addition, it includes basic data cleaning operations, incremental data refresh operations and query operations, adapted to this context.

The rest of this document is structured as follows: First, basic concepts of dimensional modelling and star schemas are summarized. The following is an illustrative example of how the package works. Then, the operations available in the package are briefly presented. Finally, the document ends with conclusions and bibliography.

Dimensional modelling and star schemas

The content of this section is a summary based mainly on @adamson2010star and @kimball2013data. It is intended to present the fundamental concepts of the area that are relevant to understand the functionality, naming and use of the starschemar package.

Dimensional modelling

Dimensional modelling aims to obtain simple data models. Simplicity is sought for two reasons: so that decision-makers can easily understand the data, and also so that they can be easily queried.

In dimensional modelling, the analysis of a business process is performed modelling how it is measured. The measures are called facts, and the descriptors of the context of the facts are dimensions. Facts are numerical data, and decision makers want to see them at various levels of detail, defined by dimensions.

Not all numerical data is a fact (some tools consider it that way). In dimensional modelling the designer has to differentiate between facts and dimensions. Some criteria are considered to distinguish between them, for example:

Sometimes there are no measures associated with the business process, it is simply recorded that the combination of dimensions has occurred. This situation is often called factless facts, @jensen2010multi prefer to call it measureless facts. In any case, including when no other measures are available, a measure can be considered that represents the number of times the combination of dimension values occurs.

Dimensions and dimension attributes

Attributes considered by the designer as dimensions can be grouped taking into account the natural affinities between them. In particular, they can be grouped as they describe the "who, what, where, when, how and why" associated with the modelled business process. Two attributes share a natural affinity when they are only related in one context. When their relationships are determined by transactions or activities, they can occur in multiple contexts, if this occurs, they must be located in different dimensions.

In this way, a dimension is made up of a set of naturally related dimension attributes that describe the context of facts. Dimensions are used for two purposes: fact selection and fact grouping with the desired level of detail.

Additionally, in the dimensions hierarchies with levels and descriptors can be defined. More details can be found at @jensen2010multi. These concepts are not used in the current version of the package.

Facts and measures

A fact has a granularity, which is determined by the attributes of the dimensions that are considered at each moment. Thus, a measure in a fact has two components, the numerical property of the fact and an formula, frequently the SUM aggregation function, that allows combining several values of this measure to obtain a new value of the same measure with a coarser granularity [@jensen2010multi].

According to their behaviour to obtain a coarser granularity, three types of measures are distinguished: additive, semi-additive and non-additive. For additive measures, SUM is always a valid formula that maintains the meaning of the measure when the granularity changes. For semi-additive measures, there is no point in using SUM when changing the level of detail in any of the dimensions because the meaning of the measure changes, this frequently occurs in dimensions that represents time and measures representing inventory level. For non-additive measures, values cannot be combined across any dimension using SUM because the result obtained has a different meaning from the original measure (generally occurs with ratios, percentages or unit amounts such as unit cost or unit price).

The most useful measures are additive. If we have non-additive measures, they can generally be redefined from other additive measures.

Star schemas

Dimensional models implemented in RDBMS (Relational Database Management Systems) using a table for each dimension are called star schemas because of their resemblance to a star-like structure: A fact table in the centre and dimension tables around it. Thus, dimension attributes are columns of the respective dimension tables, and measures are columns of the fact table.

Other possible implementations on RDBMS normalize dimensions and are known as snowflake schema. More details can be found at @jensen2010multi. This is not considered in this package.

Dimension tables

Dimension tables contain the context associated with business process measures. Although they can contain any type of data, numerical data is generally not used for dimension attributes because some query tools consider any numeric data as a measure.

Dimension attributes with NULL value are a source of problems when querying since DBMS and query tools sometimes handle them inconsistently, the result depends on the product. It is recommended to avoid the use of NULL and replace them with a descriptive text. In the case of dates, it is recommended to replace the NULL values with an arbitrary date in the very far future.

Surrogate keys

A dimension table contains dimension attributes and also a surrogate key column. This column is a unique identifier that has no intrinsic meaning: It is generally an integer and is the primary key for the dimension table. In @adamson2010star surrogate keys are easily identifiable by the suffix "_key" in the column name (and this criterion has also been applied in starschemar package).

Dimension tables also contain key columns that uniquely identify associated entities in an operational system. The separation of surrogate keys and natural keys allows the star schema to store changes in dimensions. Therefore, the use of surrogate keys in dimensions is a solution to the SCD (slowly changing dimensions) problem. This problem is not specifically addressed in this version of this package.

Special dimensions

In some cases, for the sake of simplicity, it is helpful to create a table that contains dimension attributes that have no natural affinities to each other, generally these are low-cardinality flags and indicators. The result is what is known as a junk dimension. They do not require any special support, only the designer's will to define them.

Sometimes some dimension attributes are left in the fact table, usually transaction identifiers. It is considered as the primary key of a dimension that does not have an associated table, for this reason it is known as a degenerate dimension. Degenerate dimensions are not allowed in this package.

A single dimension can be referenced multiple times in a fact table, with each reference linked to a different logical role for each dimension. These separate dimension views, with unique attribute column names, are called role dimensions and the common dimension is called a role-playing dimension.

Associated with multiple star schemas we have the conformed dimensions that are presented in section [Conformed dimensions].

Fact table

At the centre of a star schema is the fact table. In addition to containing measures, the fact table includes foreign keys that refer to each of the surrogate keys in the dimension tables.

Primary key

A subset of foreign keys, along with possibly degenerate dimensions, is considered to form the primary key of the fact table.

In starschemar package, since degenerate dimensions are not allowed, the primary key is made up of a subset of foreign keys.

Grain

The subset of dimensions that forms the primary key defines the level of detail stored in the fact table, which is known as the fact table's grain. In the design process, it is very important for the designer to clearly define the grain of the fact table (it is usually defined by listing the dimensions whose surrogate keys form its primary key): it is a way to ensure that all the facts are stored at the same level of detail.

At the finest grain, a row in the fact table corresponds to the measures of an event and vice versa, it is not influenced by the possible reports that may be obtained. When two facts have different grains, they should be set on different fact tables.

Multiple fact tables

It is frequent the need to have several fact tables for various reasons:

In reality it is about different business processes, each one has to have its own fact table but they have dimensions in common. This is known as a fact constellation which corresponds to the Kimball enterprise data warehouse bus architecture.

Conformed dimensions

When star schemas share a set of common dimensions, these dimensions are called conformed dimensions.

There are several possibilities to have conformed dimensions, the most obvious form is that the dimension tables share structure and content, that is, they are identical dimensions. This is the one considered in this version of the starschemar package.

Additional operations

Cleaning and conforming data

When data is loaded into a star schema, errors or inconsistencies can be discovered in some of them. In some cases, it is best to make corrections at the source of the data, in operational systems. Sometimes this is not possible and there is no other option but to modify the data before loading it into the star schema or even when it is already loaded.

Inconsistencies are often found in dimensions, when dimensions are integrated into one, for example to generate a role-playing dimension or conformed dimensions. Support for modifying dimension data is provided in the package.

Dimension enrichment

The more attributes the dimensions have, the more possibilities they offer to filter or aggregate data. For this reason, it is convenient to enrich the dimensions with new attributes, whenever possible. New attributes are often defined based on existing ones. There must be a way to associate the new attributes with the dimensions.

The more attributes the dimensions have, the more possibilities they offer to filter or aggregate data. For this reason, it is convenient to enrich the dimensions with new attributes, whenever possible. New attributes are often defined based on existing ones. There must be a way to associate the new attributes with the dimensions.

Operations have been defined in the package to export dimension attributes so that their values are not repeated, and also to import them, once the new attributes have been added.

Incremental refresh

When a star schema is built, an initial load is performed with all available data from a moment in time onwards.

Operational systems continue to operate and produce data. If we want to incorporate these data into the star schema, we have two possibilities:

In order to carry out this second option, the CDC (change data capture) system allows to exclusively obtain the new data produced.

Sometimes it is also convenient to delete data that is considered to be no longer necessary, generally data from the more distant past.

In this package, it has been considered that we can obtain the new data, possibly mixed with updates to data already incorporated into the star schema, in order to carry out an incremental refresh of star schemas with them. Operations are also offered to select data and delete it if it is not considered necessary.

An illustrative example

Starting data sets

To illustrate how the package works we will use a small part of the Deaths in 122 U.S. cities - 1962-2016. 122 Cities Mortality Reporting System data set[^1]. Specifically, only the data for the first 11 weeks of 1962 are considered.

[^1]: I know this data set thanks to Alberto J. Duran, student of Multidimensional Systems during the 2019-2020 academic year, a subject I teach at the University of Granada (Spain), for a work he developed based on it, tutored by me.

pander::pandoc.table(head(mrs, 12), split.table = Inf)

In the table above, the first rows of the original data are shown. For each week and city, mortality figures by age group and cause, considered separately, are included (i.e., the combination of age group and cause is not included). In the cause, only a distinction is made between pneumonia or influenza and others.

It can be seen that there is only one measure, deaths, defined at two different granularities: week-city-cause (data in columns "Pneumonia and Influenza Deaths" and "All Deaths") and week-city-age bracket (data from the 5 columns on the right). This means that we will need two fact tables to include all the data in star schemas.

As the data is partially in the form of a pivot table, from these data two tables have been generated[^2], one for each granularity. Additionally, the following operations have been carried out:

[^2]: The transformation has been carried out with tidyverse and flattabler packages.

pander::pandoc.table(head(mrs_age[,-c(1:6)]), split.table = Inf)

In the table above, the first rows of the flat table that contains the data according to the age bracket are shown.

The following table shows the first 18 rows of the flat table containing data based on cause of death. The calculated column Other Deaths has been added. Here you can see some missing data (columns Year and WEEK), errors (value "Bridgepor" in column City) and how there are only data from the first 9 weeks.

pander::pandoc.table(head(mrs_cause[,-c(1:6)], 18), split.table = Inf)

To have more layout possibilities to display, new columns with dates have been generated. Next, in both tables, the new columns related to the date that have been added can be seen.

pander::pandoc.table(head(mrs_age), split.table = Inf)
pander::pandoc.table(head(mrs_cause), split.table = Inf)

These are the flat tables that will be considered as a starting point to obtain star schemas from them in this example. They are available in the package: mrs_age and mrs_cause respectively.

Dimensional modelling

For each flat table, the goal is to define the attributes that correspond to facts and those that are dimensions. For facts, measures and their aggregation functions have to be defined. For dimensions, attributes with natural affinity must be grouped. Each attribute can only appear once in the definition.

Dimensional modelling data according to age range

To avoid having to write the name of the attributes of the table, with the following function we can have them in the form of a string. Thus, we can copy and paste each name as needed.

dput(colnames(mrs_age))

The definition of the dimensional model for the data considered is shown below.

library(tidyr)
library(starschemar)

dm_mrs_age <- dimensional_model() %>%
  define_fact(
    name = "mrs_age",
    measures = c(
      "Deaths"
    ),
    agg_functions = c(
      "SUM"
    ),
    nrow_agg = "nrow_agg"
  ) %>%
  define_dimension(
    name = "when",
    attributes = c(
      "Week Ending Date",
      "WEEK",
      "Year"
    )
  ) %>%
  define_dimension(
    name = "when_available",
    attributes = c(
      "Data Availability Date",
      "Data Availability Week",
      "Data Availability Year"
    )
  ) %>%
  define_dimension(
    name = "where",
    attributes = c(
      "REGION",
      "State",
      "City"
    )
  ) %>%
  define_dimension(
    name = "who",
    attributes = c(
      "Age Range"
    )
  )

In this case, all the elements have been explicitly defined, including aggregation functions and the name of an additional measure representing the number of rows aggregated, which is always included. Only data from two of the three possible time-related dimensions have been considered.

Dimensional modelling data according to cause

In the case of data according to cause of death, the definition of the model is shown below.

dm_mrs_cause <- dimensional_model() %>%
  define_fact(
    name = "mrs_cause",
    measures = c(
      "Pneumonia and Influenza Deaths",
      "Other Deaths"
    ),
  ) %>%
  define_dimension(
    name = "when",
    attributes = c(
      "Week Ending Date",
      "WEEK",
      "Year"
    )
  ) %>%
  define_dimension(
    name = "when_received",
    attributes = c(
      "Reception Date",
      "Reception Week",
      "Reception Year"
    )
  ) %>%
  define_dimension(
    name = "when_available",
    attributes = c(
      "Data Availability Date",
      "Data Availability Week",
      "Data Availability Year"
    )
  ) %>%
  define_dimension(
    name = "where",
    attributes = c(
      "REGION",
      "State",
      "City"
    )
  )

If no aggregation function is indicated, by default, SUM is considered. Although not explicitly stated, it also includes by default the measure relative to the number of rows aggregated. In this case, the three dimensions related to the date have been defined.

Star schema definition and transformation: Constellations

To define a star schema, we need a flat table and a dimensional model defined from it. Once defined, we can apply format modification operations to it.

Star schema definition for data according to age range

The basic definition operation of a star schema is shown below[^3].

[^3]: SaveRDS and readRDS functions can be used to save and retrieve star schemas or any other defined data structure.

st_mrs_age <- star_schema(mrs_age, dm_mrs_age)

The first rows of the obtained dimension and fact tables are shown below.

pander::pandoc.table(head(st_mrs_age$dimension$when), split.table = Inf)
pander::pandoc.table(head(st_mrs_age$dimension$when_available), split.table = Inf)
pander::pandoc.table(head(st_mrs_age$dimension$where), split.table = Inf)
pander::pandoc.table(head(st_mrs_age$dimension$who), split.table = Inf)
pander::pandoc.table(head(st_mrs_age$fact$mrs_age), split.table = Inf)

The data from the original flat table has been structured in the form of dimension tables and fact tables. Data in the columns of the original table included in the dimensions is not repeated. A surrogate key has been added to each of the dimension tables that are foreign keys in the fact table.

Next, we will apply format modification operations to the original structure obtained.

st_mrs_age <- st_mrs_age %>%
  role_playing_dimension(
    dim_names = c("when", "when_available"),
    name = "When Common",
    attributes = c("date", "week", "year")
  ) %>%
  snake_case() %>%
  character_dimensions(NA_replacement_value = "Unknown",
                       length_integers = list(week = 2))

First, a role playing dimension has been defined based on the dimensions related to dates. Then, to work with databases, the names have been adapted to the snake case criterion. Finally, the data type of the attributes of the dimensions has been transformed so that all columns except the date columns are of the character data type, in the case of numerical data, it is allowed to indicate the length of the field to fill with leading zeros, and undefined values have been replaced by the indicated value.

The first rows of the new dimension and fact tables are shown below.

pander::pandoc.table(head(st_mrs_age$dimension$when), split.table = Inf)
pander::pandoc.table(head(st_mrs_age$dimension$when_available), split.table = Inf)
pander::pandoc.table(head(st_mrs_age$dimension$where), split.table = Inf)
pander::pandoc.table(head(st_mrs_age$dimension$who), split.table = Inf)
pander::pandoc.table(head(st_mrs_age$dimension$when_common), split.table = Inf)
pander::pandoc.table(head(st_mrs_age$fact$mrs_age), split.table = Inf)

In the result, it can be seen that the dimensions related to date are now role dimensions and do not have their own data, a role playing dimension has been generated with the integrated data. The fact table continues to refer to role dimensions, the value of foreign keys has been adapted to the possible new values of the surrogate keys. Additionally it can be seen that the week field now has length 2 and has 0 on the left (this is useful to sort numbers in text format).

Star schema definition for data according to cause

We are going to define the star schema and apply similar transformations to the other flat table. The transformations can be applied in any order.

st_mrs_cause <- star_schema(mrs_cause, dm_mrs_cause) %>%
  snake_case() %>%
  character_dimensions(
    NA_replacement_value = "Unknown",
    length_integers = list(
      week = 2,
      data_availability_week = 2,
      reception_week = 2
    )
  ) %>%
  role_playing_dimension(
    dim_names = c("when", "when_received", "when_available"),
    name = "when_common",
    attributes = c("date", "week", "year")
  )

In this case, since the role playing dimension definition is the last transformation defined, the format of the week column had to be defined in the three date dimensions to obtain an equivalent result. The result obtained is shown below.

pander::pandoc.table(head(st_mrs_cause$dimension$when), split.table = Inf)
pander::pandoc.table(head(st_mrs_cause$dimension$when_received), split.table = Inf)
pander::pandoc.table(head(st_mrs_cause$dimension$when_available), split.table = Inf)
pander::pandoc.table(head(st_mrs_cause$dimension$where), split.table = Inf)
pander::pandoc.table(head(st_mrs_cause$dimension$when_common), split.table = Inf)
pander::pandoc.table(head(st_mrs_cause$fact$mrs_cause), split.table = Inf)

In this case we have three role dimensions defined on a role playing dimension.

Star schema transformation, cleaning and conforming data

Star schemas are defined from flat table fields. In some cases it may be interesting to rename elements of the schema, especially attributes of dimensions and measures. On the other hand, dimensions can be enriched by adding additional attributes, generally derived from the rest of the attributes. These are the transformations considered in this section.

We can perform data cleaning and conforming operations on star schema dimensions. Updates defined in a star schema can be applied on another with common dimensions.

Star schema rename

If necessary, the elements of a star schema can be renamed. These functions are especially useful for renaming attributes of dimensions and measures of facts because the definition operations only allowed us to select columns of a flat table, not modifying their names.

We are going to modify the attribute names of the when dimension for the two design cases. The values of the attribute called region actually correspond to the concept of division in the organization of the US territory, we are going to rename it this way for the two design cases.

Data according to age:

st_mrs_age <-
  st_mrs_age %>% rename_dimension_attributes(
    name = "when",
    attributes = c("week_ending_date", "week", "year"),
    new_names = c(
      "when_happened_date",
      "when_happened_week",
      "when_happened_year"
    )
  ) %>%
  rename_dimension_attributes(
    name = "where",
    attributes = c("region"),
    new_names = c("division")
  )

And data according to cause:

st_mrs_cause <-
  st_mrs_cause %>% rename_dimension_attributes(
    name = "when",
    attributes = c("week_ending_date", "week", "year"),
    new_names = c(
      "when_happened_date",
      "when_happened_week",
      "when_happened_year"
    )
  ) %>%
  rename_dimension_attributes(
    name = "where",
    attributes = c("region"),
    new_names = c("division")
  )

We are also going to modify the name of the measure for data according to age:

st_mrs_age <-
  st_mrs_age %>% rename_measures(measures = c("deaths"),
                                 new_names = c("n_deaths"))

Additionally, functions are available to modify the names of dimensions and facts. In this case it is not necessary to modify them because they do not depend on the names of the starting base data and we have defined them as we wanted.

Definition of updates

Using the following code, first of all, we get the names of the dimensions of a star schema, then, we get them by their name (we can see the rows using the utils::View function). If there are several role dimensions, it is enough to consult one of them, updates defined on it will be propagated to the rest.

dim_names <- st_mrs_age %>%
    get_dimension_names()

where <- st_mrs_age %>%
  get_dimension("where")

# View(where)
# where[where$where_key %in% c(1, 2, 62), ]

when <- st_mrs_age %>%
  get_dimension("when")

# View(when)
# when[when$when_key %in% c(36, 37, 73), ]

who <- st_mrs_age %>%
  get_dimension("who")

# View(who)

Reviewing the dimensions we can detect the need for updates. Errors are also sometimes detected by comparing the values with equivalent values obtained from other data sources.

Updates in the where dimension

In the where dimension we find the wrong value of "Bridgepor", generated by ourselves, the correct value is "Bridgeport", also included in the dimension. Additionally, trying to find additional data associated with cities, we found another error for the city of "Wilimington", whose correct name is "Wilmington". Below are the mentioned instances.

pander::pandoc.table(where[where$where_key %in% c(1, 2, 62), ], split.table = Inf)

Updates can be defined in several ways. One of them is referencing the surrogate key. Although in some of the functions the dimension surrogate key is referred to, updates only consider the values of the rest of the columns of the corresponding dimension. In this way, it is achieved that updates can be applied to equivalent dimensions of other star schemas, where the values of the surrogate key do not necessarily coincide with those of the dimension where updates were originally defined.

Next, we define the update that indicates that the values of record "1" ("Bridgepor") of the where dimension must match those of record "2" ("Bridgeport").

updates_st_mrs_age <- record_update_set() %>%
  match_records(dimension = where,
                old = 1,
                new = 2) 

For the case of the other city, we define an update that replaces the value in the indicated fields for the records that meet the defined condition, as shown below.

updates_st_mrs_age <- updates_st_mrs_age %>%
  update_selection_general(
    dimension = where,
    columns_old = c("state", "city"),
    old_values = c("DE", "Wilimington"),
    columns_new = c("city"),
    new_values = c("Wilmington")
  ) 
Updates in the when dimension

Below is a selection of rows involved in update operations that are referred by their surrogate key in the when dimension.

pander::pandoc.table(when[when$when_key %in% c(36, 37, 73), ], split.table = Inf)

The corresponding updates are shown below.

updates_st_mrs_age <- updates_st_mrs_age %>%
  match_records(dimension = when,
                old = 37,
                new = 36) %>%
  update_record(
    dimension = when,
    old = 73,
    values = c("1962-02-17", "07", "1962")
  )

The values of record "37" must match those of record "36" (both registers will be unified). The values in register "73" will be replaced by those provided.

Updates in the who dimension

In the case of the who dimension, we want to include a code in each value of age_range attribute so that they are sorted increasingly based on their value.

updates_st_mrs_age <- updates_st_mrs_age %>%
  update_selection(
    dimension = who,
    columns = c("age_range"),
    old_values = c("<1 year"),
    new_values = c("1: <1 year")
  ) %>%
  update_selection(
    dimension = who,
    columns = c("age_range"),
    old_values = c("1-24 years"),
    new_values = c("2: 1-24 years")
  ) %>%
  update_selection(
    dimension = who,
    columns = c("age_range"),
    old_values = c("25-44 years"),
    new_values = c("3: 25-44 years")
  ) %>%
  update_selection(
    dimension = who,
    columns = c("age_range"),
    old_values = c("45-64 years"),
    new_values = c("4: 45-64 years")
  ) %>%
  update_selection(
    dimension = who,
    columns = c("age_range"),
    old_values = c("65+ years"),
    new_values = c("5: 65+ years")
  )

In each case, we substitute the old value for the new one in the indicated attribute.

Updates application

Once updates are defined, they can be applied on the star schema from which they have been defined, as shown below.

st_mrs_age <- st_mrs_age %>%
  modify_dimension_records(updates_st_mrs_age)

The result obtained for the first star schema is shown below.

pander::pandoc.table(head(st_mrs_age$dimension$when), split.table = Inf)
pander::pandoc.table(head(st_mrs_age$dimension$when_available), split.table = Inf)
pander::pandoc.table(head(st_mrs_age$dimension$where), split.table = Inf)
pander::pandoc.table(head(st_mrs_age$dimension$who), split.table = Inf)
pander::pandoc.table(head(st_mrs_age$dimension$when_common), split.table = Inf)
pander::pandoc.table(head(st_mrs_age$fact$mrs_age), split.table = Inf)

It can be seen that the row with the value "Bridgepor" in the city column has disappeared: it has been merged with the row with the correct value. This update has also been transmitted to the fact table. Although it is not seen in the tables, the same has happened with the dates that have been unified by the update.

The same updates can also be applied to other star schemas with dimensions in common with the original star schema, as shown below.

st_mrs_cause <- st_mrs_cause %>%
  modify_dimension_records(updates_st_mrs_age)

Updates defined on dimensions not included in the star schema are ignored. In this case those referring to the who dimension. Likewise, if there are no records that meet the conditions to get the old value, they are also ignored.

Dimension enrichment

From the original attributes included in the dimensions, we can obtain new attributes that facilitate queries or offer new query possibilities.

Enrich the who dimension

Suppose that we are interested in defining some broader age ranges than the existing ones. This operation can be done by enriching the corresponding dimension.

First, we export the attributes to consider in table form, in this case only the age range.

tb_who <-
  enrich_dimension_export(st_mrs_age,
                          name = "who",
                          attributes = c("age_range"))

Next, we can see the result of the export operation. It is a table with the selected attributes where duplicate values have been eliminated if there are any (in this case there are no repeated values).

pander::pandoc.table(tb_who, split.table = Inf)

In the table we add the columns that we want. In this case a new column to define the new broader age range.

v <-
  c("0-24 years", "0-24 years", "25+ years", "25+ years", "25+ years")
tb_who <-
  tibble::add_column(tb_who,
                     wide_age_range = v)

The new table can be seen below.

pander::pandoc.table(tb_who, split.table = Inf)

We enrich the dimension considering the new data in the table.

st_mrs_age <-
  st_mrs_age %>%
  enrich_dimension_import(name = "who", tb_who)

We can see the result below, where the dimension has the new defined attribute.

pander::pandoc.table(st_mrs_age$dimension$who, split.table = Inf)
Enrich the where dimension

For the where dimension we can proceed in the same way as we have done for the who dimension: Export the data, complete it manually and import it again, as shown below.

tb_where <-
  enrich_dimension_export(st_mrs_age,
                          name = "where",
                          attributes = c("division"))

The new table for division data can be seen below.

pander::pandoc.table(tb_where, split.table = Inf)

We look for the names of the divisions and add the data of the regions to which they belong.

tb_where <-
  tibble::add_column(
    tb_where,
    division_name = c(
      "New England",
      "Middle Atlantic",
      "East North Central",
      "West North Central",
      "South Atlantic",
      "East South Central",
      "West South Central",
      "Mountain",
      "Pacific"
    ),
    region = c('1',
               '1',
               '2',
               '2',
               '3',
               '3',
               '3',
               '4',
               '4'),
    region_name = c(
      "Northeast",
      "Northeast",
      "Midwest",
      "Midwest",
      "South",
      "South",
      "South",
      "West",
      "West"
    )
  )

st_mrs_age <-
  st_mrs_age %>%
  enrich_dimension_import(name = "where", tb_where)

st_mrs_cause <-
  st_mrs_cause %>%
  enrich_dimension_import(name = "where", tb_where)

To add the name of the states and the county to which each city belongs, we could proceed in the same way. However, it is easier if we try to locate this data and use it directly. These data are available in the ft_usa_states and ft_usa_city_county data sets, respectively.

However, if we operate in the same way, when importing the data an error occurs. The reason is that not all the data in the dimension matches the data in the imported table. We can determine the missing data using the following function.

tb_missing <-
  st_mrs_age %>%
  enrich_dimension_import_test(name = "where", ft_usa_states)

The result obtained is shown below.

pander::pandoc.table(tb_missing, split.table = Inf)

In all cases, the problem occurs for the value "Unknown" in the state attribute. We must add a row to the data before importing it.

tb_where_state <- ft_usa_states %>%
  tibble::add_row(state = "Unknown", state_name = "Unknown")

st_mrs_age <-
  st_mrs_age %>%
  enrich_dimension_import(name = "where", tb_where_state)

st_mrs_cause <-
  st_mrs_cause %>%
  enrich_dimension_import(name = "where", tb_where_state)

The same problem occurs and we apply the same solution to add the county data[^4].

[^4]: This is how I have located the problem with the value "Wilimington" in the city attribute, which we have already corrected, so it no longer appears.

tb_where_county <- ft_usa_city_county %>%
  tibble::add_row(city = "Unknown",
                  state = "Unknown",
                  county = "Unknown")

st_mrs_age <-
  st_mrs_age %>%
  enrich_dimension_import(name = "where", tb_where_county)

st_mrs_cause <-
  st_mrs_cause %>%
  enrich_dimension_import(name = "where", tb_where_county)

We can see the first rows of the final result below.

pander::pandoc.table(head(st_mrs_age$dimension$where, 10), split.table = Inf)

Constellation definition

A constellation is defined from a list of star schemas, as shown below.

ct_mrs <- constellation(list(st_mrs_age, st_mrs_cause), name = "mrs")

All dimensions of the same name in star schemas must be compatible in structure and type of columns, and are defined as conformed dimensions. The conformed dimensions share all the instances of the original dimensions, this implies possible modifications in the surrogate keys that are transmitted to foreign keys of the component fact tables.

The tables of the obtained conformed dimensions are shown below.

pander::pandoc.table(head(ct_mrs$dimension$when), split.table = Inf)
pander::pandoc.table(head(ct_mrs$dimension$when_available), split.table = Inf)
pander::pandoc.table(head(ct_mrs$dimension$where), split.table = Inf)

It can be seen that the conformed dimensions are considered regardless of the type of dimension in the star schema. Definition of conformed dimensions does not imply any change in the definition of dimensions in star schemas: Role and role playing dimensions remain the same, only their rows may have changed.

In this particular case there are no discrepancies in the values of the star schemas dimension instances, since the data source is the same for both.

If discrepancies are detected once the integration has been carried out, new modification operations can be defined on conformed dimensions (as it has been done for the dimensions of star schemas) that are applied at the constellation level and are automatically transmitted to the component star schemas (modify_conformed_dimension_records).

Incremental refresh

Once we have star schemas built with the data available at the moment, we may obtain additional data, with the same structure as the initial data but from a later time. Sometimes the new data also includes data from previous periods to operate on them.

Under these conditions, suppose we get the data sets mrs_age_w10, mrs_age_w11, mrs_cause_w10, and mrs_cause_w11, for weeks 10 and 11 (data in star schemas runs through week 9). In all cases, some data from previous periods are included.

To perform an incremental refresh of a star schema, we must have the new data in the same star schema format. Additionally, if we have done data cleaning, it is likely that we will have to correct part of the corrected errors again over the new data. For this reason, it is best to package all the transformations carried out on the original data in function form so that they can be easily applied to new data.

Refresh operations for data according to age range

Below you can see the function that groups the transformations defined for the data according to the age range.

mrs_age_definition <-
  function(ft,
           dm,
           updates,
           tb_who,
           tb_where,
           tb_where_state,
           tb_where_county) {
    star_schema(ft, dm) %>%
      role_playing_dimension(
        dim_names = c("when", "when_available"),
        name = "When Common",
        attributes = c("date", "week", "year")
      ) %>%
      snake_case() %>%
      character_dimensions(NA_replacement_value = "Unknown",
                           length_integers = list(week = 2)) %>%
      rename_dimension_attributes(
        name = "when",
        attributes = c("week_ending_date", "week", "year"),
        new_names = c(
          "when_happened_date",
          "when_happened_week",
          "when_happened_year"
        )
      ) %>%
      rename_dimension_attributes(
        name = "where",
        attributes = c("region"),
        new_names = c("division")
      ) %>%
      rename_measures(measures = c("deaths"),
                      new_names = c("n_deaths")) %>%
      modify_dimension_records(updates) %>%
      enrich_dimension_import(name = "who", tb_who) %>%
      enrich_dimension_import(name = "where", tb_where) %>%
      enrich_dimension_import(name = "where", tb_where_state) %>%
      enrich_dimension_import(name = "where", tb_where_county)
  }

We apply this function to new data sets, as shown below.

st_mrs_age_w10 <-
  mrs_age_definition(
    mrs_age_w10,
    dm_mrs_age,
    updates_st_mrs_age,
    tb_who,
    tb_where,
    tb_where_state,
    tb_where_county
  )

st_mrs_age_w11 <-
  mrs_age_definition(
    mrs_age_w11,
    dm_mrs_age,
    updates_st_mrs_age,
    tb_who,
    tb_where,
    tb_where_state,
    tb_where_county
  )

Errors may occur because the data is different from the original, especially in the dimension enrichment part, if data is missing. In this case, we must eliminate from the function the lines corresponding to enrichment and later try to enrich the dimension by checking the errors with enrich_dimension_import_test.

Once we have the data in the same format, we can apply the incremental refresh to the original star schema, as follows.

st_mrs_age <- st_mrs_age %>%
  incremental_refresh_star_schema(st_mrs_age_w10, existing = "replace") %>%
  incremental_refresh_star_schema(st_mrs_age_w11, existing = "replace")

In this case, it has been assumed that if data from previous periods appears among the new data, the new data has to replace the previous data (value "replace" in existing parameter).

If the star schema has been integrated into a constellation, the incremental refresh can be performed on it, as follows.

ct_mrs <- ct_mrs %>%
  incremental_refresh_constellation(st_mrs_age_w10, existing = "replace") %>%
  incremental_refresh_constellation(st_mrs_age_w11, existing = "replace")

In this case, the corresponding star schema, the conformed dimensions and all the star schemas that share them are updated.

Refresh operations for data according to cause

Similar to how it has been done for age data, it can be done for cause data, as shown below.

mrs_cause_definition <-
  function(ft,
           dm,
           updates,
           tb_where,
           tb_where_state,
           tb_where_county) {
    star_schema(ft, dm) %>%
      snake_case() %>%
      character_dimensions(
        NA_replacement_value = "Unknown",
        length_integers = list(
          week = 2,
          data_availability_week = 2,
          reception_week = 2
        )
      ) %>%
      role_playing_dimension(
        dim_names = c("when", "when_received", "when_available"),
        name = "when_common",
        attributes = c("date", "week", "year")
      ) %>%
      rename_dimension_attributes(
        name = "when",
        attributes = c("week_ending_date", "week", "year"),
        new_names = c(
          "when_happened_date",
          "when_happened_week",
          "when_happened_year"
        )
      ) %>%
      rename_dimension_attributes(
        name = "where",
        attributes = c("region"),
        new_names = c("division")
      ) %>%
      modify_dimension_records(updates) %>%
      enrich_dimension_import(name = "where", tb_where) %>%
      enrich_dimension_import(name = "where", tb_where_state) %>%
      enrich_dimension_import(name = "where", tb_where_county)
  }

st_mrs_cause_w10 <-
  mrs_cause_definition(
    mrs_cause_w10,
    dm_mrs_cause,
    updates_st_mrs_age,
    tb_where,
    tb_where_state,
    tb_where_county
  )

st_mrs_cause_w11 <-
  mrs_cause_definition(
    mrs_cause_w11,
    dm_mrs_cause,
    updates_st_mrs_age,
    tb_where,
    tb_where_state,
    tb_where_county
  )

st_mrs_cause <- st_mrs_cause %>%
  incremental_refresh_star_schema(st_mrs_cause_w10, existing = "group") %>%
  incremental_refresh_star_schema(st_mrs_cause_w11, existing = "group")

ct_mrs <- ct_mrs %>%
  incremental_refresh_constellation(st_mrs_cause_w10, existing = "group") %>%
  incremental_refresh_constellation(st_mrs_cause_w11, existing = "group")

In this case, the previously existing data is treated differently than it was in the previous case, now what is done is grouping it using the defined aggregation functions, assuming it is additional data that has not been entered before (value "group" in existing parameter).

Filter and purge operations

Sometimes the data refresh consists of eliminating data that is no longer necessary, generally because it corresponds to a period that has stopped being analysed but it can also be for other reasons. In general, they can be selected considering any combination of dimensions, not just the time dimension.

Suppose we want to delete the Boston data working at the constellation level. First, we select them considering the stars that make up the constellation.

st1 <- ct_mrs %>%
  get_star_schema("mrs_age") %>%
  filter_fact_rows(name = "where", city == "Boston")

st2 <- ct_mrs %>%
  get_star_schema("mrs_cause") %>%
  filter_fact_rows(name = "where", city == "Boston")

We can work both at the star or constellation level. In this example, we are going to do it at the constellation level (working on a temporary variable so as not to lose data), as we have been doing.

ct_tmp <- ct_mrs %>%
  incremental_refresh_constellation(st1, existing = "delete") %>%
  incremental_refresh_constellation(st2, existing = "delete")

These operations have only removed the fact data. The first records of the where dimension are shown below.

pander::pandoc.table(head(ct_tmp$dimension$where), split.table = Inf)

Once the fact data is removed, we can remove the data for the dimensions that are no longer needed using the following function.

ct_tmp <- ct_tmp %>%
  purge_dimensions_constellation()

The result in the where dimension is shown below.

pander::pandoc.table(head(ct_tmp$dimension$where), split.table = Inf)

The Boston data has disappeared and the surrogate keys have been reassigned (on the temporary variable).

Exporting results

Once we have made the necessary definitions and transformations, we can export the data to work in a database or with a query tool.

Instead of exporting data in the specific format of a particular tool, it is exported as tibble-based structures that can be easily handled.

Star schema

Various export possibilities are offered. Specifically, for a star schema one of them is to export the data as a flat table. The main difference from the initial data is that we have cleaned and conformed it. This operation is offered for completeness. To work only with flat tables, this package is not suitable.

To work with databases, it is useful to be able to export a star schema as a list of tibble with dimension and fact tables, as shown below.

tl <- st_mrs_age %>%
  star_schema_as_tibble_list()

Optionally, the export function allows the role playing dimensions to be included.

Constellation

To export constellation data, as well as a tibble list, the multistar format may be interesting, where you have a list of tibble for fact tables and another for dimension tables.

ms_mrs <- ct_mrs %>%
  constellation_as_multistar()

multistar

We can obtain a flat table, implemented using a tibble, from a multistar (which can be the result of a query). If it only has one fact table, it is not necessary to provide its name.

ft <- ms_mrs %>%
  multistar_as_flat_table(fact = "mrs_age")

The first rows of the flat table obtained as a result are shown below.

pander::pandoc.table(head(ft), split.table = Inf)

Query functions

The main motivation of the query functions of this package is to have the possibility to select subsets of data to be exported or presented through other packages.

We can define queries on data in multistar format. Data in this format can be obtained from a star or a constellation.

ms_mrs <- ct_mrs %>%
  constellation_as_multistar()

The query is created using dimensional_query and executed using run_query. We can refine it using select_dimension, select_fact and filter_dimension as much as we deem appropriate.

ms <- dimensional_query(ms_mrs) %>%
  select_dimension(name = "where",
                   attributes = c("city", "state")) %>%
  select_dimension(name = "when",
                   attributes = c("when_happened_year")) %>%
  select_fact(name = "mrs_age",
              measures = c("n_deaths")) %>%
  select_fact(
    name = "mrs_cause",
    measures = c("pneumonia_and_influenza_deaths", "other_deaths")
  ) %>%
  filter_dimension(name = "when", when_happened_week <= "03") %>%
  filter_dimension(name = "where", city == "Bridgeport") %>%
  run_query()

The result of a query is an multistar structure. In this way, we can define queries on the result of others. The result can also be transformed into a flat table.

ft <- ms %>%
  multistar_as_flat_table()

The content of the flat table is shown below.

pander::pandoc.table(head(ft), split.table = Inf)

In the result it can be seen that, as the query has been defined at the year and city level, for a single city, with data available for a single year, there is only one row. Columns nrow_agg and mrs_cause_nrow_agg show the number of original rows that make up the result row.

From the result in the form of a flat table, pivottabler package can be used to present it in the form of a pivot table.

Available transformation operations

Package starschemar offers operations to transform flat tables into star schemas and also to export or exploit them through queries.

  1. From a flat table, we define a dimensional model classifying its attributes as facts or dimensions (dimensional modelling).

  2. From a flat table and a dimensional model we obtain a star schema that we can transform; from various star schemas we can define a constellation (star schema and constellation definition).

  3. Dimensions contain rows without duplicates, we can apply operations to perform data cleaning and to conform them (cleaning and conforming data).

  4. When new data is obtained, it is necessary to refresh the existing data with them by means of incremental refresh operations (incremental refresh).

  5. The results obtained can be exported to be consulted with other tools (exporting results).

  6. Finally, basic queries can be performed from R (query functions), especially to select the data to export.

Dimensional modelling

Starting from a flat table, a dimensional model is defined specifying the attributes that make up each of the dimensions and the measurements in the facts. The result is a dimensional_model object. It is carried out through the following functions:

dm <- dimensional_model()
dm <- dimensional_model() %>%
  define_dimension(name = "When",
                   attributes = c("Week Ending Date",
                                  "WEEK",
                                  "Year"))
dm <- dimensional_model() %>%
  define_fact(
    name = "mrs_age",
    measures = c("Deaths"),
    agg_functions = c("SUM"),
    nrow_agg = "nrow_agg"
  )

dm <- dimensional_model() %>%
  define_fact(name = "Factless fact")

Star schema and constellation definition

A dimensional model is implemented using a star schema. We can have several related star schemas through common dimensions that together form a fact constellation.

Star schema definition

A star schema is defined from a flat table and a dimensional model definition. Once defined, a star schema can be transformed by defining role playing dimensions, changing the writing style of element names or the type of dimension attributes. These operations are carried out through the following functions:

st <- star_schema(mrs_age, dm_mrs_age)
st <- star_schema(mrs_age, dm_mrs_age) %>%
  role_playing_dimension(
    dim_names = c("when", "when_available"),
    name = "When Common",
    attributes = c("Date", "Week", "Year")
  )
st <- star_schema(mrs_age, dm_mrs_age) %>%
  snake_case()
st <- star_schema(mrs_age, dm_mrs_age) %>%
  character_dimensions()

Star schema rename

Once a star schema is defined, we can rename its elements. It is necessary to be able to rename attributes of dimensions and measures of facts because the definition operations only allowed us to select columns of a flat table. For completeness also dimensions and facts can be renamed. To carry out these operations, the following functions are available:

st <- st_mrs_age %>%
  rename_dimension(name = "when", new_name = "when_happened")
attribute_names <- 
  st_mrs_age %>% get_dimension_attribute_names("when")
st <-
  st_mrs_age %>% rename_dimension_attributes(
    name = "when",
    attributes = c("when_happened_week", "when_happened_year"),
    new_names = c("week", "year")
  )
st <- st_mrs_age %>% rename_fact("age") 
measure_names <- 
  st_mrs_age %>% get_measure_names()
st <-
  st_mrs_age %>% rename_measures(measures = c("n_deaths"),
                                 new_names = c("num_deaths"))

Constellation definition

Based on various star schemas, a constellation can be defined in which star schemas share common dimensions. Dimensions with the same name must be shared. It is defined by the following function:

ct <- constellation(list(st_mrs_age, st_mrs_cause), name = "mrs")

Cleaning and conforming data

Once star schemas and fact constellations are defined, data cleaning operations can be carried out on dimensions. There are three groups of functions:

  1. One to obtain components of star schemas and constellations.

  2. Another to define data cleaning operations over dimensions.

  3. One more to apply operations to star schemas or constellations.

Obtaining components

We can obtain dimensions from a star schema or conformed dimensions from a fact constellation. Available functions in both cases are similar.

Star schema

dn <- st_mrs_age %>%
  get_dimension_names()
where <- st_mrs_age %>%
  get_dimension("where")

Constellation

dn <- ct_mrs %>%
  get_conformed_dimension_names()
when <- ct_mrs %>%
  get_conformed_dimension("when")
stn <- ct_mrs %>%
  get_star_schema_names()
age <- ct_mrs %>%
  get_star_schema("mrs_age")

Definition of updates

Modifications are defined on dimension rows in various ways based exclusively on the values of the dimension fields. Although the surrogate key intervenes in the definition, the result, internally, does not depend on it so that it can be applied more generally in other star schemas.

updates <- record_update_set()
updates <- record_update_set() %>%
  match_records(dimension = where,
                old = 1,
                new = 2)
updates <- record_update_set() %>%
  update_record(
    dimension = who,
    old = 1,
    values = c("1: <1 year")
  )
updates <- record_update_set() %>%
  update_selection(
    dimension = where,
    columns = c("city"),
    old_values = c("Bridgepor"),
    new_values = c("Bridgeport")
  )
updates <- record_update_set() %>%
  update_selection_general(
    dimension = where,
    columns_old = c("state", "city"),
    old_values = c("CT", "Bridgepor"),
    columns_new = c("city"),
    new_values = c("Bridgeport")
  )

Updates application

Defined updates can be applied on a star schema or on the conformed dimension of a fact constellation.

Star schema

st <- st_mrs_age %>%
  modify_dimension_records(updates_st_mrs_age)

Constellation

ct <- ct_mrs %>%
  modify_conformed_dimension_records(updates_st_mrs_age)

Dimension enrichment

To enrich a dimension with new attributes related to others already included in it, first, we export the attributes on which the new ones depend, then we define the new attributes, and import the table with all the attributes to be added to the dimension.

tb <-
  enrich_dimension_export(st_mrs_age,
                          name = "when_common",
                          attributes = c("week", "year"))
tb <-
  enrich_dimension_export(st_mrs_age,
                          name = "when_common",
                          attributes = c("week", "year"))

# Add new columns with meaningful data (these are not), possibly exporting
# data to a file, populating it and importing it.
tb <- tibble::add_column(tb, x = "x", y = "y", z = "z")

st <- enrich_dimension_import(st_mrs_age, name = "when_common", tb)
tb <-
  enrich_dimension_export(st_mrs_age,
                          name = "when_common",
                          attributes = c("week", "year"))

# Add new columns with meaningful data (these are not), possibly exporting
# data to a file, populating it and importing it.
tb <- tibble::add_column(tb, x = "x", y = "y", z = "z")[-1, ]

tb2 <- enrich_dimension_import_test(st_mrs_age, name = "when_common", tb)

Incremental refresh

When new data is obtained, an incremental refresh of the data can be carried out, both of the dimensions and of the facts. Incremental refresh can be applied to both star schema and fact constellation, using the following functions.

Star schema

st <- st_mrs_age %>%
  incremental_refresh_star_schema(st_mrs_age_w10, existing = "replace")

Sometimes the data refresh consists of eliminating data that is no longer necessary, generally because it corresponds to a period that has stopped being analysed but it can also be for other reasons. This data can be selected using the following function:

st <- st_mrs_age %>%
  filter_fact_rows(name = "when", when_happened_week <= "03") %>%
  filter_fact_rows(name = "where", city == "Bridgeport")

st2 <- st_mrs_age %>%
  incremental_refresh_star_schema(st, existing = "delete")

Once the fact data is removed (using the other incremental refresh functions), we can remove the data for the dimensions that are no longer needed using the following function:

st3 <- st2 %>%
  purge_dimensions_star_schema()

Constellation

ct <- ct_mrs %>%
  incremental_refresh_constellation(st_mrs_age_w10, existing = "replace")
ct <- ct_mrs %>%
  purge_dimensions_constellation()

Exporting results

Once the data has been properly structured and transformed, it can be exported to be consulted with other tools or with R. Various export formats have been defined, both for star schemas and for constellations, using the following functions.

Star schema

ft <- st_mrs_age %>%
  star_schema_as_flat_table()
ms <- st_mrs_age %>%
  star_schema_as_multistar()
tl <- st_mrs_age %>%
  star_schema_as_tibble_list(include_role_playing = TRUE)

Constellation

ms <- ct_mrs %>%
  constellation_as_multistar()
tl <- ct_mrs %>%
  constellation_as_tibble_list(include_role_playing = TRUE)

multistar

ft <- ms_mrs %>%
  multistar_as_flat_table(fact = "mrs_age")

Query functions

There are many multidimensional query tools available. The exported data, once stored in files, can be used directly from them. You can also perform basic queries from R on data in the multistar format, mainly for selecting the data to export, using the following functions:

ms_mrs <- ct_mrs %>%
  constellation_as_multistar()

dq <- dimensional_query(ms_mrs)
dq <- dimensional_query(ms_mrs) %>%
  select_fact(
    name = "mrs_age",
    measures = c("n_deaths"),
    agg_functions = c("MAX")
  )

dq <- dimensional_query(ms_mrs) %>%
  select_fact(name = "mrs_age",
              measures = c("n_deaths"))

dq <- dimensional_query(ms_mrs) %>%
  select_fact(name = "mrs_age")
dq <- dimensional_query(ms_mrs) %>%
  select_dimension(name = "where",
                   attributes = c("city", "state")) %>%
  select_dimension(name = "when")
dq <- dimensional_query(ms_mrs) %>%
  filter_dimension(name = "when", when_happened_week <= "03") %>%
  filter_dimension(name = "where", city == "Boston")
ms <- dimensional_query(ms_mrs) %>%
  select_dimension(name = "where",
                   attributes = c("city", "state")) %>%
  select_dimension(name = "when",
                   attributes = c("when_happened_year")) %>%
  select_fact(
    name = "mrs_age",
    measures = c("n_deaths")
  ) %>%
  select_fact(
    name = "mrs_cause",
    measures = c("pneumonia_and_influenza_deaths", "other_deaths")
  ) %>%
  filter_dimension(name = "when", when_happened_week <= "03") %>%
  filter_dimension(name = "where", city == "Boston") %>%
  run_query()

Conclusions

starschemar package offers a set of operations that allow us to transform flat tables into star schemas. Star schemas support the definition of role playing and role dimensions. Additional transformation operations can be applied to each star schema to adapt the format of the data. From several star schemas you can define fact constellation with conformed dimensions.

Cleaning and conforming data operations can be defined on the star schemas and fact constellation. To update the data, incremental refresh operations are offered, also applicable on said structures. In addition, there are several possibilities to export the results obtained in the form of easily treatable tibble-based structures. Operations are also provided to query the multidimensional structure, mainly for selecting the data to export.

Operations have been designed to be intuitive and easy to use. The result greatly facilitates the data transformation process for the exposed situation.

References



Try the starschemar package in your browser

Any scripts or data that you put into this service are public.

starschemar documentation built on Jan. 13, 2021, 7:33 p.m.