# HierarchyCompute: Hierarchical Computations In statisticsnorway/SSBtools: Statistics Norway's Miscellaneous Tools

## Description

This function computes aggregates by crossing several hierarchical specifications and factorial variables.

## Usage

 ```1 2 3 4 5 6 7 8 9``` ```HierarchyCompute(data, hierarchies, valueVar, colVar = NULL, rowSelect = NULL, colSelect = NULL, select = NULL, inputInOutput = FALSE, output = "data.frame", autoLevel = TRUE, unionComplement = FALSE, constantsInOutput = NULL, hierarchyVarNames = c(mapsFrom = "mapsFrom", mapsTo = "mapsTo", sign = "sign", level = "level"), selectionByMultiplicationLimit = 10^7, colNotInDataWarning = TRUE, useMatrixToDataFrame = TRUE, handleDuplicated = "sum", asInput = FALSE, verbose = FALSE, reOrder = FALSE, reduceData = TRUE, makeRownames = NULL) ```

## Arguments

 `data` The input data frame `hierarchies` A named (names in `data`) list with hierarchies. Variables can also be coded by `"rowFactor"` and `"colFactor"`. `valueVar` Name of the variable(s) to be aggregated. `colVar` When non-NULL, the function `HierarchyCompute2` is called. See its documentation for more information. `rowSelect` Data frame specifying variable combinations for output. The colFactor variable is not included. In addition `rowSelect=="removeEmpty"` removes combinations corresponding to empty rows (only zeros) of `dataDummyHierarchy`. `colSelect` Vector specifying categories of the colFactor variable for output. `select` Data frame specifying variable combinations for output. The colFactor variable is included. `inputInOutput` Logical vector (possibly recycled) for each element of hierarchies. TRUE means that codes from input are included in output. Values corresponding to `"rowFactor"` and `"colFactor"` are ignored. `output` One of "data.frame" (default), "dummyHierarchies", "outputMatrix", "dataDummyHierarchy", "valueMatrix", "fromCrossCode", "toCrossCode", "crossCode" (as toCrossCode), "outputMatrixWithCrossCode", "matrixComponents", "dataDummyHierarchyWithCodeFrame", "dataDummyHierarchyQuick". The latter two do not require `valueVar` (`reduceData` set to `FALSE`). `autoLevel` Logical vector (possibly recycled) for each element of hierarchies. When TRUE, level is computed by automatic method as in `HierarchyFix`. Values corresponding to `"rowFactor"` and `"colFactor"` are ignored. `unionComplement` Logical vector (possibly recycled) for each element of hierarchies. When TRUE, sign means union and complement instead of addition or subtraction as in `DummyHierarchy`. Values corresponding to `"rowFactor"` and `"colFactor"` are ignored. `constantsInOutput` A single row data frame to be combine by the other output. `hierarchyVarNames` Variable names in the hierarchy tables as in `HierarchyFix`. `selectionByMultiplicationLimit` With non-NULL `rowSelect` and when the number of elements in `dataDummyHierarchy` exceeds this limit, the computation is performed by a slower but more memory efficient algorithm. `colNotInDataWarning` When TRUE, warning produced when elements of `colSelect` are not in data. `useMatrixToDataFrame` When TRUE (default) special functionality for saving time and memory is used. `handleDuplicated` Handling of duplicated code rows in data. One of: "sum" (default), "sumByAggregate", "sumWithWarning", "stop" (error), "single" or "singleWithWarning". With no colFactor sum and sumByAggregate/sumWithWarning are different (original values or aggregates in "valueMatrix"). When single, only one of the values is used (by matrix subsetting). `asInput` When TRUE (FALSE is default) output matrices match input data. Thus `valueMatrix` `=` `Matrix(data[, valueVar],ncol=1)`. Only possible when no colFactor. `verbose` Whether to print information during calculations. FALSE is default. `reOrder` When TRUE (FALSE is default) output codes are ordered differently, more similar to a usual model matrix ordering. `reduceData` When TRUE (default) unnecessary (for the aggregated result) rows of `valueMatrix` are allowed to be removed. `makeRownames` When TRUE `dataDummyHierarchy` contains rownames. By default, this is decided based on the parameter `output`.

## Details

A key element of this function is the matrix multiplication: `outputMatrix` `=` `dataDummyHierarchy` `%*%` `valueMatrix`. The matrix, `valueMatrix` is a re-organized version of the valueVar vector from input. In particular, if a variable is selected as `colFactor`, there is one column for each level of that variable. The matrix, `dataDummyHierarchy` is constructed by crossing dummy coding of hierarchies (`DummyHierarchy`) and factorial variables in a way that matches `valueMatrix`. The code combinations corresponding to rows and columns of `dataDummyHierarchy` can be obtained as `toCrossCode` and `fromCrossCode`. In the default data frame output, the `outputMatrix` is stacked to one column and combined with the code combinations of all variables.

## Value

As specified by the parameter `output`

## Author(s)

Øyvind Langsrud

`Hierarchies2ModelMatrix`, `AutoHierarchies`.
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60``` ```# Data and hierarchies used in the examples x <- SSBtoolsData("sprt_emp") # Employment in sport in thousand persons from Eurostat database geoHier <- SSBtoolsData("sprt_emp_geoHier") ageHier <- SSBtoolsData("sprt_emp_ageHier") # Two hierarchies and year as rowFactor HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "rowFactor"), "ths_per") # Same result with year as colFactor (but columns ordered differently) HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per") # Internally the computations are different as seen when output='matrixComponents' HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "rowFactor"), "ths_per", output = "matrixComponents") HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per", output = "matrixComponents") # Include input age groups by setting inputInOutput = TRUE for this variable HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per", inputInOutput = c(TRUE, FALSE)) # Only input age groups by switching to rowFactor HierarchyCompute(x, list(age = "rowFactor", geo = geoHier, year = "colFactor"), "ths_per") # Select some years (colFactor) including a year not in input data (zeros produced) HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per", colSelect = c("2014", "2016", "2018")) # Select combinations of geo and age including a code not in data or hierarchy (zeros produced) HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per", rowSelect = data.frame(geo = "EU", age = c("Y0-100", "Y15-64", "Y15-29"))) # Select combinations of geo, age and year HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per", select = data.frame(geo = c("EU", "Spain"), age = c("Y15-64", "Y15-29"), year = 2015)) # Extend the hierarchy table to illustrate the effect of unionComplement # Omit level since this is handled by autoLevel geoHier2 <- rbind(data.frame(mapsFrom = c("EU", "Spain"), mapsTo = "EUandSpain", sign = 1), geoHier[, -4]) # Spain is counted twice HierarchyCompute(x, list(age = ageHier, geo = geoHier2, year = "colFactor"), "ths_per") # Can be seen in the dataDummyHierarchy matrix HierarchyCompute(x, list(age = ageHier, geo = geoHier2, year = "colFactor"), "ths_per", output = "matrixComponents") # With unionComplement=TRUE Spain is not counted twice HierarchyCompute(x, list(age = ageHier, geo = geoHier2, year = "colFactor"), "ths_per", unionComplement = TRUE) # With constantsInOutput HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per", constantsInOutput = data.frame(c1 = "AB", c2 = "CD")) # More that one valueVar x\$y <- 10*x\$ths_per HierarchyCompute(x, list(age = ageHier, geo = geoHier), c("y", "ths_per")) ```