HierarchyCompute: Hierarchical Computations

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/HierarchyCompute.R

Description

This function computes aggregates by crossing several hierarchical specifications and factorial variables.

Usage

1
2
3
4
5
6
7
8
9
HierarchyCompute(data, hierarchies, valueVar, colVar = NULL,
  rowSelect = NULL, colSelect = NULL, select = NULL,
  inputInOutput = FALSE, output = "data.frame", autoLevel = TRUE,
  unionComplement = FALSE, constantsInOutput = NULL,
  hierarchyVarNames = c(mapsFrom = "mapsFrom", mapsTo = "mapsTo", sign =
  "sign", level = "level"), selectionByMultiplicationLimit = 10^7,
  colNotInDataWarning = TRUE, useMatrixToDataFrame = TRUE,
  handleDuplicated = "sum", asInput = FALSE, verbose = FALSE,
  reOrder = FALSE, reduceData = TRUE, makeRownames = NULL)

Arguments

data

The input data frame

hierarchies

A named (names in data) list with hierarchies. Variables can also be coded by "rowFactor" and "colFactor".

valueVar

Name of the variable(s) to be aggregated.

colVar

When non-NULL, the function HierarchyCompute2 is called. See its documentation for more information.

rowSelect

Data frame specifying variable combinations for output. The colFactor variable is not included. In addition rowSelect=="removeEmpty" removes combinations corresponding to empty rows (only zeros) of dataDummyHierarchy.

colSelect

Vector specifying categories of the colFactor variable for output.

select

Data frame specifying variable combinations for output. The colFactor variable is included.

inputInOutput

Logical vector (possibly recycled) for each element of hierarchies. TRUE means that codes from input are included in output. Values corresponding to "rowFactor" and "colFactor" are ignored.

output

One of "data.frame" (default), "dummyHierarchies", "outputMatrix", "dataDummyHierarchy", "valueMatrix", "fromCrossCode", "toCrossCode", "crossCode" (as toCrossCode), "outputMatrixWithCrossCode", "matrixComponents", "dataDummyHierarchyWithCodeFrame", "dataDummyHierarchyQuick". The latter two do not require valueVar (reduceData set to FALSE).

autoLevel

Logical vector (possibly recycled) for each element of hierarchies. When TRUE, level is computed by automatic method as in HierarchyFix. Values corresponding to "rowFactor" and "colFactor" are ignored.

unionComplement

Logical vector (possibly recycled) for each element of hierarchies. When TRUE, sign means union and complement instead of addition or subtraction as in DummyHierarchy. Values corresponding to "rowFactor" and "colFactor" are ignored.

constantsInOutput

A single row data frame to be combine by the other output.

hierarchyVarNames

Variable names in the hierarchy tables as in HierarchyFix.

selectionByMultiplicationLimit

With non-NULL rowSelect and when the number of elements in dataDummyHierarchy exceeds this limit, the computation is performed by a slower but more memory efficient algorithm.

colNotInDataWarning

When TRUE, warning produced when elements of colSelect are not in data.

useMatrixToDataFrame

When TRUE (default) special functionality for saving time and memory is used.

handleDuplicated

Handling of duplicated code rows in data. One of: "sum" (default), "sumByAggregate", "sumWithWarning", "stop" (error), "single" or "singleWithWarning". With no colFactor sum and sumByAggregate/sumWithWarning are different (original values or aggregates in "valueMatrix"). When single, only one of the values is used (by matrix subsetting).

asInput

When TRUE (FALSE is default) output matrices match input data. Thus valueMatrix = Matrix(data[, valueVar],ncol=1). Only possible when no colFactor.

verbose

Whether to print information during calculations. FALSE is default.

reOrder

When TRUE (FALSE is default) output codes are ordered differently, more similar to a usual model matrix ordering.

reduceData

When TRUE (default) unnecessary (for the aggregated result) rows of valueMatrix are allowed to be removed.

makeRownames

When TRUE dataDummyHierarchy contains rownames. By default, this is decided based on the parameter output.

Details

A key element of this function is the matrix multiplication: outputMatrix = dataDummyHierarchy %*% valueMatrix. The matrix, valueMatrix is a re-organized version of the valueVar vector from input. In particular, if a variable is selected as colFactor, there is one column for each level of that variable. The matrix, dataDummyHierarchy is constructed by crossing dummy coding of hierarchies (DummyHierarchy) and factorial variables in a way that matches valueMatrix. The code combinations corresponding to rows and columns of dataDummyHierarchy can be obtained as toCrossCode and fromCrossCode. In the default data frame output, the outputMatrix is stacked to one column and combined with the code combinations of all variables.

Value

As specified by the parameter output

Author(s)

Øyvind Langsrud

See Also

Hierarchies2ModelMatrix, AutoHierarchies.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# Data and hierarchies used in the examples
x <- SSBtoolsData("sprt_emp")  # Employment in sport in thousand persons from Eurostat database
geoHier <- SSBtoolsData("sprt_emp_geoHier")
ageHier <- SSBtoolsData("sprt_emp_ageHier")

# Two hierarchies and year as rowFactor
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "rowFactor"), "ths_per")

# Same result with year as colFactor (but columns ordered differently)
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per")

# Internally the computations are different as seen when output='matrixComponents'
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "rowFactor"), "ths_per", 
                 output = "matrixComponents")
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per", 
                 output = "matrixComponents")


# Include input age groups by setting inputInOutput = TRUE for this variable
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per", 
                 inputInOutput = c(TRUE, FALSE))

# Only input age groups by switching to rowFactor
HierarchyCompute(x, list(age = "rowFactor", geo = geoHier, year = "colFactor"), "ths_per")

# Select some years (colFactor) including a year not in input data (zeros produced)
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per", 
                 colSelect = c("2014", "2016", "2018"))

# Select combinations of geo and age including a code not in data or hierarchy (zeros produced)
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per", 
                 rowSelect = data.frame(geo = "EU", age = c("Y0-100", "Y15-64", "Y15-29")))
                 
# Select combinations of geo, age and year 
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per", 
     select = data.frame(geo = c("EU", "Spain"), age = c("Y15-64", "Y15-29"), year = 2015))

# Extend the hierarchy table to illustrate the effect of unionComplement 
# Omit level since this is handled by autoLevel
geoHier2 <- rbind(data.frame(mapsFrom = c("EU", "Spain"), mapsTo = "EUandSpain", sign = 1), 
                  geoHier[, -4])

# Spain is counted twice
HierarchyCompute(x, list(age = ageHier, geo = geoHier2, year = "colFactor"), "ths_per")

# Can be seen in the dataDummyHierarchy matrix
HierarchyCompute(x, list(age = ageHier, geo = geoHier2, year = "colFactor"), "ths_per", 
                 output = "matrixComponents")

# With unionComplement=TRUE Spain is not counted twice
HierarchyCompute(x, list(age = ageHier, geo = geoHier2, year = "colFactor"), "ths_per", 
                 unionComplement = TRUE)

# With constantsInOutput
HierarchyCompute(x, list(age = ageHier, geo = geoHier, year = "colFactor"), "ths_per",
                 constantsInOutput = data.frame(c1 = "AB", c2 = "CD"))
                 
# More that one valueVar
x$y <- 10*x$ths_per
HierarchyCompute(x, list(age = ageHier, geo = geoHier), c("y", "ths_per"))

statisticsnorway/SSBtools documentation built on March 3, 2020, 1:34 a.m.