consolidate: Consolidate datacube into a single dataset

View source: R/consolidate.R

consolidateR Documentation

Consolidate datacube into a single dataset

Description

This function consolidates a set of datasets in a 'many* package' datacube into a single dataset with some combination of the rows, columns, and observations of the datasets in the datacube. The function includes separate arguments for the rows and columns, as well as for how to resolve conflicts for observations across datasets. This provides users with considerable flexibility in how they combine data. For example, users may wish to stick to units that appear in every dataset but include variables coded in any dataset, or units that appear in any dataset but only those variables that appear in every dataset. Even then there may be conflicts, as the actual unit-variable observations may differ from dataset to dataset. We offer a number of resolve methods that enable users to choose how conflicts between observations are resolved.

Usage

consolidate(
  datacube,
  rows = "any",
  cols = "any",
  resolve = "coalesce",
  key = "manyID"
)

Arguments

datacube

A datacube from one of the many packages

rows

Which rows or units to retain. By default "any" (or all) units are retained, but another option is "every", which retains only those units that appear in all parent datasets.

cols

Which columns or variables to retain. By default "any" (or all) variables are retained, but another option is "every", which retains only those variables that appear in all parent datasets.

resolve

How should conflicts between observations be resolved? By default "coalesce", but other options include: "min", "max", "mean", "median", and "random". "coalesce" takes the first non-NA value. "max" takes the largest value. "min" takes the smallest value. "mean" takes the average value. "median" takes the median value. "random" takes a random value. For different variables to be resolved differently, you can specify the variables' names alongside how each is to be resolved in a list (e.g. resolve = c(var1 = "min", var2 = "max")). In this case, only the variables named will be resolved and returned.

key

An ID column to collapse by. By default "manyID". Users can also specify multiple key variables in a list. For multiple key variables, the key variables must be present in all the datasets in the datacube (e.g. key = c("key1", "key2")). For equivalent key columns with different names across datasets, matching is possible if keys are declared (e.g. key = c("key1" = "key2")). Missing observations in the key variable are removed.

Details

Text variables are dropped for more efficient consolidation.

Value

A single tibble/data frame.

Examples


consolidate(datacube = emperors, key = "ID")
consolidate(datacube = favour(emperors, "UNRV"), rows = "every",
cols = "every", resolve = "coalesce", key = "ID")
consolidate(datacube = emperors, rows = "any", cols = "every",
resolve = "min", key = "ID")
consolidate(datacube = emperors, rows = "every", cols = "any",
resolve = "max", key = "ID")
consolidate(datacube = emperors, rows = "every", cols = "every",
resolve = "median", key = "ID")
consolidate(datacube = emperors, rows = "every", cols = "every",
resolve = "mean", key = "ID")
consolidate(datacube = emperors, rows = "every", cols = "every",
resolve = "random", key = "ID")
consolidate(datacube = emperors, rows = "every", cols = "every",
resolve = c(Begin = "min", End = "max"), key = "ID")
consolidate(datacube = emperors, rows = "any", cols = "any",
resolve = c(Death = "max", Cause = "coalesce"),
key = c("ID", "Begin"))


manydata documentation built on May 29, 2024, 8:16 a.m.