cdataMake: Create master data table from a collection of datasets
In avucoh/DIVE: Data Integration and Visual Exploration

cdataMake

R Documentation

Create master data table from a collection of datasets

Description

Builds one large master dataset given the directory where a dataset collection lives

Usage

cdataMake(
  datadir = NULL,
  files = NULL,
  keyname = "ID",
  filterfun = NULL,
  namespacefun = defaultIndex
)

Arguments

`datadir`	Directory hosting the collection of datasets. If given, this will try to use all files. For only selected files, use 'files' parameter.
`files`	A vector of dataset file paths to read. This allows specifying a subset of files that are possibly spread throughout different directories. Must be given if 'datadir' is not given, and ignored if 'datadir' is given.
`keyname`	Tables are merged using this key column.
`filterfun`	Optional, a filter function that returns selected columns within a file to be included in the final master dataset, such as to include only numeric columns.
`namespacefun`	Optional, a function to make unique namespaces. If not given, defaults to namespacing using filenames. See details.

Details

This compiles the cdata data object from a collection of datasets. Each dataset is a uniquely named .csv|.tsv|.txt file within the specified directory. The files are read and merged together into one master data.table. Because column IDs must be unique in the table, namespaced IDs are created using the parent file name. A function can be passed into indexfun for some control of this namespace index approach. For instance, instead of using the full file name, one might need to map it to a shorter key, pre-existing uuid, or other external key (as long as unique IDs can still be ensured), e.g. a data feature "Var1" from file "PMID123456_Doe-2000.txt" is column named "Doe00_Var1" in the master data table.