cdataMake: Create master data table from a collection of datasets

View source: R/make.R

cdataMakeR Documentation

Create master data table from a collection of datasets

Description

Builds one large master dataset given the directory where a dataset collection lives

Usage

cdataMake(
  datadir = NULL,
  files = NULL,
  keyname = "ID",
  filterfun = NULL,
  namespacefun = defaultIndex
)

Arguments

datadir

Directory hosting the collection of datasets. If given, this will try to use all files. For only selected files, use 'files' parameter.

files

A vector of dataset file paths to read. This allows specifying a subset of files that are possibly spread throughout different directories. Must be given if 'datadir' is not given, and ignored if 'datadir' is given.

keyname

Tables are merged using this key column.

filterfun

Optional, a filter function that returns selected columns within a file to be included in the final master dataset, such as to include only numeric columns.

namespacefun

Optional, a function to make unique namespaces. If not given, defaults to namespacing using filenames. See details.

Details

This compiles the cdata data object from a collection of datasets. Each dataset is a uniquely named .csv|.tsv|.txt file within the specified directory. The files are read and merged together into one master data.table. Because column IDs must be unique in the table, namespaced IDs are created using the parent file name. A function can be passed into indexfun for some control of this namespace index approach. For instance, instead of using the full file name, one might need to map it to a shorter key, pre-existing uuid, or other external key (as long as unique IDs can still be ensured), e.g. a data feature "Var1" from file "PMID123456_Doe-2000.txt" is column named "Doe00_Var1" in the master data table.

Value

A "master" data.table


avucoh/DIVE documentation built on Aug. 29, 2023, 6:02 p.m.