Loads data from a Causata CSV file.

Description

Loads data exported from a Causata CSV file into a data frame. Metadata from Causata is used to set variable names and classes. The function arguments allow for selective filtering of rows and / or columns.

Usage

1
2
3
ReadCausataCsv(causataR, include=c(), exclude=c(), maxMb=1000, 
  colFilterFunc=NA, rowIndex=NA, nrows=NA, metadata=FALSE, 
  debug=FALSE, ...)

Arguments

causataR

An output list from the ReadCausataR function.

include

A list of variable names or patterns to match against the variables in the CSV data. Matches are kept. See 'Details' for more information.

exclude

A list of variable names or patterns to match against the variables in the CSV data. Matches are excluded. See 'Details' for more information.

maxMb

Specifies the maximum megabytes of data to load in one pass, which is computed before rows and columns are filtered out. This constraint is applied only if nrows is specified. See 'Details' for more information.

colFilterFunc

An optional function that is applied to each column of data. The function must take the independent variable as its first argument, and it must return a logical (TRUE/FALSE) value OR a list including an element named keep. If the value is TRUE then the variable is kept, if FALSE the variable is discarded.

rowIndex

An optional vector of logical values where TRUE indicates which rows should be kept.

nrows

The maximum number of rows to read from the csv file. This is applied before rows are filtered.

metadata

If FALSE then a data frame is returned. If TRUE then a list of outputs is returned.

debug

If TRUE the column filter is applied with a for loop instead of doMC, which is easier to debug.

...

Extra arguments are sent to the colFilterFunc.

Details

CSV data from Causata is read into a data frame. The arguments allow filtering by column names, row index, or filtering by column calculations when a function is provided.

The include and exclude arguments are used to select which columns to load from the csv file. If these arguments are left at their default values then all columns are loaded. If include and exclude are set then exclude is applied first, followed by include.

The maxMb parameter can be used to load and filter data in several passes, which would reduce the total memory required if row / column filters are specified in colFilterFunc or rowIndex. If the estimated required memory exceeds maxMb, then the load will be broken into multiple passes, each no larger than maxMb. The default estimate is 12 bytes per cell of a data frame, so when MaxMb=1000 (about a gigabyte) that corresponds to a data frame with 100k rows and 833 columns.

Value

A data frame of CSV data, or a list containing the data frame and metadata as follows:

df

A data frame of CSV data.

metadata

A list of outputs returned from the colFilterFunc.

Author(s)

Justin Hemann <support@causata.com>

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.