ReadCausataCsv: Loads data from a Causata CSV file.
In Causata: Analysis utilities for binary classification and Causata users.

Description Usage Arguments Details Value Author(s)

Loads data exported from a Causata CSV file into a data frame. Metadata from Causata is used to set variable names and classes. The function arguments allow for selective filtering of rows and / or columns.

1
2
3

ReadCausataCsv(causataR, include=c(), exclude=c(), maxMb=1000, 
  colFilterFunc=NA, rowIndex=NA, nrows=NA, metadata=FALSE, 
  debug=FALSE, ...)

`causataR`	An output list from the `ReadCausataR` function.
`include`	A list of variable names or patterns to match against the variables in the CSV data. Matches are kept. See 'Details' for more information.
`exclude`	A list of variable names or patterns to match against the variables in the CSV data. Matches are excluded. See 'Details' for more information.
`maxMb`	Specifies the maximum megabytes of data to load in one pass, which is computed before rows and columns are filtered out. This constraint is applied only if `nrows` is specified. See 'Details' for more information.
`colFilterFunc`	An optional function that is applied to each column of data. The function must take the independent variable as its first argument, and it must return a logical (TRUE/FALSE) value OR a list including an element named `keep`. If the value is TRUE then the variable is kept, if FALSE the variable is discarded.
`rowIndex`	An optional vector of logical values where TRUE indicates which rows should be kept.
`nrows`	The maximum number of rows to read from the csv file. This is applied before rows are filtered.
`metadata`	If FALSE then a data frame is returned. If TRUE then a list of outputs is returned.
`debug`	If TRUE the column filter is applied with a for loop instead of `doMC`, which is easier to debug.
`...`	Extra arguments are sent to the `colFilterFunc`.

CSV data from Causata is read into a data frame. The arguments allow filtering by column names, row index, or filtering by column calculations when a function is provided.

The include and exclude arguments are used to select which columns to load from the csv file. If these arguments are left at their default values then all columns are loaded. If include and exclude are set then exclude is applied first, followed by include.

The maxMb parameter can be used to load and filter data in several passes, which would reduce the total memory required if row / column filters are specified in colFilterFunc or rowIndex. If the estimated required memory exceeds maxMb, then the load will be broken into multiple passes, each no larger than maxMb. The default estimate is 12 bytes per cell of a data frame, so when MaxMb=1000 (about a gigabyte) that corresponds to a data frame with 100k rows and 833 columns.

A data frame of CSV data, or a list containing the data frame and metadata as follows: