tdmReadAndSplit: Read and split the task data.

Description Usage Arguments Details Value Author(s) See Also

View source: R/tdmReadAndSplit.r


Read the task data using tdmReadDataset and split them into a test part and a training/validation-part and return a TDMdata object.


tdmReadAndSplit(opts, tdm, nExp = 0, dset = NULL)



a list from which we need here the elements

  • READ.INI: [T] =T: do read and split, =F: return NULL

  • READ.*: other settings for tdmReadDataset

  • filename: needed for tdmReadDataset

  • filetest: needed for tdmReadDataset

  • TST.testFrac: [0.1] set this fraction of the daa aside for testing

  • TST.COL: string with name for the partitioning column, if tdm$umode is not "SP_T". (If tdm$umode=="SP_T", then TST.COL="tdmSplit" is used.)


a list from which we need here the elements

  • mainFile: if not NULL, set working dir to dir(mainFile) before executing tdmReadDataset

  • umode: [ "RSUB" | "CV" | "TST" | "SP_T" ], how to divide in training/validation data for tuning and test data for the unbiased runs

  • SPLIT.SEED: if NULL, set random number generator (RNG) to tdmRandomSeed when constructing. dataObj. If not NULL, set RNG to SPLIT.SEED + nExp –> deterministic test set split

  • stratified: [NULL] string specifying the column with the response variable for classification. If not NULL, do the split by stratified sampling (at least one record of each class level found in dset[,tdm$stratified] shall appear in the train-vali-set). Recommended for classification


[0] experiment counter, used to select a reproducible different seed, if tdm$SPLIT.SEED!=NULL


[NULL] if non-NULL, reading of dset is skipped and the given data frame dset is used.


If dset is NULL, the files specified in opts are read into dset, see tdmReadDataset for details. Then, depending on the value of tdm$umode


dataObj, either NULL (if opts$READ.INI==FALSE) or an object of class TDMdata containing


a data frame with the complete data set


string, the name of the column in dset which has a 1 for records belonging to the test set and a 0 for train/vali records. If tdm$umode=="SP_T", then TST.COL="tdmSplit", else TST.COL=opts$TST.COL.


opts$filename, from where the data were read

Use the accessor functions dsetTrnVa.TDMdata and dsetTest.TDMdata to extract the train/vali and the test data, resp., from dataObj.

Known caller: tdmBigLoop


Wolfgang Konen (, THK

See Also

dsetTrnVa.TDMdata, dsetTest.TDMdata, tdmReadDataset, tdmBigLoop

TDMR documentation built on March 3, 2020, 1:06 a.m.