tdmReadAndSplit: Read and split the task data.

Description Usage Arguments Details Value Author(s) See Also

View source: R/tdmReadAndSplit.r

Description

Read the task data using tdmReadDataset and split them into a test part and a training/validation-part and return a TDMdata object.

Usage

1
tdmReadAndSplit(opts, tdm, nExp = 0, dset = NULL)

Arguments

opts

a list from which we need here the elements

  • READ.INI: [T] =T: do read and split, =F: return NULL

  • READ.*: other settings for tdmReadDataset

  • filename: needed for tdmReadDataset

  • filetest: needed for tdmReadDataset

  • TST.testFrac: [0.1] set this fraction of the daa aside for testing

  • TST.COL: string with name for the partitioning column, if tdm$umode is not "SP_T". (If tdm$umode=="SP_T", then TST.COL="tdmSplit" is used.)

tdm

a list from which we need here the elements

  • mainFile: if not NULL, set working dir to dir(mainFile) before executing tdmReadDataset

  • umode: [ "RSUB" | "CV" | "TST" | "SP_T" ], how to divide in training/validation data for tuning and test data for the unbiased runs

  • SPLIT.SEED: if NULL, set random number generator (RNG) to tdmRandomSeed when constructing. dataObj. If not NULL, set RNG to SPLIT.SEED + nExp –> deterministic test set split

  • stratified: [NULL] string specifying the column with the response variable for classification. If not NULL, do the split by stratified sampling (at least one record of each class level found in dset[,tdm$stratified] shall appear in the train-vali-set). Recommended for classification

nExp

[0] experiment counter, used to select a reproducible different seed, if tdm$SPLIT.SEED!=NULL

dset

[NULL] if non-NULL, reading of dset is skipped and the given data frame dset is used.

Details

If dset is NULL, the files specified in opts are read into dset, see tdmReadDataset for details. Then, depending on the value of tdm$umode

Value

dataObj, either NULL (if opts$READ.INI==FALSE) or an object of class TDMdata containing

dset

a data frame with the complete data set

TST.COL

string, the name of the column in dset which has a 1 for records belonging to the test set and a 0 for train/vali records. If tdm$umode=="SP_T", then TST.COL="tdmSplit", else TST.COL=opts$TST.COL.

filename

opts$filename, from where the data were read

Use the accessor functions dsetTrnVa.TDMdata and dsetTest.TDMdata to extract the train/vali and the test data, resp., from dataObj.

Known caller: tdmBigLoop

Author(s)

Wolfgang Konen (wolfgang.konen@th-koeln.de), THK

See Also

dsetTrnVa.TDMdata, dsetTest.TDMdata, tdmReadDataset, tdmBigLoop


TDMR documentation built on March 3, 2020, 1:06 a.m.