create.data.split: Split a dataset into training and a test sets.
In SIAMCAT: Statistical Inference of Associations between Microbial Communities And host phenoTypes

Description Usage Arguments Details Value Examples

This function prepares the cross-validation by splitting the data into num.folds training and test folds for num.resample times.

1 2	create.data.split(siamcat, num.folds = 2, num.resample = 1, stratify = TRUE, inseparable = NULL, verbose = 1)

`siamcat`	object of class siamcat-class
`num.folds`	integer number of cross-validation folds (needs to be `>=2`), defaults to `2`
`num.resample`	integer, resampling rounds (values `<= 1` deactivate resampling), defaults to `1`
`stratify`	boolean, should the splits be stratified so that an equal proportion of classes are present in each fold?, defaults to `TRUE`
`inseparable`	string, name of metadata variable to be inseparable, defaults to `NULL`, see Details below
`verbose`	integer, control output: `0` for no output at all, `1` for only information about progress and success, `2` for normal level of information and `3` for full debug information, defaults to `1`

This function splits the labels within a siamcat-class object and prepares the internal cross-validation for the model training (see train.model).

The function saves the training and test instances for the different cross-validation folds within a list in the data_split-slot of the siamcat-class object, which is a list with four entries:

num.folds - the number of cross-validation folds
num.resample - the number of repetitions for the cross-validation
training.folds - a list containing the indices for the training instances
test.folds - a list containing the indices for the test instances

If provided, the data split will take into account a metadata variable for the data split (by providing the inseparable argument). For example, if the data contains several samples for the same individual, it would make sense to keep data from the same individual within the same fold. If inseparable is given, the stratify argument will be ignored.

object of class siamcat-class with the data_split-slot filled

data(siamcat_example)

# simple working example
siamcat_split <- create.data.split(siamcat_example,
    num.folds=10,
    num.resample=5,
    stratify=TRUE)

SIAMCAT documentation built on Nov. 8, 2020, 5:14 p.m.

SIAMCAT index

Package overview README.md Holdout Testing with SIAMCAT SIAMCAT input files formats SIAMCAT: Statistical Inference of Associations between Microbial

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

SIAMCAT
Statistical Inference of Associations between Microbial Communities And host phenoTypes

create.data.split: Split a dataset into training and a test sets.
In SIAMCAT: Statistical Inference of Associations between Microbial Communities And host phenoTypes

Description

Usage

Arguments

Details

Value

Examples

Related to create.data.split in SIAMCAT...

R Package Documentation

Browse R Packages

We want your feedback!

SIAMCAT Statistical Inference of Associations between Microbial Communities And host phenoTypes

create.data.split: Split a dataset into training and a test sets. In SIAMCAT: Statistical Inference of Associations between Microbial Communities And host phenoTypes

Description

Usage

Arguments

Details

Value

Examples

Related to create.data.split in SIAMCAT...

R Package Documentation

Browse R Packages

We want your feedback!

SIAMCAT
Statistical Inference of Associations between Microbial Communities And host phenoTypes

create.data.split: Split a dataset into training and a test sets.
In SIAMCAT: Statistical Inference of Associations between Microbial Communities And host phenoTypes