tof_split_data: Split high-dimensional cytometry data into a training and...
In keyes-timothy/tidytof: Analyze High-dimensional Cytometry Data Using Tidy Data Principles

tof_split_data

R Documentation

Split high-dimensional cytometry data into a training and test set

Description

Split high-dimensional cytometry data into a training and test set

Usage

tof_split_data(
  feature_tibble,
  split_method = c("k-fold", "bootstrap", "simple"),
  split_col,
  simple_prop = 3/4,
  num_cv_folds = 10,
  num_cv_repeats = 1L,
  num_bootstraps = 10,
  strata = NULL,
  ...
)

Arguments

`feature_tibble`	A tibble in which each row represents a sample- or patient- level observation, such as those produced by `tof_extract_features`.
`split_method`	Either a string or a logical vector specifying how to perform the split. If a string, valid options include k-fold cross validation ("k-fold"; the default), bootstrapping ("bootstrap"), or a single binary split ("simple"). If a logical vector, it should contain one entry for each row in 'feature_tibble' indicating if that row should be included in the training set (TRUE) or excluded for the validation/test set (FALSE). Ignored entirely if 'split_col' is specified.
`split_col`	The unquoted column name of the logical column in 'feature_tibble' indicating if each row should be included in the training set (TRUE) or excluded for the validation/test set (FALSE).
`simple_prop`	A numeric value between 0 and 1 indicating what proportion of the data should be used for training. Defaults to 3/4. Ignored if split_method is not "simple".
`num_cv_folds`	An integer indicating how many cross-validation folds should be used. Defaults to 10. Ignored if split_method is not "k-fold".
`num_cv_repeats`	An integer indicating how many independent cross-validation replicates should be used (i.e. how many num_cv_fold splits should be performed). Defaults to 1. Ignored if split_method is not "k-fold".
`num_bootstraps`	An integer indicating how many independent bootstrap replicates should be used. Defaults to 25. Ignored if split_method is not "bootstrap".
`strata`	An unquoted column name representing the column in `feature_tibble` that should be used to stratify the data splitting. Defaults to NULL (no stratification).
`...`	Optional additional arguments to pass to `vfold_cv` for k-fold cross validation, `bootstraps` for bootstrapping, or `initial_split` for simple splitting.

Value

If for k-fold cross validation and bootstrapping, an "rset" object; for simple splitting, an "rsplit" object. For details, see rsample.

Examples

feature_tibble <-
    dplyr::tibble(
        sample = as.character(1:100),
        cd45 = runif(n = 100),
        pstat5 = runif(n = 100),
        cd34 = runif(n = 100),
        outcome = (3 * cd45) + (4 * pstat5) + rnorm(100),
        class =
            as.factor(
                dplyr::if_else(outcome > median(outcome), "class1", "class2")
            ),
        multiclass =
            as.factor(
                c(rep("class1", 30), rep("class2", 30), rep("class3", 40))
            ),
        event = c(rep(0, times = 50), rep(1, times = 50)),
        time_to_event = rnorm(n = 100, mean = 10, sd = 2)
    )

# split the dataset into 10 CV folds
tof_split_data(
    feature_tibble = feature_tibble,
    split_method = "k-fold"
)

# split the dataset into 10 bootstrap resamplings
tof_split_data(
    feature_tibble = feature_tibble,
    split_method = "bootstrap"
)

# split the dataset into a single training/test set
# stratified by the "class" column
tof_split_data(
    feature_tibble = feature_tibble,
    split_method = "simple",
    strata = class
)

keyes-timothy/tidytof documentation built on Aug. 28, 2024, 8:37 a.m.

keyes-timothy/tidytof index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

keyes-timothy/tidytof
Analyze High-dimensional Cytometry Data Using Tidy Data Principles

tof_split_data: Split high-dimensional cytometry data into a training and...
In keyes-timothy/tidytof: Analyze High-dimensional Cytometry Data Using Tidy Data Principles

Split high-dimensional cytometry data into a training and test set

Description

Usage

Arguments

Value

See Also

Examples

Related to tof_split_data in keyes-timothy/tidytof...

R Package Documentation

Browse R Packages

We want your feedback!

keyes-timothy/tidytof Analyze High-dimensional Cytometry Data Using Tidy Data Principles

tof_split_data: Split high-dimensional cytometry data into a training and... In keyes-timothy/tidytof: Analyze High-dimensional Cytometry Data Using Tidy Data Principles

Split high-dimensional cytometry data into a training and test set

Description

Usage

Arguments

Value

See Also

Examples

Related to tof_split_data in keyes-timothy/tidytof...

R Package Documentation

Browse R Packages

We want your feedback!

keyes-timothy/tidytof
Analyze High-dimensional Cytometry Data Using Tidy Data Principles

tof_split_data: Split high-dimensional cytometry data into a training and...
In keyes-timothy/tidytof: Analyze High-dimensional Cytometry Data Using Tidy Data Principles