split_data: Split the data frame to create training and test data
In promor: Proteomics Data Analysis and Modeling Tools

split_data

R Documentation

Split the data frame to create training and test data

Description

This function can be used to create balanced splits of the protein intensity data in a model_df object to create training and test data

Usage

split_data(model_df, train_size = 0.8, seed = NULL)

Arguments

`model_df`	A `model_df` object from performing `pre_process`.
`train_size`	The size of the training data set as a proportion of the complete data set. Default is 0.8.
`seed`	Numerical. Random number seed. Default is `NULL`

Details

This function splits the model_df object in to training and test data sets using random sampling while preserving the original class distribution of the data. Make sure to fix the random number seed with seed for reproducibility

Value

A list of data frames.

Author(s)

Chathurani Ranathunge

Examples


## Create a model_df object
covid_model_df <- pre_process(covid_fit_df, covid_norm_df)

## Split the data frame into training and test data sets using default settings
covid_split_df1 <- split_data(covid_model_df, seed = 8314)

## Split the data frame into training and test data sets with 70% of the
## data in training and 30% in test data sets
covid_split_df2 <- split_data(covid_model_df, train_size = 0.7, seed = 8314)

## Access training data set
covid_split_df1$training

## Access test data set
covid_split_df1$test

promor documentation built on Nov. 12, 2025, 1:06 a.m.