train_test_filesystem: Organise files into a train-test filesystem

View source: R/data_converters.R

train_test_filesystemR Documentation

Organise files into a train-test filesystem

Description

Organise files into a train-test filesystem

Usage

train_test_filesystem(
  path_to_files,
  file_ext,
  split = 0.8,
  train_folder = "train",
  test_folder = "test",
  shuffle = TRUE,
  overwrite = FALSE
)

Arguments

path_to_files

directory containing files

file_ext

file extension to filter

split

training data split

train_folder

name of training folder (subdirectory), will be created if does not exist

test_folder

name of testing folder (subdirectory), will be created if does not exist

shuffle

randomise files when splitting (if FALSE, files will be sorted by filename prior to splitting)

overwrite

force overwrite of files that already exist

Value

named vector of train and test directories

Examples

set.seed(123)
# create 10 random DNA files
tmp_dir <- tempdir()
# remove any existing .fna files
file.remove(
  list.files(tmp_dir, pattern = "*.fna", full.names = TRUE)
)

for (i in 1:10) {
 writeLines(paste0(">", i, "\n", paste0(sample(c("A", "T", "C", "G"),
 100, replace = TRUE), collapse = "")), file.path(tmp_dir, paste0(i, ".fna")))
}

# split files into train and test directories
paths <- train_test_filesystem(tmp_dir,
                               file_ext = "fna",
                               split = 0.8,
                               shuffle = TRUE,
                               overwrite = TRUE)

list.files(paths[["train"]])
list.files(paths[["test"]])

MIC documentation built on April 12, 2025, 2:26 a.m.