splitinput: Split input data into multiple files

View source: R/utils.R

splitinputR Documentation

Split input data into multiple files

Description

splitinput Splits input based on keepcol specified, yielding csv files each with at least the mininum number of rows that are written and saved separately (except for the last split file written, which may be smaller). Allows splitting input data while ensuring all records for each individual subject will stay together in one file. Pads split filenames with zeros out to five digits for consistency, assuming < 100,000 file count result.

Usage

splitinput(
  df,
  fname = deparse(substitute(df)),
  fdir = NA,
  min_nrow = 10000,
  keepcol = "subjid"
)

Arguments

df

data frame to split

fname

new name for each of the split files to start with

fdir

directory to put each of the split files (use "." for working directory). Must be changed from default (NA), which will trigger error.

min_nrow

minimum number of rows for each split file (default 10000)

keepcol

the column name (default "subjid") to use to keep records with the same values together in the same single split file

Value

the count number referring to the last split file written

Examples


# Run on given data
df <- as.data.frame(syngrowth)

# Run with all defaults (specifying directory)
splitinput(df, fdir = tempdir())

# Specifying the name, directory and minimum row size
splitinput(df, fname = "syngrowth", fdir = tempdir(), min_nrow = 5000)

# Specifying a different subject ID column
colnames(df)[colnames(df) == "subjid"] <- "sub_id"
splitinput(df, fdir = tempdir(), keepcol = "sub_id")


growthcleanr documentation built on June 24, 2024, 5:16 p.m.