split_file: Split a File by Unique Entries in a Column
In Kmisc: Kevin Miscellaneous

Description Usage Arguments Details See Also

View source: R/split_file.R View source: R/RcppExports.R

This script splits a delimited file by unique entries in a selected column. The name of the entry being split over is appended to the file name (before the file extension).

1 2	split_file(file, column, sep = NULL, outDir = file.path(dirname(file), "split"), prepend = "", dots = 1, skip = 0, verbose = TRUE)

`file`	The location of the file we are splitting.
`column`	The column (by index) to split over.
`sep`	The file separator. Must be a single character. If `''`, we guess the delimiter from the first line.
`outDir`	The directory to output the files.
`prepend`	A string to prepend to the output file names; typically an identifier for what the column is being split over.
`dots`	The number of dots used in making up the file extension. If there are no dots in the file name, this argument is ignored.
`skip`	Integer; number of rows to skip (e.g. to avoid a header).
`verbose`	Be chatty?

This function should help users out in the unfortunate case that the data they have attempted to read is too large to fit into RAM. By splitting the file into multiple, smaller files, we hope that each file, post-splitting, is now small enough to fit into RAM.

The focus is on efficient splitting of 'well-mannered' files, so if you have comments, quoted delimiters, cell entries that have paragraphs of unicode text, or other wacky things this is probably not the function for you.

extract_rows_from_file

Kmisc documentation built on May 29, 2017, 1:43 p.m.