disksort: Sort File On Disk
In matloff/partools: Tools for the 'Parallel' Package

disksort

R Documentation

Sort File On Disk

Description

This function is designed to handle files larger than memory. At most nrows will be present in memory at once. It is not parallel. For this to work efficiently it's necessary that the data between breaks fits into memory.

Usage

disksort(infile, outfile = NULL, sortcolumn = 1L, breaks = NULL,
  nrows = 1000L, nbins = 10L, read.table.args = NULL,
  write.table.args = NULL, cleanup = TRUE)

streambin(infile, firstchunk, sortcolumn = 1L, breaks = NULL,
  nrows = 1000L, read.table.args = NULL)

Arguments

`infile`	unsorted file like object to read from. See `read.table`.
`outfile`	where to write the sorted file. See `write.table`. If `infile` is the name of a file then the default prepends "sorted_" to this name.
`sortcolumn`	which column of the data frame to sort on
`breaks`	vector giving points to split data for binning
`nrows`	number of rows in the data.frame held in memory
`nbins`	number of bins for bin sort. Ignored if `breaks` is specified.
`read.table.args`	named list of extra arguments to read.table
`write.table.args`	named list of extra arguments to write.table. Defaults to using read.table.args to preserve the original formatting.
`cleanup`	remove intermediate files?
`firstchunk`	first rows from `infile`