disksort: Sort File On Disk

View source: R/disksort.R

disksortR Documentation

Sort File On Disk

Description

This function is designed to handle files larger than memory. At most nrows will be present in memory at once. It is not parallel. For this to work efficiently it's necessary that the data between breaks fits into memory.

Usage

disksort(infile, outfile = NULL, sortcolumn = 1L, breaks = NULL,
  nrows = 1000L, nbins = 10L, read.table.args = NULL,
  write.table.args = NULL, cleanup = TRUE)

streambin(infile, firstchunk, sortcolumn = 1L, breaks = NULL,
  nrows = 1000L, read.table.args = NULL)

Arguments

infile

unsorted file like object to read from. See read.table.

outfile

where to write the sorted file. See write.table. If infile is the name of a file then the default prepends "sorted_" to this name.

sortcolumn

which column of the data frame to sort on

breaks

vector giving points to split data for binning

nrows

number of rows in the data.frame held in memory

nbins

number of bins for bin sort. Ignored if breaks is specified.

read.table.args

named list of extra arguments to read.table

write.table.args

named list of extra arguments to write.table. Defaults to using read.table.args to preserve the original formatting.

cleanup

remove intermediate files?

firstchunk

first rows from infile

Functions

  • streambin: Stream File Into Bins

    Read a data frame, split it into bins, and write to those bins on disk.


matloff/partools documentation built on Oct. 20, 2022, 2:52 p.m.