disksort: Sort File On Disk

Description Usage Arguments Functions

View source: R/disksort.R

Description

This function is designed to handle files larger than memory. At most nrows will be present in memory at once. It is not parallel. For this to work efficiently it's necessary that the data between breaks fits into memory.

Usage

1
2
3
4
5
6
disksort(infile, outfile = NULL, sortcolumn = 1L, breaks = NULL,
  nrows = 1000L, nbins = 10L, read.table.args = NULL,
  write.table.args = NULL, cleanup = TRUE)

streambin(infile, firstchunk, sortcolumn = 1L, breaks = NULL,
  nrows = 1000L, read.table.args = NULL)

Arguments

infile

unsorted file like object to read from. See read.table.

outfile

where to write the sorted file. See write.table. If infile is the name of a file then the default prepends "sorted_" to this name.

sortcolumn

which column of the data frame to sort on

breaks

vector giving points to split data for binning

nrows

number of rows in the data.frame held in memory

nbins

number of bins for bin sort. Ignored if breaks is specified.

read.table.args

named list of extra arguments to read.table

write.table.args

named list of extra arguments to write.table. Defaults to using read.table.args to preserve the original formatting.

cleanup

remove intermediate files?

firstchunk

first rows from infile

Functions


matloff/partools documentation built on May 21, 2019, 12:56 p.m.