LS_sample_exact: Exact File Sampler

Description Usage Arguments Details Value See Also

View source: R/readers.r

Description

Randomly sample exactly nlines lines from an input text file.

Usage

1
LS_sample_exact(verbose, header, nskip, nlines, infile, outfile = tempfile())

Arguments

verbose

Logical; indicates whether or not linecounts of the input file and the number of lines sampled should be printed.

header

Logical; indicates whether or not there is a header on the csv file.

nskip

Number of lines to skip. If header=TRUE, then this only applies to lines after the header.

nlines

The (exact) number of lines to sample from the input file.

infile

Location of the file (as a string) to be subsampled.

outfile

Output file. Default is a temporary file.

Details

The sampling is done in two passes of the input file. First, the number of lines of the input file are determined by scanning through the file as quickly as possible (i.e., it should be completely I/O bound). Next, an index of lines to keep is produced by reservoir sampling. Then finally, the input file is scanned again line by line with the chosen lines dumped into a temporary file.

If the output file (the one pointed to by the return of this function) is "large" and to be read into memory (which isn't really appropriate for text files in the first place!), then this strategy is probably not appropriate.

Value

NULL

See Also

LS_sample_prob


wrathematics/lineSampler documentation built on May 13, 2018, 11:19 a.m.