file_sample_prop: Proportional File Sampler

Description Usage Arguments Details Value

View source: R/file_sample_prob.r

Description

Randomly sample lines from an input text file.

Usage

1
2
3
4
5
6
7
8
9
file_sample_prop(
  p,
  infile,
  outfile = tempfile(),
  header = TRUE,
  nskip = 0,
  nmax = 0,
  verbose = FALSE
)

Arguments

p

Proportion to retain; should be a numeric value between 0 and 1.

infile

Location of the file (as a string) to be subsampled.

outfile

Output file location (as a string).

header

Is a header (line of column names) on the first line of the csv file?

nskip

Number of lines to skip. If header=TRUE, then this only applies to lines after the header.

nmax

Max number of lines to read. If nmax==0, then there is no read cap.

verbose

Should linecounts of the input file and the number of lines sampled be printed?

Details

The sampling is done in one pass of the input file, dumping lines to a temporary file as the input is read.

If the output file (the one pointed to by the return of this function) is "large" and to be read into memory (which isn't really appropriate for text files in the first place!), then this strategy is probably not appropriate.

Value

NULL


wrathematics/lineSampler documentation built on Feb. 27, 2020, 8:01 p.m.