sample_lines: Read Sample Lines of Text File

Description Usage Arguments Details Value Examples

View source: R/sample_lines.r

Description

The function will read approximately p*nlines lines of a flat text file. So if p=.1, then we will get roughly (probably not exactly) 10 readLines().

Usage

1
sample_lines(file, n = -1L, p = 0.1, nskip = 0, nmax = 0, verbose = FALSE, ...)

Arguments

file

Location of the file (as a string) to be subsampled.

n

As in readLines().

p

Proportion to retain; should be a numeric value between 0 and 1.

nskip

Number of lines to skip.

nmax

Max number of lines to read. If nmax==0, then there is no read cap.

verbose

Logical; indicates whether or not linecounts of the input file and the number of lines sampled should be printed.

...

Additional arguments passed to readLines().

Details

This function scans over the test of the input file and at each step, randomly chooses whether or not to include the current line into a downsampled file. Each selected line is placed in a temporary file, before being read into R via readLines(). Additional arguments to this function (those other than file, p, and verbose) are passed to readLines(), and so if their behavior is unclear, you should examine the readLines() help file.

If verbose=TRUE, then something like:

Read 12207 lines (0.001%) of 12174948 line file.

will be printed to the terminal. This counts the header (if there is one) as one of the lines read and as one of the lines possible.

Value

A character vector, as with readLines().

Examples

1
2
3
library(filesampler)
file = system.file("rawdata/small.csv", package="filesampler")
sample_lines(file, p=.05)

wrathematics/lineSampler documentation built on Feb. 27, 2020, 8:01 p.m.