csvSample: Sample observations from a CSV file

View source: R/csvSample.R

csvSampleR Documentation

Sample observations from a CSV file

Description

This function provides a fast mechanism to sample observations (lines) within a local CSV file. It uses compiled code to read individual lines and return only the ones of interest. It is fast as it doesn't use a lot of memory, reading one line at a time. This compares with using readLines and a connection.

This can be used for non-CSV files as the actual lines are not interpreted.

Usage

csvSample(file, n, rows = sample(1:numRows, n), numRows = getNumLines(file), randomize = FALSE, header = TRUE)

Arguments

file

the name of the local file containing the line/observations of interest

n

the number of lines to sample from the file

rows

a vector of numbers identifying the lines/observations to sample

numRows

the total number of observations in the file. This is used to sample line numbers. If rows is specified, this is not used.

randomize

logical value controlling whether to randomly permute the result or leave the vector of lines in the order they are sampled.

header

a logical value that indicates whether the file has an initial line that contains the variable names, i.e. the header for the CSV file.

Value

A character vector containing the sampled lines.

Author(s)

Duncan Temple Lang

See Also

read.csv file


duncantl/FastCSVSample documentation built on Nov. 23, 2023, 4:21 p.m.