sample_csv: Read Sample of CSV
In wrathematics/lineSampler: File Sampler

Description Usage Arguments Details Value Examples

View source: R/sample_csv.r

The function will read (as csv) approximately p*nlines lines. So if p=.1, then we will get roughly (probably not exactly) 10 data. This is the analogue of the base R function read.csv().

sample_csv(
  file,
  param,
  method = "proportional",
  reader = utils::read.csv,
  header = TRUE,
  nskip = 0,
  nmax = 0,
  verbose = FALSE,
  ...
)

`file`	Location of the file (as a string) to be subsampled.
`param`	The downsampling parameter. For the "proportional" method, this is the proportion to retain and should be a numeric value between 0 and 1. For the exact method, this is the total number of lines to read in.
`method`	A string indicating the type of read method to use. Options are "proportional" and "exact".
`reader`	A function specifying the reader to use. The default is `utils::read.csv`. Other options include `data.table::fread()` and `readr::read_csv()`. Note the first argument of the reader should be the file to read in and the second should be the the `header`/`col_names` argument. This would require writing a small wrapper for `fread()`.
`header`	Is a header (line of column names) on the first line of the csv file?
`nskip`	Number of lines to skip. If `header=TRUE`, then this only applies to lines after the header.
`nmax`	Max number of lines to read. If nmax==0, then there is no read cap. Ignored if `method="exact"`.
`verbose`	Should linecounts of the input file and the number of lines sampled be printed?
`...`	Additional arguments passed to the csv reader.

This function scans over the test of the input file and at each step, randomly chooses whether or not to include the current line into a downsampled file. Each selected line is placed in a temporary file, before being read into R via read.csv(). Additional arguments to this function (those other than file, p, and verbose) are passed to read.csv(), and so if their behavior is unclear, you should examine the read.csv() help file.

If verbose=TRUE, then something like:

Read 12207 lines (0.001%) of 12174948 line file.

will be printed to the terminal. This counts the header (if there is one) as one of the lines read and as one of the lines possible.

A dataframe, as with read.csv().

library(filesampler)
file = system.file("rawdata/small.csv", package="filesampler")

# Read in a 5% random subsample of the rows.
data = sample_csv(file, param=.05)

# Read in 10 randomly sampled rows.
data = sample_csv(file, param=10, method="exact")