Description Usage Arguments Details Value
View source: R/file_sample_exact.r
Randomly sample lines from an input text file.
1 2 3 4 5 6 7 8 | file_sample_exact(
nlines,
infile,
outfile = tempfile(),
header = TRUE,
nskip = 0,
verbose = FALSE
)
|
nlines |
The (exact) number of lines to sample from the input file. |
infile |
Location of the file (as a string) to be subsampled. |
outfile |
Output file location (as a string). |
header |
Is a header (line of column names) on the first line of the csv file? |
nskip |
Number of lines to skip. If |
verbose |
Should linecounts of the input file and the number of lines sampled be printed? |
The sampling is done in two passes of the input file. First, the number of lines of the input file are determined by scanning through the file as quickly as possible (i.e., it should be completely I/O bound). Next, an index of lines to keep is produced by reservoir sampling. Then finally, the input file is scanned again line by line with the chosen lines dumped into a temporary file.
If the output file (the one pointed to by the return of this function) is "large" and to be read into memory (which isn't really appropriate for text files in the first place!), then this strategy is probably not appropriate.
NULL
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.