Description Usage Arguments Details Value Note Author(s) Examples
A lazy.frame
is a data frame promise. It presents a delimited text file
as a kind of simple read-only data frame, without initially loading the file
into memory. Lazy frames load data from their backing files on demand. Lazy
frames are useful for quickly and efficiently extracting subsets from large csv
and other text files. They support normal and gzip-compressed files.
1 |
file |
A gzip-compressed or uncompressed text file, or another lazy.frame object. |
sep |
The column delimiter character. |
gz |
TRUE indicates gzip-compressed file, FALSE an uncompressed file. |
skip |
Number of lines to skip at the top of the file (see |
stringsAsFactors |
Strings cannot automatically be converted to factor variables by lazy.frame since data is loaded on demand. If you specify TRUE, you must also explicitly set the column factor levels manually. |
header |
TRUE if the first line of the file should be read as
column names, FALSE otherwise (see |
... |
Other arguments are passed directly to |
Lazy frames express raw text files as data frames, invoking
read.table
as required. Because the file contents are not loaded
until accessed, lazy.frames are a fast and memory efficient way to
extract subsets from medium to large text files (for example with
tens of millions of rows).
Lazy frames are read only. They support gzip-compressed and uncompressed text files.
Indexing operations generally follow standard array indexing with a few exceptions:
Only positive indices are allowed.
A missing row index requires specification of a single column for use in some basic comparison operations discussed below.
Lazy frames don't yet support the dollar sign column selector.
Otherwise, specify row and column indices like those for normal data.frames.
Because lazy.frames load data on demand, the default setting of
stringsAsFactors
is FALSE
(see help for data.frame
for
more information). See the column_attr
function for an example
of working with factors.
Lazy frames provide a few very basic comparison operations that work quickly on
single columns and act like the which
function. Presently supported
operations are ==, !=, <=, >=, <, and >, and may only compare a single column
with a scalar numeric or integer value, or with a character string. See below
for an example.
A lazy.frame object is returned.
I often just need to quickly filter row subsets or sample out of large files. Lazy frames are intended to do that quickly and efficiently.
Lazy frames can in pricinple index data files with more than 2^31 rows (returned subsets must conform to R's indexing limits of course). However, the internal indexing scheme needs efficiency improvement to make handling of such large text files practical. A future version may improve this. The present version is well-suited to text files with millions to hundreds of millions of rows.
This package was inspired by the mmap
and bigmemory
packages.
B. W. Lewis <blewis@illposed.net>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | data(iris)
f = tempfile()
write.table(iris, file=f, sep=",",col.names=TRUE,row.names=FALSE)
x = lazy.frame(f, header=TRUE)
# Subsetting
print(x[c(5,15,25),])
# Quickly apply basic numeric comparisons to a column
print(x[x[,1]<4.5,])
# Basic string and integer comparisons work too. Note that they are faster
# than numeric or integer comparisons.
v = x[x[,5]=="versicolor"]
print(dim(v))
unlink(f)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.