read_parquet_page | R Documentation |
Read a page from a Parquet file
read_parquet_page(file, offset)
file |
Path to a Parquet file. |
offset |
Integer offset of the start of the page in the file.
See |
Named list. Many entries correspond to the columns of
the result of read_parquet_pages()
. Additional entries are:
codec
: compression codec. Possible values:
has_repetition_levels
: whether the page has repetition levels.
has_definition_levels
: whether the page has definition levels.
schema_column
: which schema column the page corresponds to. Note
that only leaf columns have pages.
data_type
: low level Parquet data type. Possible values:
repetition_type
: whether the column the page belongs to is
REQUIRED
, OPTIONAL
or REPEATED
.
page_header
: the bytes of the page header in a raw vector.
num_null
: number of missing (NA
) values. Only set in V2 data
pages.
num_rows
: this is the same as num_values
for flat tables, i.e.
files without repetition levels.
compressed_data
: the data of the page in a raw vector. It includes
repetition and definition levels, if any.
data
: the uncompressed data, if nanoparquet supports the
compression codec of the file (GZIP and SNAPPY at the time of
writing), or if the file is not compressed. In the latter case it
is the same as compressed_data
.
read_parquet_pages()
for a summary of all pages.
file_name <- system.file("extdata/userdata1.parquet", package = "nanoparquet")
nanoparquet:::read_parquet_pages(file_name)
options(max.print = 100) # otherwise long raw vector
nanoparquet:::read_parquet_page(file_name, 4L)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.