View source: R/write-parquet.R
append_parquet | R Documentation |
The schema of the data frame must be compatible with the schema of the file.
append_parquet(
x,
file,
compression = c("snappy", "gzip", "zstd", "uncompressed"),
encoding = NULL,
row_groups = NULL,
options = parquet_options()
)
x |
Data frame to append. |
file |
Path to the output file. |
compression |
Compression algorithm to use for the newly written
data. See |
encoding |
Encoding to use for the newly written data. It does not
have to be the same as the encoding of data in |
row_groups |
Row groups of the new, extended Parquet file.
|
options |
Nanoparquet options, for the new data, see
|
This function is not atomic! If it is interrupted, it may leave the file in a corrupt state. To work around this create a copy of the original file, append the new data to the copy, and then rename the new, extended file to the original one.
A Parquet file may be partitioned into multiple row groups, and indeed
most large Parquet files are. append_parquet()
is only able to update
the existing file along the row group boundaries. There are two
possibilities:
append_parquet()
keeps all existing row groups in file
, and
creates new row groups for the new data. This mode can be forced by
the keep_row_groups
option in options
, see parquet_options()
.
Alternatively, write_parquet
will overwrite the last row group in
file, with its existing contents plus the (beginning of) the new data.
This mode makes more sense if the last row group is small, because
many small row groups are inefficient.
By default append_parquet
chooses between the two modes automatically,
aiming to create row groups with at least num_rows_per_row_group
(see parquet_options()
) rows. You can customize this behavior with
the keep_row_groups
options and the row_groups
argument.
write_parquet()
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.