Scanner iterates over a Dataset's fragments and returns data
according to given row filtering and column projection. A
can help create one.
Scanner$create() wraps the
ScannerBuilder interface to make a
It takes the following arguments:
arrow_dplyr_query object, as returned by the
dplyr methods on
projection: A character vector of column names to select columns or a
named list of expressions
Expression to filter the scanned rows by, or
to keep all rows.
use_threads: logical: should scanning use multithreading? Default
...: Additional arguments, currently ignored
ScannerBuilder has the following methods:
$Project(cols): Indicate that the scan should only return columns given
cols, a character vector of column names or a named list of Expression.
$Filter(expr): Filter rows by an Expression.
$UseThreads(threads): logical: should the scan use multithreading?
The method's default input is
TRUE, but you must call the method to enable
multithreading because the scanner default is
$BatchSize(batch_size): integer: Maximum row count of scanned record
batches, default is 32K. If scanned record batches are overflowing memory
then this method can be called to reduce their size.
$schema: Active binding, returns the Schema of the Dataset
$Finish(): Returns a
Scanner currently has a single method,
$ToTable(), which evaluates the
query and returns an Arrow Table.
# Set up directory for examples tf <- tempfile() dir.create(tf) on.exit(unlink(tf)) write_dataset(mtcars, tf, partitioning="cyl") ds <- open_dataset(tf) scan_builder <- ds$NewScan() scan_builder$Filter(Expression$field_ref("hp") > 100) scan_builder$Project(list(hp_times_ten = 10 * Expression$field_ref("hp"))) # Once configured, call $Finish() scanner <- scan_builder$Finish() # Can get results as a table as.data.frame(scanner$ToTable()) # Or as a RecordBatchReader scanner$ToRecordBatchReader()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.