mvl_write_extent_index: Compute and write extent index

View source: R/RMVL.R

mvl_write_extent_indexR Documentation

Compute and write extent index

Description

This function computes a hash-based index that allows to find indices of rows which hashes match query values. While it can be applied to arbitrary data, it is optimized for the common case when vectors contain stretches of repeated values describing row groups to be processed. This is particularly relevant for R because vectorized processing of row batches is the only practical way to scan very large tables using pure-R code.

Usage

mvl_write_extent_index(MVLHANDLE, L, name = NULL)

Arguments

MVLHANDLE

a handle to MVL file produced by mvl_open()

L

list of vector like MVL_OBJECTs

name

if specified add a named entry to MVL file directory

Details

mvl_write_extent_index() creates the index in memory and then writes it out. The memory usage is proportional to the number of repeat stretches. Sorting tables improves performance, but is not a requirement.

Value

an object of class MVL_OFFSET that describes an offset into this MVL file. MVL offsets are vectors and can be concatenated. They can be written to MVL file directly, or as part of another object such as list.

See Also

mvl_order_vectors, mvl_index_lapply, mvl_find_matches, mvl_group, mvl_find_matches, mvl_indexed_copy, mvl_merge, mvl_hash_vectors, mvl_get_groups

Examples

## Not run: 
Mtmp<-mvl_open("tmp_a.mvl", append=TRUE, create=TRUE)
mvl_write_object(Mtmp, data.frame(x=runif(100), y=(1:100) %% 10), "df1")
Mtmp<-mvl_remap(Mtmp)
mvl_write_extent_index(Mtmp, list(Mtmp$df1[,"y",ref=TRUE]), "df1_extent_index_y")
Mtmp<-mvl_remap(Mtmp)
mvl_index_lapply(Mtmp["df1_extent_index_y", ref=TRUE], list(c(2, 3)),
                                           function(i, idx) { return(list(i, idx))})
# Example of full scan
mvl_index_lapply(Mtmp["df1_extent_index_y", ref=TRUE], ,
                                           function(i, idx) { return(list(i, idx))})

## End(Not run)

RMVL documentation built on Nov. 2, 2023, 6:09 p.m.