How a5R stores cell IDs without strings

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

The problem

An A5 cell ID is a 64-bit unsigned integer (u64). R has no native u64 type — its integers are 32-bit signed (-2^31 to 2^31 - 1), and its doubles are 64-bit floating point. A double can only represent integers exactly up to 2^53, while a u64 can go up to 2^64 - 1.

The obvious workaround is to store cell IDs as hex strings ("0800000000000006"). This works, but every trip across the R--Rust boundary requires hex parsing and formatting — O(n) string allocation that dominates the cost of lightweight operations like a5_get_resolution() or a5_cell_to_parent().

The solution: eight raw-byte fields

A u64 is exactly 8 bytes. We store each byte of the little-endian representation as a separate raw vector field in a vctrs record type:

cell_id (u64):  0x0800000000000006

little-endian bytes:
  b1 = 0x06, b2 = 0x00, b3 = 0x00, b4 = 0x00,
  b5 = 0x00, b6 = 0x00, b7 = 0x00, b8 = 0x08

This is lossless — the eight bytes are the exact same bits as the original u64, just stored across eight contiguous raw vectors. No precision loss, no special-case handling. On the Rust side, reconstructing the u64 from the eight byte slices is a single u64::from_le_bytes() call. This also avoids pointers, so there is no need to think about serialization when saving an a5_cell object to disk.

R-side: a vctrs record type

On the R side, a5_cell is a vctrs record (vctrs::new_rcrd()) with eight fields (b1 through b8):

library(a5R)
cell <- a5_lonlat_to_cell(-3.19, 55.95, resolution = 10)
vctrs::field(cell, "b1")
vctrs::field(cell, "b8")

Each field is a plain raw vector — a contiguous block of memory with no per-element overhead. Subsetting, combining, and NA propagation are all handled automatically by vctrs.

Hex strings are only produced on demand:

# Display calls format(), which converts to hex for readability
cell

# Explicit conversion
a5_u64_to_hex(cell)

# Round-trip from hex
a5_cell("0800000000000006")

Why this matters

Compare memory for one million cells:

set.seed(42)
cells <- a5_lonlat_to_cell(
  runif(1e6, -180, 180),
  runif(1e6, -80, 80),
  resolution = 10
)

# rcrd: eight contiguous raw vectors (8 × 1 byte × 1M ≈ 7.6 MB)
format(object.size(cells), units = "MB")

# equivalent hex strings would be ~81 MB
# (16 chars + 56-byte SEXP header per string)
hex <- a5_u64_to_hex(cells)
format(object.size(hex), units = "MB")

NA handling

A5 cell IDs use 60 "quintants" (values 0–59) in their top 6 bits. Quintant 63 (binary 111111) is invalid in the A5 system, so we use 0xFC00000000000000 as a sentinel value for NA. In little-endian, the last byte (b8) is 0xFC, making NA detection a fast single-byte check.

On the Rust side, the sentinel is detected and mapped to None. Standard R idioms work as expected:

cells_with_na <- a5_cell(c("0800000000000006", NA))
is.na(cells_with_na)

Summary

| Aspect | Hex strings | Raw bytes | |--------|------------|-----------| | R type | character vector | vctrs_rcrd (eight raw fields) | | Memory (1M cells) | ~81 MB | ~7.6 MB | | R-Rust crossing | O(n) hex parse/format | Zero-copy byte access | | Human-readable | Always | On format() / print() | | Lossless | Yes | Yes (exact byte representation) |



Try the a5R package in your browser

Any scripts or data that you put into this service are public.

a5R documentation built on March 26, 2026, 5:10 p.m.