knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
library(bit)
.ff.is.available = requireNamespace("ff", quietly=TRUE) && packageVersion("ff") >= "4.0.0"
if (.ff.is.available) library(ff)
#tools::buildVignette("vignettes/bit-demo.Rmd")
#devtools::build_vignettes()

bit type

Create a huge boolean vector (no NAs allowed)

n <- 1e8
b1 <- bit(n)
b1

It costs only one bit per element

object.size(b1) / n

A couple of standard methods work

b1[10:30] <- TRUE
summary(b1)

Create a another boolean vector with TRUE in some different positions

b2 <- bit(n)
b2[20:40] <- TRUE
b2

fast boolean operations

b1 & b2

fast boolean operations

summary(b1 & b2)

bitwhich type

Since we have a very skewed distribution we may coerce to an even sparser representation

w1 <- as.bitwhich(b1)
w2 <- as.bitwhich(b2)
object.size(w1) / n

and everything

w1 & w2

works as expected

summary(w1 & w2)

even mixing

summary(b1 & w2)

processing chunks

Many bit functions support a range restriction,

summary(b1, range=c(1, 1000))

which is useful

as.which(b1, range=c(1, 1000))

for filtered chunked looping

lapply(chunk(from=1, to=n, length=10), function(i) as.which(b1, range=i))

over large ff vectors

options(ffbatchbytes=1024^3)
x <- ff(vmode="single", length=n)
x[1:1000] <- runif(1000)
lapply(chunk(x, length.out = 10), function(i) sum(x[as.hi(b1, range=i)]))

and wrap-up

delete(x)
rm(x, b1, b2, w1, w2, n)

for more info check the usage vignette



truecluster/bit documentation built on April 12, 2025, 7:39 p.m.