bit-package: A class for vectors of 1-bit booleans

Description Usage Arguments Details Value Note Author(s) See Also Examples

Description

Package 'bit' provides bitmapped vectors of booleans (no NAs), coercion from and to logicals, integers and integer subscripts; fast boolean operators and fast summary statistics.

With bit vectors you can store true binary booleans {FALSE,TRUE} at the expense of 1 bit only, on a 32 bit architecture this means factor 32 less RAM and factor 32 more speed on boolean operations. With this speed gain it even pays-off to convert to bit in order to avoid a single boolean operation on logicals or a single set operation on (longer) integer subscripts, the pay-off is dramatic when such components are used more than once.

Reading from and writing to bit is approximately as fast as accessing standard logicals - mostly due to R's time for memory allocation. The package allows to work with pre-allocated memory for return values by calling .Call() directly: when evaluating the speed of C-access with pre-allocated vector memory, coping from bit to logical requires only 70% of the time for copying from logical to logical; and copying from logical to bit comes at a performance penalty of 150%.

Since bit objects cannot be used as subsripts in R, a second class 'bitwhich' allows to store selections as efficiently as possible with standard R types. This is usefull either to represent parts of bit objects or to represent very asymetric selections.

Class 'ri' (range index) allows to select ranges of positions for chunked processing: all three classes 'bit', 'bitwhich' and 'ri' can be used for subsetting 'ff' objects (ff-2.1.0 and higher).

Usage

1
2
3
 bit(length)
 ## S3 method for class 'bit'
print(x, ...)

Arguments

length

length of vector in bits

x

a bit vector

...

further arguments to print

Details

Package: bit
Type: Package
Version: 1.1.0
Date: 2012-06-05
License: GPL-2
LazyLoad: yes
Encoding: latin1

Index:

bit function bitwhich function ri function see also description
.BITS globalenv variable holding number of bits on this system
bit_init .First.lib initially allocate bit-masks (done in .First.lib)
bit_done .Last.lib finally de-allocate bit-masks (done in .Last.lib)
bit bitwhich ri logical create bit object
print.bit print.bitwhich print.ri print print bit vector
length.bit length.bitwhich length.ri length get length of bit vector
length<-.bit length<-.bitwhich length<- change length of bit vector
c.bit c.bitwhich c concatenate bit vectors
is.bit is.bitwhich is.ri is.logical test for bit class
as.bit as.bitwhich as.logical generically coerce to bit or bitwhich
as.bit.logical as.bitwhich.logical logical coerce logical to bit vector (FALSE => FALSE, c(NA, TRUE) => TRUE)
as.bit.integer as.bitwhich.integer integer coerce integer to bit vector (0 => FALSE, ELSE => TRUE)
as.bit.double as.bitwhich.double double coerce double to bit vector (0 => FALSE, ELSE => TRUE)
as.double.bit as.double.bitwhich as.double.ri as.double coerce bit vector to double (0/1)
as.integer.bit as.integer.bitwhich as.integer.ri as.integer coerce bit vector to integer (0L/1L)
as.logical.bit as.logical.bitwhich as.logical.ri as.logical coerce bit vector to logical (FALSE/TRUE)
as.which.bit as.which.bitwhich as.which.ri as.which coerce bit vector to positive integer subscripts
as.bit.which as.bitwhich.which bitwhich coerce integer subscripts to bit vector
as.bit.bitwhich as.bitwhich.bitwhich coerce from bitwhich
as.bit.bit as.bitwhich.bit UseMethod coerce from bit
as.bit.ri as.bitwhich.ri coerce from range index
as.bit.ff ff coerce ff boolean to bit vector
as.ff.bit as.ff coerce bit vector to ff boolean
as.hi.bit as.hi.bitwhich as.hi.ri as.hi coerce to hybrid index (requires package ff)
as.bit.hi as.bitwhich.hi coerce from hybrid index (requires package ff)
[[.bit [[ get single bit (index checked)
[[<-.bit [[<- set single bit (index checked)
[.bit [ get vector of bits (unchecked)
[<-.bit [<- set vector of bits (unchecked)
!.bit !.bitwhich (works as second arg in ! boolean NOT on bit
&.bit &.bitwhich bit and bitwhich ops) & boolean AND on bit
|.bit |.bitwhich | boolean OR on bit
xor.bit xor.bitwhich xor boolean XOR on bit
!=.bit !=.bitwhich != boolean unequality (same as XOR)
==.bit ==.bitwhich == boolean equality
all.bit all.bitwhich all.ri all aggregate AND
any.bit any.bitwhich any.ri any aggregate OR
min.bit min.bitwhich min.ri min aggregate MIN (first TRUE position)
max.bit max.bitwhich max.ri max aggregate MAX (last TRUE position)
range.bit range.bitwhich range.ri range aggregate [MIN,MAX]
sum.bit sum.bitwhich sum.ri sum aggregate SUM (count of TRUE)
summary.bit summary.bitwhich summary.ri tabulate aggregate c(nFALSE, nTRUE, minRange, maxRange)
regtest.bit regressiontests for the package

Value

bit returns a vector of integer sufficiently long to store 'length' bits (but not longer) with an attribute 'n' and class 'bit'

Note

Currently operations on bit objects have some overhead from R-calls. Do expect speed gains for vectors of length ~ 10000 or longer.
Since this package was created for high performance purposes, only positive integer subscripts are allowed: All R-functions behave as expected - i.e. they do not change their arguments and create new return values. If you want to save the time for return value memory allocation, you must use .Call directly (see the dontrun example in sum.bit).

Author(s)

Jens Oehlschl<c3><a4>gel <[email protected]>

Maintainer: Jens Oehlschl<c3><a4>gel <[email protected]>

See Also

logical in base R and vmode in package 'ff'

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
  x <- bit(12)                                 # create bit vector
  x                                            # autoprint bit vector
  length(x) <- 16                              # change length
  length(x)                                    # get length
  x[[2]]                                       # extract single element
  x[[2]] <- TRUE                               # replace single element
  x[1:2]                                       # extract parts of bit vector
  x[1:2] <- TRUE                               # replace parts of bit vector
  as.which(x)                                  # coerce bit to subscripts
  x <- as.bit.which(3:4, 4)                    # coerce subscripts to bit
  as.logical(x)                                # coerce bit to logical
  y <- as.bit(c(FALSE, TRUE, FALSE, TRUE))     # coerce logical to bit
  is.bit(y)                                    # test for bit
  !x                                           # boolean NOT
  x & y                                        # boolean AND
  x | y                                        # boolean OR
  xor(x, y)                                    # boolean Exclusive OR
  x != y                                       # boolean unequality (same as xor)
  x == y                                       # boolean equality
  all(x)                                       # aggregate AND
  any(x)                                       # aggregate OR
  min(x)                                       # aggregate MIN (integer version of ALL)
  max(x)                                       # aggregate MAX (integer version of ANY)
  range(x)                                     # aggregate [MIN,MAX]
  sum(x)                                       # aggregate SUM (count of TRUE)
  summary(x)                                   # aggregate count of FALSE and TRUE

  ## Not run: 
    message("\nEven for a single boolean operation transforming logical to bit pays off")
    n <- 10000000
    x <- sample(c(FALSE, TRUE), n, TRUE)
    y <- sample(c(FALSE, TRUE), n, TRUE)
    system.time(x|y)
    system.time({
       x <- as.bit(x)
       y <- as.bit(y)
    })
    system.time( z <- x | y )
    system.time( as.logical(z) )
    message("Even more so if multiple operations are needed :-)")

    message("\nEven for a single set operation transforming subscripts to bit pays off\n")
    n <- 10000000
    x <- sample(n, n/2)
    y <- sample(n, n/2)
    system.time( union(x,y) )
    system.time({
     x <- as.bit.which(x, n)
     y <- as.bit.which(y, n)
    })
    system.time( as.which.bit( x | y ) )
    message("Even more so if multiple operations are needed :-)")

    message("\nSome timings WITH memory allocation")
    n <- 2000000
    l <- sample(c(FALSE, TRUE), n, TRUE)
    # copy logical to logical
    system.time(for(i in 1:100){  # 0.0112
       l2 <- l
       l2[1] <- TRUE   # force new memory allocation (copy on modify)
       rm(l2)
    })/100
    # copy logical to bit
    system.time(for(i in 1:100){  # 0.0123
       b <- as.bit(l)
       rm(b)
    })/100
    # copy bit to logical
    b <- as.bit(l)
    system.time(for(i in 1:100){  # 0.009
       l2 <- as.logical(b)
       rm(l2)
    })/100
    # copy bit to bit
    b <- as.bit(l)
    system.time(for(i in 1:100){  # 0.009
       b2 <- b
       b2[1] <- TRUE   # force new memory allocation (copy on modify)
       rm(b2)
    })/100


    l2 <- l
    # replace logical by TRUE
    system.time(for(i in 1:100){
       l[] <- TRUE
    })/100
    # replace bit by TRUE (NOTE that we recycle the assignment  
		 # value on R side == memory allocation and assignment first)
    system.time(for(i in 1:100){
       b[] <- TRUE
    })/100
    # THUS the following is faster
    system.time(for(i in 1:100){
       b <- !bit(n)
    })/100

    # replace logical by logical
    system.time(for(i in 1:100){
       l[] <- l2
    })/100
    # replace bit by logical
    system.time(for(i in 1:100){
       b[] <- l2
    })/100
    # extract logical
    system.time(for(i in 1:100){
       l2[]
    })/100
    # extract bit
    system.time(for(i in 1:100){
       b[]
    })/100

    message("\nSome timings WITHOUT memory allocation (Serge, that's for you)")
    n <- 2000000L
    l <- sample(c(FALSE, TRUE), n, TRUE)
    b <- as.bit(l)
    # read from logical, write to logical
    l2 <- logical(n)
    system.time(for(i in 1:100).Call("R_filter_getset", l, l2, PACKAGE="bit")) / 100
    # read from bit, write to logical
    l2 <- logical(n)
    system.time(for(i in 1:100).Call("R_bit_get", b, l2, c(1L, n), PACKAGE="bit")) / 100
    # read from logical, write to bit
    system.time(for(i in 1:100).Call("R_bit_set", b, l2, c(1L, n), PACKAGE="bit")) / 100

  
## End(Not run)

OHDSI/bit documentation built on May 9, 2017, 3:30 p.m.