setNumericRounding: Change or turn off numeric rounding

Description Usage Arguments Details Value See Also Examples

Description

Change rounding to 0, 1 or 2 bytes when joining or grouping numeric (i.e. double) columns.

Usage

1
2

Arguments

x

integer or numeric vector: 2 (default), 1 or 0 byte rounding

Details

Computers cannot represent some floating point numbers (such as 0.6) precisely, using base 2. This leads to unexpected behaviour when joining or grouping columns of type 'numeric'; i.e. 'double', see example below. To deal with this automatically for convenience, when joining or grouping, data.table rounds such data to apx 11 s.f. which is plenty of digits for many cases. This is achieved by rounding the last 2 bytes of the significand. Where this is not enough, setNumericRounding can be used to reduce to 1 byte rounding, or no rounding (0 bytes rounded) for full precision available.

It's bytes rather than bits because it's tied in with the radix sort algorithm for sorting numerics which sorts byte by byte. With the default rounding of 2 bytes, at most 6 passes are needed. With no rounding, at most 8 passes are needed and hence may be slower. The choice of default is not for speed however, but to avoid surprising results such as in the example below.

Value

setNumericRounding returns no value; the new value is applied. getNumericRounding returns the current value: 0, 1 or 2.

See Also

http://en.wikipedia.org/wiki/Double-precision_floating-point_format
http://en.wikipedia.org/wiki/Floating_point
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

web statistics

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
    DT = data.table(a=seq(0,1,by=0.2),b=1:2, key="a")
    DT
    setNumericRounding(0)   # turn off rounding
    DT[.(0.4)]   # works
    DT[.(0.6)]   # no match, confusing since 0.6 is clearing there in DT
    
    setNumericRounding(2)   # restore default
    DT[.(0.6)]   # works as expected
    
    # using type 'numeric' for integers > 2^31 (typically ids)
    DT = data.table(id = c(1234567890123, 1234567890124, 1234567890125), val=1:3)
    print(DT, digits=15)
    DT[,.N,by=id]   # 1 row
    setNumericRounding(0)
    DT[,.N,by=id]   # 3 rows
    # better to use bit64::integer64 for such ids
    setNumericRounding(2)

Example output

     a b
1: 0.0 1
2: 0.2 2
3: 0.4 1
4: 0.6 2
5: 0.8 1
6: 1.0 2
     a b
1: 0.4 1
     a  b
1: 0.6 NA
     a b
1: 0.6 2
              id val
1: 1234567890123   1
2: 1234567890124   2
3: 1234567890125   3
             id N
1: 1.234568e+12 3
             id N
1: 1.234568e+12 1
2: 1.234568e+12 1
3: 1.234568e+12 1

data.table documentation built on May 2, 2019, 4:57 p.m.