README.md

RLibs

RLibs is a library of various production tools I use in data processing. It is under construction, a lot of functions are/will be deprecated, some of them will be moved to other packages (like anything useful and related to plotting with ggplot2 will go to sciplotr).

Features

Equality & Comparison

Default R == operator performs strict comparison, which does not work very well for floating-point problems. What R considers unequal, technically can be equal within machine’s precision. The standard example is

0.1 + 0.2 == 0.3
## [1] FALSE

There is RLibs::are_equal_f, which performs more or less correct floating-point comparison (with some given tolerance).

library(RLibs, quietly = TRUE, warn.conflicts = FALSE)
are_equal_f(0.1 + 0.2, 0.3)
## [1] TRUE

Atop of this function there are several more built for comfortable use:

(0.1 + 0.2) %==% 0.3
## [1] TRUE
(0.1 + 0.2) %!=% 0.3
## [1] FALSE

Operators invoke floating-point method only if type of one operand is floating-point. Type/size stability is enforced by the vctrs package.

Cluster planning

A set of tools to create clusters to work with future and furrr packages.

# Checks cluster status
get_topology()
## [1] 1
# Create 2 workers, each spawning 2 workers (so 4 + 2 in total, max 4 working simultaneously)
plan_cluster(2, 2)
## Cluster: [2, 2]
unlist(furrr::future_map(1:2, ~list(Sys.getpid(), furrr::future_map(1:2, ~Sys.getpid()))))
## [1] 22324 16552 13264 20988 26340 21840
# Switch back to sequential execution
plan_cluster(1)
## Cluster: single process
unlist(furrr::future_map(1:2, ~list(Sys.getpid(), furrr::future_map(1:2, ~Sys.getpid()))))
## [1] 10160 10160 10160 10160 10160 10160

Tricky joins

dplyr can do various joins, like inner_join, left_join. Here is a way to do conditional joins (not really optimized):

library(dplyr, quietly = TRUE, warn.conflicts = FALSE)
tbl <- data.frame(Type = c("10-20", "20-30"), L = c(10, 20), U = c(20, 30))
# Subsetting mtcars to reduce output
left_join_cnd(mtcars[c(1:7, 18:20, 28:32),], tbl, .x$mpg >= .y$L, .x$mpg < .y$U) %>% select(Type, L, mpg, U, everything())
##     Type  L  mpg  U cyl  disp  hp drat    wt  qsec vs am gear carb
## 1  20-30 20 21.0 30   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## 2  20-30 20 21.0 30   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## 3  20-30 20 22.8 30   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## 4  20-30 20 21.4 30   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## 5  10-20 10 18.7 20   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## 6  10-20 10 18.1 20   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## 7  10-20 10 14.3 20   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## 8   <NA> NA 32.4 NA   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## 9   <NA> NA 30.4 NA   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## 10  <NA> NA 33.9 NA   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## 11  <NA> NA 30.4 NA   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## 12 10-20 10 15.8 20   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## 13 10-20 10 19.7 20   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## 14 10-20 10 15.0 20   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## 15 20-30 20 21.4 30   4 121.0 109 4.11 2.780 18.60  1  1    4    2

Here ... accepts comma-separated conditions, similar to dplyr::filter, where .x refers to lhs table and .y refers to rhs table.

There are also *_join_safe, which perform exactly the same as dplyr::*_join joins, but beforehand key columns are converted to common types and a meaningful error message is displayed if conversion fails. No more casting factors to strings if the levels are different. vctrs to the rescue!

Utility methods

IO



Ilia-Kosenkov/RLibs documentation built on Jan. 26, 2020, 2:21 p.m.