RLibs

RLibs is a library of various production tools I use in data processing. It is under construction, a lot of functions are/will be deprecated, some of them will be moved to other packages (like anything useful and related to plotting with ggplot2 will go to sciplotr).

Features

Equality & Comparison

Default R == operator performs strict comparison, which does not work very well for floating-point problems. What R considers unequal, technically can be equal within machine's precision. The standard example is

0.1 + 0.2 == 0.3

There is RLibs::are_equal_f, which performs more or less correct floating-point comparison (with some given tolerance).

library(RLibs, quietly = TRUE, warn.conflicts = FALSE)
are_equal_f(0.1 + 0.2, 0.3)

Atop of this function there are several more built for comfortable use:

(0.1 + 0.2) %==% 0.3
(0.1 + 0.2) %!=% 0.3

Operators invoke floating-point method only if type of one operand is floating-point. Type/size stability is enforced by the vctrs package.

Cluster planning

library(purrr, quietly = TRUE, warn.conflicts = FALSE)
suppressMessages(plan_cluster(1))

A set of tools to create clusters to work with future and furrr packages.

# Checks cluster status
get_topology()

# Create 2 workers, each spawning 2 workers (so 4 + 2 in total, max 4 working simultaneously)
plan_cluster(2, 2)
unlist(furrr::future_map(1:2, ~list(Sys.getpid(), furrr::future_map(1:2, ~Sys.getpid()))))

# Switch back to sequential execution
plan_cluster(1)
unlist(furrr::future_map(1:2, ~list(Sys.getpid(), furrr::future_map(1:2, ~Sys.getpid()))))

Tricky joins

dplyr can do various joins, like inner_join, left_join. Here is a way to do conditional joins (not really optimized):

library(dplyr, quietly = TRUE, warn.conflicts = FALSE)
tbl <- data.frame(Type = c("10-20", "20-30"), L = c(10, 20), U = c(20, 30))
# Subsetting mtcars to reduce output
left_join_cnd(mtcars[c(1:7, 18:20, 28:32),], tbl, .x$mpg >= .y$L, .x$mpg < .y$U) %>% select(Type, L, mpg, U, everything())

Here ... accepts comma-separated conditions, similar to dplyr::filter, where .x refers to lhs table and .y refers to rhs table.

There are also *_join_safe, which perform exactly the same as dplyr::*_join joins, but beforehand key columns are converted to common types and a meaningful error message is displayed if conversion fails. No more casting factors to strings if the levels are different. vctrs to the rescue!

Utility methods

IO



Ilia-Kosenkov/RLibs documentation built on Jan. 26, 2020, 2:21 p.m.