Nothing
antiMode for the least common elementCminmax() function with OpenMP on Clang19.
Updated array reduction logic for RAW inputs to improve compatibility and reliability.is_seq (internal function in finp): fixed UB when length(table) == 0.STRING_PTR changed to STRING_PTR_RO as required by new CRAN policiesis_constant does not inherit data.table multithreadingand3s(rr == 0L) works for raw rrabs_diff contains a which.max option = 3.f() now f(void))logical3. When passes expression with non-numeric components, no longer
skips as if emptyabs_diff for non-allocating versions of abs(x - y).character2integer for a faster version of as.integer(gsub("[^0-9]", "", x))Comma, relatedly, prettyNum(x, big.mark = ",")coalesce0 as a convenience function, equivalent to coalesce(x, 0) for correct type of 0.diam and thinner for direct versions of diff(minmax(x)).every_int32 Returns a vector of every integerModeC most common element of integer vectors. unique_fmatch and uniqueN_fmatch for distinct elements.
Internal changes
and3s and friends) is now done using
a different logic, and performs internal logical operations on raw (char)
vectors.sum_and/or is now more consistent at not summing NA values.
New functions:
allNA equivalent to all(is.na(x))Implies returns correct results for NA results.
Internal
minmax accepts raw input, treating as unsigned characters
Internal:
LOGICAL C API has been absorbed.Functions are now in C to improve install time and size.
Implies for logical impliesdivisible2 test evenness of numbersfmatchp, finp experimental parallel hashing functionsis_sorted and isntSorted for assertions about sorted atomic vectorsminmax multithreaded function of c(min(x), max(x))which_first, introduced in version 0.5.0,
caused by an overeliance on compiler optimization. (#20)pminV no longer accept non-numeric inputdo_ functions have been removed entirelypmax0(x, in_place = TRUE) now returns early, rather than checking the vector twice.sum_isna now reflects sum(is.na(x)) when x contains NaN.sum_isna diverts ALTREP vectors to anyNA for performance and to avoid problems
when passed to C++.which_last for the first index from the last index.divisible and divisible16 for returning divisibilitycount_logical fast tabulation of logical vectorsand3s, or3s, parallelized and separated versions of &sum_and3s and sum_or3s, the sums of the above logical vectors. whichs for an alternative implementation of which which separates the inputwhich_firstNA and which_lastNA for first/last position of missing valueswhich_first accepts argument use.which.max for better performance on known short inputsis_constant now accepts nThread for multithreaded checking of constant vectors
and is much faster in general even in single-thread mode.sum_isna now accepts nThread for multithreaded accumulation of missing value countsare_even can be slightly faster on integers if ignoring NA, handles large
doubles (like 1e10), and accepts nThread.is_safe2int(x) now tolerates NaN input. Thanks to CRAN clang-UBSAN.which_first(x == y) now works properly when length(y) == length(x).xor2 a faster version of xor. set.seed(1)
library(hutils)
library(hutilscpp)
bench__mark <- function(...) {
dplyr::select(bench::mark(..., min_iterations = 12),
expression, median, `itr/sec`, mem_alloc, n_gc)
}
x <- y <- logical(1e9)
bench__mark(xor(x, y), xor2(x, y))
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 x 5
#> expression median `itr/sec` mem_alloc n_gc
#> <chr> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 xor(x, y) 7.956s 0.126 14.901GB 16
#> 2 xor2(x, y) 1.652s 0.530 3.725GB 3
x <- !y
bench__mark(xor(x, y), xor2(x, y))
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 x 5
#> expression median `itr/sec` mem_alloc n_gc
#> <chr> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 xor(x, y) 8.227s 0.121 14.901GB 13
#> 2 xor2(x, y) 1.983s 0.460 3.725GB 3
x <- samp(c(TRUE, FALSE), 1e9)
y <- samp(c(TRUE, FALSE), 1e9)
bench__mark(xor(x, y), xor2(x, y))
#> # A tibble: 2 x 5
#> expression median `itr/sec` mem_alloc n_gc
#> <chr> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 xor(x, y) 20.276s 0.0493 14.901GB 11
#> 2 xor2(x, y) 1.971s 0.506 3.725GB 3
x <- samp(c(TRUE, FALSE, NA), 1e9)
y <- samp(c(TRUE, FALSE), 1e9)
benc__mark(xor(x, y), xor2(x, y))
#> # A tibble: 2 x 5
#> expression median `itr/sec` mem_alloc n_gc
#> <chr> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 xor(x, y) 25.063s 0.0399 14.901GB 2
#> 2 xor2(x, y) 4.524s 0.221 3.725GB 3
Created on 2019-08-25 by the reprex package (v0.3.0)
NEWS.md file to track changes to the package.which_first(x == y) now supports logical x without returning arcane error messages.is_constant, for testing atomic vectors and isntConstant for the first
different valueis_sorted and isntSorted (currently private), similarly.and3, or3 for ternary and/or enabling vectorized short-circuitingsum_isna for counting NA values.pminC now handles integer inputs without coercing to double.pmaxC(x, a) accepts integer a when x is type double.pmax0 and pmin0 perform much better, especially when x is known and marked as sorted, but also
due to a better algorithm using absolute value.set.seed(1)
attach(asNamespace("hutilscpp"))
#> The following object is masked from package:base:
#>
#> isFALSE
bench__mark <- function(...) {
dplyr::select(bench::mark(..., min_iterations = 12),
expression, median, `itr/sec`, mem_alloc, n_gc)
}
x <- rep_len(rlnorm(1e6, 7, 2), 1e9)
bench__mark(do_pmaxC_dbl(x, 0), do_pmax0_abs_dbl(x))
#> # A tibble: 2 x 5
#> expression median `itr/sec` mem_alloc n_gc
#> <chr> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 do_pmaxC_dbl(x, 0) 2428.139ms 0.405 3618205.211KB 4
#> 2 do_pmax0_abs_dbl(x) 777.362ms 1.28 6.539KB 0
x <- x - 1
bench__mark(do_pmaxC_dbl(x, 0), do_pmax0_abs_dbl(x))
#> # A tibble: 2 x 5
#> expression median `itr/sec` mem_alloc n_gc
#> <chr> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 do_pmaxC_dbl(x, 0) 2.394s 0.410 3.451GB 4
#> 2 do_pmax0_abs_dbl(x) 2.590s 0.386 3.451GB 4
x <- sort(x)
bench__mark(do_pmaxC_dbl(x, 0), do_pmax0_radix_sorted_dbl(x))
#> # A tibble: 2 x 5
#> expression median `itr/sec` mem_alloc n_gc
#> <chr> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 do_pmaxC_dbl(x, 0) 3.593s 0.313 6.901GB 5
#> 2 do_pmax0_radix_sorted_dbl(x) 2.306s 0.437 3.451GB 4
x <- rep_len(as.integer(rlnorm(1e6, 7, 2)), 1e9)
bench__mark(do_pmaxC_int(x, 0L), do_pmax0_abs_int(x))
#> # A tibble: 2 x 5
#> expression median `itr/sec` mem_alloc n_gc
#> <chr> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 do_pmaxC_int(x, 0L) 2041.515ms 0.490 3906256.727KB 3
#> 2 do_pmax0_abs_int(x) 405.266ms 2.45 6.539KB 0
x <- x - 1L
bench__mark(do_pmaxC_int(x, 0L), do_pmax0_abs_int(x))
#> # A tibble: 2 x 5
#> expression median `itr/sec` mem_alloc n_gc
#> <chr> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 do_pmaxC_int(x, 0L) 1.449s 0.686 3.725GB 2
#> 2 do_pmax0_abs_int(x) 1.766s 0.577 3.725GB 1
x <- sort(x)
bench__mark(do_pmaxC_int(x, 0L), do_pmax0_radix_sorted_int(x))
#> # A tibble: 2 x 5
#> expression median `itr/sec` mem_alloc n_gc
#> <chr> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 do_pmaxC_int(x, 0L) 1.751s 0.568 7.451GB 2
#> 2 do_pmax0_radix_sorted_int(x) 1.404s 0.827 3.725GB 1
Created on 2019-08-10 by the reprex package (v0.3.0)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.