# psum: Parallel (Statistical) Functions In kit: Data Manipulation Functions Implemented in C

 parallel-funs R Documentation

## Parallel (Statistical) Functions

### Description

Vector-valued (statistical) functions operating in parallel over vectors passed as arguments, or a single list of vectors (such as a data frame). Similar to `pmin` and `pmax`, except that these functions do not recycle vectors.

### Usage

``````  psum(..., na.rm = FALSE)
pprod(..., na.rm = FALSE)
pmean(..., na.rm = FALSE)
pfirst(...)  # (na.rm = TRUE)
plast(...)   # (na.rm = TRUE)
pall(..., na.rm = FALSE)
pallNA(...)
pallv(..., value)
pany(..., na.rm = FALSE)
panyNA(...)
panyv(..., value)
pcount(..., value)
pcountNA(...)
``````

### Arguments

 `...` suitable (atomic) vectors of the same length, or a single list of vectors (such as a `data.frame`). See Details on the allowed data types for each function, and Examples. `na.rm` A logical indicating whether missing values should be removed. Default value is `FALSE`, except for `pfirst` and `plast`. `value` A non `NULL` value of length 1.

### Details

Functions `psum`, `pprod` work for integer, logical, double and complex types. `pmean` only supports integer, logical and double types. All 3 functions will error if used with factors.

`pfirst`/`plast` select the first/last non-missing value (or non-empty or `NULL` value for list-vectors). They accept all vector types with defined missing values + lists, but can only jointly handle integer and double types (not numeric and complex or character and factor). If factors are passed, they all need to have identical levels.

`pany` and `pall` are derived from base functions `all` and `any` and only allow logical inputs.

`pcount` counts the occurrence of `value`, and expects arguments of the same data type (except for `value = NA`). `pcountNA` is equivalent to `pcount` with `value = NA`, and they both allow `NA` counting in mixed-type data. `pcountNA` additionally supports list vectors and counts empty or `NULL` elements as `NA`.

Functions `panyv/pallv` are wrappers around `pcount`, and `panyNA/pallNA` are wrappers around `pcountNA`. They return a logical vector instead of the integer count.

None of these functions recycle vectors i.e. all input vectors need to have the same length. All functions support long vectors with up to `2^64-1` elements.

### Value

`psum/pprod/pmean` return the sum, product or mean of all arguments. The value returned will be of the highest argument type (integer < double < complex). `pprod` only returns double or complex. `pall[v/NA]` and `pany[v/NA]` return a logical vector. `pcount[NA]` returns an integer vector. `pfirst/plast` return a vector of the same type as the inputs.

### Author(s)

Morgan Jacob and Sebastian Krantz

Package 'collapse' provides column-wise and scalar-valued analogues to many of these functions.

### Examples

``````x = c(1, 3, NA, 5)
y = c(2, NA, 4, 1)
z = c(3, 4, 4, 1)

# Example 1: psum
psum(x, y, z, na.rm = FALSE)
psum(x, y, z, na.rm = TRUE)

# Example 2: pprod
pprod(x, y, z, na.rm = FALSE)
pprod(x, y, z, na.rm = TRUE)

# Example 3: pmean
pmean(x, y, z, na.rm = FALSE)
pmean(x, y, z, na.rm = TRUE)

# Example 4: pfirst and plast
pfirst(x, y, z)
plast(x, y, z)

# Adjust x, y, and z to use in pall and pany
x = c(TRUE, FALSE, NA, FALSE)
y = c(TRUE, NA, TRUE, TRUE)
z = c(TRUE, TRUE, FALSE, NA)

# Example 5: pall
pall(x, y, z, na.rm = FALSE)
pall(x, y, z, na.rm = TRUE)

# Example 6: pany
pany(x, y, z, na.rm = FALSE)
pany(x, y, z, na.rm = TRUE)

# Example 7: pcount
pcount(x, y, z, value = TRUE)
pcountNA(x, y, z)

# Example 8: list/data.frame as an input
pprod(iris[,1:2])
psum(iris[,1:2])
pmean(iris[,1:2])

# Benchmarks
# ----------
# n = 1e8L
# x = rnorm(n) # 763 Mb
# y = rnorm(n)
# z = rnorm(n)
#
# microbenchmark::microbenchmark(
#   kit=psum(x, y, z, na.rm = TRUE),
#   base=rowSums(do.call(cbind,list(x, y, z)), na.rm=TRUE),
#   times = 5L, unit = "s"
# )
# Unit: Second
# expr  min   lq mean median   uq  max neval
# kit  0.52 0.52 0.65   0.55 0.83 0.84     5
# base 2.16 2.27 2.34   2.35 2.43 2.49     5
#
# x = sample(c(TRUE, FALSE, NA), n, TRUE) # 382 Mb
# y = sample(c(TRUE, FALSE, NA), n, TRUE)
# z = sample(c(TRUE, FALSE, NA), n, TRUE)
#
# microbenchmark::microbenchmark(
#   kit=pany(x, y, z, na.rm = TRUE),
#   base=sapply(1:n, function(i) any(x[i],y[i],z[i],na.rm=TRUE)),
#   times = 5L
# )
# Unit: Second
# expr    min     lq   mean   median     uq    max neval
# kit    1.07   1.09   1.15     1.10   1.23   1.23     5
# base 111.31 112.02 112.78   112.97 113.55 114.03     5
``````

kit documentation built on Oct. 1, 2023, 5:07 p.m.