# combinatoricsGeneral: Generate Combinations and Permutations of a Vector... In RcppAlgos: High Performance Tools for Combinatorics and Computational Mathematics

## Description

• Generate combinations or permutations of a vector with or without constraints.

• Produce results in parallel using the `Parallel` or `nThreads` arguments. You can also apply each of the five compiled functions given by the argument `constraintFun` in parallel.

• Alternatively, the arguments `lower` and `upper` make it possible to generate combinations/permutations in chunks allowing for parallelization via the parallel package. This is convenient when you want to apply a custom function to the output in parallel.

• Attack integer partition and general subset sum problems.

• GMP support allows for exploration of combinations/permutations of vectors with many elements.

• The output is in lexicographical order.

## Usage

 ```1 2 3 4 5 6 7 8 9``` ```comboGeneral(v, m = NULL, repetition = FALSE, freqs = NULL, lower = NULL, upper = NULL, constraintFun = NULL, comparisonFun = NULL, limitConstraints = NULL, keepResults = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, tolerance = NULL) permuteGeneral(v, m = NULL, repetition = FALSE, freqs = NULL, lower = NULL, upper = NULL, constraintFun = NULL, comparisonFun = NULL, limitConstraints = NULL, keepResults = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, tolerance = NULL) ```

## Arguments

 `v` Source vector. If `v` is an integer (including nonpositive integers), it will be converted to the sequence `1:v`. All atomic types are supported (See `?is.atomic`). `m` Number of elements to choose. If `repetition = TRUE` or `freqs` is utilized, `m` can exceed the length of `v`. If `m = NULL`, the length will default to `length(v)` or `sum(freqs)`. `repetition` Logical value indicating whether combinations/permutations should be with or without repetition. The default is `FALSE`. `freqs` A vector of frequencies used for producing all combinations/permutations of a multiset of `v`. Each element of `freqs` represents how many times each element of the source vector, `v`, is repeated. It is analogous to the `times` argument in `rep`. The default value is `NULL`. `lower` The lower bound. Combinations/permutations are generated lexicographically, thus utilizing this argument will determine which specific combination/permutation to start generating from (e.g. `comboGeneral(5, 3, lower = 6)` is equivalent to `comboGeneral(5, 3)[6:choose(5, 3), ]`). This argument along with `upper` is very useful for generating combinations/permutations in chunks allowing for easy parallelization. `upper` The upper bound. Similar to `lower`, however this parameter allows the user to stop generation at a specific combination/permutation (e.g. `comboGeneral(5, 3, upper = 5)` is equivalent to `comboGeneral(5, 3)[1:5, ]`) If the output is constrained and `lower` isn't supplied, `upper` serves as a cap for how many results will be returned that meet the criteria (e.g. setting `upper = 100` alone will return the first 100 results that meet the criteria, while setting `lower = 1` and `upper = 100` will test the first 100 results against the criteria). In addition to the benefits listed for `lower`, this parameter is useful when the total number of combinations/permutations without constraint is large and you expect/need a small number of combinations/permutations that meet a certain criteria. Using `upper` can improve run time if used judiciously as we call the member function reserve of std::vector. See examples below. `constraintFun` Function to be applied to the elements of `v` that should be passed as a string (e.g. `constraintFun = "sum"`). The possible constraint functions are: `"sum"`, `"prod"`, `"mean"`, `"max"`, & `"min"`. The default is `NULL`, meaning no function is applied. `comparisonFun` Comparison operator that will be used to compare `limitConstraints` with the result of `constraintFun` applied to `v`. It should be passed as a string or a vector of two strings (e.g. `comparisonFun = "<="` or `comparisonFun = c(">","<")`). The possible comparison operators are: `"<"`, `">"`, `"<="`, `">="`, `"=="`. The default is `NULL`. When `comparisonFun` is a vector of two comparison strings, e.g `comparisonFun = c(comp1, comp2)`, and `limitConstraints` is a vector of two numerical values, e.g `limitConstraints = c(x1, x2)`, the combinations/permutations will be filtered in one of the following two ways: When `comp1` is one of the 'greater-than' operators (i.e. ">=" or ">"), `comp2` is one of the 'less-than' operators (i.e. "<=" or "<"), and `x1 < x2`, the combinations/permutations that are returned will have a value (after `constraintFun` has been applied) between `x1` and `x2`. When `comp1` and `comp2` are defined as in #1 and `x1 > x2`, the combinations/permutations that are returned will have a value outside the range of `x1` and `x2`. See the examples below. In other words, the first comparison operator is applied to the first limit and the second operator is applied to the second limit. `limitConstraints` This is the value(s) that will be used for comparison. Can be passed as a single value or a vector of two numerical values. The default is `NULL`. See the definition of `comparisonFun` as well as the examples below for more information. `keepResults` A logical flag indicating if the result of `constraintFun` applied to `v` should be displayed; if `TRUE`, an additional column of results will be added to the resulting matrix. The default is `FALSE`. If user is only applying `constraintFun`, `keepResults` will default to `TRUE`. E.g. The following are equivalent and will produce a 4th column of row sums: `comboGeneral(5, 3 constraintFun = "sum", keepResults = TRUE)` `comboGeneral(5, 3 constraintFun = "sum")` `FUN` Function to be applied to each combination/permutation. The default is `NULL`. `Parallel` Logical value indicating whether combinations/permutations should be generated in parallel using n - 1 threads, where n is the maximum number of threads. The default is `FALSE`. If `nThreads` is not `NULL`, it will be given preference (e.g. if user has 8 threads with `Parallel = TRUE` and `nThreads = 4`, only 4 threads will be spawned). If your system is single-threaded, the arguments `Parallel` and `nThreads` are ignored. `nThreads` Specific number of threads to be used. The default is `NULL`. See `Parallel`. `tolerance` A numeric value greater than or equal to zero. This parameter is utilized when a constraint is applied on a numeric vector. The default value is 0 when it can be determined that whole values are being utilized, otherwise it is `sqrt(.Machine\$double.eps)` which is approximately 1.5e-8. N.B. If the input vector is of type integer, this parameter will be ignored and strict equality will be enforced.

## Details

Finding all combinations/permutations with constraints is optimized by organizing them in such a way that when `constraintFun` is applied, a partially monotonic sequence is produced. Combinations/permutations are added successively, until a particular combination exceeds the given constraint value for a given constraint/comparison function combo. After this point, we can safely skip several combinations knowing that they will exceed the given constraint value.

When there are any negative values in `v` and `constraintFun = "prod"`, producing a monotonic set is non-trivial for the general case. As a result, performance will suffer as all combinations/permutations must be tested against the constraint criteria. Additionally, `upper` will not have its normal effectiveness (i.e. it will only limit the number of rows after producing all combinations/permutations).

## Value

In general, a matrix is returned with each row containing a vector of length m or m + 1 depending on the value of `keepResults`. If `FUN` is utilized, a list is returned.

## Note

• `Parallel` and `nThreads` will be ignored in the following cases:

• When the output is constrained.

• If the class of the vector passed is `character`, `raw`, and `complex` (N.B. `Rcpp::CharacterMatrix` is not thread safe). Alternatively, you can generate an indexing matrix in parallel.

• If `FUN` is utilized.

• If either `constraintFun`, `comparisonFun` or `limitConstraints` is `NULL` –or– if the class of the vector passed is `logical`, `character`, `raw`, and `complex`, the constraint check will not be carried out. This is equivalent to simply finding all combinations/permutations of v choose m.

• The maximum number of combinations/permutations that can be generated at one time is 2^31 - 1. Utilizing `lower` and `upper` makes it possible to generate additional combinations/permutations.

• Factor vectors are accepted. Class and level attributes are preserved.

• Lexicographical ordering isn't guaranteed for permutations if `lower` isn't supplied and the output is constrained.

• If `lower` is supplied and the output is constrained, the combinations/permutations that will be tested will be in the lexicographical range `lower` to `upper` or up to the total possible number of results if `upper` is not given. See the second paragraph for the definition of `upper`.

• `FUN` will be ignored if the constraint check is satisfied.

Joseph Wood

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150``` ```system.time(comboGeneral(17, 8)) system.time(permuteGeneral(13, 6)) system.time(comboGeneral(13,10,repetition = TRUE)) system.time(permuteGeneral(factor(letters[1:9]),6,TRUE)) ## permutations of the multiset (with or w/o setting m) : ## c(1,1,1,1,2,2,2,3,3,4) system.time(permuteGeneral(4, freqs = c(4,3,2,1))) #### Examples using "upper" and "lower": ## Generate some random data set.seed(1009) mySamp = sort(rnorm(75, 997, 23)) permuteCount(75, 10, freqs = rep(1:3, 25)) # >Big Integer ('bigz') : # >[1] 4606842576291495952 ## See specific range of permutations permuteGeneral(75, 10, freqs = rep(1:3, 25), lower = 1e12, upper = 1e12 + 10) ## Researcher only needs 1000 7-tuples of mySamp ## such that the sum is greater than 7200. system.time(comboGeneral(mySamp, 7, FALSE, constraintFun = "sum", comparisonFun = ">", limitConstraints = 7200, upper = 1000)) ## Similarly, you can use "lower" to obtain the last rows. ## Generate the last 10 rows system.time(comboGeneral(mySamp, 7, lower = choose(75, 7) - 9)) ## Or if you would like to generate a specific chunk, ## use both "lower" and "upper". E.g. Generate one ## million combinations starting with the 900,000,001 ## lexicographic combination. t1 = comboGeneral(mySamp, 7, lower = 9*10^8 + 1, upper = 9*10^8 + 10^6) ## class of the source vector is preserved class(comboGeneral(5,3)[1,]) == class(1:5) class(comboGeneral(c(1,2:5),3)[1,]) == class(c(1,2:5)) class(comboGeneral(factor(month.name),3)[1,]) == class(factor(month.name)) ## Using keepResults will add a column of results t2 = permuteGeneral(-3,6,TRUE,constraintFun = "prod", keepResults = TRUE) t3 = comboGeneral(-3,6,TRUE,constraintFun = "sum", comparisonFun = "==", limitConstraints = -8, keepResults = TRUE) ## Using multiple constraints: ## Get combinations such that the product ## is between 3000 and 4000 inclusive comboGeneral(5, 7, TRUE, constraintFun = "prod", comparisonFun = c(">=","<="), limitConstraints = c(3000, 4000), keepResults = TRUE) ## Or, get the combinations such that the ## product is less than or equal to 10 or ## greater than or equal to 40000 comboGeneral(5, 7, TRUE, constraintFun = "prod", comparisonFun = c("<=",">="), limitConstraints = c(10, 40000), keepResults = TRUE) #### General subset sum problem set.seed(516781810) comboGeneral(runif(100, 0, 42), 5, constraintFun = "mean", comparisonFun = "==", limitConstraints = 30, tolerance = 0.0000002) #### Integer Partitions comboGeneral(0:5, 5, TRUE, constraintFun = "sum", comparisonFun = "==", limitConstraints = 5) ## Using FUN comboGeneral(10000, 5, lower = 20, upper = 22, FUN = function(x) { which(cummax(x) %% 2 == 1) }) ## Not run: ## Parallel example generating more than 2^31 - 1 combinations. library(parallel) numCores = detectCores() - 1 system.time(mclapply(seq(1, comboCount(35, 15), 10086780), function(x) { a = comboGeneral(35, 15, lower = x, upper = x + 10086779) ## do something x }, mc.cores = numCores)) ## Find 13-tuple combinations of 1:25 such ## that the mean is less than 10 system.time(myComb <- comboGeneral(25, 13, FALSE, constraintFun = "mean", comparisonFun = "<", limitConstraints = 10)) ## Alternatively, you must generate all combinations and subsequently ## subset to obtain the combinations that meet the criteria system.time(myComb2 <- combn(25, 13)) system.time(myCols <- which(colMeans(myComb2) < 10)) system.time(myComb2 <- myComb2[, myCols]) ## Any variation is much slower system.time(myComb2 <- combn(25, 13)[,combn(25, 13, mean) < 10]) ## Test equality with myComb above all.equal(myComb, t(myComb2)) ## much faster using Parallel = TRUE system.time(permuteGeneral(15, 8, lower = 250000000, Parallel = TRUE)) system.time(permuteGeneral(15, 8, lower = 250000000)) system.time(comboGeneral(30, 10, Parallel = TRUE)) system.time(comboGeneral(30, 10)) ## Parallel also works when applying constraintFun solely system.time(comboGeneral(30, 10, Parallel = TRUE, constraintFun = "sum")) system.time(comboGeneral(30, 10, constraintFun = "sum")) ## Using Parallel argument only uses 1 minus the maximal # of threads. ## The below should run a bit faster as we are utilizing all threads. maxThreads = RcppAlgos::stdThreadMax() system.time(comboGeneral(30, 10, nThreads = maxThreads, constraintFun = "sum")) ## Depending on # of cores available, using Parallel with ## constraintFun is faster than rowSums or rowMeans alone combs = comboGeneral(30, 10) system.time(rowSums(combs)) ## Fun example... see stackoverflow: ## https://stackoverflow.com/q/22218640/4408538 system.time(permuteGeneral(seq(0L,100L,10L),8,TRUE, constraintFun = "sum", comparisonFun = "==", limitConstraints = 100)) ## End(Not run) ```

RcppAlgos documentation built on May 31, 2021, 1:06 a.m.