combinatoricsGeneral: Generate Combinations and Permutations of a Vector...

Description Usage Arguments Details Value Note Author(s) References Examples

Description

Usage

1
2
3
4
5
6
7
8
9
comboGeneral(v, m = NULL, repetition = FALSE, freqs = NULL, lower = NULL, 
              upper = NULL, constraintFun = NULL, comparisonFun = NULL,
               limitConstraints = NULL, keepResults = NULL, FUN = NULL,
                Parallel = FALSE, nThreads = NULL, tolerance = NULL)

permuteGeneral(v, m = NULL, repetition = FALSE, freqs = NULL, lower = NULL, 
                upper = NULL, constraintFun = NULL, comparisonFun = NULL,
                 limitConstraints = NULL, keepResults = NULL, FUN = NULL,
                  Parallel = FALSE, nThreads = NULL, tolerance = NULL)

Arguments

v

Source vector. If v is an integer (including nonpositive integers), it will be converted to the sequence 1:v. All atomic types are supported (See ?is.atomic).

m

Number of elements to choose. If repetition = TRUE or freqs is utilized, m can exceed the length of v. If m = NULL, the length will default to length(v) or sum(freqs).

repetition

Logical value indicating whether combinations/permutations should be with or without repetition. The default is FALSE.

freqs

A vector of frequencies used for producing all combinations/permutations of a multiset of v. Each element of freqs represents how many times each element of the source vector, v, is repeated. It is analogous to the times argument in rep. The default value is NULL.

lower

The lower bound. Combinations/permutations are generated lexicographically, thus utilizing this argument will determine which specific combination/permutation to start generating from (e.g. comboGeneral(5, 3, lower = 6) is equivalent to comboGeneral(5, 3)[6:choose(5, 3), ]). This argument along with upper is very useful for generating combinations/permutations in chunks allowing for easy parallelization.

upper

The upper bound. Similar to lower, however this parameter allows the user to stop generation at a specific combination/permutation (e.g. comboGeneral(5, 3, upper = 5) is equivalent to comboGeneral(5, 3)[1:5, ])

If the output is constrained and lower isn't supplied, upper serves as a cap for how many results will be returned that meet the criteria (e.g. setting upper = 100 alone will return the first 100 results that meet the criteria, while setting lower = 1 and upper = 100 will test the first 100 results against the criteria).

In addition to the benefits listed for lower, this parameter is useful when the total number of combinations/permutations without constraint is large and you expect/need a small number of combinations/permutations that meet a certain criteria. Using upper can improve run time if used judiciously as we call the member function reserve of std::vector. See examples below.

constraintFun

Function to be applied to the elements of v that should be passed as a string (e.g. constraintFun = "sum"). The possible constraint functions are: "sum", "prod", "mean", "max", & "min". The default is NULL, meaning no function is applied.

comparisonFun

Comparison operator that will be used to compare limitConstraints with the result of constraintFun applied to v. It should be passed as a string or a vector of two strings (e.g. comparisonFun = "<=" or comparisonFun = c(">","<")). The possible comparison operators are: "<", ">", "<=", ">=", "==". The default is NULL.

When comparisonFun is a vector of two comparison strings, e.g comparisonFun = c(comp1, comp2), and limitConstraints is a vector of two numerical values, e.g limitConstraints = c(x1, x2), the combinations/permutations will be filtered in one of the following two ways:

  1. When comp1 is one of the 'greater-than' operators (i.e. ">=" or ">"), comp2 is one of the 'less-than' operators (i.e. "<=" or "<"), and x1 < x2, the combinations/permutations that are returned will have a value (after constraintFun has been applied) between x1 and x2.

  2. When comp1 and comp2 are defined as in #1 and x1 > x2, the combinations/permutations that are returned will have a value outside the range of x1 and x2. See the examples below.

In other words, the first comparison operator is applied to the first limit and the second operator is applied to the second limit.

limitConstraints

This is the value(s) that will be used for comparison. Can be passed as a single value or a vector of two numerical values. The default is NULL. See the definition of comparisonFun as well as the examples below for more information.

keepResults

A logical flag indicating if the result of constraintFun applied to v should be displayed; if TRUE, an additional column of results will be added to the resulting matrix. The default is FALSE. If user is only applying constraintFun, keepResults will default to TRUE.

E.g. The following are equivalent and will produce a 4th column of row sums:

  • comboGeneral(5, 3 constraintFun = "sum", keepResults = TRUE)

  • comboGeneral(5, 3 constraintFun = "sum")

FUN

Function to be applied to each combination/permutation. The default is NULL.

Parallel

Logical value indicating whether combinations/permutations should be generated in parallel using n - 1 threads, where n is the maximum number of threads. The default is FALSE. If nThreads is not NULL, it will be given preference (e.g. if user has 8 threads with Parallel = TRUE and nThreads = 4, only 4 threads will be spawned). If your system is single-threaded, the arguments Parallel and nThreads are ignored.

nThreads

Specific number of threads to be used. The default is NULL. See Parallel.

tolerance

A numeric value greater than or equal to zero. This parameter is utilized when a constraint is applied on a numeric vector. The default value is 0 when it can be determined that whole values are being utilized, otherwise it is sqrt(.Machine$double.eps) which is approximately 1.5e-8. N.B. If the input vector is of type integer, this parameter will be ignored and strict equality will be enforced.

Details

Finding all combinations/permutations with constraints is optimized by organizing them in such a way that when constraintFun is applied, a partially monotonic sequence is produced. Combinations/permutations are added successively, until a particular combination exceeds the given constraint value for a given constraint/comparison function combo. After this point, we can safely skip several combinations knowing that they will exceed the given constraint value.

When there are any negative values in v and constraintFun = "prod", producing a monotonic set is non-trivial for the general case. As a result, performance will suffer as all combinations/permutations must be tested against the constraint criteria. Additionally, upper will not have its normal effectiveness (i.e. it will only limit the number of rows after producing all combinations/permutations).

Value

In general, a matrix is returned with each row containing a vector of length m or m + 1 depending on the value of keepResults. If FUN is utilized, a list is returned.

Note

Author(s)

Joseph Wood

References

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
system.time(comboGeneral(17, 8))
system.time(permuteGeneral(13, 6))

system.time(comboGeneral(13,10,repetition = TRUE))
system.time(permuteGeneral(factor(letters[1:9]),6,TRUE))

## permutations of the multiset (with or w/o setting m) :
## c(1,1,1,1,2,2,2,3,3,4)
system.time(permuteGeneral(4, freqs = c(4,3,2,1)))

#### Examples using "upper" and "lower":
## Generate some random data
set.seed(1009)
mySamp = sort(rnorm(75, 997, 23))

permuteCount(75, 10, freqs = rep(1:3, 25))
# >Big Integer ('bigz') :
# >[1] 4606842576291495952

## See specific range of permutations
permuteGeneral(75, 10, freqs = rep(1:3, 25), 
               lower = 1e12, upper = 1e12 + 10)

## Researcher only needs 1000 7-tuples of mySamp
## such that the sum is greater than 7200.
system.time(comboGeneral(mySamp, 7, FALSE, constraintFun = "sum",
             comparisonFun = ">", limitConstraints = 7200, upper = 1000))

## Similarly, you can use "lower" to obtain the last rows.
## Generate the last 10 rows
system.time(comboGeneral(mySamp, 7, lower = choose(75, 7) - 9))

## Or if you would like to generate a specific chunk,
## use both "lower" and "upper". E.g. Generate one
## million combinations starting with the 900,000,001
## lexicographic combination.
t1 = comboGeneral(mySamp, 7, lower = 9*10^8 + 1, 
                        upper = 9*10^8 + 10^6)

## class of the source vector is preserved
class(comboGeneral(5,3)[1,]) == class(1:5)
class(comboGeneral(c(1,2:5),3)[1,]) == class(c(1,2:5))
class(comboGeneral(factor(month.name),3)[1,]) == class(factor(month.name))

## Using keepResults will add a column of results
t2 = permuteGeneral(-3,6,TRUE,constraintFun = "prod", 
                         keepResults = TRUE)

t3 = comboGeneral(-3,6,TRUE,constraintFun = "sum", 
                        comparisonFun = "==",
                        limitConstraints = -8, 
                        keepResults = TRUE)
                        
## Using multiple constraints:

## Get combinations such that the product
## is between 3000 and 4000 inclusive
comboGeneral(5, 7, TRUE, constraintFun = "prod",
             comparisonFun = c(">=","<="),
             limitConstraints = c(3000, 4000),
             keepResults = TRUE)
             
## Or, get the combinations such that the
## product is less than or equal to 10 or
## greater than or equal to 40000
comboGeneral(5, 7, TRUE, constraintFun = "prod",
             comparisonFun = c("<=",">="),
             limitConstraints = c(10, 40000),
             keepResults = TRUE)
             
#### General subset sum problem
set.seed(516781810)
comboGeneral(runif(100, 0, 42), 5, constraintFun = "mean",
             comparisonFun = "==", limitConstraints = 30,
             tolerance = 0.0000002)

             
#### Integer Partitions
comboGeneral(0:5, 5, TRUE, constraintFun = "sum",
             comparisonFun = "==", limitConstraints = 5)
             
             
## Using FUN
comboGeneral(10000, 5, lower = 20, upper = 22, 
             FUN = function(x) {
                 which(cummax(x) %% 2 == 1)
             })

## Not run: 
## Parallel example generating more than 2^31 - 1 combinations.
library(parallel)
numCores = detectCores() - 1

system.time(mclapply(seq(1, comboCount(35, 15), 10086780), function(x) {
    a = comboGeneral(35, 15, lower = x, upper = x + 10086779)
    ## do something
    x
}, mc.cores = numCores))


## Find 13-tuple combinations of 1:25 such
## that the mean is less than 10
system.time(myComb <- comboGeneral(25, 13, FALSE,
                                   constraintFun = "mean",
                                   comparisonFun = "<",
                                   limitConstraints = 10))

## Alternatively, you must generate all combinations and subsequently
## subset to obtain the combinations that meet the criteria
system.time(myComb2 <- combn(25, 13))
system.time(myCols <- which(colMeans(myComb2) < 10))
system.time(myComb2 <- myComb2[, myCols])

## Any variation is much slower
system.time(myComb2 <- combn(25, 13)[,combn(25, 13, mean) < 10])

## Test equality with myComb above
all.equal(myComb, t(myComb2))


## much faster using Parallel = TRUE
system.time(permuteGeneral(15, 8, lower = 250000000, Parallel = TRUE))
system.time(permuteGeneral(15, 8, lower = 250000000))

system.time(comboGeneral(30, 10, Parallel = TRUE))
system.time(comboGeneral(30, 10))

## Parallel also works when applying constraintFun solely
system.time(comboGeneral(30, 10, Parallel = TRUE, constraintFun = "sum"))
system.time(comboGeneral(30, 10, constraintFun = "sum"))

## Using Parallel argument only uses 1 minus the maximal # of threads.
## The below should run a bit faster as we are utilizing all threads.
maxThreads = RcppAlgos::stdThreadMax()
system.time(comboGeneral(30, 10, nThreads = maxThreads, constraintFun = "sum"))

## Depending on # of cores available, using Parallel with
## constraintFun is faster than rowSums or rowMeans alone
combs = comboGeneral(30, 10)
system.time(rowSums(combs))


## Fun example... see stackoverflow:
## https://stackoverflow.com/q/22218640/4408538
system.time(permuteGeneral(seq(0L,100L,10L),8,TRUE,
                                constraintFun = "sum", 
                                comparisonFun = "==",
                                limitConstraints = 100))

## End(Not run)

RcppAlgos documentation built on May 31, 2021, 1:06 a.m.