mFLSSSparImposeBounds: Multithreaded multidimensional Subset Sum in bounded solution...

View source: R/rinterface.r

mFLSSSparImposeBoundsR Documentation

Multithreaded multidimensional Subset Sum in bounded solution space given error thresholds

Description

For comparison, function mFLSSSpar() puts no bounds on the solution space so it sorts mV internally in a special order for mining accerlation.

Usage

mFLSSSparImposeBounds(
  maxCore = 7L,
  len,
  mV,
  mTarget,
  mME,
  LB = 1L : len,
  UB = (nrow(mV) - len + 1L) : nrow(mV),
  solutionNeed = 1L,
  tlimit = 60,
  dl = ncol(mV),
  du = ncol(mV),
  targetsOrder = NULL,
  useBiSrchInFB = FALSE,
  avgThreadLoad = 8L
  )

Arguments

maxCore

See maxCore in mFLSSSpar().

len

See len in mFLSSSpar().

mV

See mV in mFLSSSpar().

mTarget

See mTarget in mFLSSSpar().

mME

See mME in mFLSSSpar().

LB

See LB in FLSSS().

UB

See UB in FLSSS().

solutionNeed

See solutionNeed in mFLSSSpar().

tlimit

See tlimit in mFLSSSpar().

dl

See dl in mFLSSSpar().

du

See du in mFLSSSpar().

targetsOrder

This argument is mainly for research and unrecommended for use. Depending on the structure of mV, mFLSSSpar() or
mFLSSSparImposeBounds() would break the mining task into a collection of no more than len * (nrow(mV) - len) + 1 independent subtasks. Threads work on these subtasks, sequentially coordinated by an atomic counter [1]. Different subtasks have different probabilities of yielding a qualified subset, thus the order of subtasks matters to the mining speed. targetsOrder is an index vector of size len * (nrow(mV) - len) + 1 for ordering the subtasks. targetsOrder <- NULL makes a special order, and is implicitly the choice in mFLSSSpar(). This order is empirically optimal based on simulations. See the package documentation for details.

useBiSrchInFB

See useBiSrchInFB in mFLSSSpar().

avgThreadLoad

See avgThreadLoad in mFLSSSpar().

Value

A list of index vectors.

References

[1] Atomic template class in Intel TBB. An atomic counter is used to coordinate heterogeneous subtasks to avoid idle threads. The atomic operation overhead is negligible compared to the time cost of the lightest subtask.

Examples

# rm(list = ls()); gc()
subsetSize = 7L
supersetSize = 60L
dimension = 5L # dimensionality


# Create a supertset at random:
N = supersetSize * dimension
superset = matrix(1000 * (rnorm(N) ^ 3 + 2 * runif(N) ^ 2 +
                  3 * rgamma(N, 5, 1) + 4), ncol = dimension)
rm(N)


# Make up the lower and upper bounds for the solution space:
tmp = sort(sample(1L : supersetSize, subsetSize))
tmp2 = sort(sample(1L : supersetSize, subsetSize))
lowerBounds = pmin(tmp, tmp2)
upperBounds = pmax(tmp, tmp2)
rm(tmp, tmp2)


# Exclude elements not covered by 'lowerBounds' and 'upperBounds':
remainIndex = unique(unlist(apply(cbind(lowerBounds, upperBounds), 1,
                                  function(x) x[1] : x[2])))
lowerBounds = match(lowerBounds, remainIndex)
upperBounds = match(upperBounds, remainIndex)
superset = superset[remainIndex, ]


# Plant a subset sum:
solution = apply(rbind(lowerBounds, upperBounds), 2, function(x)
  sample(x[1] : x[2], 1))
subsetSum = colSums(superset[solution, ])
subsetSumError = abs(subsetSum) * 0.01 # relative error within 1%
rm(solution)


rst = FLSSS::mFLSSSparImposeBounds(
  maxCore = 2L, len = subsetSize, mV = superset, mTarget = subsetSum,
  mME = subsetSumError, LB = lowerBounds, UB = upperBounds,
  solutionNeed = 1, tlimit = 2, dl = ncol(superset), du = ncol(superset),
  targetsOrder = NULL, useBiSrchInFB = FALSE, avgThreadLoad = 8L)


# Verify:
cat("Number of solutions = ", length(rst), "\n")
if(length(rst) > 0)
{
  cat("Solutions unique: ")
  cat(length(unique(lapply(rst, function(x) sort(x)))) == length(rst), "\n")
  cat("Solution in bounded space: ")
  cat(all(unlist(lapply(rst, function(x)
    sort(x) <= upperBounds & sort(x) >= lowerBounds))), "\n")
  cat("Solutions correct: ")
  cat(all(unlist(lapply(rst, function(x)
    abs(colSums(superset[x, ]) - subsetSum) <= subsetSumError))), "\n")
} else
{
  cat("No solutions exist or timer ended too soon.\n")
}


FLSSS documentation built on May 17, 2022, 5:09 p.m.