FLSSS: One-dimensional Subset Sum given error threshold

View source: R/rinterface.r

FLSSSR Documentation

One-dimensional Subset Sum given error threshold

Description

Given subset size len, sorted superset v, subset sum target and error ME, find at least solutionNeed index (integer) vector(s) x, such that target - ME <= sum(v[x]) <= target + ME. To mine subsets that sum in a given range, set target to the midpoint and ME to half of the range width.

Usage

FLSSS(
  len,
  v,
  target,
  ME,
  solutionNeed = 1L,
  LB = 1L : len,
  UB = (length(v) - len + 1L) : length(v),
  viaConjugate = FALSE,
  tlimit = 60,
  useBiSrchInFB = FALSE,
  NfractionDigits = Inf
  )

Arguments

len

An integer as the subset size: 0 <= len < length(v). If len == 0, FLSSS() mines subets without size restriction. len <- 0 would be most likely slower than looping len over 1 : (length(v) - 1). See Details.

v

A sorted numeric vector, the superset. v can be negative and nonunique.

target

A numeric value, the subset sum target.

ME

A positive numeric value, the error threshold.

solutionNeed

An integer, the least number of solutions wanted. If the function returns fewer solutions, either tlimit is up or less than solutionNeed solutions exist. The function may also return more than solutionNeed solutions.

LB

An integer vector of size len as the lower bounds of the solution space: for any solution x, LB[i] <= x[i]. Custom LB should be no less than 1L : len element-wisely. Every element in v should be within the range enclosed by LB and UB.

UB

An integer vector of size len as the upper bounds of the solution space: for any solution x, x[i] <= UB[i]. Custom UB should be no greater than (length(v) - len + 1L ) : length(v) element-wisely. Every element in v should be within the range enclosed by LB and UB.

viaConjugate

A boolean value. If TRUE, FLSSS() mines susbets of size length(v) - len that sum to sum(v) - target with the same ME. Let x be the integer vector indexing a qualified subset. FLSSS() returns (1L : length(v))[-x]. Simulations show that FLSSS() often finds the first qualified conjugate subset faster if len is much less than length(v) / 2.

tlimit

A numeric value. Enforce function to return in tlimit seconds.

useBiSrchInFB

A boolean value. If TRUE, the function performs binary search for index bounds in the auxiliary triangle matrix of continuous sequence sums. This argument is mainly for research. Simulations show binary search has no major advantage over linear search due to caching mechanisms. The advantage may be pronounced if length(v) is substantial ( > 10000) while len is small ( < 5).

NfractionDigits

An integer, the maximum number of fractional digits of all elements in v. Internally, v, target and ME are multiplied by 10 ^ NfractionDigits, and then converted as integer values before mining. The default Inf prevents such conversion. The goal is eliminate

Details

If len == 0, FLSSS() would (1) reset len to length(v), (2) pad len zeros at the beginning of v and sort v, (3) search for size-len subsets, and (4) for an index vector that represents a subset, erases elements pointing to zeros in v. See the package documentation for more details.

Value

A list of index vectors.

Examples

# =====================================================================================
# Example I: play random numbers.
# =====================================================================================
# rm(list = ls()); gc()
subsetSize = 200L
supersetSize = 1000L
superset = 10000 * sort(rnorm(supersetSize) ^ 3 + 2 * runif(supersetSize) ^ 2 +
           3 * rgamma(supersetSize, 5, 1) + 4)
subsetSum = runif(1, sum(superset[1L : subsetSize]), sum(superset[(supersetSize -
            subsetSize + 1L) : supersetSize]))
subsetSumError = 1e-3


# Mine 3 subsets
rst1 = FLSSS::FLSSS(len = subsetSize, v = superset, target = subsetSum,
                    ME = subsetSumError, solutionNeed = 3, tlimit = 4)


# Mine 3 subsets via solving the conjugate problem
rst2 = FLSSS::FLSSS(len = subsetSize, v = superset, target = subsetSum,
                    ME = subsetSumError, solutionNeed = 3, tlimit = 4,
                    viaConjugate = TRUE)


# Verify uniqueness
cat("rst1 number of solutions =",
    length(unique(lapply(rst1, function(x) sort(x)))), "\n")
cat("rst2 number of solutions =",
    length(unique(lapply(rst2, function(x) sort(x)))), "\n")


# Verify solutions
if(length(rst1) > 0)
  all(unlist(lapply(rst1, function(x)
    abs(sum(superset[x]) - subsetSum) <= subsetSumError)))
if(length(rst2) > 0)
  all(unlist(lapply(rst2, function(x)
    abs(sum(superset[x]) - subsetSum) <= subsetSumError)))


# Mine 3 subsets in bounded solution space.
# Make up the lower and upper bounds for the solution space:
tmp = sort(sample(1L : supersetSize, subsetSize))
tmp2 = sort(sample(1L : supersetSize, subsetSize))
lowerBounds = pmin(tmp, tmp2)
upperBounds = pmax(tmp, tmp2)
rm(tmp, tmp2)


# 'FLSSS()' does not work if there are elements not under the hood of
# lowerBounds + upperBounds. Exclude those elements:
remainIndex = unique(unlist(apply(cbind(lowerBounds, upperBounds), 1,
  function(x) x[1] : x[2])))
lowerBounds = match(lowerBounds, remainIndex)
upperBounds = match(upperBounds, remainIndex)
superset = superset[remainIndex]


# Plant a subset sum:
solution = integer(subsetSize)
solution[1] = sample(lowerBounds[1] : upperBounds[1], 1)
for(i in 2L : subsetSize)
{
  l = max(lowerBounds[i], solution[i - 1] + 1L)
  u = upperBounds[i]
  if(l == u) solution[i] = u
  else solution[i] = sample(l : u, 1)
}
subsetSum = sum(superset[solution])
subsetSumError = abs(subsetSum) * 0.01 # relative error within 1%
rm(solution)


rst3 = FLSSS::FLSSS(len = subsetSize, v = superset, target = subsetSum,
                    ME = subsetSumError, solutionNeed = 2, tlimit = 4,
                    LB = lowerBounds, UB = upperBounds, viaConjugate = TRUE)


print(length(rst3))


# Verify solutions
if(length(rst3) > 0)
  cat(all(unlist(lapply(rst3, function(x)
    abs(sum(superset[x]) - subsetSum) <= subsetSumError))), "\n")




# =====================================================================================
# Example II: mine a real-world dataset.
# =====================================================================================
# rm(list = ls()); gc()
superset = c(
  -1119924501, -793412295, -496234747,  -213654767,   16818148,   26267601,   26557292,
     27340260,   28343800,   32036573,    32847411,   34570996,   34574989,   43633028,
     44003100,   47724096,   51905122,    52691025,   53600924,   56874435,   58207678,
     60225777,   60639161,   60888288,    60890325,   61742932,   63780621,   63786876,
     65167464,   66224357,   67198760,    69366452,   71163068,   72338751,   72960793,
     73197629,   76148392,   77779087,    78308432,   81196763,   82741805,   85315243,
     86446883,   87820032,   89819002,    90604146,   93761290,   97920291,   98315039,
    310120088, -441403864, -548143111,  -645883459, -149110919,  305170449, -248934805,
  -1108320430, -527806318, -192539936, -1005074405, -101557770, -156782742, -285384687,
   -418917176,   80346546, -273215446,  -552291568,   86824498,  -95392618, -707778486)
superset = sort(superset)
subsetSum = 139254953
subsetSumError = 0.1


# Find a subset of size 10.
subsetSize = 10L
rst = FLSSS::FLSSS(len = subsetSize, v = superset, target = subsetSum,
                   ME = subsetSumError, solutionNeed = 1, tlimit = 4)
# Verify:
all(unlist(lapply(rst, function(x)
  abs(sum(superset[x]) - subsetSum) <= subsetSumError)))


# Find a subset without size specification.
rst = FLSSS::FLSSS(len = 0, v = superset, target = subsetSum,
                   ME = subsetSumError, solutionNeed = 1, tlimit = 4)
# Verify:
all(unlist(lapply(rst, function(x)
  abs(sum(superset[x]) - subsetSum) <= subsetSumError)))


# Find a subset via looping subset size over 2L : (length(v)).
for(len in 2L : length(superset))
{
  rst = FLSSS::FLSSS(len = subsetSize, v = superset, target = subsetSum,
                     ME = subsetSumError, solutionNeed = 1, tlimit = 4)
  if(length(rst) > 0) break
}
# Verify:
all(unlist(lapply(rst, function(x)
  abs(sum(superset[x]) - subsetSum) <= subsetSumError)))


# Find as many qualified susbets as possible in 2 seconds
rst = FLSSS::FLSSS(len = subsetSize, v = superset, target = subsetSum,
                   ME = subsetSumError, solutionNeed = 999999L, tlimit = 2)
cat("Number of solutions =", length(rst), "\n")


# Verify:
all(unlist(lapply(rst, function(x)
  abs(sum(superset[x]) - subsetSum) <= subsetSumError)))




# =====================================================================================
# Example III: solve a special knapsack problem.
# Given the knapsack's capacity, the number of catagories, the number of items in each
# catagory, select the least number of items to fulfill at least 95% of the knapsack's
# capacity.
# =====================================================================================
# rm(list = ls()); gc()
capacity = 361
catagories = LETTERS[1L : 10L] # A, B, ..., J, 10 catagories
catagoryMasses = round(runif(length(catagories)) * 20 + 1)
catagoryItems = sample(1L : 20L, length(catagories))


itemLabel = unlist(mapply(function(x, i) rep(i, x), catagoryItems, catagories))
itemMasses = unlist(mapply(function(x, i) rep(x, i), catagoryMasses, catagoryItems))
vorder = order(itemMasses)
itemLabel = itemLabel[vorder]


superset = itemMasses[vorder]
rate = 0.95
subsetSum = (capacity * rate + capacity) / 2
subsetSumError = capacity - subsetSum
for(subsetSize in 1L : length(itemMasses))
{
  rst = FLSSS::FLSSS(len = subsetSize, v = superset, target = subsetSum,
                     ME = subsetSumError, solutionNeed = 1, tlimit = 4)
  if(length(rst) > 0) break
}


# There may exist no qualified subsets. One can lower 'rate' until a solution
# shows up.
if(length(rst) == 0L)
{
  cat("No solutions. Please lower rate and rerun.\n")
} else
{
  cat("A solution:\n")
  print(table(itemLabel[rst[[1]]]))
}


# rm(list = ls()); gc()

FLSSS documentation built on May 17, 2022, 5:09 p.m.

Related to FLSSS in FLSSS...