Subbag Enumeration

A bag or multiset is just a map from some base set X to the natural numbers, so we can represent such a bag by a sequence of (non-negative) integers. Subbags are defined in the obvious way (and if s is subbag of t, t is a superbag of s). Given a subbag s of m, we can find the next subbag (in reverse lexicographic order) with this function:

bool nextbag(vector<int> &s, 
             const vector<int> &m,
             size_t start = 0)
{
  for (size_t i = start; i < s.size(); i++) {
    if (s[i] == m[i]) {
      s[i] = 0;
    } else {
      s[i]++;
      return true;
    }
  }
  return false;
}

This sets the subsequence [start..] of s to the (reverse) lexicographically next sequence. m gives maximum values for each field of s and we return false if we have cycled back round to the zero sequence. Note that if m[i] < 0 then effectively there is no maximum value for each element.

We might also want to skip from s to the next subbag that is not a superbag of s. This is accomplished by:

bool nextnonsuper(vector<int> &s, const vector<int> &m)
{
  for (size_t i = 0; i < s.size(); i++) {
    if (s[i] != 0) {
      s[i] = 0;
      return nextbag(s,m,i+1);
    }
  }
  return false;
}

Here we see why we wanted the start parameter in nextbag. For example, we go from:

[0,0,0,3,4,...] => [0,0,0,0,5,...]

Given f: bag -> A where A is an ordered set and where f is monotonic (if s is a subbag of t, then f(s) <= f(t)), if we want to find all (sub)bags where f(s) <= x, we can use nextnonsuper to skip some of the enumeration:

while (true) {
  if (f(s) >= t) s = nextnonsuper(s)
  else s = nextbag(s)
  ...
}

For an example, we can use a weighted sum over the bag elements (we will assume that the weights are positive).

int weights(const vector<int> &s, const vector<int> &w)
{
  int total = 0;
  for (size_t i = 0; i < s.size(); i++) {
    total += s[i]*w[i];
  }
  return total;
}

and we can add some boiler plate to make the whole thing into a nice standalone application:

// Print out our vector, include weights if required
// Optionally print in reverse order
void print(const vector<int> &s, 
           const vector<int> *w = NULL, 
           bool printreverse = false)
{
  int sz = s.size();
  for (int i = 0; i < sz; i++) {
    int index = printreverse?sz-i-1:i;
    cout << s[index];
    if (w != NULL) cout << ":" << (*w)[index];
    cout << (i < sz-1?" ":"\n");
  }
}

int main(int argc, char *argv[])
{
  int target;    // The sum we are aiming for
  vector<int> s; // Where we build the sequence
  vector<int> m; // The number of each items available
  vector<int> w; // The weight of each item
  bool printreverse = false; // Print sequences in reverse
  bool printweight  = false; // Print the weights as well
  bool printless    = false; // Print sequences less than target
  bool printgreater = false; // Print those greater than target

  // Crude but effective argument processing
  while (argv[1][0] == '-') {
    if (strcmp(argv[1],"-r") == 0) {
      printreverse = true;
      argv++; argc--;
    } else if (strcmp(argv[1],"-l") == 0) {
      printless = true;
      argv++; argc--;
    } else if (strcmp(argv[1],"-g") == 0) {
      printgreater = true;
      argv++; argc--;
    } else if (strcmp(argv[1],"-w") == 0) {
      printweight = true;
      argv++; argc--;
    } else {
      cerr << "Invalid flag: " << argv[1] << endl;
      exit(0);
    }
  }
  // Read target
  // target = 0 means no target
  if (sscanf(argv[1],"%d",&target) != 1) {
    cerr << "Invalid target: " << argv[1] << endl;
    exit(0);
  }
  // Read count/weight pairs
  for (int i = 2; i < argc; i++) {
    int a,b;
    if (sscanf(argv[i],"%d:%d",&a,&b) != 2) {
      cerr << "Invalid arg: " << argv[i] << endl;
      exit(0);
    }
    m.push_back(a);
    w.push_back(b);
  }
  // Initialize result sequence.
  s.resize(m.size());
  while(true) {
    int total = weights(s,w);
    // Print result if desired
    if (target == 0 || 
        total == target || 
        (total < target && printless) ||
        (total > target && printgreater)) {
      print(s, printweight?&w:NULL, printreverse);
    }
    if (target > 0 && total >= target) {
      // Have hit or exceeded target, so skip superbags
      if (!nextnonsuper(s,m)) break;
    } else {
      if (!nextbag(s,m)) break;
    }
  }
}

Compiling all this as bagenum.cpp, we can now do some enumerations. All the ways to make 5 from the bag {1,1,1,2,2,3}:

$ ./bagenum 5 3:1 2:2 1:3
3 1 0
1 2 0
2 0 1
0 1 1

We can specify a target of 0 to get all the subbags of the input, and the -w flag prints the weight out as well:

$ ./bagenum -w 0 3:1 2:2
0:1 0:2
1:1 0:2
2:1 0:2
3:1 0:2
0:1 1:2
1:1 1:2
2:1 1:2
3:1 1:2
0:1 2:2
1:1 2:2
2:1 2:2
3:1 2:2

We can enumerate bitstrings with a given number of bits set

$ ./bagenum 2 1:1 1:1 1:1 1:1
1 1 0 0
1 0 1 0
0 1 1 0
1 0 0 1
0 1 0 1
0 0 1 1

Or with at most a given number set:

$ ./bagenum -l 2 1:1 1:1 1:1 1:1
0 0 0 0
1 0 0 0
0 1 0 0
1 1 0 0
0 0 1 0
1 0 1 0
0 1 1 0
0 0 0 1
1 0 0 1
0 1 0 1
0 0 1 1

Or the ways of rolling 4 with 3 dice numbered 0-5 – much the same as rolling 7 with 3 normal dice (the -r parameter prints the results in reverse order, ie. normal lexicographic order):

$ ./bagenum -r 4 5:1 5:1 5:1
0 0 4
0 1 3
0 2 2
0 3 1
0 4 0
1 0 3
1 1 2
1 2 1
1 3 0
2 0 2
2 1 1
2 2 0
3 0 1
3 1 0
4 0 0

All this works quite nicely with unlimited multiplicities, provided we also set an upper bound, so we can enumerate the solutions to the classic Polya problem of the ways to make 50c with usual US coins (50 as it happens):

$ ./bagenum -w 50 -1:1 -1:5 -1:10 -1:25 -1:50
50:1 0:5 0:10 0:25 0:50
45:1 1:5 0:10 0:25 0:50
40:1 2:5 0:10 0:25 0:50
35:1 3:5 0:10 0:25 0:50
30:1 4:5 0:10 0:25 0:50
25:1 5:5 0:10 0:25 0:50
20:1 6:5 0:10 0:25 0:50
15:1 7:5 0:10 0:25 0:50
10:1 8:5 0:10 0:25 0:50
5:1 9:5 0:10 0:25 0:50
0:1 10:5 0:10 0:25 0:50
40:1 0:5 1:10 0:25 0:50
35:1 1:5 1:10 0:25 0:50
...

Finally, the -g flag prints the subbags found that are greater than the target – but only the ones all of whose subbags are less than the target (if you see what I mean):

./bagenum -g -w 8 1:2 1:3 1:4 1:5
1:2 1:3 1:4 0:5
0:2 1:3 0:4 1:5
0:2 0:3 1:4 1:5

(So the complement of subbags are the ones whose sum is less than or equal to a target but every superbag is greater, as in the classic knapsack problem).

Once again, this works fine with unlimited multiplicities:

$ ./bagenum -g -w 8 -1:2 -1:3 -1:4 -1:5
4:2 0:3 0:4 0:5
3:2 1:3 0:4 0:5
1:2 2:3 0:4 0:5
0:2 3:3 0:4 0:5
2:2 0:3 1:4 0:5
1:2 1:3 1:4 0:5
0:2 2:3 1:4 0:5
0:2 0:3 2:4 0:5
2:2 0:3 0:4 1:5
0:2 1:3 0:4 1:5
0:2 0:3 1:4 1:5
0:2 0:3 0:4 2:5

All in all, a nice little combinatorial Swiss Army knife and useful when trying to understand the stuff in:

http://www.amazon.co.uk/The-Computer-Programming-Volume-Fascicle/dp/0201853949/

for example (I like the TAOCP fascicles, they are much more convenient to read in the bar, in the bath or on the bus). This one has lots more about this sort of problem: special cases with more efficient solutions, as well as more general algorithms for more general combinatorial problems.

Advertisements


Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s