# Why it’s hard to eliminate variables

Let’s examine why it’s hard to eliminate variables. I remember the code I looked at in SatElite that did it: it was crazy clean code and looked like it was pretty easy to perform. In this post I’ll examine how that simple code became more than a 1’000 lines of code today.

### What needs to be done, at first sight

At first sight, variable elimination is easy. We just:

1. Build occurrence lists
2. Pick a variable to eliminate
3. Resolve every clause having the positive literal of the variable with negative ones.
4. Add newly resolved clauses into the system
5. Remove original clauses.
6. Goto 2.

These are all pretty simple steps at first sight, and one can imagine that implementing them is maybe 50-100 lines of code, no more. So, let’s examine them one-by-one to see how they get complicated.

### Building occurrence lists

The idea is that we simply take every single clause, and for every literal they have, we insert a pointer to the clause into an array for that literal’s occurrences. This sounds easy, but what happens if we are given 1M clauses, each with 1000 literals on average? If you think this is crazy, it isn’t, and does in fact happen.

One option is we estimate the amount of memory we would use and abort early because we don’t want to run out of memory. So, first we check the potential size, then we link them in. Unfortunately, this means we can’t do variable elimination at all. Another possibility is that we link in clauses only partially. For example, we don’t link in clauses that are redundant but too long. Redundant clauses are ignored during resolution when eliminating, so this is OK, but then we will have to clean these clauses up later, when finishing up. However, if a redundant clause that hasn’t been linked in backward-subsumes an irredudant clause (and thus becomes irredundant itself), we have to link it in asap. Optimisation leads to complexity.

We don’t just want to link these clauses in to some random datastructure. I believe it was Armin Biere who put this idea into my head, or maybe someone else, but re-using watchlists for occurrence lists means we use our memory resources better: there won’t be so much fragmentation. Furthermore, an advanced SAT solver uses implicit binary & tertiary clauses, so those are linked in already into the watchlists. That saves memory.

### Picking a variable to eliminate

The order in which you eliminate clauses is a defining part of the speed we get with the final solver. It is crucially important that this is done well. So, what can we do? We can either use some heuristic or precisely calculate the gain for each variable, and eliminate the best guess/calculated one first. These are both greedy algorithms but I think given the complexity of the task, they are the best at hand.

Using precise calculation is easy, we just resolve all the relevant clauses but don’t add the resolvents. It’s very expensive though. A better approach is to use a heuristic. Logically, clauses that have few literals in them are likely not to resolve such that they become tautologies. It’s unlikely that two binary clauses’ resolvent becomes a tautology. It’s however likely that large clauses become tautological once resolved. I take this into account when calculating elimination cost for variable. Since redundant clauses are linked in the occurrence lists so that I can subsume them, I have to skip them.

It’s not enough to calculate the heuristic once, of course. We have to re-calculate after every elimination — the playing field has changed. Thus, for every clause you removed, you have to keep in mind which variables were affected, and re-calculate the cost for each after every variable elimination.

### Resolving clauses

The base is easy. We add literals to a new array of literls and mark the literals that have been added in a quick-lookup array. If the opposite of a literal is added, the markings tell us and we can skip the rest — the resolvent is tautological. Things get hairy if the clause is not tautological.

What if the new clause is subsumed by already-existing clauses? Should we check for this? This is called forward-subsumption, and it’s really expensive. Backward subsumption (which asks the question ‘Does this clause subsume others?’ instead of ‘Is this clause subsumed by others?’) would be cheaper, but that’s not the case here. We can thus try to subsume the clause only by e.g. binary&tertiary clauses and hope for the best.

What if the new clause can be subsumed by stamps? That’s easy to check for, but if the new clause was used to create the stamp, that would be a self-dependency loop and not adding the resolvent would lead to an incorrect result. We can use the stamps as long as the resolving clauses were not needed for the stamp: i.e. they are not binary clauses and on-the-fly hyper-binary resolution was used during every step of stamp generation. A similar logic goes for using the implication cache.

We could also virtually extend the clause with literals using watchlists/stamps/impl. cache and then try to subsume that virtual clause. I forgot what 3-letter acronym Biere et al. gave to this method (it’s one of the 12 on slide 25 here), but, except for the acronym, this idea is pretty simple. You take a binary clause, e.g. xV~y, and if x is in the newly created clause, but y and ~y is not, you add y to the clause. The clause is now bigger, so has a larger chance to be subsumed. You now perform forward subsumption as above, but with the extended clause. Also, take care not to subsume clauses with themselves, which, as you might imagine, can get hairy.

If all of this sounds a bit intricate, this is not even the difficult part. The difficult part is keeping track of time. Where of course by time I don’t actually mean seconds — I mean computation steps that you have to define one way or another and increment counters and set limits. Remember: all this has to be deterministic.

Doing all of the above with a small but complicated instance is super-fast, under 0.001s. With a weird instance where one single literal may occur in more than a million clauses, it can be very-very expensive even for one single try — over 100s. That’s about 5 orders of magnitude of difference. So, you have to be careful. The resolution we cannot skip, but we can abort it (and indicate it up in the call tree). Some of the others we can abort, but then the whole resolution has to be re-started. Some of the above is not critical at all, so you have to use a different time-limit for some, and mark them as too expensive, so at least the basic things get done. This gets complicated, because e.g. forward-subsumption you might want to re-use at other parts of the solver so you have to use a time-limit that isn’t global.

### Adding the newly resolved clauses

Adding clauses is simple: we create and link them in. However, we can do more. Since backward-subsumption is fast, we can do that with the newly created clauses. Note that this means the newly created clause could subsume some of the original clauses it was created from — which means the resolvents should be pre-generated and kept in memory.

Another thing: since we know the new clause needs to be added, we might as well shorten it before in any way we can. At this point, we can make use of all the watchlists, stamps and implication cache we have to shorten the new clause: there are no problems with self-dependencies. It will pay off. However, note that shortening the clause before adding it means that we will have to reverse-shorten it later, when this clause might be part of a group of clauses that is touched by a new variable elimination round. So, we are working against ourselves in a way — especially because reverse shortening is pretty expensive and hairy as explained above.

Although this is obvious, but we still have to take care of time-outs. For example, if resolution took so much time that we are already out of time, we must exit asap and not worry about the resolvents. Don’t link, don’t remove, just exit. Time is of essence.

### Removing the original clauses

Easy, just unlink them from the occurrence lists. I mean, easy if you don’t care about time, of course. Because unlinking is an O(N^2) operation if you have N clauses and all of them contain the same literal X — the N-long occurrence list of literal X has to be read and updated N times. So, we don’t do this.

First of all, a special case: the two occurrence lists of the variable we are removing can simply be .clear()-ed. It’s no longer needed. Secondly, we shouldn’t unlink clauses one-by-one. Instead, we should mark the clause as removed, and then not care about the clause later. Once variable elimination is finished, we do a sweep of all the occurrence lists and clauses and remove the clauses that have been marked. This means that e.g. forward and backward subsumption gets more hairy (we shouldn’t subsume with a clause that’s been marked as removed but is still in the occurrence list) but that O(N^2) becomes O(N) which for problems where N is large makes quite a bit of difference. Like, the difference of 100s vs. 10s for a the same exact thing.

### The untold horrors

On top of what’s above, you might like to generate some statistics about what worked and what didn’t. You might like to dump these statistics to a database. You might like to not create resolutions that are not needed as the irreduntant clauses form an AND/ITE gate. Or multiple gates. You might like to eliminate only a subset of variables at each call so that you don’t make your system too sparse and thus reduce arc consistency. You might want to vary this limit based on the problem at hand. You might want to do many other things that are not detailed above.

### Conclusions

Once I read through the above, I realized I kind of missed the essence: time-outs. It’s mentioned here and there, but it’s much more critical than it seems and makes things a hell of a lot harder. How do you cleanly exit from the middle of reverse-shortening while resolving because you ran out of time? I could just bury my head in sand of course and say: I don’t care. Or, I could make some messy algorithm that checks return values of each call and return a special value in case of time-outs. This needs to be done for every level of the call, which can be pretty deep, unless you like writing 1’500 line functions. I wanted to say writing&reading, but, really, nobody reads 1’500 line functions. They are throw-away,write-only code.

# On benchmark randomization

As many of you have heard, the SAT Competition for this year has been announced. You can send in your benchmarks between the 12th and the 22nd of April, so get started. I have a bunch of benchmarks I have already submitted about 2 years ago, still waiting for any reply from those organizers — but the organizers are different this year, so fingers crossed.

What I want to talk about today is benchmark randomization. This is a very-very touchy topic. However, I fear that it’s touchy for the wrong reasons, and so I think it’s important to talk about it in detail.

## What is benchmark randomization?

Benchmark randomization is when a benchmark that is submitted is shuffled around a bit. There are many ways to shuffle a problem, and I will discuss this in a bit, but the point is that the problem at hand that is described by the benchmark CNF should not be changed, or changed only in a very-very minor way, such that everyone agrees that it doesn’t affect the core problem itself as described by the CNF.

## Why do we need shuffling?

We need shuffling because simply put, there aren’t enough good benchmarks and so the benchmarks of yesteryear (and the year before, and before, and…) re-appear often. This would be OK if SAT solvers couldn’t be tuned to solving specific problems faster. Note that I am not suggesting that SAT solvers are intentionally manipulated to solve specific problems faster by unscrupulous researchers. Instead, the following happens.

## Unintentional random seed improvements

Researchers test the performance of their SAT solvers on specific instances and then tune their solvers, testing the performance again and again on the same instances to check if they have improved performance. Logically this is the best way to test and improve performance: use the same well-defined test-set all the time for meaningful comparison. Since the researcher wants to use the instances that he/she thinks is the current use-case of SAT solvers, he naturally uses the instances of SAT competitions, since those are representative. I did and still do the same.

So, researchers add their idea to a SAT solver, and test. If the idea is not improving things then some change is made and tested again. Since modern CDCL SAT solvers behave quite randomly, and since any change in the source code changes the behaviour quite significantly, a small change in the source code (tuning of a parameter, for example) will change the behaviour. And since the set of problems tested on is fixed, there is a chance that more problems will be solved. If more are solved, the researcher might correctly interpret this as a general improvement, not specific to the problem set. However, it may very well be generic, it is also specific.

The above suggests that the randomness of the SAT solver is completely unintentionally tuned to specific problems — a subset of which will appear next year in the competition.

## Easy fixes

Since there aren’t enough benchmark problems, and in particular some benchmark types are rare, I suggest to fix the unintentional tuning of solvers to specific problems by changing the benchmarks in minor ways. Here is a list, with an explanation why I think it’s OK to perform the manipulation:

1. Propagate variables. Unitary clauses are often part of benchmarks. Propagating some of these, some recursively, gives quite a bit of problem space variation. Propagation is performed by every CDCL SAT solver, and I think many would be  surprised if it didn’t help SAT solvers that worked differently than  current SAT solvers. Agreeing on performing partial propagation is something that shouldn’t be too difficult.
2. Renumber variables. For some variable X that is not used (or is fixed to a value that has been propagated), every variable that is higher than X is decremented by one, and the CNF header is fixed to reflect this change.  Such a minor renumbering may be approved by every researcher as something that doesn’t change the problem or its structure. Note that if  partial propagation is performed there should be quite a number of variables that can be removed. Renumbering some, but not others is a way to shuffle the problem. A more radical way of renumbering variables would be to completely shuffle them, however that would change the way the problem is described in quite a radical way, so some would correctly object and it’s not necessary anyway.
3. Replace equivalent literals. Perform strongly connected component analysis and replace equivalent literals. This has been shown to significantly improve performance and I have never seen a case where it doesn’t. Since equivalent literal replacement can be performed with a lot of freedom, this is quite a bit of shuffling space. For example, if v1=v2=v3, then any of the v1, v2, v3 can be the one that replaces the rest in the CNF. Picking one randomly is a way to shuffle the instance

There are other ways of shuffling, but either they change the instance too much (e.g. blocked clause removal), or can be undone quite easily (e.g. shuffling the order of the clauses). In fact, (3) is already quite a touchy issue I think, but with (1) and (2) all could agree on. Neither requires the order of the literals or the order of the clauses to change — some clauses (e.g. unitary ones) and literals (some of those that are set) would be removed, but that’s all. The problem remains essentially unchanged such that most probably even the original problem author would easily recognize it. However, it would be different from a SAT solver point of view: these changes would change the random seed of the solver, forcing the solver to behave in a way that is less tuned to this specific problem instance.

## Conclusion

SAT solvers are currently tuned too much to specific instances. This is not intentional by the researchers, however it still affects the results. To obtain better, less biased results we should shuffle the problem instances we have. Above, I suggested three ways to shuffle the instances in such a way that most would agree they don’t disturb or change the complexity of the underlying problem described by the instance. I hope that some of these suggestions will be employed, if not this year then for next year’s SAT competition such that we could reach better, more meaningful results.

# Note to self: higher level autarkies

While reading this thesis, I have had a thought about autarkies. Let me first define what an autarky in SAT solving is:

A partial assignment `phi` is called a weak autarky for `F` if “`phi*F` is a subset of `F`” holds, while `phi` is an autarky for `F` if `phi` is a weak autarky for all sub-clause-sets `F'` of `F`. (Handbook of Satisfiability, p. 355)

What does this mean? Well, it means that in the clause-set `{x V y, x}` the assignment `y=False` is a weak autarky, because assigning `y` to `False` will lead to a clause that is already in the clause set (the clause `x`), while the assignment of `x=True` is a (normal) autarky, because it will satisfy all clauses that `x` is in. The point of autarkies is that we can remove clauses from the clause set by assigning values to a set of variables, while not changing the (un)satisfiability property of the original problem. In other words, autarkies are a cheap way of reducing the problem size. The only problem is that it seems to be relatively expensive to search for them, so conflict-driven SAT solvers don’t normally search for them (but, some lookahead solvers such as march by Marijn Heule do).

So, what is the idea? Well, I think we can have autarkies for equivalent literals, too. Here is an example CNF:

``` a V d V -f
-b V d V -f
-a V e
b V e
```

setting `a = -b` will not change the satisfiability of the problem.

We can add any number of clause pairs of the form `X V Y` where `Y` is any set of literals not in `{a, -a, b, -b}`, and `X` is `(-)a` in clause 1 and `(-)b` in clause 2. Further, one of the two variables, say, `a` can be in any clauses at will (in any literal form), though variable `b` can then only be in clauses defined by the clause pairs. Example:

``` a V  d V -f
-b V  d V -f
-a V  e
b V  e
a V  c V -h
a V -f
```

An extension that is pretty simple from here, is that we can even have clauses whose `Y` part is somewhat different: some literals can be missing, though again in a controlled way. For example:

``` a V d
-b V d V -f
-a V e
b V e
```

is possible. It would restrict `b` a bit more than necessary, but if `a` is the only one of the pairs missing literals, we are still OK I believe.

Finally, let me give an example of what these higher-level autarkies are capable of. Let’s assume we have the clause set:

``` a V  d
-b V  d V -f
-a V  e
b V  e
a V  c V -h
a V -f
a V  b V  f V -g
```

setting `a=-b` now simplifies this to:

``` a V  d
-a V  e
a V  c V -h
a V -f
```

which is equisatisfiable to the first one. Further, if the latter is satisfiable, computing the satisfying solution to the former given a solution to the latter is trivial.

# On-the-fly self-subsuming resolution

I have recently been trying a new method of shortening learnt clauses. There is a learnt clause minimisation paper by Sörensson and Biere, and I have recently been trying to do more. The trick I use, is that many literals can be removed from learnt clauses, if self-subsuming resolution (see my older post) is applied to them.

During self-subsuming resolution, under normal circumstances, the clauses used to remove literals with are the short clauses: binary and tertiary clauses. The trick I have discovered, is that since CryptoMiniSat keeps binary and tertiary clauses natively inside the watchlists (see my previous post), so doing self-subsuming resolution can be done extremely fast. We simply need to put the literals of the newly learnt clause into a fast-addressable memory, then go through the watchlists of every literal in the clause, and remove the literals that match.

The code used to achieve this is the following:

```for (uint32_t i = 0; i < cl.size(); i++) seen[cl[i].toInt()] = 1;
for (Lit *l = cl.getData(), *end = cl.getDataEnd(); l != end; l++) {
if (seen[l->toInt()] == 0) continue;

Lit lit = *l;
//watched is messed: lit is in watched[~lit]
vec& ws = watches[(~lit).toInt()];
for (Watched* i = ws.getData(), *end = ws.getDataEnd(); i != end; i++) {
if (i->isBinary()) {
if (seen[(~i->getOtherLit()).toInt()]) {
seen[(~i->getOtherLit()).toInt()] = 0;
}
}
}
}

Lit *i = cl.getData();
Lit *j= i;
for (Lit* end = cl.getDataEnd(); i != end; i++) {
if (seen[i->toInt()]) *j++ = *i;
seen[i->toInt()] = 0;
}
cl.shrink(i-j);```

Essentially, we set the `seen` vector (which is initially all `0`), to `1` where the original clause contained a literal. Then, we go through the watchlists, and check if any binary clause could strengthen the clause. If so, we unset the corresponding bit in `seen`. Finally, we clean the clause from the literals where `seen` is `0` (and then set all parts of `seen` back to `0`).

The actual implementation only differs from the one above by also using tertiary clauses (which are also natively inside the watchlists) to do self-subsuming resolution. The results are very promising: CryptoMiniSat can now solve 2 more problems from the 2009 SAT Competition instances within the original time limit, and seems to scale better in the longer run, too. An example statistics output of CryptoMiniSat (from the UTI-20-10p1 problem):

```c OTF cl watch-shrink   : 961191      (0.64      clauses/conflict)
c OTF cl watch-sh-lit   : 6096451     (6.34       lits/clause)
```

which means that in 64% of the cases, on-the-fly self-subsuming resolution was useful, and on average, it removed 6.3 literals from each clause where it was useful.

# Self-subsuming resolution

Self-subsuming resolution in SAT uses the resolution operator to carry out its magic. Resolution can be used on two clauses if they share a variable, but the sign of the variable is inverted, e.g.
`v1 V v2`
`-v1 V v3 V v4`
(where `v1..v4` are binary variables and “`V`” means binary OR) can be resolved on `v1`, producing the clause:
`v2 V v3 V v4`

In SAT this is used to simplify problems as follows. Let’s assume we have many clauses, among which there are these three:
`a V c (1)`
`a V -c V d V f (2)`
`a V -c V g (3)`
In this case, if we use the resolution operator on (1) and (2), we get:
`a V d V f (4)`
The interesting thing about (4) is that it is a strict subset of (2). In other words, we could simply replace (2) with (4), thus shortening clause (2). If we use the resolution operator on (1) and (3) we get:
`a V g (5)`
whose literals form a strict subset of those of (3), so we can replace (3) with (5), again shortening a clause. We shortened 2 clauses, each with one literal. For completeness, this technique can be applied in a recursive manner on all possible clause-pairs.

Until now, CryptoMiniSat was doing self-subsuming resolution in such a way that the clauses being manipulated were kept inside the propagation queue. The problem was, that the propagation queue enforces a very strict position of literals in the clause. So, when e.g. removing literal “`a`” from clause (3), CryptoMiniSat had to completely remove the clause, and then to completely re-add it, since the position of “`a`” changed (it got removed). The additional overhead for this was simply too much: in certain cases, self-subsuming resolution took 130 seconds to complete, 115 of which was taken by this detach-reattach overhead.

The solution to this problem was to completely remove all clauses from the propagation queue, then do self-subsuming resolution, and finally re-add the clauses. Interestingly, complete removal of all clauses is very fast (essentially, a constant-time operation, even though removing clauses one-by-one is very costly), and completely re-adding them is also very fast (linear in the number of clauses). The first impressions from this technique are very positive, and I decided to release CryptoMiniSat 2.6.0 with this technique included.

NOTE: Thanks to N. Sörensson for pointing me out that self-subsuming resolution could be done better. I am not sure this is what he meant, but fingers are crossed.