# The past half year of SAT and SMT

It’s been a while I have blogged. Lots of things happened on the way, I met people, changed countries, jobs, etc. In the meanwhile, I have been trying to bite the bullet of what has become of CryptoMiniSat: last year, it didn’t perform too well in the SAT competition. In the past half year I have tried to fix the underlying issues. Let me try to explain what and how exactly.

### Validation and cleanup

First, in the past years I ‘innovated’ a lot in directions that were simply not validated. For example, I have systems to cleanly exit the solver after N seconds have passed, but the checks were done by calling a Linux kernel function, which is actually not so cheap. It turned out that calling it took 3-5% of the time, and it was essentially doing nothing. Similar code was all over the place. I was (and still am, in certain builds) collecting massive amounts of solving data. It turns out, collecting them means not having enough registers left to do the real thing so the ASM code was horrific and so was performance. The list could go on. In the end, I had to weed out the majority of the stuff that was added as an experiment without proper validation.

### Other solvers

Second, I started looking at other solvers. In particular, the swdia5by solver by Chanseok Oh. It’s a very-very weird solver and it performed exceedingly well. I’m sure it’s on the mind of many solver implementers, and for a good reason. As a personal note, I like the webpage of the author a lot. I think what s/he forgets is that MiniSat is not so well-used because it’s so well-performing. Glucose is better. It’s used because it’s relatively good and extremely well-written. However, the patches on the author’s website are anything but well-written.

### The cloud

Finally, I worked a lot on making the system work on AWS, the cloud computing framework by Amazon. Since I don’t have access to clusters like my competitors do, I need to resort to AWS. It’s not _that_ expensive, a full SATComp’14 run sets me back about \$5-6. I used spot instances and a somewhat simple, 1000-line python framework for farming out computation to client nodes.<

### Conclusions

All in all, I’m happy to say, CryptoMiniSat is no longer as bad as it was last year. There are still some problems around and a lot of fine-tuning needed, but things are looking brighter. I have been thinking lately of trying to release some of the tools I developed for CryptoMiniSat for general use. For example, I have a pretty elaborate fuzz framework that could fuzz any solver using the new SAT library header system. Also, the AWS system could be of use to people. I’ll see.

# Faster cleaning of the learnt clause database

In SAT solvers, removing unneeded learnt clauses from the clause database sounds like a trivial task: we somehow determine which clauses are not needed and we call remove() on them. However, in case performance is an issue, it can get a bit more more complicated.

### The problem

The issue at hand is that clauses are stored in two places: as a list of pointers to the clauses and in a list of lists called the watchlist. Removing clauses from either list can be an $O(n^2)$ operation if we e.g. remove every element from the list, one by one. In fact, an old version of the most popular SAT solver, MiniSat2, used to do exactly this:

Here, removeClause() is called on each clause individually, where removeClause() eventually calls remove() twice, where remove() is a linear operation:

It is clear that if the number of learnt clauses removed is a significant percent of all clauses (which it is after some runtime), this is an $O(n^2)$ operation.

### My original solution

My original solution to this problem was the following. First, I did a sweep on the watchlist and detached all learnt clauses. This is an $O(n)$ operation. Then, I ran the algorithm above, without the removeClause(). Finally, I attached the remaining learnt clauses: again an $O(n)$ operation. This solution is significantly faster than the MiniSat one as its worst-case runtime is only $O(n)$. The improvement is measurable — worst-case cleaning times dropped from seconds to tenths of seconds. However, it can be further improved.

### The improved solution

The improvement that came to my mind just yesterday was the following. I can keep a one bit marker in each learnt clause that indicates whether the clause needs to be detached or not. Then, I can run the algorithm as above but replace removeClause() with markclause() and run through the watchlists once to remove (and free) the marked clauses. This works really well and it only necessitates one sweep of the watchlists, without any useless detach+reattach cycles.

The newer GitHub version of MiniSat also marks the clauses instead of detaching them immediately and then removes them in one sweep, later. Interestingly, it keeps a list of ‘dirty’ occurrence lists and only goes through the ones that need removal. I find that a bit strange for this specific purpose: usually almost all watchlists are affected. In other cases, though, keeping dirty lists in mind can be a good idea, e.g. if only few clauses are removed for some optimization step.

# TreeLook and transitive reduction

The paper by Heule et al. about hyper-binary resolution using intree-based lookahead is pretty funky. The idea is actually quite simple (and as usual, not exactly trivial to come up with): we re-use past propagations by reversing the order in which literals are normally enqueued.

### A simple example

First, a queue is built that starts with a leaf literal and then follows it up through binary clauses until it can. Then it backtracks (adds to the queue a special, * element) and continues. The point of the queue is to have an example order that we can use to dequeue literals from in reverse propagation order. Obviously, there are many different orders in which we can build this queue and I wouldn’t be surprised if there are some nice heuristics one can use. Let’s just assume we have such a queue.

For example if y leads to x, then an example queue will have first element x and then y. So we first enqueue x, propagate, and then we enqueue y. If x already fails, there is no point in enqueuing y (and y is failed along with x). If both y and z lead to x but only z fails, then we don’t have to perform the propagations done by x twice: We enqueue x, propagate, create new decision level, enqueue y, propagate (nothing fails), backtrack 1 decision level, enqueue z, and now we fail. Notice that we didn’t have to propagate x twice even though we probed two literals (y and z) that both entailed x.

### Failing mid-way

The paper mentions failed literals that fail mid-way while dequeueing elements. We obviously cannot simply enqueue these literals, as they would be unset next time we backtrack. So these have to be kept in an array and set later, when we are at decision level 0. Further, once we are in a failed state, anything dequeued that is at the same or lower level also fails, so we need to keep an indicator of failure for these literals.

### Keeping reasons updated

Let’s suppose we enqueued x and propagated it. Next is y. We enqueue y… but we need to know what is the reason why x got set. The reason is of course the binary clause that we examined when we built the queue: (x, ~y). The reason is needed to be set because we will be jumping backwards through the implication graph to the deepest common ancestor to attach the new hyper-binary clause there. When jumping back, we might need to go back all the way to y, through x. In order to perform transitive reduction (as explained later), we need to know if the binary clause (x, ~y) was redundant or irredundant. This information needs to be stored in the queue and every time we dequeue a new literal y the reason of the previously enqueued literal needs to be set to the inverse of the currently enqueued literal i.e. ~y.

### Transitive reduction

Updating reasons becomes a real problem in case we wish to perform transitive reduction. Transitive reduction removes binary clauses that are useless from a binary implication graph reachability perspective. However, if it removes a binary clause that is later used by the queue to update a reason, we encounter a problem. We may update a literal with a reason that is no longer valid as the corresponding binary clause has been replaced by a chain of binary clauses. Later transitive reductions will take into account that this binary clause exists (it doesn’t) and will make further reductions that may be incorrect. In particular, further transitive reductions might remove an element of the chain itself — kind of like biting our own tail.

There seems to be a couple of options to fix the problem:

1. Not to perform transitive reduction at all. This may have been the intention of the designers, as the BCP_NHBR function does not perform transitive reduction.
2. Update the queue to reflect the changed set of binary clauses. Unfortunately this would be very expensive and thus basically not doable in reasonable about of time as far as I can tell.
3. Never remove binary clauses that are used for the queue. This means we need to mark such clauses and then check for markings when removing binary clauses. This is the implementation that I chose. We can immediately unmark a clause once the corresponding element has been dequeued, making it possible to remove it later. In CryptoMiniSat I simply unmark all binary clauses at the end — it’s faster.

### Conclusions

I remember some people always asking me why I haven’t yet implemented intree-based probing. It is much faster than normal probing. However, it’s not perfect. For example, it cannot be used to perform a fast depth-first walk of the tree and as such stamping is not really possible while doing it — always updating closing times for already dequeued elements seems to defeat the purpose of the whole idea (i.e. reversing the propagation order). Secondly, I haven’t yet found a way to efficiently perform Stalmarck while doing intree probing. Thirdly, it’s not exactly trivial to implement — as explained above.

[paypal-donation]

# SMT Competition’14 and STP

The 2014 SMT competition‘s QF_BV divison is over (benchmark tarball here), and the results are (copied from here):

 Solver Errors Solved Not Solved CPU time (solved instances) Boolector 0 2361 127 138077.59 STP-CryptoMiniSat4 0 2283 205 190660.82 [MathSAT] 0 2199 289 262349.39 [Z3] 0 2180 308 214087.66 CVC4 0 2166 322 87954.62 4Simp 0 2121 367 187966.86 SONOLAR 0 2026 462 174134.49 Yices2 0 1770 718 159991.55 abziz_min_features 9 2155 324 134385.22 abziz_all_features 9 2093 386 122540.04

First, let me congratulate Boolector by Armin Biere. STP came a not-so-close second. STP was meant to be submitted twice, once with MiniSat and once with CyrptoMiniSat4 — this is the only reason why the STP submission is called STP-CryptoMiniSat4. Unfortunately, the other submission could not be made because of some compilation problems.

There are a couple of things I would like to draw your attention to. First is the cumulated solving time of solved instances. Note how it decreases compared to MathSat and Z3. Note also the differences in the DIFF of the number of solved instances. It’s relatively small between 4Simp…MathSat(<40) and increases dramatically with STP and Boolector (with around 80 each). Another thing of interest is that STP is surrounded by commercial products: Boolector, Z3 and MathSAT are all products you have to pay for to use in a commercial setting. In contrast, the biggest issue with STP is that if the optional(!) CryptoMiniSat4 is linked in, it's LGPLv2 instead of MIT licensed. In other words, a lot of effort is in there, and it's all free. Sometimes free as in beer, sometimes free as in freedom.

Lately, a lot of work has been done on STP. If you look a the github repository, it has about 80 issues resolved, about 40 open, lots of pull requests, and a lot of commits. A group of people, namely (in nickname order): Dan Liew, Vijay Ganesh, Khoo Yit Phang, Ryan Govostes, Trevor Hansen, a lot of external contributors (e.g. Stephen McCamant) and myself have been working on it pretty intensively. It now has a an automated test facility and a robust build system besides all the code cleanup and other improvements.

It’s a great team and I’m happy to be a minor part of it. If you feel like you could contribute, don’t hesitate to fork the repo, make some changes, and ask for a pull request. The discussions on the pull requests can be pretty elaborate which makes for a nice learning experience for all involved. Come and join — next year hopefully we’ll win :)

# Speeding up MiniSat with a one-liner

All SAT solvers must have a tri-state class that can hold the values `true`, `false`, and `undefined`. Although it’s not hard to write code to represent such a class, it’s hard to write such it so that all typical operations are fast. In this blog post I will compare three different ways of doing it.

### The original MiniSat 2.0 code

The original MiniSat 2.0 used the following code for such a class:

All the operators are pretty straightforward and fast, except `operator^` which flips `true` to `false` and `false` to `true` in case `b` is `true`. As you might have noticed it’s using a branch instruction to achieve this result which is weird and slow. Branching is known to be a headache as a wrong branch prediction can slow down the CPU quite a bit. Since `b` is supposed to be random (though it’s not that random), this might slow down the flip operation. Let’s call this version of the class solution (1).

Current MiniSat has changed the code to the following:

It doesn’t use a branch operator to flip values: `operator^` uses a simple XOR. However, other operators, notably the comparison operator became much more involved, and thus expensive. Let’s call this solution to the problem solution (2).

### Numerical trickery on the old code

Instead of using really tricky comparison and binary AND/OR operators, let’s try to use the original class, but replace the branch instruction with a bit of numerical trickery:

This makes ‘extensive’ use of the ALU of the chip for the flip operator to avoid using the branch instruction. However, it keeps other operators cheap as per the first version. Let’s call this solution to the problem solution (3).

### Comparison

I have tried to solve two different problems from the SAT Competition of 2009 with all three solutions. First is a typical electronic engineering problem, 9dlx_vliw_at_b_iq2.cnf. The second is a typical cryptographic problem, mizh-md5-47-3.cnf. Here are the timing results for my i7-3612QM (Ivy Bridge):

 Problem Solution(1) Solution(2) Solution(3) 9dlx_vliw_at_b_iq2.cnf 135.9s 138.8s 132.5s mizh-md5-47-3.cnf 60.7s 60.5s 56.3s

So, in this very small test, solution(3) wins.

And now for a bit of history. I have tried the trick in solution(3) about 3 years ago. Back then it seemed slower to perform the numerical trick than to use the branch. It turns out that on modern CPUs either the ALU is faster, branch misprediction is more costly, or both (or, the compilers have evolved to compile my numerical trickery into some weird thing). Anyway, it’s kind of a cool speedup for a one-liner.