Category Archives: Development

TreeLook and transitive reduction

The paper by Heule et al. about hyper-binary resolution using intree-based lookahead is pretty funky. The idea is actually quite simple (and as usual, not exactly trivial to come up with): we re-use past propagations by reversing the order in which literals are normally enqueued.

A simple example

First, a queue is built that starts with a leaf literal and then follows it up through binary clauses until it can. Then it backtracks (adds to the queue a special, * element) and continues. The point of the queue is to have an example order that we can use to dequeue literals from in reverse propagation order. Obviously, there are many different orders in which we can build this queue and I wouldn’t be surprised if there are some nice heuristics one can use. Let’s just assume we have such a queue.

For example if y leads to x, then an example queue will have first element x and then y. So we first enqueue x, propagate, and then we enqueue y. If x already fails, there is no point in enqueuing y (and y is failed along with x). If both y and z lead to x but only z fails, then we don’t have to perform the propagations done by x twice: We enqueue x, propagate, create new decision level, enqueue y, propagate (nothing fails), backtrack 1 decision level, enqueue z, and now we fail. Notice that we didn’t have to propagate x twice even though we probed two literals (y and z) that both entailed x.

Failing mid-way

The paper mentions failed literals that fail mid-way while dequeueing elements. We obviously cannot simply enqueue these literals, as they would be unset next time we backtrack. So these have to be kept in an array and set later, when we are at decision level 0. Further, once we are in a failed state, anything dequeued that is at the same or lower level also fails, so we need to keep an indicator of failure for these literals.

Keeping reasons updated

Let’s suppose we enqueued x and propagated it. Next is y. We enqueue y… but we need to know what is the reason why x got set. The reason is of course the binary clause that we examined when we built the queue: (x, ~y). The reason is needed to be set because we will be jumping backwards through the implication graph to the deepest common ancestor to attach the new hyper-binary clause there. When jumping back, we might need to go back all the way to y, through x. In order to perform transitive reduction (as explained later), we need to know if the binary clause (x, ~y) was redundant or irredundant. This information needs to be stored in the queue and every time we dequeue a new literal y the reason of the previously enqueued literal needs to be set to the inverse of the currently enqueued literal i.e. ~y.

Transitive reduction

Updating reasons becomes a real problem in case we wish to perform transitive reduction. Transitive reduction removes binary clauses that are useless from a binary implication graph reachability perspective. However, if it removes a binary clause that is later used by the queue to update a reason, we encounter a problem. We may update a literal with a reason that is no longer valid as the corresponding binary clause has been replaced by a chain of binary clauses. Later transitive reductions will take into account that this binary clause exists (it doesn’t) and will make further reductions that may be incorrect. In particular, further transitive reductions might remove an element of the chain itself — kind of like biting our own tail.

There seems to be a couple of options to fix the problem:

  1. Not to perform transitive reduction at all. This may have been the intention of the designers, as the BCP_NHBR function does not perform transitive reduction.
  2. Update the queue to reflect the changed set of binary clauses. Unfortunately this would be very expensive and thus basically not doable in reasonable about of time as far as I can tell.
  3. Never remove binary clauses that are used for the queue. This means we need to mark such clauses and then check for markings when removing binary clauses. This is the implementation that I chose. We can immediately unmark a clause once the corresponding element has been dequeued, making it possible to remove it later. In CryptoMiniSat I simply unmark all binary clauses at the end — it’s faster.

Conclusions

I remember some people always asking me why I haven’t yet implemented intree-based probing. It is much faster than normal probing. However, it’s not perfect. For example, it cannot be used to perform a fast depth-first walk of the tree and as such stamping is not really possible while doing it — always updating closing times for already dequeued elements seems to defeat the purpose of the whole idea (i.e. reversing the propagation order). Secondly, I haven’t yet found a way to efficiently perform Stalmarck while doing intree probing. Thirdly, it’s not exactly trivial to implement — as explained above.

[paypal-donation]

SMT Competition’14 and STP

The 2014 SMT competition‘s QF_BV divison is over (benchmark tarball here), and the results are (copied from here):

Solver Errors Solved Not Solved CPU time (solved instances)
Boolector 0 2361 127 138077.59
STP-CryptoMiniSat4 0 2283 205 190660.82
[MathSAT] 0 2199 289 262349.39
[Z3] 0 2180 308 214087.66
CVC4 0 2166 322 87954.62
4Simp 0 2121 367 187966.86
SONOLAR 0 2026 462 174134.49
Yices2 0 1770 718 159991.55
abziz_min_features 9 2155 324 134385.22
abziz_all_features 9 2093 386 122540.04

First, let me congratulate Boolector by Armin Biere. STP came a not-so-close second. STP was meant to be submitted twice, once with MiniSat and once with CyrptoMiniSat4 — this is the only reason why the STP submission is called STP-CryptoMiniSat4. Unfortunately, the other submission could not be made because of some compilation problems.

About the results

There are a couple of things I would like to draw your attention to. First is the cumulated solving time of solved instances. Note how it decreases compared to MathSat and Z3. Note also the differences in the DIFF of the number of solved instances. It’s relatively small between 4Simp…MathSat(<40) and increases dramatically with STP and Boolector (with around 80 each). Another thing of interest is that STP is surrounded by commercial products: Boolector, Z3 and MathSAT are all products you have to pay for to use in a commercial setting. In contrast, the biggest issue with STP is that if the optional(!) CryptoMiniSat4 is linked in, it's LGPLv2 instead of MIT licensed. In other words, a lot of effort is in there, and it's all free. Sometimes free as in beer, sometimes free as in freedom.

About STP

Lately, a lot of work has been done on STP. If you look a the github repository, it has about 80 issues resolved, about 40 open, lots of pull requests, and a lot of commits. A group of people, namely (in nickname order): Dan Liew, Vijay Ganesh, Khoo Yit Phang, Ryan Govostes, Trevor Hansen, a lot of external contributors (e.g. Stephen McCamant) and myself have been working on it pretty intensively. It now has a an automated test facility and a robust build system besides all the code cleanup and other improvements.

It’s a great team and I’m happy to be a minor part of it. If you feel like you could contribute, don’t hesitate to fork the repo, make some changes, and ask for a pull request. The discussions on the pull requests can be pretty elaborate which makes for a nice learning experience for all involved. Come and join — next year hopefully we’ll win :)

Speeding up MiniSat with a one-liner

All SAT solvers must have a tri-state class that can hold the values `true`, `false`, and `undefined`. Although it’s not hard to write code to represent such a class, it’s hard to write such it so that all typical operations are fast. In this blog post I will compare three different ways of doing it.

The original MiniSat 2.0 code

The original MiniSat 2.0 used the following code for such a class:

changed the code to the following:

lbool operator^(const bool b) const
{
  //return b ? lbool(-value) : lbool(value);
  return lbool(value * (-2*(char)b + 1));
}

This makes ‘extensive’ use of the ALU of the chip for the flip operator to avoid using the branch instruction. However, it keeps other operators cheap as per the first version. Let’s call this solution to the problem solution (3).

Comparison

I have tried to solve two different problems from the SAT Competition of 2009 with all three solutions. First is a typical electronic engineering problem, 9dlx_vliw_at_b_iq2.cnf. The second is a typical cryptographic problem, mizh-md5-47-3.cnf. Here are the timing results for my i7-3612QM (Ivy Bridge):

Problem Solution(1) Solution(2) Solution(3)
9dlx_vliw_at_b_iq2.cnf 135.9s 138.8s 132.5s
mizh-md5-47-3.cnf 60.7s 60.5s 56.3s

So, in this very small test, solution(3) wins.

And now for a bit of history. I have tried the trick in solution(3) about 3 years ago. Back then it seemed slower to perform the numerical trick than to use the branch. It turns out that on modern CPUs either the ALU is faster, branch misprediction is more costly, or both (or, the compilers have evolved to compile my numerical trickery into some weird thing). Anyway, it’s kind of a cool speedup for a one-liner.

On variable renumbering

Variable renumbering in SAT solvers keeps a mapping between the external variable numbers that is visible to the users and the internal variable numbers that is visible the to the system. The trivial mapping that most SAT solvers use is the one-to-one mapping where there is no difference between outer and internal variables. A smart mapping doesn’t keep track of all data related to variables that have been set or eliminated internally, so the internal datastructures can be smaller.

Advantages

Having smaller internal data structures help in achieving a lower memory footprint and better cache usage.

The memory savings are useful because some CNFs have tens of millions of variables. If the solver uses the typical watched literal scheme, it needs 2 arrays for each variable. If we are using 64b pointers and 32b array sizes, it’s 32B for each variable, so 32MB for every million variable only to keep the watching literal array(!). I have seen people complaining that their 100M variable problem runs out of memory — if we count that right that’s 3.2GB of memory only to hold the watching literal array pointers and sizes, not any data at all.

As for the CPU cache benefits: modern CPUs work using cache lines which are e.g. 64B long on Intel Sandy Bridge. If half of the variables are set already, the array holding the variable values — which will be accessed non-stop during propagation — will contain 50% useless data. In practice the speedup achieved can be upwards of 10%.

The simple problems

One problem with having a renumbering scheme is that you need to keep track of which datastructure is numbered in which way. The easy solution is to renumber absolutely everything. This is costly, however, as the mapping has to change every once in a while when new variables have been set. In this case, if everything is renumbered, then the eliminated variables‘ data needs to be updated according to the new mapping every time. This might be quite significant. So, it’s best not to renumber that. Similarly, if disconnected component analysis is used, then the disconnected components’ saved solutions need to be renumbered as well, along with the clauses that have been moved to the components.

An approach I have found to be satisfying is to keep every dynamic datastructure such as variable states (eliminated/decomposed/etc.), variable values (True/False/Unknown), clauses’ literals, etc. renumbered, while keeping mostly static datastructures such as eliminated clauses or equivalent literal maps non-renumbered. This works very well in practice as it allows the main system to shuffle the mapping around while not caring about all the other systems’ data.

The hard problem

The above is all fine and dandy until bounded variable addition (BVA) comes to the scene. This technique adds new variables to the problem to simplify it. These new variables will look like new outer variables, which seems good at first sight: the system could simply print the solution to all variables except the last N that were added by BVA and are not part of the original problem. However, if the caller adds new variables after the call to solve(), what can we do? The actual variables by the caller and the BVA variables will be mixed up: start with a bunch of original variables, sprinkle the end with some BVA, then some original variables, then some BVA…

The trivial solution to this is to have another mapping, one that translates variable numbers between the BVA and non-BVA system. As you might imagine, this complicates everything. Another solution is to forcibly eliminate all BVA variables after the call to solve(), let the user add the new clauses, and perform BVA again. Another even more complicated solution is to keep track of the variables being added, then re-number all variables inside all datastructures to move all BVA variables to the end of the variable array. This is expensive but only needs to be done once after the call to solve(), which may be acceptable. Currently, CryptoMiniSat uses the trivial scheme. Maybe I’ll move to the last (and most complicated) system later on.

Conclusion

Variable renumbering is not for the faint of heart. Bugs become significantly harder to track, as all debug messages need to be translated to a common variable numbering or they make no sense at all. It’s also very easy to introduce bugs through variable renumbering. A truly difficult bug I had was when the disconnected component finder’s sub-solver renumbered its internal variables and when I tried to import some values from the sub-solver back to the main solver, I used the wrong variable numbers.

MiniSat in your browser

CLICK HERE to run CyrptoMiniSat in your browser

 

Lately, I have become an LLVM nut. LLVM is amazing, it can do a lot, and, weirdly enough, a system developed from it is probably one of the biggest users of SAT solvers. I’m talking about KLEE, the symbolic virtual execution machine. It takes a source, compiles it with LLVM, then translates the LLVM IR into SMT, runs STP on it, which in turn runs CryptoMiniSat, and voila… the user gets back a free bug report. Or 1’200 bug reports.

Emscripten

Another great use of LLVM is emscripten, which takes the LLVM IR (again) and turns it into javascript. This of course means we can now run SAT solvers in the browser. This is so much fun. Here you can play with it, this is MiniSat “core”, from the golden old days of 2009. Just type in any CNF in DIMACS format and enjoy :) You can now solve the most basic NP-complete problem, right from your browser!

MiniSat in javascript

Please click here to run CyrptoMiniSat in your browser — much better!!

Epiloge

Yes, it’s actually running in your browser. The default CNF describes the HiTag2 cipher, with 10 output bits and 2 help bits given. The instance was generated using the Grain of Salt system. Note that the web page hangs until MiniSat finishes. Try typing in your own CNFs, play as much as you like. If you are patient, it can solve many problems, including reversing modern ciphers, finding SHA-1 collisions (PDF), verifying hardware and software… you just have to be patient enough.

Probably unbeknownst to you, you are using products of SAT solvers for your daily life: CPUs are verified using SAT solver-based techniques, airplane software is formally verified using SAT solvers, FPGA and CPU layouts are optimized using them, and if you are lucky, your car’s safety-critical systems are also verified using formal techniques, which basically means SAT solvers. Actually, even train schedules and public transport schedules are created using SAT solvers — I know for a fact that many trains in Europe are scheduled using these techniques. All these can be done using the very simple problem description, the CNF, you see above — and many of these systems use MiniSat.

How I did it and what could be improved

It’s relatively easy to compile any C or C++-based SAT solver to javascript using emscripten. Here is my github repo, with included HOWTO, for MiniSat. I would love to compile current lingeling, but its license doesn’t seem to allow emscripten to even think about compiling it. I’ll compile CryptoMiniSat later, it’s a bit hairy right now, I’m trying to refactor the code since it’s a mess. Until then, I hope you’ll enjoy playing with MiniSat!

Acknowledgements

A big thanks to my colleagues who pushed me to became an LLVM nut. Also, big thanks to Niklas Eén and Niklas Sörensson for MinSat!