UNSAT | Wonderings of a SAT geek

Marijn Heule kindly sent me an email on the 10th of April about DRUP, the new system used this year in the SAT Competition’13 for the UNSAT track. He kindly encouraged me to implement the DRUP system. He personally implemented it into Minisat which was a very helpful lead for me. In this post I will talk about my experiences in implementing DRUP into CryptoMiniSat within a span of 3 days.

Implementation complexity

It took only 3 days, about 1400 lines of code to implement DRUP:

git diff f27c74bbd  c0b6ccc10 --shortstat
 25 files changed, 1372 insertions(+), 307 deletions(-)

It turns out that the biggest problem is that whenever I shorten a clause, I first have to add the shortened version, and then delete the old one. Since I always do in-place literal deletion, this means I have to save the old clause into a temporary place, add the new one and finally delete the old one that has been saved. I will eventually write a C++ wrapper that does this for me, but currently, it’s a lot of

vector origCl(cl.size());
std::copy(cl.begin(), cl.end(), origCl.begin());
[blah...]
drup << cl << " 0" << endl;
drup << "d " << origCl << " 0" << endl;

So, it's a bit messy code. Other than this, the implementation went very smoothly. The biggest pain was not to forget to add to the DRUP output all changed clauses. Since I have implicit binary and tertiary clauses and I manipulate them in-place, they are changed in quite complicated code paths.

If you don't have such complicated code paths, you should be able to implement DRUP within a day or less. I encourage you to do so, it's quite fun!

Remaining uncertainties

I am a bit confused about whether some of the optimisations in CryptoMiniSat work with DRUP. I have been fuzzing the DRUP implementation for about ~1000 CPU hours, but not with all optimisations turned on. Some are a bit shaky. In particular, XOR and stamping&caching come to mind.

I cannot turn DRUP on for the top-level XOR manipulation because otherwise I would need to tell DRUP every Gaussian elimination step. Not funny, and not fast. Well, XOR is not such a big thing, and it is no longer natively implemented in CryptoMiniSat, so not a big deal, really.

The other, more troubling one is stamping and implied literal caching. Luckily I have on-the-fly hyper-binary resolution (this is needed for DRUP with Stalmarck if you think about it), so the binary clauses stored by caching and stamping are there... but they may get deleted by variable elimination, blocked clause elimination and... well, maybe nothing else. Hopefully not. Anyway, I never block binary clauses (does clause blocking ever help? I am confused) and I can of course not delete binary eliminated clauses from DRUP. However... that may make the verification very slow. So, I am at crossroads here. I think I will submit a version with stamping&caching and one without.

In the end, every optimisation can be turned on except for XOR. I find that exceptionally good given the number of tweaks/hacks used by CryptoMiniSat.

Long-term advatages of having DRUP

I think DRUP allows for a lot of possibilities. Naturally I first want to draw resolution graphs. There are plenty of libraries for 3D drawing, and I have already ordered the LEAP controller (a 3D controller), which will come handy to play with the resolution graphs (zoom&out, rotate, etc.).

From there, I want to get stats out of the graph, and I want to present it next to/with the stats that I already generate. For example, how many of the deleted clauses get re-learnt later? How many clauses get used in the resolution graph with the empty node? How often when cleaning with glues? How often when cleaning with activities? For which types of instances?

Linking this with real-world instances by coloring the graph points according to e.g. filter functions in stream ciphers is not very hard and should be quite a lot of fun.

Acknowledgements

I think Marijn Heule deserves a lot of thanks for the work he has put into DRUP (webpage, example, DIFF for MiniSat) and all the help he has given me. I had some initial doubts about whether it's possible to implement at all and I had some minor problems with the checker --- he always replied kindly and promptly. Thanks!

Timings from the article "Attacking Bivium Using SAT Solvers". The authors didn't seem to have randomised the problems enough: the time to solve should increase exponentially, but instead it goes up and down like a roller coaster

Vegard Nossum asked me about the varying time it took for CryptoMiniSat to solve a certain instance that was satisfiable. This inspired me to write an overly long reply, which I think might interest others. So, why does the solving time vary for a specific instance if we permutate the clauses? Intuitively, just by permutating the clauses, the problem itself doesn’t get any easier or harder, so why should that make any difference at all? I will now try to go through the reasons, though they can almost all be summed up as follows: the SAT solver has absolutely no clue what it is solving, and so to solve a problem, it relies on heuristics and randomisation, both of which are influenced by its starting seed, which in turn is influenced by the order of the clauses in the CNF file. And now on to the long explanation.

Let’s divide problems into two categories: satisfiable and unsatisfiable instances. First, let me talk about satisfiable instances. Let’s suppose that we have been solving the problem for some time, and we have connected the dots (variables) well enough through learnt clauses. All that the SAT solver is waiting for is a good guess to set 3-4 variables right, and then it will propagate all variables to the right setting to satisfy all clauses. The problem is therefore to get to the point of only needing to guess 3-4 variables, i.e. to connect the dots, which we do through resolution. We can sacrifice some or all of dot-connecting (i.e. resolution) with some more guessing, and this can in fact be coded down by simply foregoing some of the conflict analysis we do. In the extreme case, all conflict analysis can be disabled, and then the solver would not restart its search. The solver would in essence be brute-forcing the instance through BCP (boolean constraint propagation). It is easy to see why this latter is prone to variation: depending on the ordering of the variables in the search tree, the tree could be orders of magnitude smaller or larger, and the first solution can be at any point in the search tree, essentially at a random place.

If we decide to carry out resolution for satisfiable problems to counter the problem of the variation of the search-tree, it is interesting to realise the following: in most cases, we can not forgo guessing. The reason is simple yet leads to quite interesting properties. Essentially, a SAT instance can have multiple solutions. If there are two solutions, e.g. 01100... and 10011... i.e. the solutions are the complete inverse of one another, then the SAT solver will not be able to prove any variable to any value. The best it could do is to create binary clauses, in the example case for instance

var1=0 -> var2=1, var3=1, var4=0...
and
var1=1 -> var2=0, var3=0, var4=1...

If we do enough resolutions, the “guessing” part will eventually become a solution-selector, i.e. it will select a solution from the set of available solutions. If there are 2^20, evenly distributed in the search space, we might need to set 20 variables before a solution is found through a simple application of BCP. Naturally, SAT solvers don’t normally do this, as they are content in finding one satisfying assignment, reporting that to the user, and exiting. It would be interesting to know the average remaining number of variables that needed to be guessed to solve the solution at the point the SAT solver actually found the solution, but this has not been done yet as far as I know. Then, we would know the trade-off the SAT solver employs between resolution and searching when solving satisfiable instances.

All right, so much for satisfiable instances. What happens with UNSAT instances? Well, the solver must either go through the whole tree and realise there is no solution, or do resolution until it reaches an empty clause, essentially building a resolution tree with an empty clause at its root. Since both of these can be done at the same time, there is a similar trade-off as above, but this time it’s somewhat upside-down. First of all, the search tree can be smaller or larger depending on the ordering of the variables (as before), and secondly, the resolution tree can be smaller or larger depending on the order of resolutions. The minimal resolution tree is (I believe) NP-hard to find, which doesn’t help, but there is at least a minimum resolution tree that limits us, and there is a minimum search tree which we must go through completely, that limits us. So, in contrast to finding satisfying solutions, both of these are complete in some sense, which should make searching for them robust in terms of speed. Finding a satisfying solution is not complete, because, as I noted above, SAT solvers don’t find all satisfying solutions — if they did, they would actually have the same properties as solving an unsatisfiable problem, as they would have to prove that there are no more solutions remaining, essentially proving unsatisfiability.

The above reasoning is probably the reason why SAT solvers tend to behave more robustly when they encounter an unsatisfiable problem. In particular, CryptoMiniSat’s running time seems more robust when solving unsatisfiable problems. Also, for problems that contain a lot of randomly placed solutions in the search space, such as the (in)famous vmpc_33 and vmpc_34 problems, the solving times seem wholly unpredictable for at least two interesting SAT solvers: lingeling and CryptoMiniSat.

Wonderings of a SAT geek

A blog about SAT solving and cryptography

Tag Archives: UNSAT

Certified UNSAT and CryptoMiniSat

Implementation complexity

Remaining uncertainties

Long-term advatages of having DRUP

Acknowledgements

The variable speed of SAT solving