I must be dumb

Apparently, hyper-binary resolution is not easy. However, I have been reading the paper by Gershman and Strichman, and they make it sound like it’s the easiest thing of all — especially if you don’t do it the way they are doing it, of course.

Hyper-binary resolution is essentially very easy. Let’s say you have three clauses:

-a V b
-a V c
d V -b V -c

(where V is the binary OR, - is the binary negation, and each line must be satisfied)

If we set a=true and propagate this, then we get to the result that d=true. Which can be coded down as the clause:

-a V d

More interestingly, this new clause actually helps: if you only use the original 3 clauses, and you set d = false and propagate, nothing happens. However, if you add the new clause to the clause list, then a=false propagates automatically.

The way these kind of new binary clauses (like -a V d) could be found has been done before as follows:

  1. Build graph from binary clauses: literals are vertices and there are edges between vertices a and b if there is a binary clause -a V b
  2. Set a=true and propagate
  3. Check if every propagated fact can be reached starting from the vertex corresponding to a in the graph. If it cannot be reached, add the binary clause that represents the edge.

One can say that this algorithm works perfectly. But I am a lazy coder, and I don’t want to implement a graph library into CryptoMiniSat. What’s there to do then? Well, we can achieve the same thing with data structures already present in all modern solvers!

The trick to use is that binary clauses are propagated in a very different manner than normal clauses: they have their own watchlists, and propagating binary clauses is extremely fast. So, instead of trying to fiddle with some overoptimised graph library, we can simply write a routine, called propagateBin(), which only propagates binary clauses already inside the code. We then compare the propagate() and the results of propagateBin() and if something is missing, we add it.

In fact, implementing this algorithm using the above trick is trivial. No use of graph libraries, nothing to do really, just write a 20-line propagateBin(), which itself is just a copy-paste of the first couple of lines of propagate() . A very short side-note: it’s also very fast. We can find 100’000 binary clauses in less than 0.1 seconds. Here is the commit that implements this in CrptoMiniSat. I think it makes sense even for those unfamiliar with the code.

Finally, why am I dumb? Because there must be something amiss here. I can’t believe that people much more knowledgeable in the subject didn’t come up with this idea before. So, they must have, and probably already published it, or, the idea must be really bad, missing elements or is incomplete. If it is complete, however, then hyper-binary resolution is once and for all fast and efficient. Can’t wait for someone to prove me wrong.

On failed literal probing

Apparently, there have been quite some work done on failed literal probing, although I don’t think they have all been put into a common package yet.

The idea is quite simple in its purest form: try to set variable v1 to true and see if that fails. If it does, variable v1 must be false. If it doesn’t fail, try the other way around: set it to false and if that fails, it must be true.

There are a couple of tricks we can add, however. For example, if both v1 and !v1 set variable v10 to value X (where X can be either true or false), then set v10 to X. The reasoning here is simple: v1 must be set, so whatever both v1 and !v1 imply, it must be set, too. So we can safely set v10 to X.

A more interesting thinking is the following. If v1 sets v20 to true, but !v1 sets v20 to false, then v1 = v20. So, we can replace v20 with v1. One less variable to worry about!

There are even more tricks, however. If setting v1 to true and false both shorten a longer XOR to a 2-long XOR “v40 + v50 = false“, this 2-long XOR can be learnt: v40 can be replaced with v50.

And more tricks. If there is a 2-long clause v1 or v2 = true, then we can do all of the above, but with v1 and v2 this time. Since either v1 or v2 must be true, all the above ideas still work. In all previous ideas all we used was the fact that either v1 or !v1 must be true. This is still the case: either v1 or v2 must be true. This, by the way, is called 1-recursive learning. (note: the fourth paragraphs changes a bit, but we still learn a 2-long xor).

And now something new. I have been thinking about trying to re-use old learnt clauses. They really code down sometimes quite important facts about the problem. What if we tried to re-use them? After all, keeping them in memory is really cheap: the only problem with them is that doing regular propagation with them takes a lot of time, thus slowing down the solver. But what if we only used them to do failed literal probing? I just launched a test on the Grid’5000 cluster to find out. Fingers crossed…

Gaussian elimination is released

The new CryptoMiniSat, version 2.4.2 now has on-the-fly Gaussian elimination compiled in by default. You can simply use it by issuing e.g.

./cryptominisat --gaussuntil=100 trivium-cipher.cnf

and enjoy outputs such as:

c gauss unit truths : 0
c gauss called : 31323
c gauss conflicts : 5893 (18.81 %)
c gauss propagations : 7823 (24.98 %)
c gauss useful : 43.79 %
c conflicts : 34186 (4073.39 / sec)
c decisions : 39715 (6.86 % random)

Which basically tells that gaussian elimination was called 31 thousand times out of 39 thousand, and so it was essentially running almost all the time. Out of the 31’323 times it was called, 44% of the cases it found either a conflict or a propagation. This is a very good result, and is typical of the Trivium cipher. Trivium can be speeded up by up to 2x with Gaussian elimination. I will put up lots of CNFs, so you will be able to play around with them and (optionally) verify these results.

The magic parameter “–gaussuntil=100” tells the program to execute Gaussian elimination until decision level 100, and no deeper. I haven’t implemented and automation into finding the best depth, and so I use this (very) crude fixed number 100. Probably better results could be achieved with a fine tuning of the depth cut-off, but I don’t have the time for the moment to play around with it. However, if you are interested, you will be able to try out different depths, different ciphers, etc. I will shortly be releasing a tool called “Grain of Salt” to generate CNFs for any shift-register based stream cipher, so you can test them against CryptoMiniSat or any other SAT solver.

I hope you will enjoy using on-the-fly Gaussian elimination in CryptoMiniSat 2.4.2!

Parallelisation

I have lately been reading the book Using OpenMP and thinking about how CryptoMiniSat could be parallelised. Apparently, there are multiple ways to achieve it. I haven’t yet had time to read through all previous approaches, but it seems there are at least two main ways to distribute the solving: through guiding paths and using the idea of ManySat.

The first approach, guiding paths, decomposes the problem into multiple sub-problems, setting some variables to true or false. The problems are then solved completely independently with a master coordinating the effort. I don’t like this approach for two reasons. Firstly, for cryptographic problems, the decomposition can be made very well since the problem in its most abstract form (i.e. key, plaintext, ciphertext) is known, and so a very good guiding path can be calculated. However, even with this very good guiding path, the expected total time to solve is far more than if the problem was given to one solver instance. In other words, if we decompose the problem into 4 sub-problems, and each sub-problem takes on average T time to solve, then the total average time is 4T, but if we gave the problem to one solver, it would have found the solution on average in e.g. 2T time. In other words, the gain is not as much as one would hope. I would guess that this ratio (2T/4T = 1/2 in this case) is even worse when the problem’s abstract form is totally unknown (which is most of the time), and the solver has to guess a guiding path.

The second approach, that of ManySat is to start the same solver with the same problem multiple times, but use different restart heuristics to solve the problem. Since modern DPLL-based SAT solvers use randomised algorithms, this helps to reduce the expected time to solve an instance. Also, the solvers share short clauses with one another to work in a more collaborative manner. However, I believe that given the distribution of the solving time of a problem with a given solver, even if more instances of it are launched (each with a random seed and different restart strategy), the minimum solving time of all launched solvers will not be as small as one would hope. It is extremely rare that a difficult instance can be solved in under e.g. 1 minute if the average time to solve it is 1000 minutes. So, if we were to launch 1000 solver instances, probably none would finish before 2-300 minutes. Of course I am not counting the effect of shared learnt clauses here, but I am not very convinced about them. Apparently, activity-based learnt clause sharing (instead of size-based) is more effective than size-based (see this), and ManySat shares them according to the size.

So, if all else fails, what is there left to do? Well, I bought the OpenMP book to find out. I wish to implement the idea of distributing clauses to different threads, thus distributing the BCP (Boolean Constraint Propagation) load. However, multi-threading can only go so far, so MPI will eventually need to be used, along with multiple instances of the solver on the same physical CPU, the way ManySAT is doing it — though I will try to implement some form of clause-sharing. The good thing is, that implementing MPI will also bring the possibility of running a problem instance on a huge cluster, something I am really looking forward to.

EDITED TO ADD (16/07/2010): I just realised that MiraXT does quite a number of things that I wish to implement into CryptoMiniSat.

Social coding

I saw a talk by Ed Catmull on the problems Pixar had in their early days. The talk is very interesting, and I recommend it. What caught me, is that at Pixar they had a “brain trust”, a certain group of people who could express their new ideas to one another without fear of backlash. This is actually very difficult to achieve, since early implementations of ideas are always “childish”, lack maturity, and are therefore not considered good enough to be shared with others. However, if you don’t share the idea with the others early, then you will get feedback too late, when you have already invested a lot of time into it. Criticism at this late stage hurts more and what is worse, it might make you so uncomfortable, that you might abandon the idea altogether.

The trick is to get a set of people to trust each other well enough to share not-yet mature ideas with one another in the early stage, so that criticism can help everyone get their ideas right from the start. Establishing this trust is very difficult. I started to understand what Ed Catmull was talking about while working with the STP team. The earlier I seemed to share my ideas, the better they got. For instance, CryptoMiniSat was originally never meant to be used as a library. However, once they (mostly Vijay Ganesh) helped me debug the code, it was much easier to get things “right”. I also met Laurent Simon last year for half a day, and he gave me the idea of using Grid’5000 to test the performance of CryptoMiniSat. Without these early steering ideas, I would have never been able to get CryptoMiniSat to where it is today.