CryptoMiniSat: 8000 commits later

The GitHub repository for CryptoMiniSat just hit 8000 commits. To celebrate this rather weird and crazy fact, let me put together a bit of a history.

The Beginnings

CryptoMiniSat began as a way of trying to prove that a probabilistic cryptographic scheme was not possible to break using SAT solvers. This was the year 2009, and I was working in Grenoble at INRIA. It was a fun time and I was working really hard to prove what I wanted to prove, to the point that I created a probabilistic SAT solver where one could add probability weights to clauses. The propagations and conflict engine would only work if the conflict or propagation was supported by multiple clauses. Thus began a long range of my unpublished work in SAT. This SAT solver, curiously, still works and solves problems quite well. It’s a lot of fun, actually — just add some random clause into the database that bans the only correct solution and the solver will find the correct solution. A bit of a hackery, but hey, it works. Also, it’s so much faster than doing that solving with the “right” systems, it’s not even worth comparing.
Continue reading CryptoMiniSat: 8000 commits later

Clause glues are a mystery to me

Note: the work below has been done in collaboration with Vegard Nossum, but the wording and some of the (mis-)conculsions are mine. If you are interested, check out his master thesis, it’s quite incredible

Anyone who has ever tried to really understand clause glues in SAT solvers have probably wondered what they really mean. Their definition is simple: the number of variables in the final conflict clause that come from different decision levels. An explanation of these terms can be found here. On the surface, this sounds very simple and clean: the number of different decision levels should somehow be connected to the number of variables that need to be set before the learnt clause activates itself, i.e. it causes a propagation or a conflict. Unfortunately, this is just the surface, because if you try to draw 2-3 implication graphs, you will see that in fact the gap between the number of variables needed to be set (let’s call this ‘activation number’) and the glue can be virtually anything, making the glue a potentially bad indicator of the activation number.

The original reasoning

The original reasoning behind glues is the following: variables in the same decision level, called ‘blocks of variables’ have a chance to be linked together through direct dependencies, and these dependencies should be expressed somehow in order to reduce the number of decisions needed to reach a conflict (and thus ultimately reduce the search space). To me, this reasoning is less clear than the one above. In fact, there are about as many intuitions about glues as the number of people I have met.

Glucosetrack

With Vegard Nossum we have developed (in exactly one day) something quite fun. On the face of it, it’s just glucose 1.0, plain and simple. However, it has an option, “-track”, which does the following: whenever a learnt clause is about to cause a conflict, it jumps over this learnt clause, saves the state of the solver, and works on until the next conflict in order to measure the amount of work the SAT solver would have had to do if that particular learnt clause had not been there. Then, when the next clause wishes to cause a conflict, it records the amount of propagation and decisions between the original conflict and this new conflict, resets the state to the place saved, and continues on its journey as if nothing had happened. The fun part here is that the state is completely reset, meaning that the solver behaves exactly as glucose, but at the same time it records how much search that particular cause has saved. This is very advantageous because it doesn’t give some magical number like glue, but actually measures the usefulness of the given clause. Here is a typical output:

```c This is glucose 1.0  with usefulness tracking by
c Vegard Nossum and Mate Soos. Based on glucose, which
c is in run based on MiniSat, Many thanks to all teams
c
c ============================[ Problem Statistics ]=============================
c |                                                                             |
c |  Number of variables:  138309                                               |
c |  Number of clauses:    942285                                               |
c |  Parsing time:         0.28         s                                       |
============================[ Search Statistics ]==============================
| Conflicts |          ORIGINAL         |          LEARNT          | Progress |
|           |    Vars  Clauses Literals |    Limit  Clauses Lit/Cl |          |
===============================================================================
|         0 |  138309   942285  2636352 |   314095        0   -nan |  0.000 % |
|       620 |  138074   942285  2636352 |   314095      615     82 |  1.559 % |
|       919 |  135307   925938  2596895 |   314095      906     75 |  3.908 % |
|      2714 |  130594   894562  2518954 |   314095     2664     67 |  6.799 % |
|      2814 |  130593   894562  2518954 |   314095     2763     69 |  6.808 % |
|      2930 |  130592   894562  2518954 |   314095     2879     70 |  6.808 % |
|      4042 |  127772   874934  2471045 |   314095     3974     69 |  9.292 % |
|      4142 |  127772   874934  2471045 |   314095     4074     70 |  9.292 % |
...
c Cleaning clauses (clean number 0). Current Clause usefulness stats:
INSERT INTO data(cleanno, idx, size, glue, conflicts, props, bogoprops, decisions) VALUES(0 , 961074 , 107 , 5 , 42 , 185509 , 1301341 , 0);
INSERT INTO data(cleanno, idx, size, glue, conflicts, props, bogoprops, decisions) VALUES(0 , 944729 , 14 , 1 , 7 , 36865 , 268229 , 0);
INSERT INTO data(cleanno, idx, size, glue, conflicts, props, bogoprops, decisions) VALUES(0 , 948275 , 7 , 1 , 15 , 27909 , 220837 , 0);
INSERT INTO data(cleanno, idx, size, glue, conflicts, props, bogoprops, decisions) VALUES(0 , 953762 , 102 , 2 , 2 , 29365 , 197410 , 0);
...
c End of this round of database cleaning
...
|     38778 |  110896   758105  2182915 |   314095    28270     93 | 20.167 % |
|     39488 |  110894   758105  2182915 |   314095    28978     93 | 20.185 % |
c Cleaning clauses (clean number 1). Current Clause usefulness stats:
INSERT INTO data(cleanno, idx, size, glue, conflicts, props, bogoprops, decisions) VALUES(1 , 979236 , 71 , 1 , 8 , 45531 , 280156 , 0);
INSERT INTO data(cleanno, idx, size, glue, conflicts, props, bogoprops, decisions) VALUES(1 , 954908 , 2 , 2 , 7 , 0 , 232760 , 0);
...```

The output is in SQL format for easy SQL import. The “size” is the clause size, “glue” is the glue number, “conflicts” is the number of times the clause caused a conflict, “props” is the number of propagations gained by having that clause (i.e. by doing the conflict early), “bogoprops” is an approximation of the amount of time gained based on the number of watchlists and the type of clauses visited during propagation, and “decisions” is the number of decisions gained. The list is sorted according to “bogoprops”, though once the output is imported to MySQL, many sortings are possible. You might notice that the ‘glue’ is 1 for some clauses (e.g. on the second output line) — these are clauses that have caused a propagation at decision level 0, so they will eventually be removed through clause-cleaning, since they are satisfied. Notice that high up, there are some relatively large clauses (of size 102 for example) with glue 2, that gain quite a lot in terms of time of search. The gained conflicts/propagations/etc. are all cleared after every clause-cleaning, though since clauses are uniquely indexed (‘idx’), they can be accumulated in all sorts of ways.

The program is 2-5x slower than normal glucose, but considering that it has to save an extreme amount of state due to the watchlists being so difficult to handle and clauses changing all the time, I think it does the job quite well — as a research tool it’s certainly quite usable. In case you wish to download it, it’s up in GIT here, and you can download a source tarball here. To build, issue “cmake .” and “make”. Note that the tool only measures the amount of search saved by having the clause around when it tries to conflict. It does not measure the usefulness of the propagations that a learnt clause makes. Also, it doesn’t measure the other side of the coin: the (potentially better) conflict generated by using this clause instead of the other one. In other words, it gives a one-sided view (no measure of help through propagation) of a one-sided view (doesn’t measure the quality of difference between the conflict clauses generated). Oh well, it was a one-day hack.

Experiments

I have made very few experiments with glucosetrack, but you might be interested in the following result. I have taken UTI-20-10p0, ran it until completion, imported the output into MySQL, and executed the following two queries. The first one:

```SELECT glue, AVG(props), FROM data
WHERE glue >= 2 AND size >= 2
GROUP BY glue
ORDER BY glue```

calculates the average number of saved propagations between each round of cleaning for clauses of glue >= 2 (i.e. clauses that didn’t eventually cause a propagation at decision level 0), and of size >= 2, because unitary clauses are of no interest. The second is very similar:

```SELECT size, AVG(props), FROM data
WHERE glue >= 2 AND size >= 2
GROUP BY size
ORDER BY size```

which calculates the same as above, but for size.

Some explanation is in order here regarding why I didn’t count SUM(), and instead opted for AVG(). In fact I personally did make graphs for SUM(), but Vegard corrected me: there is in fact no point in doing that. If I came up with a new glue calculation function that gave an output of ‘1’ for every clause, then the SUM for that function would look perfect: every clause would be in the same bracket, saving a lot of propagations, but that would not help me make a choice of which clauses to throw out. But the point of glues is exactly that: to help me decide which clauses to throw out. So what we really want is a usefulness metric that tells me that if I keep clauses in that bracket, how much do I gain per clause. The AVG() gives me that.

Here goes the AVG() graph for the last clause cleaning (clause cleaning iteration 33):

Notice that the y axis is in logscale. In case you are interested in a larger graph, here it is. The graph for clause cleaning iteration 22 is:

(Iteration 11 has high fluctuations due to less data, but for the interested, here it is). I think it’s visible that glues are good distinguishers. The graph for glues drops down early and stays low. For sizes, the graph is harder to read. Strangely, short clauses are not that good, and longer clauses are better on average. If I had to make a choice about which clauses to keep based on the size graph, it would be a hard choice to make: I would be having trouble selecting a group that is clearly better than the rest. There are no odd-one-out groups. On the other hand, it’s easier to characterise which clauses are good to have in terms of glues: take the low glue ones, preferably below 10, though we can skip the very low ones if we are really picky. An interesting side-effect of the inverse inclination of the size and glue graphs and the fact that “glue<=size” is that maybe we could choose better clauses to keep if we go for larger clauses that have a low glue.

Conclusions

Unfortunately, there are no real conclusions to this post. I guess running glucosetrack for far more than just one example, and somehow also making it measure the difference between the final conflict clauses’ effectiveness would help to write some partially useful conclusion. Vegard and me have tried to put some time and effort into this, but to not much avail I am afraid.

PS: Notice that glucosetrack allows you to generate many of the graphs here using the right SQL query.

CryptoMinisat 3.1 released

CryptoMinisat 3.1 has been released. The short changelog is:

```\$ git diff cryptoms-3.0 cryptoms-3.1 --shortstat
84 files changed, 3079 insertions(+), 2751 deletions(-)
```

The changes made were threefold. First, memory usage has been greatly reduced. This is crucial, because memory usage was over 7GB on certain instances. Secondly, the implication cache wasn’t very well-used and an idea that came to my mind greatly improved performance on most problems. Finally, time limiting of some inprocessing techniques on certain types of problems has been improved.

Memory usage reduction

On instances that produced a lot of long learnt clauses the memory usage was very high. These learnt clauses were all automatically linked in to the occurrence list and consequently took large amounts of memory, sometimes up to 10GB. On other instances, the original clauses were too numerous and too large, so putting even them into the occurrence list was too much. On these instances, variable elimination is not carried out (or carried out only later, when enough original clauses have been removed/shortened). To debug some of these problems, I wrote a fuzzer that generates extremely large problems with many binary and many long clauses, it’s available here as “largefuzzer”. It’s actually quite nice with many-many binary clauses so it also can fuzz the problems encountered with probing of extremely weird and large instances.

Implied literal usage improvement

CryptoMiniSat uses implied literals, i.e. caches what literals were propagated by each literal during probing. It then re-uses this information to subsume and/or strengthen clauses. This is kind of similar to stamping though uses more memory. It is actually useful to have alongside stamping, and I now do both — propagating DFS that stamping requires is expensive though updating cache during DFS is just as easy as during quasi-BFS.

The trick I discovered while playing around with cached implied literals is that if literal L1 propagates L2 and also !L2 then that means there are conceptually two binary clauses in the solver (!L1, L2), (!L1, !L2), so !L1 is TRUE. This is of course trivial, but I never checked for this. The question most would raise is: why would L1 propagate both L2 and !L2 and not fail? The answer is kind of tricky, but very interesting. Let’s say at one point, L1 propagates L2 due to a learnt clause, but that learnt clause is then removed. A new learnt clause is then later learnt, and with that learnt clause in place, L1 propagates !L2. Now, without caching, this would be ignored. Caching memorizes past conceptual binary clauses and re-uses this information.

This is not an optimization that only looks good on paper, it is very good to have. With this one optimization, I gained 5 instances from the SAT Comp’09 instances with a 1000s timeout (196 solved -> 201 solved). I can’t right now imagine how this could be done with stamping effectively, but that doesn’t mean it’s not possible. Though, according to my experience, stamping doesn’t preserve that much information over time as it’s being updated (renumbered) frequently while the cache is only improved over time, never shrunk. A possibility would be to have more than one stamp system and round-robin selecting them. However that would mean that sorting of clauses (for shrinking) would need to be done more than once, and sorting them is already relatively expensive. I sometimes feel that what stamping gains in memory it looses on sorting (i.e. processing time) and lower coverage (re-numbering).

More precise time-limiting

Martin Maurer has been kind enough to file a lot of bug reports about probing and variable elimination taking too much time, sometimes upwards of 150s when they should take around 20-30s maximum. While investigating, it tuned out that the problem was very weird indeed. While trying to eliminate or probe one variable the time for that one variable took upwards of 100s. This was completely unexpected as the code only checked for timeouts on a per-variable basis. In the end, the code had to be improved to track time on an intra-variable basis in both systems. While at it, I also added intra-variable time-tracking to implicit clause subsumption and strengthening too. So, over-times should less prevalent from now on. As an interesting side-note, time-limiting on probing is now so fine-grained that a 32-bit unsigned integer would overflow within 15s if used as the time-tracker.

CCC Camp’11

In case you’ve missed it, the CCC Camp was a great opportunity to meet people both working in security and otherwise. I have even met a very kind Taiwanese researcher who worked on SAT and Gröbner basis: in fact, if you haven’t had the chance to read this paper, I highly recommend it. A set of kind Taiwanese researchers recommended this paper to me, and I think it’s the most interesting SAT paper I have read in the past year.

We at SRLabs have made two releases during this camp, one that breaks GPRS encryption, and one that breaks smart card ROM encryption. I was involved with the first release, essentially working on the crypto part. In case you are interested in the videos, the one on GPRS is uploaded here, and the one on smart card ROM encryption is here. This reminds me of something: the videos from the MIT SAT/SMT Summer School are missing :( Well, given my fail there, maybe that’s a good thing :)

Note to self: higher level autarkies

While reading this thesis, I have had a thought about autarkies. Let me first define what an autarky in SAT solving is:

A partial assignment `phi` is called a weak autarky for `F` if “`phi*F` is a subset of `F`” holds, while `phi` is an autarky for `F` if `phi` is a weak autarky for all sub-clause-sets `F'` of `F`. (Handbook of Satisfiability, p. 355)

What does this mean? Well, it means that in the clause-set `{x V y, x}` the assignment `y=False` is a weak autarky, because assigning `y` to `False` will lead to a clause that is already in the clause set (the clause `x`), while the assignment of `x=True` is a (normal) autarky, because it will satisfy all clauses that `x` is in. The point of autarkies is that we can remove clauses from the clause set by assigning values to a set of variables, while not changing the (un)satisfiability property of the original problem. In other words, autarkies are a cheap way of reducing the problem size. The only problem is that it seems to be relatively expensive to search for them, so conflict-driven SAT solvers don’t normally search for them (but, some lookahead solvers such as march by Marijn Heule do).

So, what is the idea? Well, I think we can have autarkies for equivalent literals, too. Here is an example CNF:

``` a V d V -f
-b V d V -f
-a V e
b V e
```

setting `a = -b` will not change the satisfiability of the problem.

We can add any number of clause pairs of the form `X V Y` where `Y` is any set of literals not in `{a, -a, b, -b}`, and `X` is `(-)a` in clause 1 and `(-)b` in clause 2. Further, one of the two variables, say, `a` can be in any clauses at will (in any literal form), though variable `b` can then only be in clauses defined by the clause pairs. Example:

``` a V  d V -f
-b V  d V -f
-a V  e
b V  e
a V  c V -h
a V -f
```

An extension that is pretty simple from here, is that we can even have clauses whose `Y` part is somewhat different: some literals can be missing, though again in a controlled way. For example:

``` a V d
-b V d V -f
-a V e
b V e
```

is possible. It would restrict `b` a bit more than necessary, but if `a` is the only one of the pairs missing literals, we are still OK I believe.

Finally, let me give an example of what these higher-level autarkies are capable of. Let’s assume we have the clause set:

``` a V  d
-b V  d V -f
-a V  e
b V  e
a V  c V -h
a V -f
a V  b V  f V -g
```

setting `a=-b` now simplifies this to:

``` a V  d
-a V  e
a V  c V -h
a V -f
```

which is equisatisfiable to the first one. Further, if the latter is satisfiable, computing the satisfying solution to the former given a solution to the latter is trivial.