Propagating binary clauses

Binary clauses are somewhat special. They are clauses that look like:

-v1 or v20 = true
-v1 or v21 = true

which basically mean: if v1 is TRUE, then both v20 and v21 must be TRUE. We put these clauses into a special datastructure, that looks like this:

watchlist[-v1] = v20,v21, ...

This makes us all very happy, since this datastructure is very simple: just literals (negated or non-negated variables) one after the other, in a very trivial list. This makes the whole thing very compact, and very efficient.

The problem with such a list is that when v1 is indeed set to TRUE, then when we set the variables (e.g. v20,v21), we must record what set these variables to their new values. What usually is done is to simply record the pointer to the clause that does this. However, in the above datastructure, there are no clause pointers. The datastructure for GLUCOSE contained these pointers. I have lately been experimenting with removing these pointers. The tricky part is to update the conflict generation routine that examines the clauses and decisions that lead to a conflict. This routine must now automatically recognise that these binary clauses are special, and they are only 2-long: one part of them is the variable that got set, and the other part is the variable that set it. These two informations are available: watchlist[-v1] immediately tells us that literal -v1 set the variable, and the variable that got set is always known.

Armed with this logic, one would think that moving towards this all-new no-clause-pointers heaven is the way to go. However, apparently, this might not be so. This is quite astonishing, given that doing things without pointers should mean less cache-misses (since pointers are less likely to be resolved during conflict analysis). I consistently get worse speeds, though. I am guessing that the problem lies with the datastructure I use to store the information stating what lead to the setting of the variable. I use a very simple struct:

struct WhatLeadHere {
    bool wasPointer;
    union{Clause* pointer; Lit lit;};
};

where wasPointer is TRUE if a non-binary clause lead here (and we store the pointer), and FALSE if it was a binary clause (and we store the other literal). The union takes care of the rest: pointer is valid when wasPointer is TRUE and lit is valid when wasPointer is FALSE.

What is apparent in this datastructure is the following: we now store a 32/64 bit pointer plus we store a boolean. Right, so 42/72 bits of data? No. Data alignment in C++ means, that this datastructure will be aligned according to its largest member, so it will be aligned to 32/64-bit boundaries: this datastructure takes 128/64 bits to store on 32 and 64 bit architectures, respectively. Oops, we just doubled our demand of data writes!

There are some remedies, of course. We can simply pack:

#pragma pack(push)
#pragma pack(1)
struct WhatLeadHere {
    bool wasPointer;
    union{Clause* pointer; Lit lit;};
};
#pragma pack(pop)

which removes the memory burden of aligning to maximum size, and the actual size will really be 42/72 bits. But this reaises the overhead of accessing data inside this structure. Furthermore, I have been using a 32-bit mini-pointer on 64-bit architectures, which is a big hack, so it crumbles as soon as something as complicated as this comes up: so I am left with writing 72 bits instead of 32. This probably leads to the slowdowns. Oh well, I have got to work harder…

EDITED TO ADD: Next up, I will talk about some new ideas that let me solve 217 problems from the 2009 SAT Race within the original time limit. Remember that last year the best solver only solved 204.

Optimisations, take two

I have talked about optimisations before and I mentioned that there are a few left to describe. In this post I will talk about two of them.

The first optimisation I would like to talk about concerns compilation flags. Somehow, everyone seems obsessed with them, like if they could somehow make a slow program fast. Alas, this is not the case. Firstly, and this is quite funny: the gcc flag “-O3” usually turns on all extra flags that people tend to give. One should really ponder upon this when looking at desperate attempts to speed up code. However, there is exactly one flag-combo that is very-very useful besides “-O3”: it’s “-fprofile-generate” and “-fprofile-use”. To understand why these are useful, we must first understand that one of the challenges faced by an optimising compiler is to try to guess how many times a loop will be executed,and how many times a branch will be taken. Given a good set of guesses, the loop can be unwound (or not) and the branches can be taken by default (or not). If compiled with “-fprofile-generate”, the program generates such information on-the-fly, which later can be used by “-fprofile-use”. The speedup achieved with such an approach in the realms of DPLL-based SAT solvers is relatively high, in the order of ~5-10%. I believe many SAT solvers’ binaries don’t use this trick, even though it’s cheap to use. I personally compile with “-fprofile-generate”, then run an example problem like UTI-20-10p1 on it, and then recompile with “-fprofile-use”. An important aspect, by the way, is to execute your problem with a very typical scenario: strange or non-normal scenarios will produce spurious branching and loop data which will actually slow down the program most of the time instead of speeding it up.

The second optimisation I would like to talk about concerns cache usage. Modern processors are extremely fast, to the point that many times what really limits the processor is no longer the ability to execute the code on the data, but to actually get the data that the code needs. This problem is usually alleviated through spatial and temporal data locality that is, that data most likely needed next is usually physically close to the one that we just accessed. However, in DPLL-based SAT solvers, the main function, propagate() goes through a list of pointers, accessing the data where each pointer points to. In other words, the CPU will fetch the next pointer in the background, but will not fetch where that pointer points to — simply because it doesn’t know it’s a pointer. So, how could we alleviate this problem? Well, by telling the CPU that the next piece of information that will be needed is where the next pointer points to. Like this:

for (i = 0; i < pointers.size();  i++) {
    if (i+1 < pointers.size())
        __builtin_prefetch(pointers[i+1]);
    treat_data(*pointers[i]);
}

where pointers is a vector of pointers that must each be treated. The function __builtin_prefetch() calls an assembly instruction that non-blockingly fetches the data pointed to into the Level 1 cache of the processor. This/these instruction(s) are not implemented in all CPUs, but many x86 and all AMD64 architectures have them. Essentially, while we are dealing with data where pointers[i] points at, we fetch the data where pointers[i+1] points at, in parallel. Interleaving the fetching of next data with the treatment of current data helps us do more things in less time, speeding up the program. Interestingly, since prefetching the data does not alter the flow of the program in any way, omitting the prefetch call does not alter the results of the program at all: it just makes it run slower. In CryptoMiniSat I use this optimisation, and it seems to speed up the solving by a couple of percentages, maybe 5-8%. The amount of time you will save on this depends on the locality and size of the information your pointers are pointing to. If they are very small, e.g. 2-3MB in total and are close to each other, then the CPU will fetch these after the first couple of pointers have been dereferenced, so there is no need to do explicit prefetching. But if the data you are dealing with is much larger, which is often the case with SAT solvers, then such explicit pre-fetching can save quite some time.

The Obvious Child

The new CryptoMiniSat 2.5.1, codenamed “The Obvious Child” has been released. It’s very close to the system that has been sent into the SAT Race’10. This version of CryptoMS is much better than any versions before: I finally corrected a lot of bugs that went into the 2.4.x series, and added new features to treat binary clauses.

The release name is the song title by Paul Simon. I am a fan of the 60’s for many reasons. It was the time when Laing wrote his first books, empathy was first scientifically demonstrated for animals, the days of I have a dream, the landing on the moon, and the take-off of Simon&Garfunkel.; Some interesting times, shall we say.

I probably will be working less on CryptoMiniSat from now on, instead concentrating on other things that I must do, such as publishing research papers on subjects that interest the scientific community. I will also be trying to play out in my mind how CryptoMiniSat will fare in the SAT Race. Of course I am hoping for the best, but I worked on the program for a mere 8 months, alone. Comparing this to some experts working on their solvers for multiple years in groups of 2-3, I really have no chance of winning. However, I am optimistic, as always: maybe I will get some good position in the rankings :)

Why programs fail

I just bought the book with the title of the blogpost by Andreas Zeller. Essentially, it’s about debugging, where the author analyses the chain of program defect leading to infected program state, finally leading to program failure. I bought the book because while working with CryptoMiniSat, I have encountered so many bugs that they could fill a book.

The problem with chasing bugs in CryptoMiniSat is that SAT solvers employ a very optimised algorithm that makes a lot of decisions per second, so recording all interesting events makes for a very big dump file. I have per-module verbosity settings, so if I can narrow down what module(s) are causing the failure, there is less debug output to go through, but sometimes I still have to wade through 10-20GB dumps to see how the program state got infected.

While going through such large dumps, the following occured to me: what if we loaded all this data into a MySQL database? Some 8 years ago I had a job where I processed a large chunk of the data on all of Hungary’s phone conversations on a daily basis. Essentially, data on your phone calls are recorded in many different databases, and these must be merged to calculate your bill and to provide statistical feedback to the company. I wrote a program that processed 3million records in about 1 hour using MySQL and a number of SQL statements that spanned approx. 2-3000 lines. In other words, I am not afraid of large databases, and I think such multi-gigabyte data could easily be loaded into a MySQL database.

The goal of loading a SAT run into a MySQL database is the following: once the MySQL data is ready, a bit of PHP, graphviz and gnuplot will plot any relevant information at any point in time about the solving. In other words, we could dig this data visually, with some very interesting effects.

For example, I have been lately having problems with clause strengthening interfering with variable elimination. The idea is in theory quite simple. Variable elimination eliminates a variable and adds clauses that represent the original clauses the variable was present in. These clauses are sometimes trivial, such as:
a or b or c or !c = true
which is never added since it is satisfied whether c is true or false. So far, so good. Clause strenghtening is a method whereby clauses are shortened. For example, if there was a clause:
e or b or c = true
the solver might discover that in fact it is also true that:
e or b = true
and replace the original clause with this, since this clause is “stronger”: it poses a more stringent requirement. The way variable elimination and clause strenghtening interact is as follows. Let’s suppose we eliminate on varible e. If the above clause is strenghtened before elimination, then instead of:
a or b or c or !c = true
we might have got:
a or b or !c = true
which must be added. We actually have more clauses because of clause strenghtening! I believe nobody ever found this anomaly. To properly quantitise how this affects solving, we need to know all states of a clause, and a web interface with a MySQL backbone could help a lot with that.

Being dumb is part of the deal

In my last post, I presented a fast and easy way to add binary clauses to the clause database of SAT solvers, and I conjectured that since the algorithm was so simple, there must be something wrong with it. I was, in fact, right. I ran a cluster experiment, and it turned out that adding all those binary clauses was slowing down the solving.

My first reaction was of course: how on Earth? Binary clauses are “good”: both MiniSat and Glucose try to amass these clauses as much as possible. So what can possibly be wrong? Well, redundant information, for a start. Let’s assume that we know that

A leads to B
B leads to C

If I now tell you that

A leads to C

it wouldn’t really help you. However, the algorithm I presented last time didn’t address this issue. It simply added a good number (though, luckily, not all possible) binary clauses that were totally redundant.

The trick of adding binary clauses is, then, to add as little as possible to achieve maximal graph connectivity. And, while we are at it, we can of course remove all redundant information like the one I presented above. Neither of these two are easy. I use the Monte-Carlo method to approximate the degree of vertexes for the first algorithm, and then sort vertexes according to their degrees when adding binary clauses. The second algorithm just needs a bit of imagination and a will to draw graphs on a piece of paper.

So, there is a point in being dumb: without the hands-on knowledge how binary clauses negatively affect problem solving, I would never have thought of developing such algorithms. The final algorithms take ~3-4 seconds to complete, add (fairly close to) the minimum number of clauses, remove all redundant clauses, and work like a charm.

There is only one thing to understand now: why does Armin Biere add hundreds of thousands of binary clauses? I seem to be able to achieve the same effect in a couple of thousand… Again, I must be dumb, and he must be doing something right. I am lost, again :(

EDITED TO ADD: You might enjoy this plot of PrecoSat236 vs. CryptoMiniSat 2.5.0beta1 on the SAT Competition instances of 2009.