# Machine Learning and SAT

I have lately been digging myself into a deep hole with machine learning. While doing that it occurred to me that the SAT community has essentially been trying to imitate some of ML in a somewhat poor way. Let me explain.

### CryptoMiniSat and clause cleaning strategy selection

When CryptoMiniSat won the SAT Race of 2010, it was in large part because I realized that glucose at the time was essentially unable to solve cryptographic problems. I devised a system where I could detect which problems were cryptographic. It checked the activity stability of variables and if they were more stable than a threshold, it was decided that the problem was cryptographic. Cryptographic problems were then solved using a geometric restart strategy with clause activities for learnt database cleaning. Without this hack, it would have been impossible to win the competition.
Continue reading Machine Learning and SAT

# On using less memory for binary clauses in lingeling’s watchlists

Armin Biere gave a lecture at the Pragmatics of SAT workshop (proceeedings here) in Vienna about all the things inside lingeling which won a lot of awards[PDF] this year. If you weren’t there, you missed an amazing presentation. In this blog post I’ll reflect on a particular part of the presentation dealing with a memory trickery that has been intriguing me for a long while but I did not implement. Before I begin let me say: the presentation was awesome, and it’s not by chance that lingeling won so many awards.

### The idea

The idea used by lingeling I want to talk about is easy to explain (though not easy to invent, as is usually the case). If you look at typical CNF problems, the majority of the clauses will be binary, i.e. only contain 2 literals. These clauses used to be stored exactly the same way as normal clauses: in the heap we allocate 2 literals and we put a pointer into the watchlist to these literals.

An improvement over this idea are the so-called implicit clauses. For the binary clause “x V y” we put into the watchlist of x the literal y, and in the watchlist of y the literal x. There is no other place we store these binary clauses, hence the word “implicit”. For other clauses, we still put pointers into the watchlist and allocate space, as usual. The problem with this approach is that the pointer to a clause is 32b (we use an 32b offset on 64b machines) but for each clause we also store a so-called blocking literal in the watchlist, which is also 32b. That makes the entries in the watchlist 64b long for normal clauses, and 32b long for binary clauses.

The idea is to have differing sizes of elements in a watchlist. If e.g. the first bit of the element is a 1, the next 63b relate to a long clause, and if it’s a 0, then the next 31b relate to a binary clause. In case 80% of the clauses are binary, this saves 50% of the space in 80% of the cases. Not bad at all.

The advantages of using this idea are twofold. First, as already mentioned, memory use is lowered. This is non-trivial as memory usage of the watchlists can be enormous and although many other improvements can be switched off (such as e.g. stamping), storage of the clauses can never be switched off. Secondly, not having holes in memory leads to much better cache usage which in turn can bring real speedup. In case you think this is not important, you might enjoy knowing that the HHVM module of Facebook was made over 2x faster by making sure that important cache lines are not knocked out[PDF].

In case you have an array that has varying size elements in it, some non-trivial complications arise. Let me list just a few that come to my mind.

Sorting the list is no longer trivial You cannot just swap elements: they might not fit. One way to do sorting efficiently is to move all the data into another, equally-spaced container and sort there, then move it back in. However, keep in mind that the reason why quicksort is so fast is that it can do in-place sort. Merge-sorting would be another option, but it copies elements and it’s not by chance it’s not the default sort in most cases. Also, you would have to re-write merge-sort of course, to deal with the varying-sized elements.

In case you think that sorting is not needed, maybe you forgot to consider the lightning-quick subsumption you can do between implicit binary&tertiary clauses using sort to give just one example.

Removing an element is no longer an O(1) operation In case you need to remove element X in a watchlist, you can simply swap the last element to the position of X and make the array one smaller. This trick is used quite extensively since the order of the watchlist is usually irrelevant.

Loops need to be re-written All your loops that go through the watchlist need to be re-written and have to have a switch() in them with some pointer arithmetic (do we advance by 32b or 64b?). In case you think you don’t need to go through the watchlist too often anyway, think again. Any time you need to do anything with clauses you will have to go through the watchlists, since they are the only place where binary (and tertiary, if your system is optimised enough) clauses are stored. This means your watchlist-gothrough function will be absolutely everywhere in the code unless you don’t want to implement any pre- or in-processing.

You might think you could write a function and just pass a pointer to another function to it that does the ‘real’ job, essentially hiding the complexity in a function that you only need to write once. There are three problems with this. First, this will be a tight loop and so performance is important, which you will loose as you will need to dereference the passed-in function pointer every time. This can be overcome with the use of templates but it won’t make the code pretty. Secondly, your original, hiding function will need to be written more than a single time. For example, some such executions will need to count time (operations) and some won’t. You will need to count pointer dereferences (normal clause is fetched) and binary clauses (no pointer dereferenced) in a significantly different way as a cache-miss is very expensive and a clause-access will cause such a cache-miss most of the time. For performance reasons you will need variations that don’t dereference long clauses, variations that allow for manipulation of the array, variations that don’t, etc.

Maintaining datastructure consistency becomes harder Unless you use hiding functions, which is non-trivial as explained above (and maybe impossible in e.g. plain C), the complexity to maintain consistency of the datastructure will be all over the code. Even if done very carefully, the constraints on the datastructure may end up being implicitly, rather than explicitly, represented in the code. It will make it easier to create bugs and harder to find them.

### Conclusions

This idea of using less memory for binary clauses in the watchlists is very interesting and has intrigued me for a long while — Armin was kind enough to tell me about this a long time ago. It has the potential to save a lot of memory and to keep things more packed in the datastructure that is arguably the most accessed during solving and inprocessing. However, I was always daunted by the obstacles I saw in front of me — though I might simply need to understand C++11 and templates better to make it work.

Currently, I feel like there are plenty of other optimisations that I could implement from the talk of Armin, e.g. that all watchlists are stored in the same array, using offsets and a hand-rolled memory manager. That seems to have a potential of also improving the memory usage and speed while being easier to implement and easier to hide in a class.

# A note on learnt clauses

Learnt clauses are clauses derived while searching for a solution with a SAT solver in a CNF. They are at the heart of every modern so-called “CDCL” or “Conflict-Driven Clause-Learning” SAT solver. SAT solver writers make a very important difference between learnt and original clauses. In this blog post I’ll talk a little bit about this distinction, why it is important to make it, and why we might want to relax that distinction in the future.

### A bit of terminology

First, let me call “learnt” clauses “reducible” and original clauses “irreducible”. This terminology was invented by Armin Biere I believe, and it is conceptually very important.

If a clause is irreducible it means that if I remove that clause from the clause database and solve the remaining system of constraints, I might end up with a solution that is not a solution to the original problem. However, these clauses might not be the “original” clauses — they might have been shortened, changed, or otherwise manipulated such as through equivalent literal replacement, strengthening, etc.

Reducible clauses on the other hand are clauses that I can freely remove from the clause database without the risk of finding a solution that doesn’t satisfy the original set of constraints. These clauses could be called “learnt” but strictly speaking they might not have been learnt through the 1st UIP learning process. They could have been added through hyper-binary resolution, they could have been 1UIP clauses that have been shortened/changed, or clauses obtained through other means such as Gaussian Elimination or other high-level methods.

### The distinction

Reducible clauses are typically handled “without care” in a SAT solver. For example, during bounded variable elimination (BVE) resolutions are not carried out with reducible clauses. Only irreducible clauses are resolved with each other and are added back to the clause database. This means that during variable elimination information is lost. For this reason, when bounded variable addition (BVA) is carried out, one would not count the simplification obtained through the removal of reducible clauses, as BVE could then completely undo BVA. Naturally, the heuristics in both of these systems only count irreducible clauses.

Reducible clauses are also regularly removed or ‘cleaned’ from the clause database. The heuristics to perform this has been a hot topic in the past years and continue to be a very interesting research problem. In particular, the solver Glucose has won multiple competitions by mostly tuning this heuristic. Reducible clauses need to be cleaned from the clause database so that they won’t slow the solver down too much. Although they represent information, if too many of them are present, propagation speed grinds to a near-halt. A balance must be achieved, and the balance lately has shifted much towards the “clean as much as possible” side — we only need to observe the percentage of clauses cleaned between MiniSat and recent Glucose to confirm this.

Glues (used first by Glucose) are an interesting heuristic in that they are static in a certain way: they never degrade. Once a clause achieves glue status 2 (the lowest, and best), it can never loose this status. This is not true of dynamic heuristics such as clause activities (MiniSat) or other usability metrics (CryptoMiniSat 3). They are highly dynamic and will delete a clause eventually if it fails to perform well after a while. This makes a big difference: with glues, some reducible clauses will never be deleted from the clause database, as they have achieved a high enough status that most new clauses will have a lower status (a higher glue) and will be deleted instead in the next cleaning run.

Since Glucose doesn’t perform variable elimination (or basically any other optimization that could forcibly remove reducible clauses), some reducible clauses are essentially “locked” into the clause database, and are never removed. These reducible clauses act as if they were irreducible.

It’s also interesting to note that glues are not static: they are in fact updated. The way they are updated, however, is very particular: they can obtain a lower glue number (a higher chance of not being knocked out) through some chance encounters while propagating. So, if they are propagated often enough, they have a higher chance of obtaining a lower glue number — essentially having a higher chance to be locked into the database.

What if these reducible clauses that are locked into the clause database are an important ingredient in giving glues the edge? In other words, what if it’s not only the actual glue number that is so wildly good at guessing the usefulness of a reducible clause, instead the fact that their calculation method doesn’t allow some reducible clauses ever to be removed also significantly helps?

To me, this sounds like a possibility. While searching and performing conflict analysis SAT solvers are essentially building a chain of lemmas, a proof. In a sense, constantly removing reducible clauses is like building a house and then knocking a good number of bricks out every once in a while. If those bricks are at the foundation of the system, what’s above might collapse. If there are however reducible clauses that are never “knocked out”, they can act as a strong foundation. Of course, it’s a good idea to be able to predict what is a good foundation, and I believe glues are good at that (though I think there could be other, maybe better measures invented). However, the fact that some of them are never removed may also play a significant role in their success.

### Locking clauses

Bounded variable addition is potentially a very strong system that could help in shortening proofs. However, due to the original heuristics of BVE it cannot be applied if the clauses it removes are only reducible. So, it can only shorten the description of the original problem (and maybe incidentally some of the reducible clauses) but not only the reducible clauses themselves. This is clearly not optimal for shortening the proof. I don’t know how lingeling performs BVA and BVE, but I wouldn’t be surprised if it has some heuristic where it treats some reducible clauses as irreducible (thereby locking them) so that it could leverage the compression function of BVA over the field of reducible clauses.

Unfortunately, lingeling code is hard to read, and it’s proprietary code so I’d rather not read it unless some licensing problems turn up. No other SAT solver performs BVA as an in-processing method (riss performs it only as pre-processing, though it is capable to perform BVA as in-processing), so I’m left on my own to guess this and code it accordingly.

UPDATE: According to Norbert Manthey lingeling doesn’t perform BVA at all. This is more than a little surprising.

### End notes

I believe it was first Vegard Nossum who put into my head the idea of locking some reducible clauses into the database. It only occurred to me later that glues automatically achieve that, and furthermore, they seem to automatically lock oft-propagated reducible clauses.

There are some problems with the above logic, though. I believe lingeling increments the glue counter of some (all?) reducible clauses on a regular basis, and lingeling is a good solver. That would defeat the above logic, though the precise way glues are incremented (and the way they are cleaned) in lingeling is not entirely clear to me. So some of the above could still hold. Furthermore, lingeling could be so well-performing for other reasons — there are more to SAT solvers than just search and resolution. Lately, up to 50% or more of the time spent in modern SAT solvers could be used to perform actions other than search.