Tag Archives: SAT

CryptoMiniSat 3.0 released

CryptoMiniSat 3.0 has been released. I could talk about how it’s got a dynamic, web-based statistics interface, how it has more than 80 options, how it uses no glues for clause-cleaning and all the other goodies, but unfortunately these don’t much matter if the speed is not up to par. So, here is the result for the 2009 SAT Competition problems on a 1000s timeout with two competing solvers, lingeling and glucose:

cryptoms_speed

This of course does not mean that CryptoMiniSat is faster than the other solvers in general. In fact it is slower on a number of instances. What it means is that in general it’s OK and that’s good enough for the moment. It would be awesome to run the above experiment (or a similar one) with a longer timeout. Unfortunately, I don’t have a cluster to do that. However, if you have access to one, and would be willing to help with running the 3 solvers on a larger timeout, please do, I will post the updated graph here.

Update Norbert Manthey kindly ran all the above solvers on the TU Dresden cluster, thanks! He also kindly included one more solver, Riss 3g. The cluster was an AMD Bulldozer architecture with 2cores/solver with an extreme, 7200s timeout. The resulting graph is here:

cryptoms_speed

Riss 3g is winning this race, with CryptoMiniSat being second, third is glucose, and very intriguingly lingeling the 4th. Note that CryptoMiniSat leads the pack most of the time. Also note, this is the first time CryptoMiniSat 3.0 has been run for such a long time, while all the other competing solvers’ authors (lingeling, glucose, riss) have clusters available for research purposes.

Licensing

For those wondering if they could use this as a base for SAT Competition 2013, the good news is that the licence is LGPL so you can do whatever you want with it, provided you publish the changes you made to the code. However, I would prefer that you compete with a name such as “cms-MYNAME” unless you change at least 10% of the code, i.e. ~2000 lines. For the competitions after 2013, though, it’s all up for grabs. As for companies, it’s LGPL, so you can link it with your code, it’s safe, you only have to publish what you change in the library, you don’t have to publish your own code that uses the library.

Features

CryptoMiniSat has been almost completely rewritten from scratch. It features among other things:

  • 4 different ways to propagate
  • Implicit binary&tertiary clauses
  • Cached implied literals
  • Stamping
  • Blocking of long clauses
  • Extended XOR detection and top-level manipulation
  • Gate detection and manipulation
  • Subsumption, variable elimination, strengthening
  • 4 different ways to clear clauses
  • 4 different ways to restart
  • Large amounts of statistics data, both into console and optionally to MySQL
  • Web-based dynamic display of gathered statistics
  • 3 different ways to calculate optimal variable elimination order
  • On-the-fly variable elimination order update
  • Super-fast binary&tertiary subsumption&strengthening thanks to implicit bin&tri
  • On-the-fly hyper-binary resolution with precise time-control
  • On-the-fly transitive reduction with precise time-control
  • Randomised literal dominator braching
  • Internal variable renumbering
  • Vivification
  • On-the-fly clause strengthening
  • Cache&stamp-based learnt clause minimisation
  • Dynamic strongly connected component check and equivalent literal replacement

Code layout

As for those wondering how large the code base is, it’s about 20KLOC of code, organised as:

cryptoms_overview

Visualizing SAT solving

Visualizing the solving of mizh-md5-47-3.cnf


Visualizing what happens during SAT solving has been a long-term goal of mine, and finally, I have managed to pull together something that I feel confident about. The system is fully explained in the liked image on the right, including how to read the graphs and why I made them. Here, I would like to talk about the challenges I had to overcome to create the system.

Gathering information

Gathering information during solving is challenging for two reasons. First, it’s hard to know what to gather. Second, gathering the information should not affect overall speed of the solver (or only minimally), so the code to gather the information has to be well-written. To top it all, if much information is gathered, these have to be structured in a sane way, so it’s easy to access later.

It took me about 1-1.5 months to write the code to gather all information I wanted. It took a lot of time to correctly structure and to decide about how to store/summarize the information gathered. There is much more gathered than shown on the webpage, but more about that below.

Selecting what to display, and how

This may sound trivial. Some would simply say: just display all information! But what we really want is not just plain information: what good is it to print 100’000 numbers on a screen? The data has to be displayed in a meaningful and visually understandable way.

Getting to the current layout took a lot of time and many-many discussions with all all my friends and colleagues. I am eternally grateful for their input — it’s hard to know how good a layout is until someone sees it for the first time, and completely misunderstands it. Then you know you have to change it: until then, it was trivial to you what the graph meant, after all, you made it!

What to display is a bit more complex. There is a lot of data gathered, but what is interesting? Naturally, I couldn’t display everything, so I had to select. But selection may become a form of misrepresentation: if some important data isn’t displayed, the system is effectively lying. So, I tried to add as much as possible that still made sense. This lead to a very large table of graphs, but I think it’s still understandable. Further, the graphs can be moved around (just drag their labels), so doing comparative analysis is not hampered much by the large set of graphs.

The final layout is much influenced by Edward Tufte‘s books. Most graphic libraries for javascript, including what I used, Dygraphs, contain a lot of chartjunk by default. For example, the professional library HighCharts is full chartjunk (just look at their webpage), and is apparently used by many Fortune 500 companies. I was appalled at this — many-many graph libraries, none that offers a clean look? Luckily, I could do away with all that colorful beautifying mess — the data is interesting, and demands no embellishments.

Creating the webpage

Creating the webpage to display what I wanted was quite difficult. I am no expert at PHP or HTML, and this was the first time I had touched javascript. Although the final page doesn’t show it much, I struggled quite a bit with all these different tools. If I had to do this again, I would choose to use a page generation framework. This time, I wrote everything by hand.

I am most proud of two things on the webpage. First is the histogram at the bottom of the graphs. I know it may not seem like it, but that is all done with a javascript I wrote that pulls data from an array that could be dynamically changed. I think it does what it’s supposed to do, and does it well. The second is that I had to tweak the graph library used (Dygraphs, the best library out there, hands down), because it was too slow at printing these ~30 graphs. The graphs can be zoomed (just click and drag on X axis), and when zooming in&out the speed was really terrible. It now works relatively fast though I had to tweak the system to trade speed for a bit of visual beauty.

Final thoughts

Making the visualization webpage was a long marathon. I feel like it’s OK now, even though there were quite a number of ideas that weren’t implemented in the end. I hope you will enjoy playing with it as much as I have enjoyed making it.

The variable speed of SAT solving

Timings from the article "Attacking Bivium Using SAT Solvers". The authors didn't seem to have randomised the problems enough: the time to solve should increase exponentially, but instead it goes up and down like a roller coaster

Vegard Nossum asked me about the varying time it took for CryptoMiniSat to solve a certain instance that was satisfiable. This inspired me to write an overly long reply, which I think might interest others. So, why does the solving time vary for a specific instance if we permutate the clauses? Intuitively, just by permutating the clauses, the problem itself doesn’t get any easier or harder, so why should that make any difference at all? I will now try to go through the reasons, though they can almost all be summed up as follows: the SAT solver has absolutely no clue what it is solving, and so to solve a problem, it relies on heuristics and randomisation, both of which are influenced by its starting seed, which in turn is influenced by the order of the clauses in the CNF file. And now on to the long explanation.

Let’s divide problems into two categories: satisfiable and unsatisfiable instances. First, let me talk about satisfiable instances. Let’s suppose that we have been solving the problem for some time, and we have connected the dots (variables) well enough through learnt clauses. All that the SAT solver is waiting for is a good guess to set 3-4 variables right, and then it will propagate all variables to the right setting to satisfy all clauses. The problem is therefore to get to the point of only needing to guess 3-4 variables, i.e. to connect the dots, which we do through resolution. We can sacrifice some or all of dot-connecting (i.e. resolution) with some more guessing, and this can in fact be coded down by simply foregoing some of the conflict analysis we do. In the extreme case, all conflict analysis can be disabled, and then the solver would not restart its search. The solver would in essence be brute-forcing the instance through BCP (boolean constraint propagation). It is easy to see why this latter is prone to variation: depending on the ordering of the variables in the search tree, the tree could be orders of magnitude smaller or larger, and the first solution can be at any point in the search tree, essentially at a random place.

If we decide to carry out resolution for satisfiable problems to counter the problem of the variation of the search-tree, it is interesting to realise the following: in most cases, we can not forgo guessing. The reason is simple yet leads to quite interesting properties. Essentially, a SAT instance can have multiple solutions. If there are two solutions, e.g. 01100... and 10011... i.e. the solutions are the complete inverse of one another, then the SAT solver will not be able to prove any variable to any value. The best it could do is to create binary clauses, in the example case for instance

var1=0 -> var2=1, var3=1, var4=0...
and
var1=1 -> var2=0, var3=0, var4=1...

If we do enough resolutions, the “guessing” part will eventually become a solution-selector, i.e. it will select a solution from the set of available solutions. If there are 2^20, evenly distributed in the search space, we might need to set 20 variables before a solution is found through a simple application of BCP. Naturally, SAT solvers don’t normally do this, as they are content in finding one satisfying assignment, reporting that to the user, and exiting. It would be interesting to know the average remaining number of variables that needed to be guessed to solve the solution at the point the SAT solver actually found the solution, but this has not been done yet as far as I know. Then, we would know the trade-off the SAT solver employs between resolution and searching when solving satisfiable instances.

All right, so much for satisfiable instances. What happens with UNSAT instances? Well, the solver must either go through the whole tree and realise there is no solution, or do resolution until it reaches an empty clause, essentially building a resolution tree with an empty clause at its root. Since both of these can be done at the same time, there is a similar trade-off as above, but this time it’s somewhat upside-down. First of all, the search tree can be smaller or larger depending on the ordering of the variables (as before), and secondly, the resolution tree can be smaller or larger depending on the order of resolutions. The minimal resolution tree is (I believe) NP-hard to find, which doesn’t help, but there is at least a minimum resolution tree that limits us, and there is a minimum search tree which we must go through completely, that limits us. So, in contrast to finding satisfying solutions, both of these are complete in some sense, which should make searching for them robust in terms of speed. Finding a satisfying solution is not complete, because, as I noted above, SAT solvers don’t find all satisfying solutions — if they did, they would actually have the same properties as solving an unsatisfiable problem, as they would have to prove that there are no more solutions remaining, essentially proving unsatisfiability.

The above reasoning is probably the reason why SAT solvers tend to behave more robustly when they encounter an unsatisfiable problem. In particular, CryptoMiniSat’s running time seems more robust when solving unsatisfiable problems. Also, for problems that contain a lot of randomly placed solutions in the search space, such as the (in)famous vmpc_33 and vmpc_34 problems, the solving times seem wholly unpredictable for at least two interesting SAT solvers: lingeling and CryptoMiniSat.

Optimisations, take two

I have talked about optimisations before and I mentioned that there are a few left to describe. In this post I will talk about two of them.

The first optimisation I would like to talk about concerns compilation flags. Somehow, everyone seems obsessed with them, like if they could somehow make a slow program fast. Alas, this is not the case. Firstly, and this is quite funny: the gcc flag “-O3” usually turns on all extra flags that people tend to give. One should really ponder upon this when looking at desperate attempts to speed up code. However, there is exactly one flag-combo that is very-very useful besides “-O3”: it’s “-fprofile-generate” and “-fprofile-use”. To understand why these are useful, we must first understand that one of the challenges faced by an optimising compiler is to try to guess how many times a loop will be executed,and how many times a branch will be taken. Given a good set of guesses, the loop can be unwound (or not) and the branches can be taken by default (or not). If compiled with “-fprofile-generate”, the program generates such information on-the-fly, which later can be used by “-fprofile-use”. The speedup achieved with such an approach in the realms of DPLL-based SAT solvers is relatively high, in the order of ~5-10%. I believe many SAT solvers’ binaries don’t use this trick, even though it’s cheap to use. I personally compile with “-fprofile-generate”, then run an example problem like UTI-20-10p1 on it, and then recompile with “-fprofile-use”. An important aspect, by the way, is to execute your problem with a very typical scenario: strange or non-normal scenarios will produce spurious branching and loop data which will actually slow down the program most of the time instead of speeding it up.

The second optimisation I would like to talk about concerns cache usage. Modern processors are extremely fast, to the point that many times what really limits the processor is no longer the ability to execute the code on the data, but to actually get the data that the code needs. This problem is usually alleviated through spatial and temporal data locality that is, that data most likely needed next is usually physically close to the one that we just accessed. However, in DPLL-based SAT solvers, the main function, propagate() goes through a list of pointers, accessing the data where each pointer points to. In other words, the CPU will fetch the next pointer in the background, but will not fetch where that pointer points to — simply because it doesn’t know it’s a pointer. So, how could we alleviate this problem? Well, by telling the CPU that the next piece of information that will be needed is where the next pointer points to. Like this:

where pointers is a vector of pointers that must each be treated. The function __builtin_prefetch() calls an assembly instruction that non-blockingly fetches the data pointed to into the Level 1 cache of the processor. This/these instruction(s) are not implemented in all CPUs, but many x86 and all AMD64 architectures have them. Essentially, while we are dealing with data where pointers[i] points at, we fetch the data where pointers[i+1] points at, in parallel. Interleaving the fetching of next data with the treatment of current data helps us do more things in less time, speeding up the program. Interestingly, since prefetching the data does not alter the flow of the program in any way, omitting the prefetch call does not alter the results of the program at all: it just makes it run slower. In CryptoMiniSat I use this optimisation, and it seems to speed up the solving by a couple of percentages, maybe 5-8%. The amount of time you will save on this depends on the locality and size of the information your pointers are pointing to. If they are very small, e.g. 2-3MB in total and are close to each other, then the CPU will fetch these after the first couple of pointers have been dereferenced, so there is no need to do explicit prefetching. But if the data you are dealing with is much larger, which is often the case with SAT solvers, then such explicit pre-fetching can save quite some time.

On failed literal probing

Apparently, there have been quite some work done on failed literal probing, although I don’t think they have all been put into a common package yet.

The idea is quite simple in its purest form: try to set variable v1 to true and see if that fails. If it does, variable v1 must be false. If it doesn’t fail, try the other way around: set it to false and if that fails, it must be true.

There are a couple of tricks we can add, however. For example, if both v1 and !v1 set variable v10 to value X (where X can be either true or false), then set v10 to X. The reasoning here is simple: v1 must be set, so whatever both v1 and !v1 imply, it must be set, too. So we can safely set v10 to X.

A more interesting thinking is the following. If v1 sets v20 to true, but !v1 sets v20 to false, then v1 = v20. So, we can replace v20 with v1. One less variable to worry about!

There are even more tricks, however. If setting v1 to true and false both shorten a longer XOR to a 2-long XOR “v40 + v50 = false“, this 2-long XOR can be learnt: v40 can be replaced with v50.

And more tricks. If there is a 2-long clause v1 or v2 = true, then we can do all of the above, but with v1 and v2 this time. Since either v1 or v2 must be true, all the above ideas still work. In all previous ideas all we used was the fact that either v1 or !v1 must be true. This is still the case: either v1 or v2 must be true. This, by the way, is called 1-recursive learning. (note: the fourth paragraphs changes a bit, but we still learn a 2-long xor).

And now something new. I have been thinking about trying to re-use old learnt clauses. They really code down sometimes quite important facts about the problem. What if we tried to re-use them? After all, keeping them in memory is really cheap: the only problem with them is that doing regular propagation with them takes a lot of time, thus slowing down the solver. But what if we only used them to do failed literal probing? I just launched a test on the Grid’5000 cluster to find out. Fingers crossed…