Tag Archives: Collaborative effort

Creating a SAT solver from scratch

I have lately been having a small chat with Francois Grieu and it inspired me to think about how difficult it would be to create a new SAT solver from scratch. Let’s take it for granted that the person who embarks on such a difficult task understands the basics well, and wishes to create a (nowadays very popular) Conflict-Driven Clause-Learning (or simply, CDCL) type SAT solver. The main parts of  these solvers are:

  1. Clause parser&database to store the clauses
  2. Propagation engine using watch-lists to carry out Boolean Constraint Propagation (BCP)
  3. Conflict generation code, using the first-UIP scheme
  4. Learnt clause activity calculator&deleter
  5. Variable state database, storing activity and polarity
  6. Search restart manager
  7. Pre-processor is recommended — SatELite does the job reasonably well.

None of these are very difficult to do. However, by implementing these in a minimalistic fashion, one will only end up with a reasonably fast, but not a very fast solver. Enhancements, for example, storing binary clauses implicitly in the occurrence lists in the database bring small, but nevertheless important, benefits. Similarly, having specialised watch-lists to handle binary and tertiary clauses further speed up the solving. Conflict clause minimisation, on-the-fly subsumption, and on-the-fly self-subsuming resolution improve conflict generation. Polarity-calculation and -caching comes handy for variable polarities, and dynamic restarts are also useful. Learnt clause activities might need to support both MiniSat-style and Glucose-style activities.

The real challenge, however, comes with bullet number 7: a fast and complete pre- and in-processor is very complex to write. The pre-processing done by SatELite, i.e. self-subsuming resolution, variable elimination and subsumption are just the icing on the cake for a good pre- and in-processor. Doing these in the middle of the solving (i.e. in-processing), using learnt clauses, doing asymmetric branching, binary clause reasoning, variable replacement, blocked clause elimination, failed literal probing, tautology elimination, clause strenghtening with tautological binary clauses, clause cleaning, and a long list of others — that’s where the hard part lies.

With all of these challenges facing SAT solver implementers, is it reasonable to think that anyone can relatively easily participate in SAT Races and SAT Competitions? If one wishes to respond with yes to this question, then there are two paths ahead: either immense amounts of time need to be invested into writing such a solver, or an already written solver has to be taken as a base to build ideas upon.  I think the only reasonable choices for a base are: lingeling, MiniSat, PrecoSat and (I hope to think) CryptoMiniSat. None of these solvers are perfect of course, so one has to put up with all the inherent problems in these solvers.

In the past months, I have been writing many blog entries trying to detail the inner workings of SAT solvers — mostly to help others (and in the process, myself) struggling with SAT solvers. Maybe I should be spending more time trying to document the code of CryptoMiniSat, so that the barrier for entry to creating winning SAT solvers could be lowered.

The 2010 SAT Conference

I am currently at the SAT 2010 conference, in Edinburgh. It is my second SAT conference, but the first one where I finally have a chance to know who is who. I had a chance to talk to some great researchers. For instance, I talked to (in no particular order) Armin Biere (author of PrecoSat and Lingeling), Niklas Een, Niklas Sörensson (authors of MiniSat — new version is out, btw), Marijn Heule (author of the march_* set of solvers), Matti Järvisalo (one of the authors of Blocked Clause Elimination), the author of MiraXT, and many others. It’s really a great place to meet up people and get a grasp on what SAT is really about. It’s also a great way to understand the motivation behind certain ideas, and to hear the problems encountered by others.

For instance, I now know that bugs, about which I have already written a blog post about is a problem that comes up for nearly everyone, and tackling them is one of the most difficult things for the developers. I have also finally understood the reason behind the MiniSat Template Library (MTL), which I’ve always found odd: it re-implements many things in the STL (the Standard Template Library), which I have always found unnecessary. Although the reasons are now clear (better control, easier debugging, less memory overhead), I think I will still replace it with STL. I have also had a chance to talk to the kind reviewer who did such an extensive review of my paper, which was quite an enlightening discussion. If you are interested in SAT, I encourage you to attend next year: there are usually some workshops associated with the conference, where it is easier to have a paper accepted: e.g. this year, a Masters student got his quite interesting paper (long version) accepted, and I am sure he has benefited a lot from it.

Social coding

I saw a talk by Ed Catmull on the problems Pixar had in their early days. The talk is very interesting, and I recommend it. What caught me, is that at Pixar they had a “brain trust”, a certain group of people who could express their new ideas to one another without fear of backlash. This is actually very difficult to achieve, since early implementations of ideas are always “childish”, lack maturity, and are therefore not considered good enough to be shared with others. However, if you don’t share the idea with the others early, then you will get feedback too late, when you have already invested a lot of time into it. Criticism at this late stage hurts more and what is worse, it might make you so uncomfortable, that you might abandon the idea altogether.

The trick is to get a set of people to trust each other well enough to share not-yet mature ideas with one another in the early stage, so that criticism can help everyone get their ideas right from the start. Establishing this trust is very difficult. I started to understand what Ed Catmull was talking about while working with the STP team. The earlier I seemed to share my ideas, the better they got. For instance, CryptoMiniSat was originally never meant to be used as a library. However, once they (mostly Vijay Ganesh) helped me debug the code, it was much easier to get things “right”. I also met Laurent Simon last year for half a day, and he gave me the idea of using Grid’5000 to test the performance of CryptoMiniSat. Without these early steering ideas, I would have never been able to get CryptoMiniSat to where it is today.

On PrecoSat

Lately, I have been having a poll of who will win the SAT Race’10. Apparently, 5 people voted until now, me included. I voted PrecoSat. Not because I didn’t want to vote CryptoMiniSat, but because I honestly believe it really will win. The reason is awfully simple: Armin Biere, the (main) author of PrecoSat has had far more experience and time than me to perfect his solver. This is normal, though I only hope he might notice the amount of effort that was put into CryptoMiniSat.

CryptoMiniSat now contains ~10’000 lines of code, compared to the original so-called ‘core’ MiniSat it has been developed from, which contained ~1’000. A ten-fold increase in code, which represents more than 1300 commits in the main tree (and about 1.5 years of my life). This effort has not been carried out alone. Though I haven’t much talked about people that helped me on the way, CryptoMiniSat has been tested and patches have been committed by a number of people, putting a lot of their time and effort into making a working SAT solver. They all worked just for the fun of being part of the project.

The collaborative nature of the CryptoMiniSat project is partially the reason why, I hope, it might succeed in the end in becoming a viable alternative to PrecoSat. End of next week, I will post the whole revision history of CryptoMiniSat to Gitorious, so everyone can see how certain options evolved over time — my commit messages are usually (overly) long ;) Then, bugzilla, mailing lists, forums and the rest will hopefully garner some steam, and we can start rolling. The road is wide ahead…