# CryptoMiniSat: 8000 commits later

The GitHub repository for CryptoMiniSat just hit 8000 commits. To celebrate this rather weird and crazy fact, let me put together a bit of a history.

## The Beginnings

CryptoMiniSat began as a way of trying to prove that a probabilistic cryptographic scheme was not possible to break using SAT solvers. This was the year 2009, and I was working in Grenoble at INRIA. It was a fun time and I was working really hard to prove what I wanted to prove, to the point that I created a probabilistic SAT solver where one could add probability weights to clauses. The propagations and conflict engine would only work if the conflict or propagation was supported by multiple clauses. Thus began a long range of my unpublished work in SAT. This SAT solver, curiously, still works and solves problems quite well. It’s a lot of fun, actually — just add some random clause into the database that bans the only correct solution and the solver will find the correct solution. A bit of a hackery, but hey, it works. Also, it’s so much faster than doing that solving with the “right” systems, it’s not even worth comparing.
Continue reading CryptoMiniSat: 8000 commits later

# The Cathedral

Last week I visited Micosoft Reasearch in Cambridge, where I gave a talk on SAT solvers, Grobner basis algorithms, and Cryptography. It was nice to visit Microsoft, and in particular I enjoyed seeing how “the cathedral” works. The kind of software development carried out at Microsoft and almost all large software companies is called the cathedral model due to the book “The Mythical Man-Month“. It’s a very interesting book that talks about how software (and in particular, a “Programming Systems Product”) is developed. All Windows variants, and e.g. the IBM System/360 fall into the category of programming systems products: they are all tested, documented, extensible, have predefined inputs and outputs, come with predefined interfaces, etc. The development of these software is called the cathedral model simply because that’s the example the book uses, and it got stuck in the literature.

I have been developing CryptoMiniSat for quite some while now, and I must say it is exceedingly difficult to create a programming systems product — CryptoMiniSat doesn’t even scratch the surface of the requirements, but then almost no SAT solver does — maybe SAT4J is the sole exception. Documentation is usually scarce, and even though I have added a lot, it’s still not enough. The input is rarely checked for sanity, and resource needs are usually just ignored: if memory is not enough, then the program will probably just crash; don’t expect graceful degradation.

As for Microsoft, I must say I was pretty impressed. It’s a nice building, with large offices, free tea/coffee/orange juice, filled with some very kind and very bright people. The audience was astonishingly good too, asking questions when they felt like they haven’t understood something, instead of just getting disinterested and e.g. browsing email. All in all, it was very interesting to visit the place, though I still feel more inclined to release my code as Free Software ;)

# Configuration management at IRILL days

Today I attended the IRILL days here in Paris. IRILL’s goal is to bring together high level researchers and scientists, FOSS (Free and Open Source) developers, and FOSS industry players to tackle the three fundamental challenges that FOSS poses:

• scientific: study problems raised by the development and maintenance of FOSS code
• educational: adapt the teaching curriculum to FOSS
• economic: help create a sustainable ecosystem for FOSS innovations

The most interesting part of the conference for me was Debian package dependency management, which currently is done by a number of ad-hoc algorithms, a problem that the Mancoosi Project is tacking. Essentially, there are a number of program packages that need different libraries and services, and we need to decide which ones we need to install in order to satisfy those dependencies when installing/upgrading a package. This dependency graph and the package to be installed/upgraded is capture in a language called CUDF (complete PDF description here). When deciding which packages to retain,upgrade or install, there are some goals which we must achieve, such as minimising the number of new packages as much a possible: the different optimisation criterias are described here. There is a competition associated with this computationally difficult problem, the MISC Competition, which was first conducted this year, during the FLoC conference.

Interestingly, the problem of package dependencies maps very well to problems faced by industry: configuration management. For instance, many high-end car models have thousands of options. Some of these options are incompatible, and some of them depend on other options being taken. For example, it may be necessary to take a certain color scheme to apply certain figurative patterns on the car. Although this sounds trivial, once the number of options go up, the complexity increases at a typically exponential rate, after which finding suitable solutions for customers might become a real challenge. For example, I wouldn’t be surprised if a modern airplane such as the A380 had customer options approaching the tens of thousands. Managing this automatically is a really interesting and a very real challenge, in the solution of which I hope CryptoMiniSat will be able to participate in.

# Open source software? Free software?

Today I attended the Open World Forum conference here in Paris. Basically, it’s a business-oriented conference to do networking for folks in the free/open source industry. Some of the panelists were sometimes really boring, such as the “French Secretary of State responsible for the Digital Economy” who seemed to have deeply confused “free as in speech” and “open source” software — a grave mistake by my book. The panelist, who knew the difference of course, regularly overemphasised the use of “open source” software, but the notion of “free as in speech” was lost, and mentioned rarely, with the notable exception of the Red Hat folks. With the release of CryptoMiniSat, which I explicitly released under a free software licence, GPLv3, I of course disagreed.

The highlight of the conference for me was meeting the current Debian Leader, Stefano Zacchiroli, and researcher Roberto Di Cosmo. I have been using Debian for a very long time, and I always wanted to contribute. However, the best way to contribute is always with your expertise, which for me is SAT solvers. So, I approached Stefano with the idea of configuration management in Debian (dpkg), for which CryptoMiniSat would be a good fit, I think: complex package dependencies could be resolved with ease using CryptoMiniSat. If included in dpkg, CryptoMiniSat could take the prize of the most deployed SAT solver away from SAT4J, which currently holds this title due to its inclusion in the Eclipse development package. Fingers crossed… and lots of work is ahead.

# Creating a SAT solver from scratch

I have lately been having a small chat with Francois Grieu and it inspired me to think about how difficult it would be to create a new SAT solver from scratch. Let’s take it for granted that the person who embarks on such a difficult task understands the basics well, and wishes to create a (nowadays very popular) Conflict-Driven Clause-Learning (or simply, CDCL) type SAT solver. The main parts of  these solvers are:

1. Clause parser&database to store the clauses
2. Propagation engine using watch-lists to carry out Boolean Constraint Propagation (BCP)
3. Conflict generation code, using the first-UIP scheme
4. Learnt clause activity calculator&deleter
5. Variable state database, storing activity and polarity
6. Search restart manager
7. Pre-processor is recommended — SatELite does the job reasonably well.

None of these are very difficult to do. However, by implementing these in a minimalistic fashion, one will only end up with a reasonably fast, but not a very fast solver. Enhancements, for example, storing binary clauses implicitly in the occurrence lists in the database bring small, but nevertheless important, benefits. Similarly, having specialised watch-lists to handle binary and tertiary clauses further speed up the solving. Conflict clause minimisation, on-the-fly subsumption, and on-the-fly self-subsuming resolution improve conflict generation. Polarity-calculation and -caching comes handy for variable polarities, and dynamic restarts are also useful. Learnt clause activities might need to support both MiniSat-style and Glucose-style activities.

The real challenge, however, comes with bullet number 7: a fast and complete pre- and in-processor is very complex to write. The pre-processing done by SatELite, i.e. self-subsuming resolution, variable elimination and subsumption are just the icing on the cake for a good pre- and in-processor. Doing these in the middle of the solving (i.e. in-processing), using learnt clauses, doing asymmetric branching, binary clause reasoning, variable replacement, blocked clause elimination, failed literal probing, tautology elimination, clause strenghtening with tautological binary clauses, clause cleaning, and a long list of others — that’s where the hard part lies.

With all of these challenges facing SAT solver implementers, is it reasonable to think that anyone can relatively easily participate in SAT Races and SAT Competitions? If one wishes to respond with yes to this question, then there are two paths ahead: either immense amounts of time need to be invested into writing such a solver, or an already written solver has to be taken as a base to build ideas upon.  I think the only reasonable choices for a base are: lingeling, MiniSat, PrecoSat and (I hope to think) CryptoMiniSat. None of these solvers are perfect of course, so one has to put up with all the inherent problems in these solvers.

In the past months, I have been writing many blog entries trying to detail the inner workings of SAT solvers — mostly to help others (and in the process, myself) struggling with SAT solvers. Maybe I should be spending more time trying to document the code of CryptoMiniSat, so that the barrier for entry to creating winning SAT solvers could be lowered.