Configuration management at IRILL days

Today I attended the IRILL days here in Paris. IRILL’s goal is to bring together high level researchers and scientists, FOSS (Free and Open Source) developers, and FOSS industry players to tackle the three fundamental challenges that FOSS poses:

  • scientific: study problems raised by the development and maintenance of FOSS code
  • educational: adapt the teaching curriculum to FOSS
  • economic: help create a sustainable ecosystem for FOSS innovations

The most interesting part of the conference for me was Debian package dependency management, which currently is done by a number of ad-hoc algorithms, a problem that the Mancoosi Project is tacking. Essentially, there are a number of program packages that need different libraries and services, and we need to decide which ones we need to install in order to satisfy those dependencies when installing/upgrading a package. This dependency graph and the package to be installed/upgraded is capture in a language called CUDF (complete PDF description here). When deciding which packages to retain,upgrade or install, there are some goals which we must achieve, such as minimising the number of new packages as much a possible: the different optimisation criterias are described here. There is a competition associated with this computationally difficult problem, the MISC Competition, which was first conducted this year, during the FLoC conference.

Interestingly, the problem of package dependencies maps very well to problems faced by industry: configuration management. For instance, many high-end car models have thousands of options. Some of these options are incompatible, and some of them depend on other options being taken. For example, it may be necessary to take a certain color scheme to apply certain figurative patterns on the car. Although this sounds trivial, once the number of options go up, the complexity increases at a typically exponential rate, after which finding suitable solutions for customers might become a real challenge. For example, I wouldn’t be surprised if a modern airplane such as the A380 had customer options approaching the tens of thousands. Managing this automatically is a really interesting and a very real challenge, in the solution of which I hope CryptoMiniSat will be able to participate in.

Open source software? Free software?

Today I attended the Open World Forum conference here in Paris. Basically, it’s a business-oriented conference to do networking for folks in the free/open source industry. Some of the panelists were sometimes really boring, such as the “French Secretary of State responsible for the Digital Economy” who seemed to have deeply confused “free as in speech” and “open source” software — a grave mistake by my book. The panelist, who knew the difference of course, regularly overemphasised the use of “open source” software, but the notion of “free as in speech” was lost, and mentioned rarely, with the notable exception of the Red Hat folks. With the release of CryptoMiniSat, which I explicitly released under a free software licence, GPLv3, I of course disagreed.

The highlight of the conference for me was meeting the current Debian Leader, Stefano Zacchiroli, and researcher Roberto Di Cosmo. I have been using Debian for a very long time, and I always wanted to contribute. However, the best way to contribute is always with your expertise, which for me is SAT solvers. So, I approached Stefano with the idea of configuration management in Debian (dpkg), for which CryptoMiniSat would be a good fit, I think: complex package dependencies could be resolved with ease using CryptoMiniSat. If included in dpkg, CryptoMiniSat could take the prize of the most deployed SAT solver away from SAT4J, which currently holds this title due to its inclusion in the Eclipse development package. Fingers crossed… and lots of work is ahead.

Documenting CryptoMiniSat

Documenting code is not always so much fun. However, in order for the code to be extended by others, documentation needs to exist. Since I conceived CryptoMiniSat as a program to be extended by others, documentation was a must. So now, after investing about 2 weeks into documentation, it is finally ready. All major classes have been documented, along with all major functions and internal data structures. The number of comment lines I added is around a thousand, all in the Doxygen format. A preliminary HTML version is available here. I hope the quality of the documentation will improve with time, and that others might correct and add more documentation as they update the program.

While documenting the code, it occurred to me that certain variable and function names were really awkward, or reflected the state of the class from an earlier version of the code. These variables and functions have been renamed, and some even removed, since they served no purpose other than making the code bigger for no reason. I have also found a number of TODOs while going through the code: sometimes I implemented things the fast way instead of the correct way, so some data structures are really strange, slow, or both. The sourcecode with all the documentation is available here, from the gitorious code repository. I will soon make a 2.6.1 release that contains not only this newly added documentation, but other additions as well.

Transitive OTF self-subsuming resolution

The title may be a bit long, but its essence is very simple: we try to shorten learnt clauses. The basic idea was described in this post: there is a clause we just derived, e.g.

d V -e V f V g (1)

where d,e,f,g are binary variables, - is binary negation, and V is the binary OR operator. We can remove a literal from this using self-subsuming resolution with e.g. the 2-long clause:

f V -g (2)

removing g from clause (1). This has been achieved before using on-the-fly self-subsuming resolution. The trick we add now is the following. Let’s assume that clause (2) was not in the clause database. With the above technique, g would not be removed. However, if clauses:

f V a
-a V -g

are inside the clause database, we could, in fact, remove g, since the above two clauses, when we resolve them on a become:

f V -g

i.e. clause (2), what we have been searching for! So, how could we do this kind of reasoning efficiently? It turns out that this is not so difficult. We simply need to try to propagate -f using only the 2-long clauses. Then, we will reach -g through the intermediary, a.

Naturally, we can do the above recursive-propagation process not only for f but for all literals in the original clause (1), and then try to perform on-the-fly self-subsuming resolution, as before. There is only one catch: doing this kind of recursive propagation on all 2-long clauses for all literals in a clause is too time-consuming. So we only do it for clauses that are short: 5 literals or shorter. The results are in, and seem to indicate that transitive on-the-fly self-subsuming resolution with a limit of 5-long clauses is indeed viable:

The set of problems used were those of the SAT Competition 2009, and the time limit was 1500 seconds on some powerful machines — they are approx 2x as fast as those used in the competition. As you can see, transitive OTF self-subsuming resolution seems to pay off in terms of number of problem instances solved within a certain time limit. I have decided to add this feature to the upcoming CryptoMiniSat 2.6.1, which should be ready soon.

anf2cnf script released

I have finally managed to fix the script that converts ANF problems to CNF format in the Sage math system. The original script was having some problems that I blogged about. The new script has corrected most of the shortcomings of the original script, as well as added some textual help for the user.

For instance, the equations

sage: print two_polynoms
[x0*x1 + 1, x0*x1 + x1]

that last time required 13 clauses and 4 variables in CNF, now look like this:

sage: print anf2cnf.cnf(two_polynoms)
p cnf 3 6
c ------------------------------
c Next definition: x0*x1 + 1
3 0
c ------------------------------
c Next definition: x0*x1 + x1
3 -2 0
-3 2 0
c ------------------------------
c Next definition: monomial x0*x1
1 -3 0
2 -3 0
3 -1 -2 0

which is 1 variable and 7 clauses shorter than the original, not to mention the visually cleaner look and human-parseable output. The new script is available here. Hopefully, some of my enhancements included in the Grain-of-Salt package will be included in this script. The problem is mainly that Grain-of-Salt uses radically different data structures, and is written in a different programming language, so porting is not trivial.