Tag Archives: visualisation

Stats on CryptoMiniSat’s development

I have just discovered the gitstats tool, and quickly put together a set of statistics on the development of CryptoMiniSat, the SAT solver I have been working on lately. I have been using the GIT repository for most of my work on CryptoMiniSat, doing ~2000 commits since its inception, so it gives a good picture of how things evolved, and how I usually work. Firstly, the hours I usually work:

I basically work almost as much at midnight as I do at midday. The afternoon is clearly the most active and practically useful part of my day. Next, the graph to show which days I work the most:

Essentially, I worked as much on CryptoMiniSat during Sunday as I did during Monday or Friday. Saturday was the day I must have been the most fed up with development. Finally, the number of commits by months:

The 2nd and 3rd months of 2010 was a struggle because of the SAT and PoS deadlines. I tried to put together a credible research paper, which finally got accepted at PoS. During the 6th and 7th months of 2010 I was mostly looking forward to the results of the SAT Race, so I wasn’t developing so much.

Why programs fail

I just bought the book with the title of the blogpost by Andreas Zeller. Essentially, it’s about debugging, where the author analyses the chain of program defect leading to infected program state, finally leading to program failure. I bought the book because while working with CryptoMiniSat, I have encountered so many bugs that they could fill a book.

The problem with chasing bugs in CryptoMiniSat is that SAT solvers employ a very optimised algorithm that makes a lot of decisions per second, so recording all interesting events makes for a very big dump file. I have per-module verbosity settings, so if I can narrow down what module(s) are causing the failure, there is less debug output to go through, but sometimes I still have to wade through 10-20GB dumps to see how the program state got infected.

While going through such large dumps, the following occured to me: what if we loaded all this data into a MySQL database? Some 8 years ago I had a job where I processed a large chunk of the data on all of Hungary’s phone conversations on a daily basis. Essentially, data on your phone calls are recorded in many different databases, and these must be merged to calculate your bill and to provide statistical feedback to the company. I wrote a program that processed 3million records in about 1 hour using MySQL and a number of SQL statements that spanned approx. 2-3000 lines. In other words, I am not afraid of large databases, and I think such multi-gigabyte data could easily be loaded into a MySQL database.

The goal of loading a SAT run into a MySQL database is the following: once the MySQL data is ready, a bit of PHP, graphviz and gnuplot will plot any relevant information at any point in time about the solving. In other words, we could dig this data visually, with some very interesting effects.

For example, I have been lately having problems with clause strengthening interfering with variable elimination. The idea is in theory quite simple. Variable elimination eliminates a variable and adds clauses that represent the original clauses the variable was present in. These clauses are sometimes trivial, such as:
a or b or c or !c = true
which is never added since it is satisfied whether c is true or false. So far, so good. Clause strenghtening is a method whereby clauses are shortened. For example, if there was a clause:
e or b or c = true
the solver might discover that in fact it is also true that:
e or b = true
and replace the original clause with this, since this clause is “stronger”: it poses a more stringent requirement. The way variable elimination and clause strenghtening interact is as follows. Let’s suppose we eliminate on varible e. If the above clause is strenghtened before elimination, then instead of:
a or b or c or !c = true
we might have got:
a or b or !c = true
which must be added. We actually have more clauses because of clause strenghtening! I believe nobody ever found this anomaly. To properly quantitise how this affects solving, we need to know all states of a clause, and a web interface with a MySQL backbone could help a lot with that.