Today marked the 100th cluster experiment I have conducted on the Grid’5000 cluster in Grenoble. The computers in this cluster are very powerful beasts, equipped with two Xeon Core i7-based processors, each with 4 cores for 8 cores in total with some impressive memory size and connectivity. The sad part is that since CryptoMiniSat is sequential software, I only use 1 core from the 8 total, since I cannot allow to have fluctuating test results due to memory-bandwidth problems when running multiple instances of the solver on the same physical computer. I have just counted, and using a very conservative estimate, I have wasted 41.8 years of CPU core time to get CryptoMiniSat up to speed.
Was it worth wasting such immense amount of CPU core time to speed up a SAT solver? I think yes. First of all, I only used the cluster during night and during the weekends, when load was negligible. So, in a way I wasn’t wasting much apart from my own time launching the cluster experiments, collecting and analysing the results. Secondly, the feedback from the cluster immensely influenced my research and engineering decisions, and taught me many things regarding SAT. The best thing I have learnt is: never trust your intuitions when it comes to SAT. Secondly, good ideas on paper rarely turn out to be good ideas when first implemented, but with heavy tuning, they usually turn out to work well. Let me detail two examples.
Clause Vivification (I called it asymmetric branching) is a relatively simple example of why tuning is always necessary. This optimisation shortens clauses it is fed to, however, it is slow. The idea I used when tuning this optimisation is that the longest clauses are probably responsible for the slow solving and the long generated conflict clauses. So, I sort clauses according to size first, and then do clause vivification on the topmost (i.e. longest) clauses. Furthermore, I realised that enqueueing literals 2-by-2 on the stack (instead of enqueueing 1-by-1) saves significant time, and is almost as good as enqueueing them 1-by-1 — the difference is at most 1 literal in the final simplified clause.
A more complex example is that of speeding up self-subsuming resolution. The problem was that keeping all clauses attached in the watchlists slowed down self-subsuming resolution. However, the obvious solution of detaching all clauses and reattaching them after self-subs. resolution was incomplete, strangely enough. With multiple cluster experiments it became apparent that keeping clauses attached is important, because some propagations are made during the running of the algorithm, and propagating these (very few!) variables is crucial. The solution was to keep natively stored (2- and 3-long clauses) in the watchlists. These natively kept clauses don’t have pointers in the watchlists, and therefore their representation at the pointer’s location can be updated. The watchlists are therefore partially cleaned, then self-subsuming resolution is carried out, then the watchlists are fully cleaned and finally the watchlists are fully populated — not exactly a trivial algorithm, but it achieves both goals of fast and complete self-subsuming resolution.
Off-topic: For those who like contemporary art, here is a gem