Category Archives: Development

CryptoMiniSat 3.2.0 released

CyptoMinSat 3.2.0 has been released. This code should be extremely stable and should contain no bugs. In case it does, CryptoMiniSat will fail quite bady at the competition. I have fuzzed the solver for about 2-3000 CPU hours, with some sophisticated fuzzers (all available here — most of them not mine) so all should be fine, but fingers are crossed.

Additions and fixes

The main addition this time is certification proofs for UNSAT through the use of DRUP. This allows for use of the solver where certainty of the UNSAT result is a necessity — e.g. where lives could potentially depend on it. Unfortunately, proof checking is relatively slow through any system, though DRUP seems to be the best and fastest method. Other than the implementation of DRUP, I have fixed some issues with variable replacement.

SAT Competition’13

The description of the solver sent in to the SAT Competition’13 is available from the subfolder “desc” of the tarball. The code of 3.2.0 is actually the same that will run during the competition, the only changes made were:

  • the DRUP output had to be put into the standard output
  • the line “o proof DRUP” had to be printed
  • certified UNSAT binary uses the “–unsat 1” option by default
  • compilation was changed to use the m4ri library that was included with the tarball
  • linking is static so m4ri and other requirements don’t cause trouble
  • boost minimum version had to be lowered

You can download my submissions to the competition, forl, from here and here.

GCC 4.5.2 at the SAT Competition’13

(Note update below)

Just a short note. All MiniSat-derived SAT solvers might give wrong results during the competition. No, MiniSat doesn’t have a bug. GCC has a giant bug. Vegard reported this years ago — I personally put about 12h tracking it down, posted it to the CryptoMiniSat mailing list, and Vegard took it up, stripped it down and reported it.

This is a well-known bug, there is a thread about it on the MiniSat mailing list and I personally asked the competition organizers back in 2011 not to use the 4.5.2(or .3?) they wanted to use. I used to have code to check for this bug (there is a handy check attached to the bug report), but I removed it, though maybe I should add it back. It’s fixed in 4.5.4+, still present in 4.5.3 where Vegard&me found it. I hope I will be able to convince the organisers this time around, too :) By the way, the option you need to pass to gcc to avoid the bug is “-fno-tree-pre” — though of course this turns off an optimisation and so slows down the final executable.

Update: The organisers decided to move to gcc 4.8.0. This is very good news. They also note that the speed of the solvers is better, which is also very good news. I am happy I contributed to this welcome improvement and I hope it doesn’t cause any delays or inconveniences to anyone.

CryptoMiniSat 3.0 released

CryptoMiniSat 3.0 has been released. I could talk about how it’s got a dynamic, web-based statistics interface, how it has more than 80 options, how it uses no glues for clause-cleaning and all the other goodies, but unfortunately these don’t much matter if the speed is not up to par. So, here is the result for the 2009 SAT Competition problems on a 1000s timeout with two competing solvers, lingeling and glucose:

cryptoms_speed

This of course does not mean that CryptoMiniSat is faster than the other solvers in general. In fact it is slower on a number of instances. What it means is that in general it’s OK and that’s good enough for the moment. It would be awesome to run the above experiment (or a similar one) with a longer timeout. Unfortunately, I don’t have a cluster to do that. However, if you have access to one, and would be willing to help with running the 3 solvers on a larger timeout, please do, I will post the updated graph here.

Update Norbert Manthey kindly ran all the above solvers on the TU Dresden cluster, thanks! He also kindly included one more solver, Riss 3g. The cluster was an AMD Bulldozer architecture with 2cores/solver with an extreme, 7200s timeout. The resulting graph is here:

cryptoms_speed

Riss 3g is winning this race, with CryptoMiniSat being second, third is glucose, and very intriguingly lingeling the 4th. Note that CryptoMiniSat leads the pack most of the time. Also note, this is the first time CryptoMiniSat 3.0 has been run for such a long time, while all the other competing solvers’ authors (lingeling, glucose, riss) have clusters available for research purposes.

Licensing

For those wondering if they could use this as a base for SAT Competition 2013, the good news is that the licence is LGPL so you can do whatever you want with it, provided you publish the changes you made to the code. However, I would prefer that you compete with a name such as “cms-MYNAME” unless you change at least 10% of the code, i.e. ~2000 lines. For the competitions after 2013, though, it’s all up for grabs. As for companies, it’s LGPL, so you can link it with your code, it’s safe, you only have to publish what you change in the library, you don’t have to publish your own code that uses the library.

Features

CryptoMiniSat has been almost completely rewritten from scratch. It features among other things:

  • 4 different ways to propagate
  • Implicit binary&tertiary clauses
  • Cached implied literals
  • Stamping
  • Blocking of long clauses
  • Extended XOR detection and top-level manipulation
  • Gate detection and manipulation
  • Subsumption, variable elimination, strengthening
  • 4 different ways to clear clauses
  • 4 different ways to restart
  • Large amounts of statistics data, both into console and optionally to MySQL
  • Web-based dynamic display of gathered statistics
  • 3 different ways to calculate optimal variable elimination order
  • On-the-fly variable elimination order update
  • Super-fast binary&tertiary subsumption&strengthening thanks to implicit bin&tri
  • On-the-fly hyper-binary resolution with precise time-control
  • On-the-fly transitive reduction with precise time-control
  • Randomised literal dominator braching
  • Internal variable renumbering
  • Vivification
  • On-the-fly clause strengthening
  • Cache&stamp-based learnt clause minimisation
  • Dynamic strongly connected component check and equivalent literal replacement

Code layout

As for those wondering how large the code base is, it’s about 20KLOC of code, organised as:

cryptoms_overview

How to use CryptoMiniSat 2.9 as a library

There have been some questions asked about how to use CryptoMiniSat as a library. Here, I give a simple example usage to help any future potential users. If there are any question that seems unanswered, please feel free to add them as a comment to this post.

First, we initialize the solver with high verbosity:

#include "cmsat/Solver.h"
using namespace CMSat;

SolverConf conf;
conf.verbosity = 2;
Solver solver(conf);

Then we add the variables 0..9:

for(int i = 0; i < 10; i++) {
  solver.newVar();
}

And add a clause that would look in the DIMACS input format as "2 -5 10 0":

vec clause;
clause.push(Lit(1, false));
clause.push(Lit(4, true));
clause.push(Lit(9, false));
solver.addClause(clause);

We now create the assumptions that would look in DIMACS input format as "1 0" and "-5 0":

vec assumps;
assumps.push(Lit(0, false));
assumps.push(Lit(4, true));

We now solve with these assumptions:

lbool ret = solver.solve(assumps);

Let's see if the state of the solver after solving is UNSAT regardless of the assumptions:

if (!solver.okay()) {
  printf("UNSAT!\n");
  exit(0);
}

Note that okay() can only be FALSE if "ret" was "l_False". However, it is important to distinguish between two kinds of l_False: one where only the assumptions couldn't be satisfied, and one where no matter what the assumptions are, the problem is unsatisfiable. See below for more.

Now let's check that the solution is UNSAT relative to the assumptions we have given. In this case, let's print the conflicting clause:

if (ret == l_False) {
  printf("Conflict clause expressed in terms of the assumptions:\n");
  for(unsigned i = 0; i != solver.conflict.size(); i++) {
    printf("%s%d ", (solver.conflict[i].sign()) ? "" : "-", solver.conflict[i].var()+1);
  }
  printf("0\n");
}

The above will print an empty clause if solver.okay() is FALSE.

Now, let's see if the solution is SAT, and if so, print the solution to console, using DIMACS notation:

if (ret == l_True) {
  printf("We have the solution, it's here: \n");
  for (unsigned var = 0; var != solve.nVars(); var++)
    if (S.model[var] != l_Undef) {
      printf("%s%d ", (solver.model[var] == l_True)? "" : "-", var+1);
      printf("0\n");
    }
}

If solver.okay() is still TRUE (which it must be, if the ret was l_True), we can now add new clauses and new variables without restriction, and solve again (potentially with different assumptions). Note that most of the above applies to MiniSat as well.

When compiling the code, you have to pass the parameters such as:

$ g++ -fopenmp -lcryptominisat myprogram.cpp -o myprogram

PS: Thanks go to Martin Albrecht for pointing out some ambiguities in my explanation of the library usage, which have now been fixed.

A nifty way of saving time with OpenCL

In OpenCL, the platform for using the graphics card as a form of highly parallelised machine, reading and writing to the memory of the card is very expensive. For this reason, the OpenCL standard offers a set of highly flexible ways of writing and reading data to- and from the OpenCL device. Using these flexible systems takes time to get used to, but seem to be necessary to get the most out of the graphics card.

The Problem

If data needs to be processed, a standard way of pushing it through the compute device is to use two instances of an OpenCL kernel: while one is executing, the system is reading out the finished data and uploading the new data to compute for the other kernel. Then the kernels switch places, kernel 2 is processing, while data is read-written for kernel 1. This requires the programmer to index every data piece (since there are now two of them), and make sure to get the last iteration right. This latter is tricky, because even if there is no longer data to process, kernel 2 may still be executing even though kernel 1 can now be stopped.

These difficulties make the system tricky to get right, though not impossible. In this blogpost I would like to share with you a nifty trick I have found that completely eliminates this complexity, significantly cleaning up code, while keeping the same speed.

The Solution

Since OpenCL 1.1, OpenCL supports multithreading. While managing threads is one of the most dreaded parts of programming, OpenMP makes things extremely simple. The architecture we will need is the following:

  1. A single-kernel system that churns through data. Let us call this Worker
  2. A simple class that keeps in mind which data pieces have been finished with. Let us call this Dealer

There will be two Workers, and one Dealer. When the Worker requests data from the Dealer, it does it inside a critical OpenMP directive. The Dealer is simply initialised with the data to read through, and deals this data out to the Worker upon request, and marks the data piece as ‘done’. If there are no more data to work on, the Worker exits. The only tricky part is the startup, which should look like:

//Don't allow dynamic number of threads
omp_set_dynamic(0);

//Set the number of threads we need
omp_set_num_threads(2);

//Initialise Dealer
Dealer dealer(my_options);

#pragma omp parallel
{
  int thread = omp_get_thread_num();
  int num_threads = omp_get_num_threads();

  Worker worker(&dealer, my_options);
  worker.compute();
}

It’s really simple, and works on all platforms where OpenMP is supported, which is pretty much everything, including Windows, Linux, embedded devices, etc. The really nice part about this architecture though is that it allows for way more than just getting around the problems with multiple kernels in the same source code.

The advantages

First of all, this architecture allows for the Dealer to do much more than just deal out data. You can set the number of threads to 3, set the workers to do their stuff, and wake up the Dealer every 5 seconds to dump the data, for instance. This only needs a “while(!finished) { wait(5sec); dump_my_data();}” loop in Dealer. Since the other threads can simply continue working while Dealer is writing to disk, this may be quite an important advantage in case the data you are generating is large. Similarly, Dealer could pre-load the data from disk that is to be dealt out to Workers.

Another very nice advantage of the above is that there can be any number of Workers, which sounds crazy until you get your hands on a multi-GPU computer. To take advantage of all GPUs in the system, the Workers simply need to be paremetrised with the device number that we wish to use. For example, if we have two OpenCL devices on a machine, we need to launch 4 threads, with device numbers 0,0 and 1,1. If the data chunks to compute are small relative to the overall data, e.g. data chunks need approx. 1 minute to process, but the overall data takes more than an hour, the dealer will distribute the load pretty evenly among the GPUs, so the GPUs can be even of a mixed speed, and it won’t affect the overall finish time much.

Conclusions

Although the proposed architecture uses multi-threading, something that most programmers fear (sometimes rightly so), it leads to a significantly cleaner and more flexible code that can do more with less. Using this architecture, the code should be easier to write, maintain, and extend — something all good programmers strive for.