# Thoughts on SAT@home

If you have ever wondered how your multi-month SAT problem could be solved, the answer is here: SAT@home by some kind Russian researchers! This idea has been brewing in my head, we even started a mini-project with a  friendly American researcher, but we never got it into a working condition :(. It’s great that someone took the time to do it, though. Creating such a setup has some easier and harder parts. Setting up a BOINC server is not that hard, but it needs to run 24/7, which is difficult, unless you are department head at a university or you have a company behind you. Then the scripts need to be written, which is relatively easy to do, but testing and debugging is time-consuming. The really hard part, though, is how to distribute the workload. Treatises could be written on that, and the one above uses the simplest method, that of cutting up the problem along a couple of variables. It’s a good start, I think: starting easy is always the best way to start — the first automobile didn’t try to achieve 300km/h speeds, and it didn’t have to.

In the long run, I guess the setups will get more and more complicated, combining many techniques from cutting on variables, to clause sharing with a multitude of clause usefulness heuristics, to sharing parameter usefulness statistics. Workshares themselves could eventually be diversified, some doing ‘simple’ search without resolution, some doing only resolution (and clause cleaning?), some doing only (specific) simplification algorithms such as subsumption or strengthening, some doing data analysis on the current state of the problem such as reachability analysis, number of gates or other higher-level elements such as adders in the problem, etc. In other words, there are many ways to improve this. Maybe even a common interface to such a system could be laid down where different SAT solvers could be plugged in and used without notice.

A friend of mine has once written me a mail describing his work, which essentially talks about paying for a webpage not by looking at adverts, but by running a javascript program. The idea is, of course, that we could write a SAT solver in javascript, and make people run the script by putting it on a well-visited webpage. The server would then distribute SAT problems to the javascripts, which would send some computed data back (like learnt clauses), while the person is surfing the page. Cheap (maybe even free) and essentially unlimited CPU time for hard problems. We could use it to answer some fun math questions, or break some cryptographic systems, for example… just have to find someone crazy enough to write a SAT solver in javascript! :)

# Distributed SAT solving is fun

I have lately been trying to keep my promise I made on the webpage of CryptoMiniSat: “The long-term goals of CryptoMiniSat are to be an efficient sequential, parallel and distributed solver”. The first step was to make a relatively efficient sequential solver. With the winning of the SAT Race’10, that was at least partially accomplished. The next step was to make the program parallel. Now, I have made it distributed. The parallel version was made using OpenMP, and you can try it using the “master” branch from the git repository. The distributed version uses MPI, which means it can probably work on most cluster/supercomputer architectures. This distributed version is currently alpha, and is not yet in the git repo.

My first impression of the distributed version is: it’s awsome to solve using 10-20 machines in parallel. The speed is just unbelievable. It’s also great to have the multi-threaded and distributed option in one program, as I can now launch just one MPI process on each machine, and the 8 cores on each machine can use threads to work together, instead of using the somewhat slower MPI. I haven’t measured the speedup, and I haven’t measured the MPI load, so I cannot say anything yet about efficiency. All I can say, is that using 160 threads is a little faster than using only one.

As for technical details, I am only sharing unitary and binary clauses. It’s a trivial and lazy choice, but it makes it easier to check for redundant clauses (thanks to Armin Biere for highlighting to me that this is important), and works well with CryptoMiniSat’s expand-contract loop, where a set of simplification operations are carried out at regular intervals to squeeze the most unitary and binary clauses out of the current knowledge stored inside the clause database.

I am looking forward to breaking a somewhat problematic crypto-problem with the new distributed CryptoMiniSat. Hopefully this will work, though maybe it’s some way off — I think the original time needed to break the problem might be in the hundreds of days with single-threaded non-distributed CryptoMiniSat. Fingers are crossed that the speedup is non-linear with the distributed version, and I can allocate enough computers for enough time to the job.

Edited to add — The solver has now been running stable on 100 computers, distributed. This is getting to be real fun. Next step is to deploy it on a 342-node cluster, each node double-core. Having 684 threads collaborate will feel mighty ;) Also, it will inspire me to build more heterogeneity into the architecture. It was never designed to run on this many nodes…

# FPGA programming and SAT

I have lately been trying to implement a cryptographic protocol (CryptoGPS) on an FPGA. First impressions are that VHDL is a difficult language, but once understood, it’s not all that difficult to work with it. It took me ~2 weeks to get a hold on the language, but now it feels relatively easy to program in it. I have been looking forward to getting down and dirty with FPGAs, because it’s one of my long-term goals to somehow integrate SAT solvers with FPGAs.

SAT solvers are extremely versatile tools that can resolve a number of very complex issues with relative ease. FPGAs are also extremely versatile (essentially,  they are reprogrammable hardware), and are very fast. Making once piece out of these two is like merging Batman with Superman. The trick is, of course, how to merge them? The problem with merging is that FPGAs are typically very good at executing a (preferably small) set of instructions over and over again on a relatively small dataset, while SAT solvers usually deal with large datasets, and execute an excruciatingly complex set of instructions on them over and over again.

I think a way to resolve this issue could be to use the FPGA as a sort of oracle for the SAT solver. The solver is running on a normal computer and at the same time it is connected to an FPGA that awaits its orders. The SAT solver regularly extracts a small set of constraints from its constraint database, sends it to the FPGA, which tries every possible combination of the variables, and sends back the result to the SAT solver. While the FPGA works its wonders, the SAT solver could be executing the normal SAT procedure, and regularly process the incoming data from the FPGA.

The FPGA could simply try every single combination of every variable setting on the clauses it received, and return compact information such as: “variable 5 must be TRUE” or “there is no solution”. More complex return values, such as “variable 10 is always equal to variable 2” could be returned, but it would make for a more complex VHDL design, leading to an overall slowdown. The simplier the logic, the more we can fit on a given FPGA, which means more parallelisation, and therefore a speedier solution overall. Since the number of variables in the clauses will be limited (checking for 30 is already at least 2^30 operations), we could store the clauses as constant-length double-bitfields in the FPGA, and use a bitmask for the current variable settings. This could allow for massively parallelised evaluation of clauses.

On the SAT solver side, we need some guiding principles along which to pick the constraints for the FPGA. The usual suspects could be constraints that contain the most active variables, for example. More complex metrics could minimise the number of variables inside the constraints while maximising the number of small constraints they are present in. Other metrics could perhaps concentrate on the current restart branch of the SAT solver, and try to ask the FPGA to prove that below a certain decision level, the current branch cannot contain a solution — this way, we could backjump using the data from the FPGA.

Would this architecture work? I think its performance would depend on whether there is any compact information that the FPGA could extract from the clauses sent to it. This depends on whether there are small “cores” in the problem that contain compact information, and whether we can extract these cores. The existence of such information-rich cores depend on many things. The clauses/variable ratio is one such metric, another would be the number of symmetries in the problem: the more symmetries, the less compact information can be extracted. Cryptographic problems come to my mind when seeing these criteria. They usually have a very high clauses/variables ratio and rarely, if ever, contain symmetries.

# Parallelisation

I have lately been reading the book Using OpenMP and thinking about how CryptoMiniSat could be parallelised. Apparently, there are multiple ways to achieve it. I haven’t yet had time to read through all previous approaches, but it seems there are at least two main ways to distribute the solving: through guiding paths and using the idea of ManySat.

The first approach, guiding paths, decomposes the problem into multiple sub-problems, setting some variables to `true` or `false`. The problems are then solved completely independently with a master coordinating the effort. I don’t like this approach for two reasons. Firstly, for cryptographic problems, the decomposition can be made very well since the problem in its most abstract form (i.e. key, plaintext, ciphertext) is known, and so a very good guiding path can be calculated. However, even with this very good guiding path, the expected total time to solve is far more than if the problem was given to one solver instance. In other words, if we decompose the problem into 4 sub-problems, and each sub-problem takes on average T time to solve, then the total average time is 4T, but if we gave the problem to one solver, it would have found the solution on average in e.g. 2T time. In other words, the gain is not as much as one would hope. I would guess that this ratio (2T/4T = 1/2 in this case) is even worse when the problem’s abstract form is totally unknown (which is most of the time), and the solver has to guess a guiding path.

The second approach, that of ManySat is to start the same solver with the same problem multiple times, but use different restart heuristics to solve the problem. Since modern DPLL-based SAT solvers use randomised algorithms, this helps to reduce the expected time to solve an instance. Also, the solvers share short clauses with one another to work in a more collaborative manner. However, I believe that given the distribution of the solving time of a problem with a given solver, even if more instances of it are launched (each with a random seed and different restart strategy), the minimum solving time of all launched solvers will not be as small as one would hope. It is extremely rare that a difficult instance can be solved in under e.g. 1 minute if the average time to solve it is 1000 minutes. So, if we were to launch 1000 solver instances, probably none would finish before 2-300 minutes. Of course I am not counting the effect of shared learnt clauses here, but I am not very convinced about them. Apparently, activity-based learnt clause sharing (instead of size-based) is more effective than size-based (see this), and ManySat shares them according to the size.

So, if all else fails, what is there left to do? Well, I bought the OpenMP book to find out. I wish to implement the idea of distributing clauses to different threads, thus distributing the BCP (Boolean Constraint Propagation) load. However, multi-threading can only go so far, so MPI will eventually need to be used, along with multiple instances of the solver on the same physical CPU, the way ManySAT is doing it — though I will try to implement some form of clause-sharing. The good thing is, that implementing MPI will also bring the possibility of running a problem instance on a huge cluster, something I am really looking forward to.

EDITED TO ADD (16/07/2010): I just realised that MiraXT does quite a number of things that I wish to implement into CryptoMiniSat.