Speeding up MiniSat with a one-liner

All SAT solvers must have a tri-state class that can hold the values true, false, and undefined. Although it’s not hard to write code to represent such a class, it’s hard to write such it so that all typical operations are fast. In this blog post I will compare three different ways of doing it.

The original MiniSat 2.0 code

The original MiniSat 2.0 used the following code for such a class:

All the operators are pretty straightforward and fast, except operator^ which flips true to false and false to true in case b is true. As you might have noticed it’s using a branch instruction to achieve this result which is weird and slow. Branching is known to be a headache as a wrong branch prediction can slow down the CPU quite a bit. Since b is supposed to be random (though it’s not that random), this might slow down the flip operation. Let’s call this version of the class solution (1).

Newest MiniSat code

Current MiniSat has changed the code to the following:

It doesn’t use a branch operator to flip values: operator^ uses a simple XOR. However, other operators, notably the comparison operator became much more involved, and thus expensive. Let’s call this solution to the problem solution (2).

Numerical trickery on the old code

Instead of using really tricky comparison and binary AND/OR operators, let’s try to use the original class, but replace the branch instruction with a bit of numerical trickery:

This makes ‘extensive’ use of the ALU of the chip for the flip operator to avoid using the branch instruction. However, it keeps other operators cheap as per the first version. Let’s call this solution to the problem solution (3).

Comparison

I have tried to solve two different problems from the SAT Competition of 2009 with all three solutions. First is a typical electronic engineering problem, 9dlx_vliw_at_b_iq2.cnf. The second is a typical cryptographic problem, mizh-md5-47-3.cnf. Here are the timing results for my i7-3612QM (Ivy Bridge):

Problem Solution(1) Solution(2) Solution(3)
9dlx_vliw_at_b_iq2.cnf 135.9s 138.8s 132.5s
mizh-md5-47-3.cnf 60.7s 60.5s 56.3s

So, in this very small test, solution(3) wins.

And now for a bit of history. I have tried the trick in solution(3) about 3 years ago. Back then it seemed slower to perform the numerical trick than to use the branch. It turns out that on modern CPUs either the ALU is faster, branch misprediction is more costly, or both (or, the compilers have evolved to compile my numerical trickery into some weird thing). Anyway, it’s kind of a cool speedup for a one-liner.