All SAT solvers must have a tri-state class that can hold the values true
, false
, and undefined
. Although it’s not hard to write code to represent such a class, it’s hard to write such it so that all typical operations are fast. In this blog post I will compare three different ways of doing it.
The original MiniSat 2.0 code
The original MiniSat 2.0 used the following code for such a class:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
class lbool { char value; explicit lbool(const char v) : value(v) { } public: lbool() : value(0) {} bool operator==(lbool b) const { return value == b.value; } bool operator!=(lbool b) const { return value != b.value; } lbool operator^(const bool b) const { return b ? lbool(-value) : lbool(value); } friend lbool toLbool(const char v); friend lbool boolToLBool(const bool b); friend class llbool; }; inline lbool toLbool(const char v) { return lbool(v); } const lbool l_True = toLbool( 1); const lbool l_False = toLbool(-1); const lbool l_Undef = toLbool( 0); |
All the operators are pretty straightforward and fast, except operator^
which flips true
to false
and false
to true
in case b
is true
. As you might have noticed it’s using a branch instruction to achieve this result which is weird and slow. Branching is known to be a headache as a wrong branch prediction can slow down the CPU quite a bit. Since b
is supposed to be random (though it’s not that random), this might slow down the flip operation. Let’s call this version of the class solution (1).
Newest MiniSat code
Current MiniSat has changed the code to the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
class lbool { uint8_t value; public: explicit lbool(uint8_t v) : value(v) { } lbool() : value(0) { } explicit lbool(bool x) : value(!x) { } bool operator == (lbool b) const { return ((b.value&2) & (value&2)) | (!(b.value&2)&(value == b.value)); } bool operator != (lbool b) const { return !(*this == b); } lbool operator ^ (bool b) const { return lbool((uint8_t)(value^(uint8_t)b)); } lbool operator && (lbool b) const { uint8_t sel = (this->value << 1) | (b.value << 3); uint8_t v = (0xF7F755F4 >> sel) & 3; return lbool(v); } lbool operator || (lbool b) const { uint8_t sel = (this->value << 1) | (b.value << 3); uint8_t v = (0xFCFCF400 >> sel) & 3; return lbool(v); } friend int toInt (lbool l); friend lbool toLbool(int v); }; inline int toInt (lbool l) { return l.value; } inline lbool toLbool(int v) { return lbool((uint8_t)v); } const lbool l_True ((uint8_t)0); const lbool l_False((uint8_t)1); const lbool l_Undef((uint8_t)2); |
It doesn’t use a branch operator to flip values: operator^
uses a simple XOR. However, other operators, notably the comparison operator became much more involved, and thus expensive. Let’s call this solution to the problem solution (2).
Numerical trickery on the old code
Instead of using really tricky comparison and binary AND/OR operators, let’s try to use the original class, but replace the branch instruction with a bit of numerical trickery:
1 2 3 4 5 |
lbool operator^(const bool b) const { //return b ? lbool(-value) : lbool(value); return lbool(value * (-2*(char)b + 1)); } |
This makes ‘extensive’ use of the ALU of the chip for the flip operator to avoid using the branch instruction. However, it keeps other operators cheap as per the first version. Let’s call this solution to the problem solution (3).
Comparison
I have tried to solve two different problems from the SAT Competition of 2009 with all three solutions. First is a typical electronic engineering problem, 9dlx_vliw_at_b_iq2.cnf. The second is a typical cryptographic problem, mizh-md5-47-3.cnf. Here are the timing results for my i7-3612QM (Ivy Bridge):
Problem | Solution(1) | Solution(2) | Solution(3) |
9dlx_vliw_at_b_iq2.cnf | 135.9s | 138.8s | 132.5s |
mizh-md5-47-3.cnf | 60.7s | 60.5s | 56.3s |
So, in this very small test, solution(3) wins.
And now for a bit of history. I have tried the trick in solution(3) about 3 years ago. Back then it seemed slower to perform the numerical trick than to use the branch. It turns out that on modern CPUs either the ALU is faster, branch misprediction is more costly, or both (or, the compilers have evolved to compile my numerical trickery into some weird thing). Anyway, it’s kind of a cool speedup for a one-liner.