Recovering XORs from a CNF | Wonderings of a SAT geek

Let’s suppose you are looking to recover XORs. It’s fun to recover XORs, because you could try to XOR the XORs together to obtain new information, like this:

$a \oplus b \oplus c = 0$
$a \oplus d \oplus b = 0$
therefore,
$c \oplus d = 0$
therefore,
$c = d$

Simple XOR recovery

The simplest way to recover an XOR clause is to look for clauses that encode it, which is the following set of clauses:

$\neg a \vee b \vee c$
$a \vee \neg b \vee c$
$a \vee b \vee \neg c$
$\neg a \vee \neg b \vee \neg c$

Finding such a set is not that hard: we need to first sort each clause according to its literals, then sort the list of clauses according to their sizes and variable contents and then go through the list of clauses linearly. If you think through the first two steps, you will see that it ensures that the above 4 clauses are next to each other in the list of clauses, making it trivial to find them.

Improved XOR recovery

There is only one small glitch with the algorithm above: other sets of clauses can also make up XORs. For example, this set of clauses:
$\neg a \vee b \vee c$
$a \vee \neg b \vee c$
$a \vee b \vee \neg c$
$\neg a \vee \neg b$

Notice that all I took away was one literal from one clause. Now, if a CNF implies $\neg a \vee \neg b$ it must surely also imply $\neg a \vee \neg b \vee \neg c$ since this latter one is less stringent than the first one. So, we could easily have it inside our CNF, but we don’t for reasons of lazyness: who wants to keep around data that is implied by unit propagation already? However, keeping in mind that the second, less stringent clause is implied by the first one is important: it could lead to the discovery of more XORs.

Results

Let’s move on to the magical territory of practical SAT solving. Let’s first take a typically industrial problem, UTI-20-10p0:

$./cryptominisat_old UTI-20-10p0.cnf
[...]
c Finding non-binary XORs:     0.10 s (found:       0, avg size: -nan)

The old, 2.9.0 CryptoMiniSat seems to have no luck at finding any XOR clauses. This is very typical: industrial instances seem not to contain any XOR clauses at all. Let’s look at the new algorithm now:

$./cryptominisat_new UTI-20-10p0.cnf
[...]
c XOR finding finished. Num XORs:   9288

All right, that seems better.

Edited to add: Thanks to Vegard Nossum for telling me something was odd with XOR recovery. As usually is the case, answering a question is much easier than knowing what question to ask.

Edited to add2: The above algorithm can be easily improved by using cached literals and stamping.