Proof Traces for SAT solvers

If you have a look at a modern SAT solver, say kissat or CryptoMiniSat, you’ll see that they have well over 20k lines of code. Reviewing that for bugs is basically impossible. So how can you trust that the solver is giving a correct result? Well, if the result is that the input formula is satisfiable, it’s quite easy — we can simply substitute the solution (all modern SAT solvers provide a satisfying assignment) to the formula, and it should satisfy each constraint. This is really fast to do, linear in the size of the formula.

However, if the solver gives a result that the formula is unsatisfiable (UNSAT), what do we do? Well, we need some form of assurance. One would be to check the code, but again, that’s just too much work. We could run it with another SAT solver, but there have been cases of the same bug being in more than one solver. Actually, this bug seems to have been a conceptual mistake copy-pasted, quite weird. In fact, there is a well-known GCC version that will miscompile most MiniSat-based SAT solvers such that they report wrong results(!). We could also do some fuzzing, but do you want to sit on an airplane knowing its flight control software was fuzz-tested for a full 10 minutes to “meet” compliance requirements? No. We need strong assurances. So, let’s ask the SAT solver to produce a proof for the UNSAT result! This is what proof traces are all about.

The Binary Resolution Operator

At its core, a proof trace essentially records the set of operations that must be performed by a system to arrive at the 0=1 equation (i.e. at obvious nonsense), starting with the input equations. Let’s see a simple example. Let’s have our input formula (so-called CNF) be:

 x1 V  x2 = TRUE
 x1 V -x2 = TRUE
-x1 V  x2 = TRUE
-x1 V -x2 = TRUE

Where x1 and x2 are boolean variables. Actually, let’s make this a bit easier to read by a computer:

 1  2 0
 1 -2 0
-1  2 0
-1 -2 0

This the same as the above, but closer to the standard DIMACS format. Each line is called a clause, each line is made up of literals like 1, -1, 2, and -2, and each clause is terminated by a “0”, marking the end of the clause.

The above CNF has no solution to its set of clauses: no matter what we set x1 and x2, at least one of the clauses is not satisfied. OK, so how do I prove that there is no solution? Well, the easiest way to do it is through the binary resolution operator. The binary resolution basically says that if two clauses have one literal that’s inverted, such as “x1” and “-x1”, then we can do this: (A OR x1) RESOLVE (B OR -x1) => A OR B, where “A” and “B” are any set of literals. So, e.g. “x1 V x2 V -x3” resolved with “x3 V x4” becomes “x1 V x2 V x4”. Nice, so let’s try to prove that the above problem is unsatisfiable:

 1  2 0 RESOLVE  1 -2 0 =>  1 0
-1  2 0 RESOLVE -1 -2 0 => -1 0
 1  0   RESOLVE -1 0    =>  0

In essence, we resolved 2 clauses to prove that x1 is TRUE, then resolved another two to prove that x1 is FALSE, and then we resolved “x1” with “-x1” to arrive at EMPTY=TRUE, i.e. 0=1, which is obviously nonsense.

The RUP notation

In the so-called RUP notation (“Reverse Unit Propagation”), the above is simply written as:

a  1 0
a -1 0
a 0

Basically, we simply write down each derived clause one after the other as “added” (=”a”). It’s called Reverse Unit Propagation, because it’s possible to prove each of these clauses to be correct, by starting from the top, and performing simple propagation [where propagation is substitution of the values into the equations, and checking if any variable is forced to a value] . Let me explain. Say we want to prove “1 0” is correct. Then, we set all the literals’ inverted values, here, “x1=FALSE”, and propagate on the original clauses plus any clause that’s above “1 0” (and hence has been proven to be correct). If we set x1=FALSE, then the original formula’s clause “1 2 0” will propagate x2=TRUE and then the original formula’s “1 -2 0” conflicts immediately (all its literals are now FALSE). Hence, propagation lead to a conflict! This means that indeed, “1 0” must hold.

What’s important to realize here is that each of the lines of the proof must be checkable by simple propagation. Let’s say we have this set of constraints:

8 9  2  3 0
8 9 -2  3 0
8 9  4 -3 0
8 9 -4 -3 0

Given these constraints, it is fairly easy to check that “8 9 0” holds. However, we can’t just write into the proof that “a 8 9 0”. Because for that, the propagation check as explained above should succeed: when I set both x8=FALSE and x9=FALSE, I should get the a conflict. But I don’t! Instead when I substitute x8=FALSE and x9=FALSE in to the equations, I get this reduced formula:

 2  3 0
-2  3 0
 4 -3 0
-4 -3 0

Which does not force any variable to any value (i.e. they don’t propagate anything) and none of them are the empty clause, so I’m stuck. The proof verifiers literally will say they are “stuck”. So, how I can prove “x8 OR x9”? Let’s do it like this:

a 8 9 -3 0
a 8 9  3 0
a 8 9 0

Let’s see how this works. First, let’s try to validate “8 9 -3” by setting v8=FALSE, v9=FALSE, v3=TRUE. Then the original constraint “8 9 4 -3 0” will propagate v4=TRUE, and the original constraint “8 9 -4 -3 0” will be immediately conflicting (all its literals are now FALSE). So simple propagation fails, hence the clause “8 9 -3” must be correct. While this is not the goal (that’s “8 9 0”), we are getting there. We now prove “8 9 3” in a similar fashion, and then we prove “8 9 0” in the same fashion. Easy!

Faster Proof Verification: DRUP

While RUP is indeed a good format, we don’t always need all the constraints to be in the active set in order to prove UNSAT. Let’s quickly re-visit this CNF, and try to prove “8 9 0”:

8 9  2  3 0
8 9 -2  3 0
8 9  4 -3 0
8 9 -4 -3 0

In order to reduce memory load, let’s delete all the clauses we don’t need after we used them! These steps are going to be noted with “d” (=”delete”) instead of “a” (“add”):

a 8 9 -3 0
d 8 9  4 -3 0
d 8 9 -4 -3 0
a 8 9  3 0
d 8 9 -2 3 0
d 8 9  2 3 0
a 8 9  0
d 8 9 -3 0
d 8 9  3 0

NICE! So we first derived “8 9 -3 0” and then removed both of the clauses we used to create it, they are not needed anymore! Begone “8 9 4 -3 0” and “8 9 -4 -3 0” ! Same thing with “8 9 3 0”. Then, once we derived “8 9 0”, we even deleted the intermediary constraints “8 9 -3 0” and “8 9 3 0” we previously derived. Begone! Not needed anymore, all we need is “8 9 0 “.

Extended resolution for proofs

Okay, I’m gonna cut this part short, mostly because I am terrible at this, but we have talk about the elephant in the room: RAT, or, more beautifully, Resolution Asymmetric Tautologies. So, what is it? Essentially, it allows you to declare new variables in your proof and use them later. This is absolutely amazing, because with this, you can provide so-called extended resolution proofs (warning: 1968 research paper), which allows us to express certain proofs more compactly, in fact, exponentially more compactly!

Here’s how to do it. Let’s say you want to use the definition “x1 OR v2 = x99” in your proof, because it would make your proof smaller. This can happen, for example, because your constraints often have “x1 OR x2” in them, and now you can replace all of those parts with simply “x99”. Well then, you write this into your proof file:

a -99 2 1 0
a  99 -1 0
a  99 -2 0

Notice that this is just a simple OR gate definition. The only thing RAT requires is that (1) the first variable is the one that’s being defined and (2) the introduced variable must not be part of our input formula — so our input formula here should have at most 98 variables. That’s it. Now you can use x99 anywhere you like, as long as it makes sense. For example, you could take a clause “1 2 3 4 0” and convert it to “99 3 4 0”:

a  99 3 4 0
d  1 2 3 4 0

Notice that “99 3 4 0” is simply a RUP proof at this point: setting 99=FALSE, 3=FALSE, 4=FALSE will immediately propagate x1=FALSE, x2=FALSE (due to the clauses “99 -1 0” and “99 -2 0”), and x1,x2,x3, and x4 all being FALSE will immediately trigger a conflict for clause “1 2 3 4 0”, so we have the RUP property and we are fine. Then we can delete “1 2 3 4 0”, we don’t need it no more. Notice that I can’t write the reverse:

d  1 2 3 4 0
a  99 3 4 0

If I wrote that, “a 99 3 4 0” will not work, since it relies on “1 2 3 4 0” but we already deleted that, oops!

Besides the above trivial example with the OR gate, extended resolution can simulate, in polynomial time, some algorithms that are extremely hard (read: exponentially hard) for normal resolution to express. This means that using RAT, we can express even complicated mathematical concepts in this extremely simple system, for example in this case, pseudo-boolean reasoning.

What the solver does vs. what the proof checker does

So let’s say you implement DRAT proof trace into your SAT solver and your proofs are always verified. All good! Not so fast. One day you decide to check the proof that DRAT recovered, and to your absolute horror, you realize that DRAT thinks your proof to be substantially different than what the SAT solver thought it was! How can that be?

Well, the DRAT checker recovered a proof. It’s a valid proof, and it can be recovered given the set of clauses that haven’t been deleted using the “d” operator, as above. But notice that we never told the proof checker which set of clauses we resolved on to get to a clause! It just kind of figured it out by itself. Now, we can look at this and say, this is great because (1) I don’t have to do hand-holding of the proof checker, and really interestingly, (2) the proof checker could actually recover different proofs from the same proof trace(!). Ambiguity can be useful. Well, that’s great, but what if I really-really want to know what the SAT solver actually did?

Enter FRAT. This proof format is super-close to DRAT, with the following changes. Firstly, FRAT numbers each clause with a unique clause ID. This is needed because the same clause can be in the solver’s memory more than once, and we need to be able to distinguish them from each other — they may have been derived in completely different ways! Secondly, FRAT allows an optional hint at how the clause was derived. Let me give a hands-on example, it should be quite clear. As a reminder, here is the input formula we had before:

 1  2 0
 1 -2 0
-1 -2 0
-1  2 0

And here is the DRAT proof:

a  1 0
a -1 0
a 0

Now, for the FRAT proof:

o 10  1  2 0         -- input clause, let's set its ID to 10
o 11  1 -2 0         -- input clause, let's set its ID to 11
o 12 -1  2 0         
o 13 -1 -2 0         -- last input clause
a 14  1  0 l 10 11 0 -- first resolvent
d 10  1  2 0 
d 11  1 -2 0 
a 15 -1  0 l 12 13 0 -- second resolvent
d 12 -1  2 0 
d 13 -1 -2 0 
a 16 0 l 14 15 0     -- empty clause! It's UNSAT!
f 14  1  0           -- finishup from down here
f 15 -1  0
f 16 0

Okkkay. So… the original clauses are part of the proof trace now, they start with “o”. And each clause has an ID, after o/a/d/f. This can be any number, but they must be non-clashing (obviously). Furthermore, we are allowed to have an “l” after the closing “0” when adding a clause — these are the clause IDs we had to resolve to arrive at the new clause. So, e.g. IDs 10 and 11, i.e. “1 2 0” and “1 -2 0” must resolve to “1 0” (clause ID 14). And finally, we must “finalize” all our clauses, i.e. we must account for all clauses remaining in memory, with the “f” command.

Firstly, finalization means that in case your SAT solver forgot to delete a clause in the proof trace, but deleted it from memory, finalization will fail, because the solver will not finalize that clause, and so the checker will complain. In essence, finalization forces the hand of the SAT solver writer to make sure that the “view of the world” from the perspective of the SAT solver and that of the proof verifier match when it comes to clauses in memory. This is not the case for DRAT — the proof verifier there could have millions of clauses more in memory than the SAT solver :S

Secondly, the “l” notation means that we can now be precise about what clauses were actually resolved by the SAT solver to arrive at a new clause. However, this is only a “can” — adding the “l” is not necessary, and of course can be stripped. This means that if we want to, we can have ambiguity, or if we want to, we can have precision. The best of both worlds!

Trusting trust

Proofs are cool, and are very important so we can be sure our results are correct. But… are we sure they are correct? Well, not so much. You see, the proof verifiers themselves are not verified. Say, frat-rs will say “VERIFIED” on a FRAT proof, but… frat-rs itself is over 5000 lines of Rust. And before you think I’m really stretching things here, let me just remind you that I personally found a segmentation fault in drat-trim, the official DRAT proof verifier, used at the SAT competition. (I actually found the bug accidentally while fuzzing my own SAT solver.)

So, what can we do? Well, we need another verifier. I’m not kidding. Basically, frat-rs can take a proof that the SAT solver provides, and translate it to a fully annotated, clean proof with all the deletions at their earliest possible points (see this hack). We can then write a slow, but verified proof checker, that can do the trivial checking that the resolutions are indeed correct, reaching the empty clause (i.e. 0=1). The pipeline would then be:

Airplane software code + compliance requirements ->SAT_query
solution, proof  = SAT_solver(SAT_query)
solution, clean_proof = frat-rs(SAT_query, solution, proof)
solution = verified_proof_checker(SAT_query, solution, clean_proof)

What this allows us to do is that we don’t need to trust the SAT_solver or frat-rs. We can treat them as black boxes. All we need to trust is that the verified_proof_checker is correct: given the SAT_query, the solution can be trusted. In this sense, clean_proof is simply a hint for the verified_proof_checker to do its job. Notice that I completely skipped how SAT_query can be trusted — that’s an entirely different can of worms :D

In case you want to use such a verified proof checker, there is one in lean4, available here, and one for acl2, available here. The non-verified tool to “elaborate” simpler (e.g. DRAT) proofs into annotated (LRAT) proofs, you can use frat-rs.


So, we talked a lot about proof traces, proof formats, and how to trust the solution. In some cases, this is extremely important — some software and hardware is verified in this way that controls trains and airplanes, and industrial systems (think: nuclear reactors, rockets… Ariane 5 crash anyone?). We certainly don’t want any bugs in there.

What I didn’t talk about, is that proof traces do something else, as well: they allow us to understand how SAT solvers work. You see, while the proof is written in the forward fashion — the same direction that SAT solvers work, — the proof actually only makes sense in the reverse direction. We always start at the empty clause, at the end of the proof, and walk backwards to see how the empty clause got proven! So, if you think about it, proofs can be used to look at SAT solvers in the reverse direction. And that’s super-exciting. For example, how much work did the SAT solver really have to do to reach the empty clause? How much complete nonsense did it do that was utterly useless to prove unsatsifiability? How many times did it forget proof fragmets that it later had to re-learn? Or, what is the minimum memory footprint that its produced proof could fit in? And if we take advantage of DRAT’s and FRAT’s ambiguity: is there a proof with fewer steps it could have produced given the total same set of resolvents? In my opinion, there is a whole wealth of data in proof traces that we should do research on to learn more about SAT solvers.

(PS: Tracecheck is one of the earliest proof trace systems. I didn’t want to mention it because it’s not as relevant anymore as the above, and it would have made the story a tad more confusing)