Since the SAT solver fuzzing paper by Brummayer, Lonsing and Biere, SAT solver fuzzing has been quite the thing in the SAT community. Many bugs have since been discovered in almost all SAT solvers, including those of this author. In this blog post I will detail my SAT fuzzing setup: what fuzzers I use and how I use them.
The most obvious fuzzers to use are those present in the paper and accompanying website: CNFFuzz and FuzzSAT. These fuzzers randomly genereate high-quality structured problem instances. The emphasis here is on the word structured. Randomly generated instances, as the authors of the original paper note, do not excercise SAT solvers enough. SAT solvers are written with very specific problem types in mind: problems containing long chains of binary clauses, gates, etc.
Over the year(s) I have found, however, that these fuzzers are not enough. One of the revelations came when Vegard sent me lots of bugreports about CryptoMiniSat failing when Gaussian elimination was enabled. It turned out that although I ran the fuzzer for long hours, I only ran it with the default option, Gaussian turned off. So default options are sometimes not enough and when off-the track, systems can fail.
Vegard then sent me a fuzzer that wasn’t using CNFFuzz or FuzzSAT, but instead his own CNF generator for SHA-1 hash functions. Instances thus generated are not only structured, they are also very picky about the solution: they either have exactly one or zero. This helped fuzz solution reconstruction, which after blocked clause elimination may not be exactly easy. In fact, it’s so good at fuzzing that it finds a bug I have never been able to fix in CryptoMiniSat, to this day, about binary clauses. Whenever I block them, I cannot reconstruct the solution, ever, unless I disable equivalent literal replacement. The authors of the paper give a pretty complicated proof about why it should work in the presence of equivalent literal replacement, which I still haven’t understood.
When I implemented disconnected component analysis into CryptoMiniSat 3, I knew I had to be careful. In the SAT Competition’11 I failed terribly with it: CryptoMiniSat 2 memory-outed and crashed on many instances with disconnected components. The solution I came up with was to make a program that could concatenate two or more problems into one problem, renumbering the variables. This meta-fuzzer (that used fuzzers as inputs) was then used to fuzz the disconnected component handling system. Although this sounds sufficient, it is in fact far from it. Firstly, the problems generated are disconnected at toplevel — whereas problems in the real world sometimes disconnect later during the search. Secondly, it took 2, maybe 4 CNF inputs, but real problems sometimes have thousands of disconnected components. Both of these issues had to be addressed.
One of the most difficult parts of modern SAT solvers is time-outs of complicated algorithms. Time-outs sometimes make code quite convoluted, and thus are a source of bugs. However, most fuzzers generate problems that are too small for any time-outs to occur, thus severely limiting the scope of the fuzzer. So, the fuzzers don’t exercise a part of the system that is fully exercised by real-world problems. To fix this, I have written a short fuzzer script that generates enormous, but simple problems with millions of variables. This helped me fix a lot of issues in timeouts. I advise having a RAM-disk when using it though, as it writes gigabyte-size CNFs.
My fuzzing system
Fuzz CNF generators are nice, but not enough on their own. A system around them is necessary for the full benefit. Writing one sounds trivial at first, but my system around the fuzzers is about 1000 lines of python code, so maybe it’s not that trivial after all. It has to do a number of things to be truly useful.
First, it needs to test the solutions given by the solver. If the solution is SAT, it needs to read the (gzipped) CNF file, and check the (partial) solution. Second, if the solution is UNSAT, it either needs to call the DRUP checker to verify the proof, and/or it needs to call another solver to see if it finds a solution or not — and verify that the other solver’s solution is correct. I actually run both DRUP and the other solver, you can never be sure enough. Note that solution verification also complicates time-limiting: we don’t want to wait 30 minutes because a fuzz CNF is too complicated, and we don’t want to indicate an error just because the problem is hard.
Secondly, fuzzing is not only about generating CNFs that trigger bugs, but also about saving these problematic cases for later, as a form of regression tests. These regression tests need to be ran regularly, so that bugs that came up before don’t come up again. I have a large set of regression tests that I have accumulated over the years. These help me increase confidence that when I improve the solver, I don’t accidentally re-introduce bugs. My fuzzer system helps manage and run these regression tests.
Finally, we don’t just want to fuzz the solver through the CNF interface. SAT solvers typically expose a library interface where one can add clauses, solve (with assumptions) and check the conflict returned in terms of the assumptions given. For now, CryptoMiniSat’s library is fuzzed through a specialized CNF format, where “c Solver::solve()” lines are interspersed with the regular clauses. These lines indicate for the solver to call the solve() method and output the solution to a file. This file is then read by the fuzzing system and verified by copying parts of the file into another CNF and using the DRUP-checker and another SAT solver to verify the results. This simulates library usage without assumptions. Thanks to Lukas Prokop who alerted me to bugs in assumptions usage, assumptions-based solving is also simulated through “c Solver::solve( LIT LIT LIT…)” calls, where LITs are replaced with a random literals.
I want to do two things related to testing and fuzzing. First, I want to improve the checker so that it checks for more problems. I want to check the conflict returned when assumptions are used. Further, I want to check timeouts — this will require some coordination between the fuzzer system and the solver where times are printed that are later checked for correctness. Some optimisations must never take more than N seconds on a fast enough machine, and this could be checked relatively easily.
Second, I want to move CryptoMiniSat into a test-driven system. This will require some quite extensive code re-writing so that test-harnesses will be easier to write. However, it will help fix one issue that hasn’t been talked about, kind of like the elephant in the room: performance bugs. They are quite prevalent, and cannot be caught by simple fuzzing. They plague many SAT solvers, I’m sure, as I keep on finding them at every corner. They are the silent bugs that prevent solvers to be more performant, yet they don’t diminish the resulting solution — only the time takes to find it.