# The Inprocessing API of CryptoMiniSat

Many modern SAT solvers do a lot of what’s called inprocessing. These steps simplify the CNF into something that is easier to solve. In the compiler world, these are called rewritngs since the effectively rewrite (parts of) the formula to something else that retain certain properties, such as satisfiability. One of the most successful such rewrite rules for CNF is Bounded Variable Elimination (BVE, classic paper here), but there are many others. These rewrites are usually done by modern SAT solvers in a particular order that was found to be working well for their particular use-case, but they are not normally accessible from the outside.

Sometimes one wants to use these rewrite rules for something other than just solving the instance via the SAT solver. One such use-case is to use these rewrite rules to simplify the CNF in order to count the solution to it. In this scenario, the user wants to rewrite the CNF in a very particular way, and then extract the simplified CNF. Other use-cases are easy to imagine, such as e.g. MaxSAT, core counting, etc. Over the years, CryptoMiniSat has evolved such a rewrite capability. It is possible to tell CryptoMiniSat to simplify the formula exactly how the user wants the solver to be satisfied and then extract the simplified formula.

## Example Use-Case

Let’s say we have a CNF that we want to simplify:

``````p cnf 4 2
1 2 3 4 0
1 2 3 0``````

In this CNF, `1 2 3 4 0` is not needed, because it is subsumed by the clause `1 2 3 0`. You can run subsumption using CryptoMiniSat this way:

``````#include "cryptominsat5/cryptominisat.h"
#include <vector>
#include <cmath>
#include <iostream>

using namespace CMSat;
using namespace std;
#define lit(a) Lit(std::abs(a)-1, a<0)

int main() {
Solver s;
vector<Lit> cl;

cl = vector<Lit>{lit(1), lit(2), lit(3), lit(4)};
cl = vector<Lit>{lit(1), lit(2), lit(3)};

s.simplify(NULL, "occ-backw-sub");
s.start_getting_clauses();
while(s.get_next_clause(cl) {
for(const auto l: cl) cout << l << " ";
cout << endl;
}
s.end_getting_clauses()

return 0;
}``````

This code runs the inprocessing system `occ-backw-sub`, which stands for backwards subsumption using occurrence lists. The input CNF can be anything, and the output CNF is the simplified CNF. This sounds like quite a lot of code for simple subsumption, but this does a lot of things under the hood for things to be fast, and it is a lot more capable than just doing subsumption.

Notice that the first argument we passed to `simplify()` is `NULL`. This means we don’t care about any variables being preserved — any variable can (and will) be eliminated if `occ-bve` is called. In case some variables are important to you not to be eliminated, you can create a vector of them and pass the pointer here. If you have called the `renumber` API, then you can get the set of variables you had via `clean_sampl_and_get_empties()`. The numbering will not be preserved, but their set will be the same, though not necessarily the same size. This is because some variables may have been set, or some variables may be equivalent to other variables in the same set. You can get the variables that have been set via `get_zero_assigned_lits()`.

## Supported Inprocessing Steps

Currently, the following set of inprocessing steps are supported:

## Convenience Features Under the Hood

The steps above do more than what they say on the label. For example, the ones that start with `occ` build an occurrence list and use it for the next simplification stop if it also starts with `occ`. They also all make sure that memory limits and time limits are adhered to. The timeout multiplier can be changed via `set_timeout_all_calls(double multiplier)`. The time limits are entirely reproducible, there is no actual seconds, it’s all about an abstract “tick” that is ticking. This means that all bugs in your code are always reproducible. This helps immensely with debugging — no more frustrating Heisenbugs. You can check the `cryptominisat.h` file for all the different individual timeouts and memouts you can set.

Under the hood you also get a lot of tricks implemented by default. You don’t have to worry about e.g. strengthening running out of control, it will terminate in reasonable amount of ticks, even if that means it will not run to completion. And the next time you run it, it will start at a different point. This makes a big difference in case you actually want your tool to be usable, rather than just “publish and forget”. In many cases, simplification only makes things somewhat faster, and you want to stop performing the simplification after some time, but you also want your users to be able to report bugs and anomalies. If the system didn’t have timeouts, you’d run the risk of the simplifier running way too long, even though the actual solving would have taken very little time. And if the timeout was measured in seconds, you’d run the risk of a bug being reported but being irreproducible, because the exact moment the timeout hit for the bug to occur would be irreproducible.

## Making the Best of it All

This system is just an API — it doesn’t do much on its own. You need to play with it, and creatively compose simplifications. If you take a look at `cryptominisat.h`, it already has a cool trick, where it moves the simplified CNF from an existing solver to a new, clean solver through the API, called `copy_simp_solver_to_solver()`. It is also used extensively in Arjun, our CNF simplifier for counting. There, you can find the function that controls CryptoMiniSat from the outside to simplify the CNF in the exact way needed. It may be worthwhile reading through that function if you want to control CryptoMiniSat via this API.

The `simplify()` API can give you the redundant clauses, too (useful if you e.g. did ternary or hyper-binary resolution), and can give you the non-renumbered CNF as well — check out the full API in `cryptominisat.h`, or the Arjun code. Basically, there is a `red` and a `simplified` parameter you can pass to this function.

Perhaps I’ll expose some of this API via the Python interface, if there is some interest for it. I think it’s quite powerful and could help people who use CNFs in other scenarios, such as MaxSAT solving, core counting, core minimization, etc.

## Closing Thoughts

I think there is currently a lack of tooling to perform the already well-known and well-documented pre- and inprocessing steps that many SAT solvers implement internally, but don’t expose externally. This API is supposed to fill that gap. Although it’s a bit rough on the edges sometimes, hopefully it’s something that will inspire others to either use this API to build cool stuff, or to improve the API so others can build even cooler stuff. While it may sound trivial to re-implement e.g. BVE, once you start going into the weeds of it (e.g. dealing with the special case of detecting ITE, OR & AND gates’ and their lower resolvent counts, or doing it incrementally with some leeway to allow clause number increase), it gets pretty complicated. This API is meant to alleviate this stress, so researchers and enthusiasts can build their own simplifier given a set of working and tested “LEGO bricks”.