Category Archives: Development

Ganakv2 Released

May 12, 2025Development, Research, SAT, Toolsmodel counting, Release, researchmsoos

Finally, after many months of waiting for our paper to be accepted (preliminary PDF here), Ganak2 is finally released (GitHub code, released binaries) and easily accessible and modifiable. I wish this release had come earlier, but double-blind requirements didn’t allow us to release the code any sooner.

Basically, Ganakv2 has been re-written in many ways. Some of the original ideas by Thurley for SharpSAT has been kept alive, but many-many things are completely different and completely re-written — well over 10’000 lines of change in total. This set of changes has allowed Ganak2 to win every available track at the Model Counting Competition of 2024. In other words, Ganak2 is incredibly fast, making it a state-of-the-art tool.

Chonological Backtracking, Enhanced SAT solving, S- and D-sets

The new Ganak has a pretty powerful SAT solver engine with its own restart strategy, polarity and variable branching strategy. This SAT solver is tightly integrated into Ganak, and reuses the same datastructures a Ganak, hence the counter/SAT transitions are smooth. This relies on the observation that once every variable in the (minimal) independent set has been assigned, there is at most one solution, so we can just run a SAT solver, there is no need to a full-blown model counter. This idea has existed at least since GPMC, but we improved on it in important ways.

A very complicated but very important aspect of Ganak2 is our integration of Chronological Backtracking which we adopted to the model counting setting. This required a lot of attention to detail, and in order to debug this, we took full advantage of the fuzzer written and maintained by Anna Latour. We further added very serious and thorough checking into Ganak2, with varying levels of internal checking and per-node checking of counts. This allows to perform effective debugging: the first node that the count is incorrect will immediately stop and error-out, along with a full human-readable debug log of what had happened. This was necessary to do, as some of the issues that Chonological Backtracking leads to are highly non-trivial, as mentioned in the paper.

Ganak2 now understands something we call a D-set, which is a set of variables that’s potentially much larger than the projection set but when branched on, the count is the same. This allows Ganak to branch on far more variables than any other current model counter. We run an algorithm that takes advantage of Padoa’s theorem to make this D-set as large as possible. Further, Ganak2 takes advantage of projection set minimization, calling he minimal set an S-set, and running a SAT solver as soon as all variables from the S-set have been decided. This allows us to run a SAT solver much earlier than any other model counter.

Component Discovery, Subsumption and Strengthening, and Counting over any Field

Since the paper, I have also added a number of pretty interesting improvements, especially related to component generation, which is now a highly cache-optimized datastructure, using time stamping, a trick I learned from Heule et al’s brilliant tree-based lookahead paper (worth a read). I also changed the hash function used to chibihash, it’s not only faster (on the small inputs we run it on) but also doesn’t rely on any specific instruction set, so it can be compiled to emscripten, and so Ganak2 can run in your browser.

There is also a pretty experimental on-the-fly subsumption and strengthening that I do. It actually rewrites the whole search tree in order to achieve this, which is a lot harder to do than I imagined. It’s the most trivial subsumption & strengthening code you can imagine, but then the rewriting is hell on earth. However, the rewriting is necessary in case we want to be able to continue counting without restarting the whole counting process.

Ganak, unlike many of its counterparts, also manages memory very tightly and tries not to die due to memory-out. The target memory usage of the cache can be given via –maxcache in MB. A cache size of 2500MB is default, which should cap out at about 4GB total memory usage. More cache should lead to better performance, but we actually won (every available track of) the Model Counting Competition 2024 with rather low memory usage, I think I restricted it to 8-12GB — other competitors ran with ~2x that.

Furthermore, Arjun, our CNF minimizer has also been tuned. While this is not specifically Ganak, it gives a nice improvement, and has some cool ideas that I am pretty happy about. In particular, it runs the whole preprocessing twice, and it incorporates Armin Biere’s CadiBack for backbone detection. I wish I could improve this system a bit, because sometimes CadiBack takes >1000s to run, but without it, it’s even slower to count in most cases. But not always, and sometimes backbone is just a waste of time.

The system also now allows to use any field, all you need to do is implement the +/-/*/div operators and the 0 and 1 constants. I have implemented integers, rationals, modulo prime counting, and counting over a the field of polynomials with rational coefficients. All you need to do is override the Field and FieldGen interfaces. I made it work for modulo prime counting in under 20 minutes, it’s like 30 lines of code.

Floating Point Numbers

Notice that floating point numbers don’t form a field, because 0.1+0.2 is not equal to 0.2+0.1, i.e. it’s not commutative. Actually, Ganak2’s weighted model counting doesn’t currently support floating point numbers, because it doesn’t need to — we won literally every available track with infinite rational numbers: 0.3 is simply interpreted as 3/10. This sounds minor, but it also means that it’s actually trivial to check if Ganak runs correctly — it simply needs to be run with a different configuration and it should produce the same solution. Notice that this is not at all true for any other weighted model counter. Literally all of them, except for Ganak, will give different solutions if ran with different seeds. Which is hilarious, since there is obviously only one correct solution. But since 0.1+0.2 is not equal to 0.2+0.1 in floating point, they can’t do anything… rational.

This whole floating point saga is actually quite annoying, as it also means we can’t check Ganak against any other counter — at least not easily. All other counters give wrong answers, given that floating point is incorrect, and so we can only compare Ganak2 to other counters if we allow a certain deviation. Which is pretty hilarious, given that counters all claim to use “infinite precision”. In fact, the only thing infinite about their results is likely the error. Since (0.1+02)-(0.2+0.1) is not 0, it is actually possible to have theoretically infinite error in case of negative weights.

Approximate Counting and Ganak

Ganak2 incorporates ApproxMC, and can seamlessly transition into counting via approximate model counting methods. To do this, simply pass the “–appmct T” flag, where T is the number of seconds after which Ganak will start counting in approximate mode. This can only be done for unweighted counting, as ApproxMC can only do unweighted counting. However, this seamless transition is very interesting to watch, and demonstrates that using a multi-modal counting framework, i.e. exact+approximate is a very viable strategy.

What happens under the hood is currently unpublished, but basically, we stop the exact counting, check how much we have counted so far, subtract that from the formula, and pass it to ApproxMC. We then get the result from ApproxMC and add the partial count that Ganak counted, and display the result to the user. So the final count is partially approximate, and partially exact, so we actually give better approximation (epsilon and delta) guarantees than what we promise.

Compiling Ganak

The new Ganak incorporates 8 libraries: GNU MP, CryptoMiniSat, Arjun, ApproxMC, SBVA, CadiCal, CadiBack, and BreakID. Furthermore, it and compiles in (optinally) BuDDy, Oracle (from Korhonen and Jarvisalo) and Flowcutter. These are all individually mentioned and printed to the console when running. However, this means that building Ganak is not trivial. Hence, with the very generous help of Noa Aarts, Ganak has a flake nix, so you can simply:

git clone https://github.com/meelgroup/ganak
cd ganak
nix-shell

And it will reproducibly build Ganak and make it available in the path. All this needs is for one to have installed Nix, which is a single-liner and works incredibly reliably. Otherwise, the Ganak repository can be cloned together with its GitHub Actions, and then each push to the repository will build Ganak2 for 4 different architectures: Linux x86&ARM, Mac x86&ARM.

Submit your Version of Ganak to the 2025 Competition

Ganak2 is highly extensible/modifiable. I strongly encourage you to extend/change/improve it and submit your improvement to the Model Counting Competition of 2025. Deadline is in 3 weeks, it’s time to get your idea into Ganak and win. I promise I will have all my changes publicly available until then, and you can literally submit the same thing I will, if you like. I’d prefer if you put in some cool change, though.

Our tools for solving, counting and sampling

February 18, 2024Development, Research, SAT, ToolsANF, CNF, model counting, samplingmsoos

This post is just a bit of a recap of what we have developed over the years as part of our toolset of SAT solvers, counters, and samplers. Many of these tools depend on each other, and have taken greatly from other tools, papers, and ideas. These dependencies are too long to list here, but the list is long, probably starting somewhere around the Greek period, and goes all the way to recent work such as SharpSAT-td or B+E. My personal work stretches back to the beginning of CryptoMiniSat in 2009, and the last addition to our list is Pepin.

Overview

Firstly when I say “we” I loosely refer to the work of my colleagues and myself, often but not always part of the research group lead by Prof Kuldeep Meel. Secondly, almost all these tools depend on CryptoMiniSat, a SAT solver that I have been writing since around 2009. This is because most of these tools use DIMACS CNF as the input format and/or make use of a SAT solver, and CryptoMiniSat is excellent at reading, transforming , and solving CNFs. Thirdly, many of these tools have python interface, some connected to PySAT. Finally, all these tools are maintained by me personally, and all have a static Linux executable as part of their release, but many have a MacOS binary, and some even a Windows binary. All of them build with open source toolchains using open source libraries, and all of them are either MIT licensed or GPL licensed. There are no stale issues in their respective GitHub repositories, and most of them are fuzzed.

CryptoMiniSat

CryptoMiniSat (research paper) our SAT solver that can solve and pre- and inprocess CNFs. It is currently approx 30k+ lines of code, with a large amount of codebase dedicated to CNF transformations, which are also called “inprocessing” steps. These transformations are accessible to the outside via an API that many of the other tools take advantage of. CryptoMiniSat used to be a state-of-the-art SAT solver, and while it’s not too shabby even now, it hasn’t had the chance to shine at a SAT competition since 2020, when it came 3rd place. It’s hard to keep SAT solver competitive, there are many aspects to such an endeavor, but mostly it’s energy and time, some of which I have lately redirected into other projects, see below. Nevertheless, it’s a cornerstone of many of our tools, and e.g. large portions of ApproxMC and Arjun are in fact implemented in CryptoMiniSat, so that improvement in one tool can benefit all other tools.

Arjun

Arjun (research paper) is our tool to make CNFs easier to count with ApproxMC, our approximate counter. Arjun takes a CNF with or without a projection set, and computes a small projection set for it. What this means is that if say the question was: “How many solutions does this CNF has if we only count solutions to be distinct over variables v4, v5, and v6?”, Arjun can compute that in fact it’s sufficient to e.g. compute the solutions over variables v4 and v5, and that will be the same as the solutions over v4, v5, and v6. This can make a huge difference for large CNFs where e.g. the original projection set can be 100k variables, but Arjun can compute a projection set sometimes as small as a few hundred. Hence, Arjun is used as a preprocessor for our model counters ApproxMC and GANAK.

ApproxMC

ApproxMC (research paper) is our probabilistically approximate model counter for CNFs. This means that when e.g. ApproxMC gives a result, it gives it in a form of “The model count is between 0.9*M and 1.1*M, with a probability of 99%, and with a probability of 1%, it can be any value”. Which is very often enough for most cases of counting, and is much easier to compute than an exact count. It counts by basically halfing the solution space K times and then counts the remaining number of solutions. Then, the count is estimated to be 2^(how many times we halved)*(how many solutions remained). This halfing is done using XOR constraints, which CryptoMiniSat is very efficient at. In fact, no other state-of-the-art SAT solver can currently perform XOR reasoning other than CryptoMiniSat.

UniGen

UniGen (research paper) is an approximate probabilistic uniform sample generator for CNFs. Basically, it generates samples that are probabilistically approximately uniform. This can be hepful for example if you want to generate test cases for a problem, and you need the samples to be almost uniform. It uses ApproxMC to first count and then the same idea as ApproxMC to sample: add as many XORs as needed to half the solution space, and then take K random elements from the remaining (small) set of solutions. These will be the samples returned. Notice that UniGen depends on ApproxMC for counting, Arjun for projection minimization, and CryptoMiniSat for the heavy-lifting of solution/UNSAT finding.

GANAK

GANAK (research paper, binary) is our probabilistic exact model counter. In other words, it returns a solution such as “This CNF has 847365 solutions, with a probability of 99.99%, and with 0.01% probability, any other value”. GANAK is based on SharpSAT and some parts of SharpSAT-td and GPMC. In its currently released form, it is in its infancy, and while usable, it needs e.g. Arjun to be ran on the CNF before, and while competitive, its ease-of-use could be improved. Vast improvements are in the works, though, and hopefully things will be better for the next Model Counting Competition.

CMSGen

CMSGen (research paper) is our fast, weighted, uniform-like sampler, which means it tries to give uniform samples the best it can, but it provides no guarantees for its correctness. While it provides no guarantees, it is surprisingly good at generating uniform samples. While these samples cannot be trusted in scenarios where the samples must be uniform, they are very effective in scenarios where a less-than-uniform sample will only degrade the performance of a system. For example, they are great at refining machine learning models, where the samples are taken uniformly at random from the area of input where the ML model performs poorly, to further train (i.e. refine) the model on inputs where it is performing poorly. Here, if the sample is not uniform, it will only slow down the learning, but not make it incorrect. However, generating provably uniform samples in such scenarios may be prohibitively expensive. CMSGen is derived from CryptoMiniSat, but does not import it as a library.

Bosphorus

Bosphorus (research paper) is our ANF solver, where ANF stands for Algebraic Normal Form. It’s a format used widely in cryptography to describe constraints over a finite field via multivariate polynomials over a the field of GF(2). Essentially, it’s equations such as “a XOR b XOR (b AND c) XOR true = false” where a,b,c are booleans. These allow some problems to be expressed in a very compact way and solving them can often be tantamount to breaking a cryptographic primitive such as a symmetric cipher. Bosphorus takes such a set of polynomials as input and either tries to simplify them via a set of inprocessing steps and SAT solving, and/or tries to solve them via translation to a SAT problem. It can output an equivalent CNF, too, that can e.g. be counted via GANAK, which will give the count of solutions to the original ANF. In this sense, Bosphorus is a bridge from ANF into our set of CNF tools above, allowing cryptographers to make use of the wide array of tools we have developed for solving, counting, and sampling CNFs.

Pepin

Pepin (research paper) is our probabilistically approximate DNF counter. DNF is basically the reverse of CNF — it’s trivial to ascertain if there is a solution, but it’s very hard to know if all solutions are present. However, it is actually extremely fast to probabilistically approximate how many solutions a DNF has. Pepin does exactly that. It’s one of the very few tools we have that doesn’t depend on CryptoMiniSat, as it deals with DNFs, and not CNFs. It basically blows all other such approximate counters out of the water, and of course its speed is basically incomparable to that of exact counters. If you need to count a DNF formula, and you don’t need an exact result, Pepin is a great tool of choice.

Conclusions

My personal philosophy has been that if a tool is not easily accessible (e.g. having to email the authors) and has no support, it essentially doesn’t exist. Hence, I try my best to keep the tools I feel responsible for accessible and well-supported. In fact, this runs so deep, that e.g. CryptoMiniSat uses the symmetry breaking tool BreakID, and so I made that tool into a robust library, which is now being packaged by Fedora, because it’s needed by CryptoMiniSat. In other words, I am pulling other people’s tools into the “maintained and supported” list of projects that I work with, because I want to make use of them (e.g. BreakID now builds on Linux, MacOS, and Windows). I did the same with e.g. the Louvain Community library, which had a few oddities/issues I wanted to fix.

Another oddity of mine is that I try my best to make our tools make sense to the user, work as intended, give meaningful (error) messages, and good help pages. For example, none of the tools I develop call subprocesses that make it hard to stop a computation, and none use a random number seed that can lead to reproducibility issues. While I am aware that working tools are sometimes less respected than a highly cited research paper, and so in some sense I am investing my time in a slightly suboptimal way, I still feel obliged to make sure the tax money spent on my academic salary gives something tangible back to the people who pay for it.

The Inprocessing API of CryptoMiniSat

October 1, 2023Development, SAT, ToolsSATmsoos

Many modern SAT solvers do a lot of what’s called inprocessing. These steps simplify the CNF into something that is easier to solve. In the compiler world, these are called rewritngs since the effectively rewrite (parts of) the formula to something else that retain certain properties, such as satisfiability. One of the most successful such rewrite rules for CNF is Bounded Variable Elimination (BVE, classic paper here), but there are many others. These rewrites are usually done by modern SAT solvers in a particular order that was found to be working well for their particular use-case, but they are not normally accessible from the outside.

Sometimes one wants to use these rewrite rules for something other than just solving the instance via the SAT solver. One such use-case is to use these rewrite rules to simplify the CNF in order to count the solution to it. In this scenario, the user wants to rewrite the CNF in a very particular way, and then extract the simplified CNF. Other use-cases are easy to imagine, such as e.g. MaxSAT, core counting, etc. Over the years, CryptoMiniSat has evolved such a rewrite capability. It is possible to tell CryptoMiniSat to simplify the formula exactly how the user wants the solver to be satisfied and then extract the simplified formula.

Example Use-Case

Let’s say we have a CNF that we want to simplify:

p cnf 4 2
1 2 3 4 0
1 2 3 0

In this CNF, 1 2 3 4 0 is not needed, because it is subsumed by the clause 1 2 3 0. You can run subsumption using CryptoMiniSat this way:

#include "cryptominsat5/cryptominisat.h"
#include <vector>
#include <cmath>
#include <iostream>

using namespace CMSat;
using namespace std;
#define lit(a) Lit(std::abs(a)-1, a<0)

int main() {
  Solver s;
  vector<Lit> cl;
  s.add_new_vars(4);

  cl = vector<Lit>{lit(1), lit(2), lit(3), lit(4)};
  s.add_clause(cl);
  cl = vector<Lit>{lit(1), lit(2), lit(3)};
  s.add_clause(cl);
  
  s.simplify(NULL, "occ-backw-sub");
  s.start_getting_clauses();
  while(s.get_next_clause(cl) {
    for(const auto l: cl) cout << l << " ";
    cout << endl;
  }
  s.end_getting_clauses()
  
  return 0;
}

This code runs the inprocessing system occ-backw-sub, which stands for backwards subsumption using occurrence lists. The input CNF can be anything, and the output CNF is the simplified CNF. This sounds like quite a lot of code for simple subsumption, but this does a lot of things under the hood for things to be fast, and it is a lot more capable than just doing subsumption.

Notice that the first argument we passed to simplify() is NULL. This means we don’t care about any variables being preserved — any variable can (and will) be eliminated if occ-bve is called. In case some variables are important to you not to be eliminated, you can create a vector of them and pass the pointer here. If you have called the renumber API, then you can get the set of variables you had via clean_sampl_and_get_empties(). The numbering will not be preserved, but their set will be the same, though not necessarily the same size. This is because some variables may have been set, or some variables may be equivalent to other variables in the same set. You can get the variables that have been set via get_zero_assigned_lits().

Supported Inprocessing Steps

Currently, the following set of inprocessing steps are supported:

API name	Inprocessing performed
occ-backw-sub	Backwards subsumption using occurence lists
occ-backw-sub-str	Backwards subsumption and strengthening using occurence lists
occ-bce	Blocked clause elimination (paper)
occ-ternary-res	Ternary resolution (paper)
occ-lit-rem	Literal removal via strengthening
occ-cl-rem-with-orgates	OR-gate based clause removal (unpublished work, re-discovered by others)
occ-rem-with-orgates	OR-gate based literal removal
occ-bve	Bounded variable elimination (paper)
occ-bve-empty	Bounded variable elimination of empty resolvents only
intree-probe	Probe using in-tree probing, also do hyper-binary resolution and transitive reduction (paper). Also does hyper-binary resoution & transitive reduction
full-probe	Probe each literal individually (slow compared to intree-probe)
backbone	Backbone simplification (cadiback paper)
sub-impl	Subsume with binary clauses with binary clauses (fast)
sub-str-cls-with-bin	Subsume and strengthen long clauses with binary clauses (fast)
sub-cls-with-bin	Subsume long clauses with binary clauses (fast)
distill-bins	Distill binary clauses
distill-cls	Distill long clauses (distillation paper)
distill-cls-onlyrem	Distill long clauses, but only remove clauses, don’t shorten them. Useful if you want to make sure BVE can run at full blast after this step.
clean-cls	Clean clauses of set literals, and delete satisfied clauses
must-renumber	Renumber variables to start from 0, in case some have been set to TRUE/FALSE or removed due to equivalent literal replacement.
must-scc-vrepl	Perform strongly connected component analysis and perform equivalent literal replacement.
oracle-vivify	Vivify clauses using the Oracle tool by Korhonen and Jarvisalo (paper). Slow but very effective.
oracle-vivif-sparsify	Vivify & sparsify clauses using the Oracle tool by Korhonen and Jarvisalo. Slow but very effective.

Convenience Features Under the Hood

The steps above do more than what they say on the label. For example, the ones that start with occ build an occurrence list and use it for the next simplification stop if it also starts with occ. They also all make sure that memory limits and time limits are adhered to. The timeout multiplier can be changed via set_timeout_all_calls(double multiplier). The time limits are entirely reproducible, there is no actual seconds, it’s all about an abstract “tick” that is ticking. This means that all bugs in your code are always reproducible. This helps immensely with debugging — no more frustrating Heisenbugs. You can check the cryptominisat.h file for all the different individual timeouts and memouts you can set.

Under the hood you also get a lot of tricks implemented by default. You don’t have to worry about e.g. strengthening running out of control, it will terminate in reasonable amount of ticks, even if that means it will not run to completion. And the next time you run it, it will start at a different point. This makes a big difference in case you actually want your tool to be usable, rather than just “publish and forget”. In many cases, simplification only makes things somewhat faster, and you want to stop performing the simplification after some time, but you also want your users to be able to report bugs and anomalies. If the system didn’t have timeouts, you’d run the risk of the simplifier running way too long, even though the actual solving would have taken very little time. And if the timeout was measured in seconds, you’d run the risk of a bug being reported but being irreproducible, because the exact moment the timeout hit for the bug to occur would be irreproducible.

Making the Best of it All

This system is just an API — it doesn’t do much on its own. You need to play with it, and creatively compose simplifications. If you take a look at cryptominisat.h, it already has a cool trick, where it moves the simplified CNF from an existing solver to a new, clean solver through the API, called copy_simp_solver_to_solver(). It is also used extensively in Arjun, our CNF simplifier for counting. There, you can find the function that controls CryptoMiniSat from the outside to simplify the CNF in the exact way needed. It may be worthwhile reading through that function if you want to control CryptoMiniSat via this API.

The simplify() API can give you the redundant clauses, too (useful if you e.g. did ternary or hyper-binary resolution), and can give you the non-renumbered CNF as well — check out the full API in cryptominisat.h, or the Arjun code. Basically, there is a red and a simplified parameter you can pass to this function.

Perhaps I’ll expose some of this API via the Python interface, if there is some interest for it. I think it’s quite powerful and could help people who use CNFs in other scenarios, such as MaxSAT solving, core counting, core minimization, etc.

Closing Thoughts

I think there is currently a lack of tooling to perform the already well-known and well-documented pre- and inprocessing steps that many SAT solvers implement internally, but don’t expose externally. This API is supposed to fill that gap. Although it’s a bit rough on the edges sometimes, hopefully it’s something that will inspire others to either use this API to build cool stuff, or to improve the API so others can build even cooler stuff. While it may sound trivial to re-implement e.g. BVE, once you start going into the weeds of it (e.g. dealing with the special case of detecting ITE, OR & AND gates’ and their lower resolvent counts, or doing it incrementally with some leeway to allow clause number increase), it gets pretty complicated. This API is meant to alleviate this stress, so researchers and enthusiasts can build their own simplifier given a set of working and tested “LEGO bricks”.

CryptoMiniSat 5.8.0 Released

July 26, 2020Development, Research, SATbranching, local search, Release, SAT Racemsoos

After many months of work, CryptoMiniSat 5.8.0 has been released. In this post I’ll go through the most important changes, and how they helped the solver to be faster and win a few awards, among them 1st place at the SAT incremental track, 3rd place SAT Main track, and 2nd&3d place in the SMT BitVector tracks together with the STP and MinkeyRink solvers.

Gauss-Jordan Elimination

First and foremost, Gauss-Jordan elimination at all levels of the search is now enabled by default. This is thanks to the work detailed in the CAV 2020 paper (video here). The gist of the paper is that we take advantage of the bit-packed matrix and some clever bit field filters to quickly check whether an XOR constraint is propagating, conflicting, or neither. This, and a variety of other improvements lead to about 3-10x speedup for the Gauss-Jordan elimination procedure.

With this speedup, the overhead is quite small, and we enable G-J elimination at all times now. However, there are still limits on the size of the matrix, the number of matrices, and we disable it if it doesn’t seem to improve performance.

As a bit of reflection: our original paper with Nohl and Castelluccia on CryptoMiniSat, featuring Gauss-Jordan elimination at all levels of the search tree was published at SAT 2009. It took about 11 years of work, and in particular the work of Han and Jiang to get to this point, but we finally arrived. The difference is day and night.

Target Phases

This one is really cool, and it’s in CaDiCaL (direct code link here) by Armin Biere, description here (on page 8). If you look at the SAT Race of 2019, you will see that CaDiCaL solved a lot more satisfiable problems than any other solver. If you dig deep enough, you’ll see it’s because of target phases.

Basically, target phases are a variation of phase saving, but instead of saving the phase all the time when backtracking, it only saves it when backtracking from a depth that’s longer than anything seen before. Furthermore, it is doing more than just this: sometimes, it picks only TRUE, and sometimes it picks only FALSE phase. To spice it up, you can keep “local deepest” and “global deepest” if you like, and even pick inverted phases.

It’s pretty self-explanatory if you read this code (basically, just switching between normal, target, inverted, fixed FALSE, fixed TRUE phases) and it helps tremendously. If you look at the graphs of the SAT 2020 competition results (side no. 19 here) you will see a bunch of solvers being way ahead of the competition. That’s target phases right there.

CCAnr Local Search Solver

CryptoMiniSat gained a new local search solver, CCAnr (paper here) and it’s now the default. This is a local search solver by Shaowei Cai who very kindly let me add his solver to CryptoMiniSat and allowed me to add him as an author to the version of CryptoMiniSat that participated in the SAT competition. It’s a local search solver, so it can only solve satisfiable instances, and does so by always working on a full solution candidate that it tries to “massage” into a full solution.

Within CryptoMiniSat, CCAnr takes the starting candidate solution from the phases inside the CDCL solver, and tries to extend it to fit all the clauses. If it finds a satisfying assignment, this is emitted as a result. If it doesn’t, the best candidate solution (the one that satisfies the most clauses) is saved into the CDCL phase and is later used in the CDCL solver. Furthermore, some statistics during the local search phase are saved and then injected into the variable branching heuristics of the CDCL solver, see code here.

Hybrid Variable Branching

Variable branching in CryptoMiniSat has always been a mix of VSIDS (Variable State Independent Decaying Sum, paper here) and Maple (multi-arm bandit based, paper here) heuristics. However, both Maple and VSIDS have a bunch of internal parameters that work best for one, or for another type of SAT problem.

To go around the issue of trying to find a single optimal value for all, CryptoMiniSat now uses a combination of different configurations that is parsed from the command line, such as: “maple1 + maple2 + vsids2 + maple1 + maple2 + vsids1” that allows different configurations for both Maple and VSIDS (v1 and v2 for both) to be configured and used, right from the command line. This configuration system allows for a wider variety of problems to be efficiently solved.

Final Remarks

CryptoMiniSat is now used in many systems. It is the default SAT solver in:

QBF solver Caqe, which regularly wins QBF competitions
SMT solvers STP and MinkeyRink , SMT competition results here and here, regularly placing 2nd and 3rd in the QF_BV track
Model counting system ApproxMC, which seems to have won 2 of the 3 Model Counting Competition 2020 awards
Uniform sampling system Unigen
ANF solving system Bosphorus

I think the above, especially given their track record of achieving high performance in their respective fields, show that CryptoMiniSat is indeed a well-performing and reliable workhorse. This is thanks to many people, including, but not limited to, Kuldeep Meel, Kian Ming A. Chai, Trevor Hansen, Arijit Shaw, Dan Liew, Andrew V. Jones, Daniel Fremont, Martin Hořeňovský, and others who have all contributed pull requests and valuable feedback. Thanks!

As always, let me know if you have any feedback regarding the solver. You can create a GitHub issue here, and pull request here. I am always interested in new use-cases and I am happy to help integrate it into new systems.

ApproxMCv3, a modern approximate model counter

November 25, 2018Development, Research, SATGaussian elimination, model counting, Release, XORmsoos

This blogpost and its underlying work has been brewing for many years, and I’m extremely happy to be able to share it with you now. Kuldeep Meel and myself have been working very hard on speeding up approximate model counting for SAT and I think we have made real progress. The research paper, accepted at AAAI-19 is available here. The code is available here (release with static binary here). The main result is that we can solve a lot more problems than before. The speed of solving is orders(!) of magnitude faster than the previous best system:

Background

The idea of approximate model counting, originally by Chakraborty, Meel and Vardi was a huge hit back in 2013, and many papers have followed it, trying to improve its results. All of them were basically tied to CryptoMiniSat, the SAT solver that I maintain, as all of them relied on XOR constraints being added to the regular CNF of a typical SAT problem.

So it made sense to examine what CryptoMiniSat could do to improve the speed of approximate counting. This time interestingly coincided with me giving up on XORs in CryptoMiniSat. The problem was the following. A lot of new in- and preprocessing systems were being invented, mostly by Armin Biere et al, and I quickly realised that I simply couldn’t keep adding them, because they didn’t take into account XOR constraints. They handled CNF just fine, but not XORs. So XORs became a burden, and I removed them in versions 3 and 4 of CryptoMiniSat. But there was need, and Kuldeep made it very clear to me that this is an exciting area. So, they had to come back.

Blast-Inprocess-Recover-Destroy

But how to both have and not have XOR constraints? Re-inventing all the algorithms for XORs was not a viable option. The solution I came up with was a rather trivial one: forget the XORs during inprocessing and recover them after. The CNF would always remain the source of truth. Extracting all the XORs after in- and preprocessing would allow me to run the Gauss-Jordan elimination on the XORs post-recovery. So I can have the cake and eat it too.

The process is conceptually quite easy:

Blast all XORs into clauses that are in the input using intermediate variables. I had all the setup for this, as I was doing Bounded Variable Addition (also by Biere et al.) so I didn’t have to write code to “hide” these additional variables.
Perform pre- or inprocessing. I actually only do inprocessing nowadays (as it has faster startup time). But preprocessing is just inprocessing at the start ;)
Recover the XORs from the CNF. There were some trivial methods around. They didn’t work as well as one would have hoped, but more on that later
Run the CDCL and Gauss-Jordan code at the same time.
Destroy the XORs and goto 2.

This system allows for everything to be in CNF form, lifting the XORs out when necessary and then forgetting them when it’s convenient. All of these steps are rather trivial, except, as I later found out, recovery.

XOR recovery

Recovering XORs sounds like a trivial task. Let’s say we have the following clauses

 x1 V  x2 V  x3
-x1 V -x2 V  x3
 x1 V -x2 V -x3
-x1 V  x2 V -x3

This is conceptually equivalent to the XOR v1+v2+v3=1. So recovering this is trivial, and has been done before, by Heule in particular, in his PhD thesis. The issue with the above is the following: a stronger system than the above still implies the XOR, but doesn’t look the same. Let me give an example:

 x1 V  x2 V  x3
-x1 V -x2 V  x3
 x1 V -x2 V -x3
-x1 V  x2

This is almost equivalent to the previous set of clauses, but misses a literal from one of the clauses. It still implies the XOR of course. Now what? And what to do when missing literals mean that an entire clause can be missing? The algorithm to recover XORs in such cases is non-trivial. It’s non-trivial not only because of the complexity of how many combinations of missing literals and clauses there can be (it’s exponential) but because one must do this work extremely fast because SAT solvers are sensitive to time.

The algorithm that is in the paper explains all the bit-fiddling and cache-friendly data layout used along with some fun algorithms that I’m sure some people will like. We even managed to use compiler intrinsics to use target-specific assembly instructions for hamming weight calculation. It’s a blast. Take a look.

The results

The results, as shown above, speak for themselves. Problems that took thousands of seconds to solve can now be solved under 20. The reason for such incredible speedup is basically the following. CryptoMiniSatv2 was way too clunky and didn’t have all the fun stuff that CryptoMiniSatv5 has, plus the XOR handling was incorrect, loosing XORs and the like. The published algorithm solves the underlying issue and allows CNF pre- and inprocessing to happen independent of XORs, thus enabling CryptoMiniSatv5 to be used in all its glory. And CryptoMiniSatv5 is fast, as per the this year’s SAT Competition results.

Some closing words

Finally, I want to say thank you to Kuldeep Meel who got me into the National University of Singapore to do the work above and lots of other cool work, that we will hopefully publish soon. I would also like to thank the National Supercomputing Center Singapore that allowed us to run a ton of benchmarks on their machines, using at least 200 thousand CPU hours to make this paper. This gave us the chance to debug all the weird edge-cases and get this system up to speed where it beats the best exact counters by a wide margin. Finally, thanks to all the great people I had the chance to meet and sometimes work with at NUS, it was a really nice time.

Wonderings of a SAT geek

A blog about SAT solving and cryptography