Category Archives: Research

Research-related information

Predicting Clause Usefulness

In this post, I will be taking a shot at building prediction models for learnt clause usefulness by running over 150 unsatisfiable CNF problems, extracting over 140 features for each learnt clause, and then calculating whether the clause was used in the UNSAT proof. Here I need to thank Marijn Heule who has helped me adding a clause ID to each clause in his DRAT checker so I could perform supervised learning on the collected data. Thanks Marijn!

The first works on machine learning in SAT has been by the crowd creating portifolio solvers such as SATZilla that calculate features of a CNF and then run the SAT solver that is predicted to solve that instance the fastest. This has been achieved by coming up with a set of easy-to-calculate but useful features and then using a standard prediction method to fit SAT solvers to CNFs. The issue, as always with machine learning, is overfitting. Unfortunately, CNFs tended to stay very similar between old SAT Competitions, and overfitting was richly rewarded. Hence, these portfolio solvers sometimes did well in competitions but sometimes relatively poorly in practice.

The second wave of improvements came with MapleSAT where a multi-armed bandit (MAB) framework was used to sometimes to pick branching variables. This won the 2016 SAT Competition‘s Main Track and was a novel and truly interesting idea, manipulating the branching heuristic based on the feature that the researchers called “learning rate“.

With this blog post, I hope to inspire a third wave of improvements in SAT solver performance. Learnt clause usefulness metrics have been a a very hot potato since the glucose SAT solver which engineered and used a new feature, glues, to decide the usefulness on. Maybe with the work and data provided below, we could use features not used before, or combine them in novel ways to achieve better prediction accuracy.

A History of Clause Usefulness Heuristics

For a long time, learnt clauses were chosen to be kept around in the clause database based on the feature called their “activity”, which measured how often was the clause involved in generating conflict clauses. This was the heuristic used in the original MiniSat.

The next big step came with glues that was present in the earliest of the Glucose solver family. This was a huge step in a different direction — suddenly a new feature has been engineered, “glue”, and the heuristic used to decide has changed. It was now the glue of the clause, relative to the other clauses, that determined whether the clause stayed or was discarded.

The next step was a hybrid approach, that stays until today. This says that a learnt clause has very low glue should stay no matter what, clauses with slightly higher glues should stick around for a while at least, and clauses with much higher glues should be only kept around for a short while, based on their activities. This is the strategy used by MapleSAT, a very successful SAT solver.

DRAT and the Era of Knowing What’s Good

The DRAT-trim proof checker has ushered in a new possibility in the era of SAT solving. We could finally know how the UNSAT proof was built, and furthermore, we could know which lemmas, i.e. which learnt clauses were actually used to build that proof. This means that we could finally measure all sorts of features of clauses, e.g. their glue or their size, or their activity, store this along with the clause, and once the SAT solver has finished and we know which clauses were actually useful, try to build predictive models using supervised machine learning. However, for whatever reason, nobody did this.

Thanks to Marijn Heule, who I asked to help me parse Clause IDs in DRAT, I have finally built a system that does exactly as above. It gathers over 140 features about each learnt clause, then when the solving finishes, it runs DRAT and then associates a “OK” or “BAD” nominal class value with the features. The output is an SQLite file for each solved problem. I have picked 153 problems that all solved as UNSAT within less than 10 minutes from the past 6 years of SAT competitions, and have run my setup on it. The solver used was CryptoMiniSat and so as not to taint the results, I have set it not to delete any clause at all from the database (i.e. no clause cleaning was performed) and set it to preprocess the instance but not to perform any in-processing. Furthermore, the restart heuristic was set to be the one pioneered by swdia5by, i.e. glue-based and geometric restarts are combined, in an iterative way.

The Features

The features saved broadly fall into four categories:

  1. Features computed once in a while on the whole CNF. These are similar to those engineered in SATZilla and other portifolio solvers. Thanks go to Yuri Malitsky and Horst Samulowitz who sent me “features_fast.c”. This includes things like statistics about clauses and variable distribution, etc. These start with “feat.” such as “feat.horn_min”
  2. Features of the previous restart. This includes things like the number of binary clauses learnt, average branch depth, average learnt clause glue size, clause size, etc. These start with “rst.” such as “rst.propLongRed” — the number of propagations made by long redundant (what most solvers call “learnt”) clauses.
  3. Features of the current clause such as its size, glue, its variables’ average standardised activity (i.e. divided by “var_inc”), etc. These start with “cls.” such as “cl.conflicts_this_restart”, i.e. the number of conflicts in this restart.
  4. Some computed features, such as “cl.glue_rel” which is equal to “cl.glue” / “cl.glue_hist”

The final “class” attribute is set to either “OK” meaning the clause was used in the final UNSAT proof, or “BAD” meaning it was not. Obviously, we want to never keep the BAD ones, and only keep the OK ones. If we could predict based on the four types of features above (which are all available at the time of the clause creation) which clauses we should keep, then we potentially could solve problems a lot faster. Partially because we would just throw the clauses away that we predict to be useless, and partially because we could steer the solver towards regions of the search space that contain more useful clauses — e.g. by restarting when we decide that where we are is a poor place to be.

The Data

CSV is available HERE

First of all, a disclaimer. There are a million ways this can be done differently, improved, or made worse. I have collected over a hundred features and I think they are useful. I also have conducted the experiments on over 150 problems each running at most 10 minutes (but at least 10’000 conflicts), using CryptoMiniSat, never deleting any clause. This constitutes millions of data points.  I have spent over two years in doing this, and I have been analysing different versions of this dataset for multiple months.  Hence, I may well be biased, in both the way the data has been collected and how I’m analysing it. To counter these biases, I am making the data available so you can perform your own analysis and CryptoMiniSat is made fully open source, including all that you need to completely and precisely re-generate this (or any other) data. See the bottom of this post for some how-tos.

I am only making available the sampled CSV for space and practicality reasons. I created a CSV that contains 153 problems’ datapoints, picking 10’000 randomly from each, which was then randomly sampled to 30’000 elements. This 51MB data is available here. I strongly suggest using Weka to read this file. In case you have trouble using Weka and/or creating predictive models, I’d suggest the free online course Data Mining with Weka.

Note: I have the 1530k line file as well, but it’s 2.6GB. If you really need it, please let me know at my email. For almost all analysis, 30k datapoints is sufficient — Weka will not be able to practically work with any dataset over 100k elements. In case you want to perform deep learning, we can take a shot at that 2.6GB data piece together. However, I think clustering and per-cluster prediction models are a lower-hanging fruit.

Preliminary Analysis

So, what does the data tell us? Throughout here I will use a percentage split of 66% for the training-test set, using the 30k-line data file above. All results below are trivially reproducible.

Zero Rule Check
First of all let’s do a ZeroR analysis, i.e. decide to keep or not keep a clause without looking at any data. We get:

So we have about 50% useful and 50% useless clauses. Note that this is may not be representative for problems that are difficult. We picked problems that solve under 10 minutes that have at least 10’000 conflicts.

One Rule Check
Next up, let’t do the slightly less trivial setup and do a OneR, i.e. one rule analysis. Here, Weka is only allowed to pick one feature to decide. One would hope this to be glue. It isn’t. Let’s set the minimum bucket size to 100, which means Weka won’t give us a 100 different small decisions (i.e. won’t overfit, which one can observe in some portifolio solvers). We get:

What is very strange here is that the best predictor is not glue. In fact, the next best predictors are, in order:

  1. cl.decision_level (72.3 %)
  2. rst.branchDepth (67.4%)
  3. rst.branchDepthDelta (67.6 %)
  4. cl.decision_level_hist (67.1 %)
  5. cl.backtrack_level_hist (67.1 %)

So they all seem to have something to do with the branch depth/decision depth. If forced, Weka will draw the line at glue 15 and below to be OK and 16 and above to be BAD, giving a 61.9% accuracy.

J48 Decision Tree
Let’s try to see if we can get some interesting result with J48, a standard decision tree-building setup. I’m going to put a minimum of 400 elements in any bucket, or we will have overfitting.

Interesting… so we first branch on backtrack_level, which is not unsurprising, given the OneR results above. Then we either use the glue distribution variance of the last time we measured the features, “feat.red_glue_distr_var” or the previous restart’s average branch depth delta (i.e. backjump-size), “rst.branchDepthDelta”. The lower branches are interesting as well.

Random Decision Tree
Just for completeness sake, let’s try to do a RandomTree as well. Let’s set the MinNum to 500 so we can combat overfitting. We then get a pretty complicated tree with the following stats:

Which is pretty fair performance, but the tree is hard to understand and its performance is worse than J48, though not substantially.

There are other analyses that could be done, for example clustering of similar problems — you can find the CNF filename in feature “fname” (though you should never make any decisions based on that in your decision trees). Building problem classes may be interesting — we could cut up the CSV into different classes and create different prediction models for each. I am pretty sure this would significantly boost our prediction performance.

You can use Weka to visualize the data in many ways. Here is one:

You can use such visualisations to find anomalies, incorrect datapoints, or skews in the collection of data. If you do find any of these, please email me. I am happy to correct them of course — we might even achieve better prediction.

How to Do This at Home

It’s quite easy to create data for your own CNF instances. Just follow the README on the CryptoMiniSat GitHub repository, which says to get a clean Debian or Ubuntu distribution and then:

This will do predictive analysis using Python’s scikit-learn’s DecisionTreeClassifier (which may or may not be what you want) on a test problem and build a nice visual PNG tree too using graphviz:

Note that this decision tree is highly non-representative — it ran on a single, incredibly small instance. You can look at test_predict/data.sqlite.csv with Weka to run your own analysis and build your own, custom trees. To run this on a CNF of your own choice, run:

You will then have your scikit-learn tree generated and more importantly, you will have the CSV under “myoutput/data.sqlite.csv” so you can read that with Weka and do your own analysis. Beware that you probably want to cut down the size of that CSV before loading it into Weka — but randomise first or you will only see the beginning of the solving. Concatenating CSVs of different runs is easy, just make sure to strip the topmost line from CSV: it’s the header and should not be repeated. For randomisation, use “shuf file” and for cutting the header off, use “tail -n +2 file”. To get only the header, use “head -n 1 file”:


I think the above could help how choosing parameters/features and cutoffs when deciding whether to keep a learnt clause and in determining when we are in a bad place in search. This has been done experimentally until now, playing with the cutoffs, trying new cutoffs, and running all the problems on a large cluster, many times.

With the above data in hand, I think we could do better. Of course we will still be using clusters, verifying what the data analysis is indicating. But we may be able to check, engineer and determine the features and their cutoffs better. And finally, with all this data, I am hopeful that we will be able to have a discussion grounded in more data than just solving times.

Machine Learning and SAT

I have lately been digging myself into a deep hole with machine learning. While doing that it occurred to me that the SAT community has essentially been trying to imitate some of ML in a somewhat poor way. Let me explain.

CryptoMiniSat and clause cleaning strategy selection

When CryptoMiniSat won the SAT Race of 2010, it was in large part because I realized that glucose at the time was essentially unable to solve cryptographic problems. I devised a system where I could detect which problems were cryptographic. It checked the activity stability of variables and if they were more stable than a threshold, it was decided that the problem was cryptographic. Cryptographic problems were then solved using a geometric restart strategy with clause activities for learnt database cleaning. Without this hack, it would have been impossible to win the competition.
Continue reading

Why it’s hard to eliminate variables

Let’s examine why it’s hard to eliminate variables. I remember the code I looked at in SatElite that did it: it was crazy clean code and looked like it was pretty easy to perform. In this post I’ll examine how that simple code became more than a 1’000 lines of code today.

What needs to be done, at first sight

At first sight, variable elimination is easy. We just:

  1. Build occurrence lists
  2. Pick a variable to eliminate
  3. Resolve every clause having the positive literal of the variable with negative ones.
  4. Add newly resolved clauses into the system
  5. Remove original clauses.
  6. Goto 2.

These are all pretty simple steps at first sight, and one can imagine that implementing them is maybe 50-100 lines of code, no more. So, let’s examine them one-by-one to see how they get complicated.

Building occurrence lists

The idea is that we simply take every single clause, and for every literal they have, we insert a pointer to the clause into an array for that literal’s occurrences. This sounds easy, but what happens if we are given 1M clauses, each with 1000 literals on average? If you think this is crazy, it isn’t, and does in fact happen.

One option is we estimate the amount of memory we would use and abort early because we don’t want to run out of memory. So, first we check the potential size, then we link them in. Unfortunately, this means we can’t do variable elimination at all. Another possibility is that we link in clauses only partially. For example, we don’t link in clauses that are redundant but too long. Redundant clauses are ignored during resolution when eliminating, so this is OK, but then we will have to clean these clauses up later, when finishing up. However, if a redundant clause that hasn’t been linked in backward-subsumes an irredudant clause (and thus becomes irredundant itself), we have to link it in asap. Optimisation leads to complexity.

We don’t just want to link these clauses in to some random datastructure. I believe it was Armin Biere who put this idea into my head, or maybe someone else, but re-using watchlists for occurrence lists means we use our memory resources better: there won’t be so much fragmentation. Furthermore, an advanced SAT solver uses implicit binary & tertiary clauses, so those are linked in already into the watchlists. That saves memory.

Picking a variable to eliminate

The order in which you eliminate clauses is a defining part of the speed we get with the final solver. It is crucially important that this is done well. So, what can we do? We can either use some heuristic or precisely calculate the gain for each variable, and eliminate the best guess/calculated one first. These are both greedy algorithms but I think given the complexity of the task, they are the best at hand.

Using precise calculation is easy, we just resolve all the relevant clauses but don’t add the resolvents. It’s very expensive though. A better approach is to use a heuristic. Logically, clauses that have few literals in them are likely not to resolve such that they become tautologies. It’s unlikely that two binary clauses’ resolvent becomes a tautology. It’s however likely that large clauses become tautological once resolved. I take this into account when calculating elimination cost for variable. Since redundant clauses are linked in the occurrence lists so that I can subsume them, I have to skip them.

It’s not enough to calculate the heuristic once, of course. We have to re-calculate after every elimination — the playing field has changed. Thus, for every clause you removed, you have to keep in mind which variables were affected, and re-calculate the cost for each after every variable elimination.

Resolving clauses

The base is easy. We add literals to a new array of literls and mark the literals that have been added in a quick-lookup array. If the opposite of a literal is added, the markings tell us and we can skip the rest — the resolvent is tautological. Things get hairy if the clause is not tautological.

What if the new clause is subsumed by already-existing clauses? Should we check for this? This is called forward-subsumption, and it’s really expensive. Backward subsumption (which asks the question ‘Does this clause subsume others?’ instead of ‘Is this clause subsumed by others?’) would be cheaper, but that’s not the case here. We can thus try to subsume the clause only by e.g. binary&tertiary clauses and hope for the best.

What if the new clause can be subsumed by stamps? That’s easy to check for, but if the new clause was used to create the stamp, that would be a self-dependency loop and not adding the resolvent would lead to an incorrect result. We can use the stamps as long as the resolving clauses were not needed for the stamp: i.e. they are not binary clauses and on-the-fly hyper-binary resolution was used during every step of stamp generation. A similar logic goes for using the implication cache.

We could also virtually extend the clause with literals using watchlists/stamps/impl. cache and then try to subsume that virtual clause. I forgot what 3-letter acronym Biere et al. gave to this method (it’s one of the 12 on slide 25 here), but, except for the acronym, this idea is pretty simple. You take a binary clause, e.g. xV~y, and if x is in the newly created clause, but y and ~y is not, you add y to the clause. The clause is now bigger, so has a larger chance to be subsumed. You now perform forward subsumption as above, but with the extended clause. Also, take care not to subsume clauses with themselves, which, as you might imagine, can get hairy.

If all of this sounds a bit intricate, this is not even the difficult part. The difficult part is keeping track of time. Where of course by time I don’t actually mean seconds — I mean computation steps that you have to define one way or another and increment counters and set limits. Remember: all this has to be deterministic.

Doing all of the above with a small but complicated instance is super-fast, under 0.001s. With a weird instance where one single literal may occur in more than a million clauses, it can be very-very expensive even for one single try — over 100s. That’s about 5 orders of magnitude of difference. So, you have to be careful. The resolution we cannot skip, but we can abort it (and indicate it up in the call tree). Some of the others we can abort, but then the whole resolution has to be re-started. Some of the above is not critical at all, so you have to use a different time-limit for some, and mark them as too expensive, so at least the basic things get done. This gets complicated, because e.g. forward-subsumption you might want to re-use at other parts of the solver so you have to use a time-limit that isn’t global.

Adding the newly resolved clauses

Adding clauses is simple: we create and link them in. However, we can do more. Since backward-subsumption is fast, we can do that with the newly created clauses. Note that this means the newly created clause could subsume some of the original clauses it was created from — which means the resolvents should be pre-generated and kept in memory.

Another thing: since we know the new clause needs to be added, we might as well shorten it before in any way we can. At this point, we can make use of all the watchlists, stamps and implication cache we have to shorten the new clause: there are no problems with self-dependencies. It will pay off. However, note that shortening the clause before adding it means that we will have to reverse-shorten it later, when this clause might be part of a group of clauses that is touched by a new variable elimination round. So, we are working against ourselves in a way — especially because reverse shortening is pretty expensive and hairy as explained above.

Although this is obvious, but we still have to take care of time-outs. For example, if resolution took so much time that we are already out of time, we must exit asap and not worry about the resolvents. Don’t link, don’t remove, just exit. Time is of essence.

Removing the original clauses

Easy, just unlink them from the occurrence lists. I mean, easy if you don’t care about time, of course. Because unlinking is an O(N^2) operation if you have N clauses and all of them contain the same literal X — the N-long occurrence list of literal X has to be read and updated N times. So, we don’t do this.

First of all, a special case: the two occurrence lists of the variable we are removing can simply be .clear()-ed. It’s no longer needed. Secondly, we shouldn’t unlink clauses one-by-one. Instead, we should mark the clause as removed, and then not care about the clause later. Once variable elimination is finished, we do a sweep of all the occurrence lists and clauses and remove the clauses that have been marked. This means that e.g. forward and backward subsumption gets more hairy (we shouldn’t subsume with a clause that’s been marked as removed but is still in the occurrence list) but that O(N^2) becomes O(N) which for problems where N is large makes quite a bit of difference. Like, the difference of 100s vs. 10s for a the same exact thing.

The untold horrors

On top of what’s above, you might like to generate some statistics about what worked and what didn’t. You might like to dump these statistics to a database. You might like to not create resolutions that are not needed as the irreduntant clauses form an AND/ITE gate. Or multiple gates. You might like to eliminate only a subset of variables at each call so that you don’t make your system too sparse and thus reduce arc consistency. You might want to vary this limit based on the problem at hand. You might want to do many other things that are not detailed above.


Once I read through the above, I realized I kind of missed the essence: time-outs. It’s mentioned here and there, but it’s much more critical than it seems and makes things a hell of a lot harder. How do you cleanly exit from the middle of reverse-shortening while resolving because you ran out of time? I could just bury my head in sand of course and say: I don’t care. Or, I could make some messy algorithm that checks return values of each call and return a special value in case of time-outs. This needs to be done for every level of the call, which can be pretty deep, unless you like writing 1’500 line functions. I wanted to say writing&reading, but, really, nobody reads 1’500 line functions. They are throw-away,write-only code.

A note on learnt clauses

Learnt clauses are clauses derived while searching for a solution with a SAT solver in a CNF. They are at the heart of every modern so-called “CDCL” or “Conflict-Driven Clause-Learning” SAT solver. SAT solver writers make a very important difference between learnt and original clauses. In this blog post I’ll talk a little bit about this distinction, why it is important to make it, and why we might want to relax that distinction in the future.

A bit of terminology

First, let me call “learnt” clauses “reducible” and original clauses “irreducible”. This terminology was invented by Armin Biere I believe, and it is conceptually very important.

If a clause is irreducible it means that if I remove that clause from the clause database and solve the remaining system of constraints, I might end up with a solution that is not a solution to the original problem. However, these clauses might not be the “original” clauses — they might have been shortened, changed, or otherwise manipulated such as through equivalent literal replacement, strengthening, etc.

Reducible clauses on the other hand are clauses that I can freely remove from the clause database without the risk of finding a solution that doesn’t satisfy the original set of constraints. These clauses could be called “learnt” but strictly speaking they might not have been learnt through the 1st UIP learning process. They could have been added through hyper-binary resolution, they could have been 1UIP clauses that have been shortened/changed, or clauses obtained through other means such as Gaussian Elimination or other high-level methods.

The distinction

Reducible clauses are typically handled “without care” in a SAT solver. For example, during bounded variable elimination (BVE) resolutions are not carried out with reducible clauses. Only irreducible clauses are resolved with each other and are added back to the clause database. This means that during variable elimination information is lost. For this reason, when bounded variable addition (BVA) is carried out, one would not count the simplification obtained through the removal of reducible clauses, as BVE could then completely undo BVA. Naturally, the heuristics in both of these systems only count irreducible clauses.

Reducible clauses are also regularly removed or ‘cleaned’ from the clause database. The heuristics to perform this has been a hot topic in the past years and continue to be a very interesting research problem. In particular, the solver Glucose has won multiple competitions by mostly tuning this heuristic. Reducible clauses need to be cleaned from the clause database so that they won’t slow the solver down too much. Although they represent information, if too many of them are present, propagation speed grinds to a near-halt. A balance must be achieved, and the balance lately has shifted much towards the “clean as much as possible” side — we only need to observe the percentage of clauses cleaned between MiniSat and recent Glucose to confirm this.

An observation about glues

Glues (used first by Glucose) are an interesting heuristic in that they are static in a certain way: they never degrade. Once a clause achieves glue status 2 (the lowest, and best), it can never loose this status. This is not true of dynamic heuristics such as clause activities (MiniSat) or other usability metrics (CryptoMiniSat 3). They are highly dynamic and will delete a clause eventually if it fails to perform well after a while. This makes a big difference: with glues, some reducible clauses will never be deleted from the clause database, as they have achieved a high enough status that most new clauses will have a lower status (a higher glue) and will be deleted instead in the next cleaning run.

Since Glucose doesn’t perform variable elimination (or basically any other optimization that could forcibly remove reducible clauses), some reducible clauses are essentially “locked” into the clause database, and are never removed. These reducible clauses act as if they were irreducible.

It’s also interesting to note that glues are not static: they are in fact updated. The way they are updated, however, is very particular: they can obtain a lower glue number (a higher chance of not being knocked out) through some chance encounters while propagating. So, if they are propagated often enough, they have a higher chance of obtaining a lower glue number — essentially having a higher chance to be locked into the database.

Some speculation about glues

What if these reducible clauses that are locked into the clause database are an important ingredient in giving glues the edge? In other words, what if it’s not only the actual glue number that is so wildly good at guessing the usefulness of a reducible clause, instead the fact that their calculation method doesn’t allow some reducible clauses ever to be removed also significantly helps?

To me, this sounds like a possibility. While searching and performing conflict analysis SAT solvers are essentially building a chain of lemmas, a proof. In a sense, constantly removing reducible clauses is like building a house and then knocking a good number of bricks out every once in a while. If those bricks are at the foundation of the system, what’s above might collapse. If there are however reducible clauses that are never “knocked out”, they can act as a strong foundation. Of course, it’s a good idea to be able to predict what is a good foundation, and I believe glues are good at that (though I think there could be other, maybe better measures invented). However, the fact that some of them are never removed may also play a significant role in their success.

Locking clauses

Bounded variable addition is potentially a very strong system that could help in shortening proofs. However, due to the original heuristics of BVE it cannot be applied if the clauses it removes are only reducible. So, it can only shorten the description of the original problem (and maybe incidentally some of the reducible clauses) but not only the reducible clauses themselves. This is clearly not optimal for shortening the proof. I don’t know how lingeling performs BVA and BVE, but I wouldn’t be surprised if it has some heuristic where it treats some reducible clauses as irreducible (thereby locking them) so that it could leverage the compression function of BVA over the field of reducible clauses.

Unfortunately, lingeling code is hard to read, and it’s proprietary code so I’d rather not read it unless some licensing problems turn up. No other SAT solver performs BVA as an in-processing method (riss performs it only as pre-processing, though it is capable to perform BVA as in-processing), so I’m left on my own to guess this and code it accordingly.

UPDATE: According to Norbert Manthey lingeling doesn’t perform BVA at all. This is more than a little surprising.

End notes

I believe it was first Vegard Nossum who put into my head the idea of locking some reducible clauses into the database. It only occurred to me later that glues automatically achieve that, and furthermore, they seem to automatically lock oft-propagated reducible clauses.

There are some problems with the above logic, though. I believe lingeling increments the glue counter of some (all?) reducible clauses on a regular basis, and lingeling is a good solver. That would defeat the above logic, though the precise way glues are incremented (and the way they are cleaned) in lingeling is not entirely clear to me. So some of the above could still hold. Furthermore, lingeling could be so well-performing for other reasons — there are more to SAT solvers than just search and resolution. Lately, up to 50% or more of the time spent in modern SAT solvers could be used to perform actions other than search.

Why CryptoMiniSat 3.3 doesn’t have XORs

I’ve had a number of inquiries about why CryptoMiniSat 3.3 doesn’t have native XOR clause support. Let me explain what I think the benefits and the drawbacks are of having native XOR support and why I made the choice of removing them.


The most trivial benefit is that users don’t have to translate XORs into plain CNF. Although transforming is rather easy, some prefer not to go through the hassle. The principle is to cut the XOR into smaller XORs by introducing new variables and then representing the cut-down XORs by banning all the wrong combinations. For example the the XOR x+y+z+w=0 is cut into two XORs x+y+r=0, z+w+r=0 and then x+y+r=0 is represented as x \vee y \vee r, x \vee \neg y \vee \neg r, \ldots.

A more substantial benefit is that propagation and conflict generation can be done natively. This is quite advantageous as it means that far less clauses and variables need to be visited while propagating and generating the UIP conflict. Less variables and clauses visited means less cache-misses and due to the more compact representation, less memory overhead and thus even better cache usage. (For the aficionados: an interesting side-effect of XORs is that blocked literals cannot be used for XOR clauses).

The biggest benefit of having native XORs is that Gaussian elimination can be carried out at every level of the search tree, not just at top-level. Although most people I have talked with still think that CryptoMiniSat 2 only performs Gaussian elimination only at top-level, this is not the case. Performing Gauss at top-level is a two-day hack. Performing it at every level took me about half a year to implement (and about 5K lines of code). It requires leaving traces of past computations and computing reasons for propagations and conflicts indicated by the Gaussian elimination algorithm. According to my highly unofficial count, exactly 2 people use Gaussian elimination at deeper than top-level, Vegard Nossum and Kuldeep S Meel. Out of these two users, only the latter have indicated speedups. It speeds up crypto problems, but not many others, so it’s completely disabled by default.


The main drawback of having native XOR clauses along normal CNF clauses is the loss of simplicity and universality. Let me explain in detail.

Loss of simplicity. Once a variable present in both a normal and an XOR clause, it cannot be eliminated in a simple way. I don’t even want to try to express the clauses that would result from doing varelim in such cases — it’s probably horribly convoluted. All algorithms in the code now need to check for XORs. On-the-fly clause strengthening cannot strengthen an XOR with a normal clause, of course. Clause cleaning has to take into account that when a 3-long XOR is shortened to a 2-long one, it indicates a new equivalent literal and that may lead to immediate UNSAT. I could go on, but the list is long.

Another problem is that XORs make the state more complex. Until now there were only clauses of the form a \vee \neg b \ldots, sometimes specially stored such as binary clauses. The state of the solver goes through many transformations — think of them as llvm passes — while it is in-processed and solved. The more complex the state, the larger and the more complex the code. While a simple clause-cleaning algorithm can be expressed in about 20 lines of code without implicit binary&tertiary clauses, the same algo blows up to about 100 lines of code with those optimizations. Once you add XOR clauses, it jumps to around double that. We just made our code 10x larger. Imagine adding native support for, e.g. at-most-n.

Loss of universality. If a variable is only XOR clauses, it can now be eliminated at the XOR level. A pure variable at the XOR level indicates an XOR clause that can be removed and later always satisfied. XOR subsumption is when an XOR subsumes the other: the larger can be removed and their XOR added in. All of these are implemented in CMS 2. Probably there are many others out there, though. We could try to implement them all, but this is a whole new universe that may open up the gates of hell. Worst of all, most of these techniques on XORs are probably already simulated by in-processing techniques at the CNF level. Pure XOR variables are simulated by blocked clause elimination. XOR-based variable elimination doesn’t make practical sense at the CNF level, but it could be simulated by normal variable elimination if ignoring heuristics. I could go on.

What CryptoMiniSat 3.3 does

The compromise I came up with for CryptoMiniSat 3.0 is that I regularly search for XOR clauses at the top-level, perform top-level Gaussian elimination, and add the resulting new information to the CNF. Binary and unitary XOR clauses are the simplest to add back.

In the SAT’11 application benchmark set, out of 300 problems, 256 problems contain XOR clauses. the number of XOR clauses found, when they are found, is on average 2905 with an average size of 4.2 literals. Using Gaussian elimination, counting only when something was extracted, the average number of literal equivalences (binary XORs) extracted is 130, while the number of unitary clauses extracted is 0.15.

Here is a typical example output:

Here, about 104K XOR clauses were found (using a sophisticated algorithm) within 2.11s, all of size 3. These XOR clauses were subjected to disconnected component analysis (within the XOR sphere) and they were found to belong to 2185 components. It is important to cut XORs into groups, because Gaussian elimination is an O(n^{\approx 2.5}) algorithm and if we can reduce the n by having more than one matrix, the speed is significantly increased. Next, these 2185 matrices were Gauss-eliminated and were found to contain 1275 binary XOR clauses, which were quickly marked as equivalent literals.

To perform Gaussian elimination at top-level, CryptoMiniSat uses the excellent m4ri library, maintained by my old college and friend Martin Albrecht. It’s a lightweight yet extremely fast Gaussian elimination system using all sorts of nifty tricks to do its job. Unfortunately, for large matrices it can still take a long while as the algorithm itself is not cheap, so I have a cut-off for matrices that are too large.

What a future CryptoMiniSat 3.x could do

Once we have extracted the XORs, we could just as well keep them around during search and perform Gaussian elimination at levels below the top-level. This would bring us close to the old system in terms of real performance — native propagation (which would be unavailable) can only give a 1.5x-2x speedup, but Gaussian elimination at every level can give much more. I would need to clean up a lot of code and then, maybe, this would work. Maybe I’ll do this one day. Though, after spending weeks doing it, probably people will still believe it only does top-level. At least right now, it’s the case.


Implementing a new theory such as XOR deep into the guts of a SAT solver is neither easy nor does it provide a clear advantage in most situations. Those who push for these native theories have either not tried implementing them into a complicated solver such as lingeling/SatELite/CryptoMiniSat/clasp or have already bit the bullet such as the clasp group and I did, and it is probably limiting them in extending the system with new techniques. The resulting larger internal state leads to edge cases, exceptions, and much-much more code.