Multi-threading

April 23, 2014SATParallel SAT Solvingmsoos

Just a short note: this Easter holiday I got a bit bored and did a mini-threading system for CryptoMiniSatv4. It will now be able to compete in the multi-threading track after quite a number of years of absence. Last time I did this, CryptoMiniSat won the UNSAT multi-threaded track, which was a bit funny because all I shared were unitary and binary clauses. For the simplicity and the fun, I’m doing the same this time around. Let’s hope all the super-complicated and extra-theoretical stuff will get won over.

Non-blocking datastructs

Some authors implement a non-blocking data-sharing construct to gain more performance out of the system. I think/hope that what is shared is more important than how. I use one global lock for the shared unit and one for the shared binary clauses. Since there won’t be more than 8 threads under normal circumstances and I have to lock about once per 10sec per thread for about 0.01s, there won’t be any contention. Under these circumstances, simplicity wins over the complexity of using non-locking datastructures. Also, these fancy datastructs probably don’t have an implementation that is C++11 (i.e. portable) and is already implemented in gcc 4.7.1 (which doesn’t support a number of the threading constructs), the compiler used in the competition.

Library usage

This time around I didn’t mess it up: multi-threading can be used from the library interface. The system completely hides the fact that multiple threads are competing in the background to solve the problem. The threads are cleanly and safely stopped and then re-started when the user asks for another solve() operation, optionally with assumptions. In other words, the user doesn’t see and doesn’t have to bother with the complexity of threads. All of that is abstracted away behind a neat and clean API. This means that e.g. the different threads can have radically different variable numbers. They could have renumbered the variables differently, added new and different BVA variables, etc. All of this is hidden away and the user is presented with and always communicates using his/her own variable numbers.

Final notes

This simple multi-threading system means that I can now port my old MPI code to use the new CryptoMiniSat. Maybe, once again, CryptoMiniSat will run on a cluster. Fun times ahead!

Why it’s hard to eliminate variables

April 16, 2014Research, SATOptimisation, resolution, simplification, variable eliminationmsoos

Let’s examine why it’s hard to eliminate variables. I remember the code I looked at in SatElite that did it: it was crazy clean code and looked like it was pretty easy to perform. In this post I’ll examine how that simple code became more than a 1’000 lines of code today.

What needs to be done, at first sight

At first sight, variable elimination is easy. We just:

Build occurrence lists
Pick a variable to eliminate
Resolve every clause having the positive literal of the variable with negative ones.
Add newly resolved clauses into the system
Remove original clauses.
Goto 2.

These are all pretty simple steps at first sight, and one can imagine that implementing them is maybe 50-100 lines of code, no more. So, let’s examine them one-by-one to see how they get complicated.

Building occurrence lists

The idea is that we simply take every single clause, and for every literal they have, we insert a pointer to the clause into an array for that literal’s occurrences. This sounds easy, but what happens if we are given 1M clauses, each with 1000 literals on average? If you think this is crazy, it isn’t, and does in fact happen.

One option is we estimate the amount of memory we would use and abort early because we don’t want to run out of memory. So, first we check the potential size, then we link them in. Unfortunately, this means we can’t do variable elimination at all. Another possibility is that we link in clauses only partially. For example, we don’t link in clauses that are redundant but too long. Redundant clauses are ignored during resolution when eliminating, so this is OK, but then we will have to clean these clauses up later, when finishing up. However, if a redundant clause that hasn’t been linked in backward-subsumes an irredudant clause (and thus becomes irredundant itself), we have to link it in asap. Optimisation leads to complexity.

We don’t just want to link these clauses in to some random datastructure. I believe it was Armin Biere who put this idea into my head, or maybe someone else, but re-using watchlists for occurrence lists means we use our memory resources better: there won’t be so much fragmentation. Furthermore, an advanced SAT solver uses implicit binary & tertiary clauses, so those are linked in already into the watchlists. That saves memory.

Picking a variable to eliminate

The order in which you eliminate clauses is a defining part of the speed we get with the final solver. It is crucially important that this is done well. So, what can we do? We can either use some heuristic or precisely calculate the gain for each variable, and eliminate the best guess/calculated one first. These are both greedy algorithms but I think given the complexity of the task, they are the best at hand.

Using precise calculation is easy, we just resolve all the relevant clauses but don’t add the resolvents. It’s very expensive though. A better approach is to use a heuristic. Logically, clauses that have few literals in them are likely not to resolve such that they become tautologies. It’s unlikely that two binary clauses’ resolvent becomes a tautology. It’s however likely that large clauses become tautological once resolved. I take this into account when calculating elimination cost for variable. Since redundant clauses are linked in the occurrence lists so that I can subsume them, I have to skip them.

It’s not enough to calculate the heuristic once, of course. We have to re-calculate after every elimination — the playing field has changed. Thus, for every clause you removed, you have to keep in mind which variables were affected, and re-calculate the cost for each after every variable elimination.

Resolving clauses

The base is easy. We add literals to a new array of literls and mark the literals that have been added in a quick-lookup array. If the opposite of a literal is added, the markings tell us and we can skip the rest — the resolvent is tautological. Things get hairy if the clause is not tautological.

What if the new clause is subsumed by already-existing clauses? Should we check for this? This is called forward-subsumption, and it’s really expensive. Backward subsumption (which asks the question ‘Does this clause subsume others?’ instead of ‘Is this clause subsumed by others?’) would be cheaper, but that’s not the case here. We can thus try to subsume the clause only by e.g. binary&tertiary clauses and hope for the best.

What if the new clause can be subsumed by stamps? That’s easy to check for, but if the new clause was used to create the stamp, that would be a self-dependency loop and not adding the resolvent would lead to an incorrect result. We can use the stamps as long as the resolving clauses were not needed for the stamp: i.e. they are not binary clauses and on-the-fly hyper-binary resolution was used during every step of stamp generation. A similar logic goes for using the implication cache.

We could also virtually extend the clause with literals using watchlists/stamps/impl. cache and then try to subsume that virtual clause. I forgot what 3-letter acronym Biere et al. gave to this method (it’s one of the 12 on slide 25 here), but, except for the acronym, this idea is pretty simple. You take a binary clause, e.g. xV~y, and if x is in the newly created clause, but y and ~y is not, you add y to the clause. The clause is now bigger, so has a larger chance to be subsumed. You now perform forward subsumption as above, but with the extended clause. Also, take care not to subsume clauses with themselves, which, as you might imagine, can get hairy.

If all of this sounds a bit intricate, this is not even the difficult part. The difficult part is keeping track of time. Where of course by time I don’t actually mean seconds — I mean computation steps that you have to define one way or another and increment counters and set limits. Remember: all this has to be deterministic.

Doing all of the above with a small but complicated instance is super-fast, under 0.001s. With a weird instance where one single literal may occur in more than a million clauses, it can be very-very expensive even for one single try — over 100s. That’s about 5 orders of magnitude of difference. So, you have to be careful. The resolution we cannot skip, but we can abort it (and indicate it up in the call tree). Some of the others we can abort, but then the whole resolution has to be re-started. Some of the above is not critical at all, so you have to use a different time-limit for some, and mark them as too expensive, so at least the basic things get done. This gets complicated, because e.g. forward-subsumption you might want to re-use at other parts of the solver so you have to use a time-limit that isn’t global.

Adding the newly resolved clauses

Adding clauses is simple: we create and link them in. However, we can do more. Since backward-subsumption is fast, we can do that with the newly created clauses. Note that this means the newly created clause could subsume some of the original clauses it was created from — which means the resolvents should be pre-generated and kept in memory.

Another thing: since we know the new clause needs to be added, we might as well shorten it before in any way we can. At this point, we can make use of all the watchlists, stamps and implication cache we have to shorten the new clause: there are no problems with self-dependencies. It will pay off. However, note that shortening the clause before adding it means that we will have to reverse-shorten it later, when this clause might be part of a group of clauses that is touched by a new variable elimination round. So, we are working against ourselves in a way — especially because reverse shortening is pretty expensive and hairy as explained above.

Although this is obvious, but we still have to take care of time-outs. For example, if resolution took so much time that we are already out of time, we must exit asap and not worry about the resolvents. Don’t link, don’t remove, just exit. Time is of essence.

Removing the original clauses

Easy, just unlink them from the occurrence lists. I mean, easy if you don’t care about time, of course. Because unlinking is an O(N^2) operation if you have N clauses and all of them contain the same literal X — the N-long occurrence list of literal X has to be read and updated N times. So, we don’t do this.

First of all, a special case: the two occurrence lists of the variable we are removing can simply be .clear()-ed. It’s no longer needed. Secondly, we shouldn’t unlink clauses one-by-one. Instead, we should mark the clause as removed, and then not care about the clause later. Once variable elimination is finished, we do a sweep of all the occurrence lists and clauses and remove the clauses that have been marked. This means that e.g. forward and backward subsumption gets more hairy (we shouldn’t subsume with a clause that’s been marked as removed but is still in the occurrence list) but that O(N^2) becomes O(N) which for problems where N is large makes quite a bit of difference. Like, the difference of 100s vs. 10s for a the same exact thing.

The untold horrors

On top of what’s above, you might like to generate some statistics about what worked and what didn’t. You might like to dump these statistics to a database. You might like to not create resolutions that are not needed as the irreduntant clauses form an AND/ITE gate. Or multiple gates. You might like to eliminate only a subset of variables at each call so that you don’t make your system too sparse and thus reduce arc consistency. You might want to vary this limit based on the problem at hand. You might want to do many other things that are not detailed above.

Conclusions

Once I read through the above, I realized I kind of missed the essence: time-outs. It’s mentioned here and there, but it’s much more critical than it seems and makes things a hell of a lot harder. How do you cleanly exit from the middle of reverse-shortening while resolving because you ran out of time? I could just bury my head in sand of course and say: I don’t care. Or, I could make some messy algorithm that checks return values of each call and return a special value in case of time-outs. This needs to be done for every level of the call, which can be pretty deep, unless you like writing 1’500 line functions. I wanted to say writing&reading, but, really, nobody reads 1’500 line functions. They are throw-away,write-only code.

CryptoMiniSat 4 released

April 9, 2014SAT, ToolsCollaborative effort, Release, SAT Racemsoos

[wpdm_file id=1]
CryptoMiniSat 4 is now available for download. This version brings a number of substantial improvements and picks up speed to be as good as the best solvers out there. It now has a much improved library interface as well as a simple but powerful python interface.

SAT Competition 2014

This release is made ahead of the SAT competition 2014 deadlines so anybody can compete and actually have a chance to win. Unfortunately, the way I see it, it’s not possible to use newer versions of lingeling or riss (see license for for details), MiniSat is rather old and glucose doesn’t have new simplification techniques. If you feel the same way, and you rather not write 30K LoC of code, you might enjoy playing with CryptoMiniSat v4 and submitting it to the competition. You can change as much as you like, it’s LGPLv2 — just don’t call it CryptoMiniSat.

Improvements and techniques

Here is a non-exhaustive list of techniques used in CryptoMiniSat v4:

Variable elimination and replacement, strengthening, subsumption, vivification
On-the-fly stamping, literal caching, hyper-binary resolution and transitive reduction during failed literal probing
Bounded variable addition with hack to allow 2-literal diff
DRUP-based unsatisfiable proof logging
Gate-based clause shortening and removal
XOR recovery and manipulation (NOTE: uses the M4RI library that is GPL, if you want LGPL, compile without it)
Precise time- and memory tracking. No time or memory-outs on weird CNFs
Precise usefulness tracking of all clauses
Clause usefulness-based redundant clause removal. Glues are not used by default, but glues are tracked and can be used (command line option)
Variable renumbering and variable number hiding. Thanks to this, XOR clauses are cut and the added variables are transparent to the user.
SQL-based data logging and AJAX-based powerful data display
And of course many-many more

All of the above are implemented as inprocessing techniques. I do not believe in preprocessing and the solver does not in fact use preprocessing at all — it immediately starts to solve instead. This, as everything else, is configurable and you can change it by passing `’–presimp 1’` as a command-line option. There are a total of 120 command-line options so you can tune the solver as you like.

Python interface

It’s intuitive and fun to use:

>>> from pycryptosat import Solver
>>> s = Solver()
>>> s.add_clause([-1])
>>> s.add_clause([1, 2])
>>> sat, solution = s.solve()
>>> print sat
True
>>> print solution[1]
False
>>> print solution[2]
True

You can even have assumptions:

>>> from pycryptosat import Solver
>>> s = Solver()
>>> s.add_clause([-1])
>>> sat, solution = s.solve([1])
>>> print sat
False
>>> sat, solution = s.solve()
>>> print sat
True

All the power of the SAT solver in a very accessible manner. XOR clauses are trivial, too:

>>> from pycryptosat import Solver
>>> s = Solver()
>>> s.add_xor_clause([1, 2], false)
>>> sat, solution = s.solve([1])
>>> print sat
True
>>> print solution[1]
True
>>> print solution[2]
True

Where the second argument is the right hand side (RHS) of the equation v1 XOR v2 = False.

C++ interface

Usage is pretty simple, and the header files have been significantly cleaned up:

#include 
#include 
#include 
using std::vector;
using namespace CMSat;

int main()
{
    Solver solver;
    vector clause;

    //adds "1 0"
    clause.push_back(Lit(0, false));
    solver.add_clause(clause);

    //adds "-2 0"
    clause.clear();
    clause.push_back(Lit(1, true));
    solver.add_clause(clause);

    //adds "-1 2 3 0"
    clause.clear();
    clause.push_back(Lit(0, true));
    clause.push_back(Lit(1, false));
    clause.push_back(Lit(2, false));
    solver.add_clause(clause);

    lbool ret = solver.solve();
    assert(ret == l_True);
    assert(solver.get_model()[0] == l_True);
    assert(solver.get_model()[1] == l_False);
    assert(solver.get_model()[2] == l_True);

    return 0;
}

Some suggestions where you can improve the solver to compete

Here is a non-exhaustive list of things that you can improve to win at the competition:

Add your own weird idea. You can add new variables if you like, use the occurrence lists already built, and take advantage of all the datastructures (such as stamps, literal cache) already present.
Tune the parameters. I only have exactly one i7-4770 to tune the parameters. You might have more. All parameters are accessible from command line, so tuning should be trivial.
Use glues to clean clauses. Or use a combination of glues and usefulness metrics. All the metrics are at your fingertips.
Make bounded variable addition work for learnt clauses. I could never figure this one out.
Improve the ordering of variable elimination. Makes a huge difference.
Try a different approach: I use the ‘heavy’ approach where I don’t remove all clauses that I can as I like strong propagation properties. You might try the ‘light’ approach where everything is removed if possible. Just set variable elimination to 100% and add blocked clause elimination. It might work.

For example, below is the code that calculates which clause should be cleaned or kept. You can clearly see how easily this can be changed using the data elements below:

bool Solver::reduceDBStructPropConfl::operator() (const ClOffset xOff, const ClOffset yOff) {
    const Clause* x = clAllocator.getPointer(xOff);
    const Clause* y = clAllocator.getPointer(yOff);

    uint64_t x_useful = x->stats.propagations_made
                        + x->stats.conflicts_made;
    uint64_t y_useful = y->stats.propagations_made
                        + y->stats.conflicts_made;
    return x_useful < y_useful;
}

//the data you can use to hack the above calculation:
struct ClauseStats
{
    uint32_t glue;    ///
If you were thinking about submitting your weird hack to the MiniSat hacktrack, think about doing the same to CrytoMiniSat v4. You might actually win the real competition. You can change as much as you like.
I will submit a description of CryptoMiniSat v4, your description can simply say that it's the same except for xyz that you changed. The point of the descriptions is so that people can read what you did and why and then comprehend the results in that light. Just explain carefully what you did and why, and you should be fine.
Thanks
Many-many thanks to Martin Maurer who has submitted over 100 bug reports through the GitHub issue system. Kudos to all who have helped me use, debug and improve the solver. To name just a few: Vegard Nossum, Martin Albrecht, Karsten Nohl, Luca Melette, Vijay Ganesh and Robert Aston.

Speeding up MiniSat with a one-liner

March 12, 2014Development, SAThack, MiniSatmsoos

All SAT solvers must have a tri-state class that can hold the values `true`, `false`, and `undefined`. Although it’s not hard to write code to represent such a class, it’s hard to write such it so that all typical operations are fast. In this blog post I will compare three different ways of doing it.

The original MiniSat 2.0 code

The original MiniSat 2.0 used the following code for such a class:

changed the code to the following:
lbool operator^(const bool b) const
{
  //return b ? lbool(-value) : lbool(value);
  return lbool(value * (-2*(char)b + 1));
}

This makes ‘extensive’ use of the ALU of the chip for the flip operator to avoid using the branch instruction. However, it keeps other operators cheap as per the first version. Let’s call this solution to the problem solution (3).
Comparison
I have tried to solve two different problems from the SAT Competition of 2009 with all three solutions. First is a typical electronic engineering problem, 9dlx_vliw_at_b_iq2.cnf. The second is a typical cryptographic problem, mizh-md5-47-3.cnf. Here are the timing results for my i7-3612QM (Ivy Bridge):


Problem
Solution(1)
Solution(2)
Solution(3)


9dlx_vliw_at_b_iq2.cnf
135.9s
138.8s
132.5s


mizh-md5-47-3.cnf
60.7s
60.5s
56.3s


So, in this very small test, solution(3) wins. 
And now for a bit of history. I have tried the trick in solution(3) about 3 years ago. Back then it seemed slower to perform the numerical trick than to use the branch. It turns out that on modern CPUs either the ALU is faster, branch misprediction is more costly, or both (or, the compilers have evolved to compile my numerical trickery into some weird thing). Anyway, it’s kind of a cool speedup for a one-liner.

On variable renumbering

December 18, 2013Development, SATBCP, bug, extended resolution, memory layout, variable eliminationmsoos

Variable renumbering in SAT solvers keeps a mapping between the external variable numbers that is visible to the users and the internal variable numbers that is visible the to the system. The trivial mapping that most SAT solvers use is the one-to-one mapping where there is no difference between outer and internal variables. A smart mapping doesn’t keep track of all data related to variables that have been set or eliminated internally, so the internal datastructures can be smaller.

Advantages

Having smaller internal data structures help in achieving a lower memory footprint and better cache usage.

The memory savings are useful because some CNFs have tens of millions of variables. If the solver uses the typical watched literal scheme, it needs 2 arrays for each variable. If we are using 64b pointers and 32b array sizes, it’s 32B for each variable, so 32MB for every million variable only to keep the watching literal array(!). I have seen people complaining that their 100M variable problem runs out of memory — if we count that right that’s 3.2GB of memory only to hold the watching literal array pointers and sizes, not any data at all.

As for the CPU cache benefits: modern CPUs work using cache lines which are e.g. 64B long on Intel Sandy Bridge. If half of the variables are set already, the array holding the variable values — which will be accessed non-stop during propagation — will contain 50% useless data. In practice the speedup achieved can be upwards of 10%.

The simple problems

One problem with having a renumbering scheme is that you need to keep track of which datastructure is numbered in which way. The easy solution is to renumber absolutely everything. This is costly, however, as the mapping has to change every once in a while when new variables have been set. In this case, if everything is renumbered, then the eliminated variables‘ data needs to be updated according to the new mapping every time. This might be quite significant. So, it’s best not to renumber that. Similarly, if disconnected component analysis is used, then the disconnected components’ saved solutions need to be renumbered as well, along with the clauses that have been moved to the components.

An approach I have found to be satisfying is to keep every dynamic datastructure such as variable states (eliminated/decomposed/etc.), variable values (True/False/Unknown), clauses’ literals, etc. renumbered, while keeping mostly static datastructures such as eliminated clauses or equivalent literal maps non-renumbered. This works very well in practice as it allows the main system to shuffle the mapping around while not caring about all the other systems’ data.

The hard problem

The above is all fine and dandy until bounded variable addition (BVA) comes to the scene. This technique adds new variables to the problem to simplify it. These new variables will look like new outer variables, which seems good at first sight: the system could simply print the solution to all variables except the last N that were added by BVA and are not part of the original problem. However, if the caller adds new variables after the call to solve(), what can we do? The actual variables by the caller and the BVA variables will be mixed up: start with a bunch of original variables, sprinkle the end with some BVA, then some original variables, then some BVA…

The trivial solution to this is to have another mapping, one that translates variable numbers between the BVA and non-BVA system. As you might imagine, this complicates everything. Another solution is to forcibly eliminate all BVA variables after the call to solve(), let the user add the new clauses, and perform BVA again. Another even more complicated solution is to keep track of the variables being added, then re-number all variables inside all datastructures to move all BVA variables to the end of the variable array. This is expensive but only needs to be done once after the call to solve(), which may be acceptable. Currently, CryptoMiniSat uses the trivial scheme. Maybe I’ll move to the last (and most complicated) system later on.

Conclusion

Variable renumbering is not for the faint of heart. Bugs become significantly harder to track, as all debug messages need to be translated to a common variable numbering or they make no sense at all. It’s also very easy to introduce bugs through variable renumbering. A truly difficult bug I had was when the disconnected component finder’s sub-solver renumbered its internal variables and when I tried to import some values from the sub-solver back to the main solver, I used the wrong variable numbers.

Wonderings of a SAT geek

A blog about SAT solving and cryptography