Category Archives: Research

Research-related information

Why CryptoMiniSat 3.3 doesn’t have XORs

August 1, 2013Research, SATGaussian elimination, XORmsoos

I’ve had a number of inquiries about why CryptoMiniSat 3.3 doesn’t have native XOR clause support. Let me explain what I think the benefits and the drawbacks are of having native XOR support and why I made the choice of removing them.

Benefits

The most trivial benefit is that users don’t have to translate XORs into plain CNF. Although transforming is rather easy, some prefer not to go through the hassle. The principle is to cut the XOR into smaller XORs by introducing new variables and then representing the cut-down XORs by banning all the wrong combinations. For example the the XOR $x+y+z+w=0$ is cut into two XORs $x+y+r=0$ , $z+w+r=0$ and then $x+y+r=0$ is represented as $x \vee y \vee r, x \vee \neg y \vee \neg r, \ldots$ .

A more substantial benefit is that propagation and conflict generation can be done natively. This is quite advantageous as it means that far less clauses and variables need to be visited while propagating and generating the UIP conflict. Less variables and clauses visited means less cache-misses and due to the more compact representation, less memory overhead and thus even better cache usage. (For the aficionados: an interesting side-effect of XORs is that blocked literals cannot be used for XOR clauses).

The biggest benefit of having native XORs is that Gaussian elimination can be carried out at every level of the search tree, not just at top-level. Although most people I have talked with still think that CryptoMiniSat 2 only performs Gaussian elimination only at top-level, this is not the case. Performing Gauss at top-level is a two-day hack. Performing it at every level took me about half a year to implement (and about 5K lines of code). It requires leaving traces of past computations and computing reasons for propagations and conflicts indicated by the Gaussian elimination algorithm. According to my highly unofficial count, exactly 2 people use Gaussian elimination at deeper than top-level, Vegard Nossum and Kuldeep S Meel. Out of these two users, only the latter have indicated speedups. It speeds up crypto problems, but not many others, so it’s completely disabled by default.

Drawbacks

The main drawback of having native XOR clauses along normal CNF clauses is the loss of simplicity and universality. Let me explain in detail.

Loss of simplicity. Once a variable present in both a normal and an XOR clause, it cannot be eliminated in a simple way. I don’t even want to try to express the clauses that would result from doing varelim in such cases — it’s probably horribly convoluted. All algorithms in the code now need to check for XORs. On-the-fly clause strengthening cannot strengthen an XOR with a normal clause, of course. Clause cleaning has to take into account that when a 3-long XOR is shortened to a 2-long one, it indicates a new equivalent literal and that may lead to immediate UNSAT. I could go on, but the list is long.

Another problem is that XORs make the state more complex. Until now there were only clauses of the form $a \vee \neg b \ldots$ , sometimes specially stored such as binary clauses. The state of the solver goes through many transformations — think of them as llvm passes — while it is in-processed and solved. The more complex the state, the larger and the more complex the code. While a simple clause-cleaning algorithm can be expressed in about 20 lines of code without implicit binary&tertiary clauses, the same algo blows up to about 100 lines of code with those optimizations. Once you add XOR clauses, it jumps to around double that. We just made our code 10x larger. Imagine adding native support for, e.g. at-most-n.

Loss of universality. If a variable is only XOR clauses, it can now be eliminated at the XOR level. A pure variable at the XOR level indicates an XOR clause that can be removed and later always satisfied. XOR subsumption is when an XOR subsumes the other: the larger can be removed and their XOR added in. All of these are implemented in CMS 2. Probably there are many others out there, though. We could try to implement them all, but this is a whole new universe that may open up the gates of hell. Worst of all, most of these techniques on XORs are probably already simulated by in-processing techniques at the CNF level. Pure XOR variables are simulated by blocked clause elimination. XOR-based variable elimination doesn’t make practical sense at the CNF level, but it could be simulated by normal variable elimination if ignoring heuristics. I could go on.

What CryptoMiniSat 3.3 does

The compromise I came up with for CryptoMiniSat 3.0 is that I regularly search for XOR clauses at the top-level, perform top-level Gaussian elimination, and add the resulting new information to the CNF. Binary and unitary XOR clauses are the simplest to add back.

In the SAT’11 application benchmark set, out of 300 problems, 256 problems contain XOR clauses. the number of XOR clauses found, when they are found, is on average 2905 with an average size of 4.2 literals. Using Gaussian elimination, counting only when something was extracted, the average number of literal equivalences (binary XORs) extracted is 130, while the number of unitary clauses extracted is 0.15.

Here is a typical example output:

c Reading file 'valves-gates-1-k617-unsat.shuffled-as.sat03-412.cnf.gz'
c -- header says num vars:         985042
c -- header says num clauses:     3113540
[.. after about 124K conflicts ..]
c XOR finding  Num XORs: 104091 avg size:  3.0 T: 2.11
c Cut XORs into 2185 block(s) sum vars: 180818 T: 0.11
c Extracted XOR info. Units: 0 Bins: 1275 0-depth-assigns: 0 T: 8.34

Here, about 104K XOR clauses were found (using a sophisticated algorithm) within 2.11s, all of size 3. These XOR clauses were subjected to disconnected component analysis (within the XOR sphere) and they were found to belong to 2185 components. It is important to cut XORs into groups, because Gaussian elimination is an $O(n^{\approx 2.5})$ algorithm and if we can reduce the n by having more than one matrix, the speed is significantly increased. Next, these 2185 matrices were Gauss-eliminated and were found to contain 1275 binary XOR clauses, which were quickly marked as equivalent literals.

To perform Gaussian elimination at top-level, CryptoMiniSat uses the excellent m4ri library, maintained by my old college and friend Martin Albrecht. It’s a lightweight yet extremely fast Gaussian elimination system using all sorts of nifty tricks to do its job. Unfortunately, for large matrices it can still take a long while as the algorithm itself is not cheap, so I have a cut-off for matrices that are too large.

What a future CryptoMiniSat 3.x could do

Once we have extracted the XORs, we could just as well keep them around during search and perform Gaussian elimination at levels below the top-level. This would bring us close to the old system in terms of real performance — native propagation (which would be unavailable) can only give a 1.5x-2x speedup, but Gaussian elimination at every level can give much more. I would need to clean up a lot of code and then, maybe, this would work. Maybe I’ll do this one day. Though, after spending weeks doing it, probably people will still believe it only does top-level. At least right now, it’s the case.

Conclusions

Implementing a new theory such as XOR deep into the guts of a SAT solver is neither easy nor does it provide a clear advantage in most situations. Those who push for these native theories have either not tried implementing them into a complicated solver such as lingeling/SatELite/CryptoMiniSat/clasp or have already bit the bullet such as the clasp group and I did, and it is probably limiting them in extending the system with new techniques. The resulting larger internal state leads to edge cases, exceptions, and much-much more code.

Handling disconnected components

July 31, 2013Research, SATautarky, componentsmsoos

In CryptoMiniSat 3.2 and above there is a disconnected component finding&solving system. This finds toplevel disconnected components regularly and solves them by launching an instance of the solver itself. If the component is unsatisfiable (UNSAT), the whole system is UNSAT and solving can be immediately aborted. If the component is satisfiable (SAT) the component’s solution is saved and later when printing the solution to the whole problem the component’s satisfying assignment is also printed.

Why deal with disconnected components?

Disconnected components in SAT instances are problematic because the search logic of the solver doesn’t effectively restrict its search to a single component at a time, making its strategy ineffective. Firstly, learnt clause cleaning and variable activity heuristics behave worse. Secondly, solutions to components are thrown away when the search restarts when another component’s solution is too difficult to find. Note that some solvers, e.g. Marijn Heule’s march, do restrict their search to one component at a time, though I personally don’t know of any other.

Disconnected components seem to be quite prevalent. In the SAT’11 competition‘s set of 300 Application instances, 53 contained disconnected components at top-level at the beginning of the search. On average the number of disconnected components for these 53 instances were 393(!). When doing top-level analysis on a regular basis during search, the number of problems containing disconnected components rises to 58 and the average number of components found decreases to 294.

Finding disconnected components

The algorithm I use to find disconnected components is relatively easy. It essentially performs the following: it initializes a set of datastructures that hold information about the found component(s) and then feeds each clause one-by-one to a function that creates new component(s) and/or merges existing components. The datastructures needed are:

Array ‘varToComp’ that indicates which variable is in which component. If a variable is in no component yet, it has special value MAX
A map of arrays ‘compToVar’ indexed by the component. For example, compToVar[10] points to the array of variables in component number 10
Variable ‘nextComp’, an ever-incrementing index indicating the maximum component number used
Variable ‘realComps’ indicating the actual number of components — some components counted by ‘nextComp’ may have been merged, and are now empty

The algorithm, in pseudo-python is:

nextComp = 0;
realComps = 0;
varToComp = [MAXVAR] * numVars();
compToVar = {};

for clause in clauses:
  handleClause(clause);

def handleClause(clause) :
  newComp = []
  compsToMerge = set();
  
  for lit in clause:
    var = lit.var();
    if varToComp[var] != MAX :
      comp = varToComp[var];
      compsToMerge.insert(comp);
    else:
      newComp.append(var);

 
    #no components to merge, just put all of them into one component
    if len(compsToMerge) == 1 :
      comp = compsToMerge.pop();
      for var in newComp:
        varToComp[var] = comp
        compsToVar[comp].append(var);

      return;
  
    #delete components to merge and put their variables into newComp
    for comp in compsToMerge:
      vars = compToVar[comp];
      for var in vars:
        newComp.append(var);

      compToVar.erase(comp);
      realComps--;
    
    #mark all variables in newComp belonging to 'nextComp'
    for var in newComp:
      varToComp[var] = nextComp;
    
    compToVar[nextComp] = newComp;
    nextComp++;
    realComps++;

There are ways to make this algorithm faster. A trivial one is to remove the ‘set()’ and use an array initialized to 0 and mark components already seen. A fixed-sized array of the size of variables will do the job. Similarly, the use of ‘nextComp’ forces us to use a map for ‘compToVar’ which is slow — a smarter algorithm will use a round-robin system with an array of arrays. We can also avoid having too many small components at the beginning by calling the handleClause() function with the longer clauses first. It doesn’t give any guarantees, but in practice works quite well.

Finally, we need to add approximate time measurements so that we could exit early in case the algorithm takes too much time. According to my measurements, on the SAT’11 Application instances it took only 0.186s on average to find all components, and in only 34 cases out of 1186 did it take longer than the timeout — which was set to around 7s. At the same time, the maximum time ever spent on finding components was 7.8s. In other words, it is almost free to execute the algorithm.

Handling disconnected components

CryptoMiniSat handles found components in the following way. First, it counts the number of components found. If there is only one component it simply continues solving it. If there are more than one, it orders them according to the number of variables inside and picks the smallest one, solves it with a sub-solver, and checks it solution. If it’s UNSAT, it exits with UNSAT, otherwise it saves the solution for later and picks the next smallest component until there is only one left. When there is only one left, it continues its job with this, largest, component.

The sub-solver CryptoMiniSat launches is a bit special. First of all, it renumbers the variables so if variables 100..120 are in a component, the sub-solver will be launched with variables 0..20. This saves memory in the sub-solver but it means variables must be back-numbered when extracting solutions. Since the subsolvers themselves also internally re-number variables for further speedup, this can get a little bit complicated.

Further, if the sub-solver is handling an extremely small problem (<50 vars) most of its internal functionality is turned off to reduce build-up and tear-down speed. This is important, because in case there are 5000 components we need to go through, even a 0.01s build-up&tear-down time is unacceptable. Finding 5000 components, by the way, is not some wild imaginary number: it happens with the instance transport-[...]10packages-2008seed.040 from SAT'11. This instance actually contains ~5500 components at beginning of the search at toplevel, with an average of 8 variables each.

Demo

Let’s see an example output from CryptoMiniSat:

c Reading file 'traffic_3b_unknown.cnf.gz'
[..]
c -- clauses added: [..] 533919 irredundant 
c -- vars added      39151
[..]
c Found components: 23 BP: 8.85M time: 0.06 s
c large component     0 size:        833
c large component     1 size:       1071
c large component     2 size:       4879
c large component     4 size:        357
c large component     5 size:      14994
c large component     6 size:        476
c large component     7 size:        952
c large component     9 size:        952
c large component    10 size:        595
c large component    11 size:      10234
c large component    16 size:        476
c large component    17 size:        476
c large component    19 size:        476
c Not printed total small (<300 vars) components:10 vars: 2380

In total 23 components were found, the largest component containing 15K variables. The other 22 components contain 10K to as little as less-than 300 variables. Notice that finding the components was essentially free at 0.06s (on a i7-3612QM).

Another typical output is:

c Reading file 'transport-transport-city-sequential-25nodes-1000size-3degree-100mindistance-3trucks-10packages-2008seed.060-SAT.cnf.gz'
[...]
c -- clauses added [..] 3869060 irredundant
c -- vars added     723130
[...]
c Found component(s): 2779 BP: 253.02M time: 0.58 s
c large component  2778 size:     657548
c Not printed total small (<300 vars) components:2778 vars: 25002
c Coming back to original instance, solved 2778 component(s), 25002 vars T: 1.46

In this case there were 2.8K small disconnected components with (on average) 9 variables each within a problem that originally contained 723K variables and 3.9M clauses. The components were found in 0.58s and all but the largest were solved within 1.46s. This means that 2.8K CryptoMiniSat sub-solvers were launched, initialized, ran and destroyed within a span of 1.46s. The largest component with 658K variables is what's left to the solver to deal with. The rest have been removed and their solutions saved.

Conculsions

It's fun, useful, and relatively easy to find disconnected components in SAT problems. They appear at the beginning of the search for some problems, and for some more, during search. Although it is not a game-changer, it is a small drop of water in a cup that gets mostly filled with small drops anyway.

People doing problem modeling for a living are probably crying in horror and blame the modeler who created an instance with thousands of disconnected components. As someone who deals with CNFs though, I don't have the liberty of blaming anyone for their instances and have to deal with what comes my way.

Certified UNSAT and CryptoMiniSat

April 25, 2013Research, SATDRUP, hyper-binary resolution, SAT Race, UNSATmsoos

Marijn Heule kindly sent me an email on the 10th of April about DRUP, the new system used this year in the SAT Competition’13 for the UNSAT track. He kindly encouraged me to implement the DRUP system. He personally implemented it into Minisat which was a very helpful lead for me. In this post I will talk about my experiences in implementing DRUP into CryptoMiniSat within a span of 3 days.

Implementation complexity

It took only 3 days, about 1400 lines of code to implement DRUP:

git diff f27c74bbd  c0b6ccc10 --shortstat
 25 files changed, 1372 insertions(+), 307 deletions(-)

It turns out that the biggest problem is that whenever I shorten a clause, I first have to add the shortened version, and then delete the old one. Since I always do in-place literal deletion, this means I have to save the old clause into a temporary place, add the new one and finally delete the old one that has been saved. I will eventually write a C++ wrapper that does this for me, but currently, it’s a lot of

vector origCl(cl.size());
std::copy(cl.begin(), cl.end(), origCl.begin());
[blah...]
drup << cl << " 0" << endl;
drup << "d " << origCl << " 0" << endl;

So, it's a bit messy code. Other than this, the implementation went very smoothly. The biggest pain was not to forget to add to the DRUP output all changed clauses. Since I have implicit binary and tertiary clauses and I manipulate them in-place, they are changed in quite complicated code paths.

If you don't have such complicated code paths, you should be able to implement DRUP within a day or less. I encourage you to do so, it's quite fun!

Remaining uncertainties

I am a bit confused about whether some of the optimisations in CryptoMiniSat work with DRUP. I have been fuzzing the DRUP implementation for about ~1000 CPU hours, but not with all optimisations turned on. Some are a bit shaky. In particular, XOR and stamping&caching come to mind.

I cannot turn DRUP on for the top-level XOR manipulation because otherwise I would need to tell DRUP every Gaussian elimination step. Not funny, and not fast. Well, XOR is not such a big thing, and it is no longer natively implemented in CryptoMiniSat, so not a big deal, really.

The other, more troubling one is stamping and implied literal caching. Luckily I have on-the-fly hyper-binary resolution (this is needed for DRUP with Stalmarck if you think about it), so the binary clauses stored by caching and stamping are there... but they may get deleted by variable elimination, blocked clause elimination and... well, maybe nothing else. Hopefully not. Anyway, I never block binary clauses (does clause blocking ever help? I am confused) and I can of course not delete binary eliminated clauses from DRUP. However... that may make the verification very slow. So, I am at crossroads here. I think I will submit a version with stamping&caching and one without.

In the end, every optimisation can be turned on except for XOR. I find that exceptionally good given the number of tweaks/hacks used by CryptoMiniSat.

Long-term advatages of having DRUP

I think DRUP allows for a lot of possibilities. Naturally I first want to draw resolution graphs. There are plenty of libraries for 3D drawing, and I have already ordered the LEAP controller (a 3D controller), which will come handy to play with the resolution graphs (zoom&out, rotate, etc.).

From there, I want to get stats out of the graph, and I want to present it next to/with the stats that I already generate. For example, how many of the deleted clauses get re-learnt later? How many clauses get used in the resolution graph with the empty node? How often when cleaning with glues? How often when cleaning with activities? For which types of instances?

Linking this with real-world instances by coloring the graph points according to e.g. filter functions in stream ciphers is not very hard and should be quite a lot of fun.

Acknowledgements

I think Marijn Heule deserves a lot of thanks for the work he has put into DRUP (webpage, example, DIFF for MiniSat) and all the help he has given me. I had some initial doubts about whether it's possible to implement at all and I had some minor problems with the checker --- he always replied kindly and promptly. Thanks!

A variable elimination improvement

April 21, 2013Research, SATvariable eliminationmsoos

Lately, I have been thinking about how to improve variable elimination. It’s one of the most important things in SAT solvers, and it’s not exactly easy to do right.

Variable elimination

Variable elimination simply resolves every occurrence of a literal $v1$ with every occurrence of the literal $\neg v1$ , removes the original clauses and adds the resolvents. For example, let’s take the clauses

$v1 \vee v2 \vee v3$
$v1 \vee v4 \vee v5$
$\neg v1 \vee v10 \vee v11$
$\neg v1 \vee v12 \vee v13$

When $v1$ gets eliminated the resolvents become

$v2 \vee v3 \vee v10 \vee v11$
$v2 \vee v3 \vee v12 \vee v13$
$v4 \vee v5 \vee v10 \vee v11$
$v4 \vee v5 \vee v12 \vee v13$

The fun comes when the resolvents are tautological. This happens in this case for example:

$v1 \vee v4$
$\neg v1 \vee v5\vee \neg v4$

The resolvent is the clause

$v4 \vee \neg v4 \vee v5$

Which contains both a literal and its negation and is therefore always true. It’s good to find variables we can eliminate without and side-effects, i.e. variables that eliminate without leaving any resolvents behind. However, it’s not so cheap to find these. Until now.

A fast procedure for calculating the no. of non-tautological resolvents

The method I came up with is the following. For every clause where $v1$ is inside, I go through every literal and in an array the size of all possible literals, I set a bit. For every clause, I set a different bit. When going through every clause of every literal where $\neg v1$ is present, I calculate the hamming weight (a popcount(), a native ASM instruction on modern CPUs) of the array’s inverse literals and substruct this hamming weight from the number of clauses $v1$ was inside. I sum up all these and then the final count will be the number of non-tautological resolvents. Here is a pseudo-code:

mybit = 1
num = 0
for clause in clauses[v1]:
    for l in clause:
        myarray[l] |= mybit

    mybit = mybit << 1
    num += 1

count = 0
for clause in clauses[not v1]:
    tmp = 0
    for l in clause:
        tmp |= myarray[not l]
    count += num - popcount(tmp)

print "number of non-tautological resolvents: %d" % count

I think this is pretty neat. Notice that this is linear in the number of literals in the clauses where $v1$ and $\neg v1$ is present. The only limitation of this approach is that ‘myarray’ has to have enough bits in its elements to hold ‘num’ number of bits. This is of course non-trivial and can be expensive in terms of memory (and cache-misses) but I still find this approach rather fun.

Using this procedure, I can check whether all resolvents are tautological, and if so, remove all the clauses and not calculate anything at all. Since this happens very often, I save a lot of calculation.

CryptoMinisat 3.1 released

April 14, 2013Research, SATBCP, binary clauses, Optimisation, Release, research, stampingmsoos

CryptoMinisat 3.1 has been released. The short changelog is:

$ git diff cryptoms-3.0 cryptoms-3.1 --shortstat
 84 files changed, 3079 insertions(+), 2751 deletions(-)

The changes made were threefold. First, memory usage has been greatly reduced. This is crucial, because memory usage was over 7GB on certain instances. Secondly, the implication cache wasn’t very well-used and an idea that came to my mind greatly improved performance on most problems. Finally, time limiting of some inprocessing techniques on certain types of problems has been improved.

Memory usage reduction

On instances that produced a lot of long learnt clauses the memory usage was very high. These learnt clauses were all automatically linked in to the occurrence list and consequently took large amounts of memory, sometimes up to 10GB. On other instances, the original clauses were too numerous and too large, so putting even them into the occurrence list was too much. On these instances, variable elimination is not carried out (or carried out only later, when enough original clauses have been removed/shortened). To debug some of these problems, I wrote a fuzzer that generates extremely large problems with many binary and many long clauses, it’s available here as “largefuzzer”. It’s actually quite nice with many-many binary clauses so it also can fuzz the problems encountered with probing of extremely weird and large instances.

Implied literal usage improvement

CryptoMiniSat uses implied literals, i.e. caches what literals were propagated by each literal during probing. It then re-uses this information to subsume and/or strengthen clauses. This is kind of similar to stamping though uses more memory. It is actually useful to have alongside stamping, and I now do both — propagating DFS that stamping requires is expensive though updating cache during DFS is just as easy as during quasi-BFS.

The trick I discovered while playing around with cached implied literals is that if literal L1 propagates L2 and also !L2 then that means there are conceptually two binary clauses in the solver (!L1, L2), (!L1, !L2), so !L1 is TRUE. This is of course trivial, but I never checked for this. The question most would raise is: why would L1 propagate both L2 and !L2 and not fail? The answer is kind of tricky, but very interesting. Let’s say at one point, L1 propagates L2 due to a learnt clause, but that learnt clause is then removed. A new learnt clause is then later learnt, and with that learnt clause in place, L1 propagates !L2. Now, without caching, this would be ignored. Caching memorizes past conceptual binary clauses and re-uses this information.

This is not an optimization that only looks good on paper, it is very good to have. With this one optimization, I gained 5 instances from the SAT Comp’09 instances with a 1000s timeout (196 solved -> 201 solved). I can’t right now imagine how this could be done with stamping effectively, but that doesn’t mean it’s not possible. Though, according to my experience, stamping doesn’t preserve that much information over time as it’s being updated (renumbered) frequently while the cache is only improved over time, never shrunk. A possibility would be to have more than one stamp system and round-robin selecting them. However that would mean that sorting of clauses (for shrinking) would need to be done more than once, and sorting them is already relatively expensive. I sometimes feel that what stamping gains in memory it looses on sorting (i.e. processing time) and lower coverage (re-numbering).

More precise time-limiting

Martin Maurer has been kind enough to file a lot of bug reports about probing and variable elimination taking too much time, sometimes upwards of 150s when they should take around 20-30s maximum. While investigating, it tuned out that the problem was very weird indeed. While trying to eliminate or probe one variable the time for that one variable took upwards of 100s. This was completely unexpected as the code only checked for timeouts on a per-variable basis. In the end, the code had to be improved to track time on an intra-variable basis in both systems. While at it, I also added intra-variable time-tracking to implicit clause subsumption and strengthening too. So, over-times should less prevalent from now on. As an interesting side-note, time-limiting on probing is now so fine-grained that a 32-bit unsigned integer would overflow within 15s if used as the time-tracker.

Wonderings of a SAT geek

A blog about SAT solving and cryptography

Category Archives: Research

Why CryptoMiniSat 3.3 doesn’t have XORs

Benefits

Drawbacks

What CryptoMiniSat 3.3 does

What a future CryptoMiniSat 3.x could do

Conclusions

Handling disconnected components

Why deal with disconnected components?

Finding disconnected components

Handling disconnected components

Demo

Conculsions

Certified UNSAT and CryptoMiniSat

Implementation complexity

Remaining uncertainties

Long-term advatages of having DRUP

Acknowledgements

A variable elimination improvement

Variable elimination

A fast procedure for calculating the no. of non-tautological resolvents

CryptoMinisat 3.1 released

Memory usage reduction

Implied literal usage improvement

More precise time-limiting