autarky | Wonderings of a SAT geek

In CryptoMiniSat 3.2 and above there is a disconnected component finding&solving system. This finds toplevel disconnected components regularly and solves them by launching an instance of the solver itself. If the component is unsatisfiable (UNSAT), the whole system is UNSAT and solving can be immediately aborted. If the component is satisfiable (SAT) the component’s solution is saved and later when printing the solution to the whole problem the component’s satisfying assignment is also printed.

Why deal with disconnected components?

Disconnected components in SAT instances are problematic because the search logic of the solver doesn’t effectively restrict its search to a single component at a time, making its strategy ineffective. Firstly, learnt clause cleaning and variable activity heuristics behave worse. Secondly, solutions to components are thrown away when the search restarts when another component’s solution is too difficult to find. Note that some solvers, e.g. Marijn Heule’s march, do restrict their search to one component at a time, though I personally don’t know of any other.

Disconnected components seem to be quite prevalent. In the SAT’11 competition‘s set of 300 Application instances, 53 contained disconnected components at top-level at the beginning of the search. On average the number of disconnected components for these 53 instances were 393(!). When doing top-level analysis on a regular basis during search, the number of problems containing disconnected components rises to 58 and the average number of components found decreases to 294.

Finding disconnected components

The algorithm I use to find disconnected components is relatively easy. It essentially performs the following: it initializes a set of datastructures that hold information about the found component(s) and then feeds each clause one-by-one to a function that creates new component(s) and/or merges existing components. The datastructures needed are:

Array ‘varToComp’ that indicates which variable is in which component. If a variable is in no component yet, it has special value MAX
A map of arrays ‘compToVar’ indexed by the component. For example, compToVar[10] points to the array of variables in component number 10
Variable ‘nextComp’, an ever-incrementing index indicating the maximum component number used
Variable ‘realComps’ indicating the actual number of components — some components counted by ‘nextComp’ may have been merged, and are now empty

The algorithm, in pseudo-python is:

nextComp = 0;
realComps = 0;
varToComp = [MAXVAR] * numVars();
compToVar = {};

for clause in clauses:
  handleClause(clause);

def handleClause(clause) :
  newComp = []
  compsToMerge = set();
  
  for lit in clause:
    var = lit.var();
    if varToComp[var] != MAX :
      comp = varToComp[var];
      compsToMerge.insert(comp);
    else:
      newComp.append(var);

 
    #no components to merge, just put all of them into one component
    if len(compsToMerge) == 1 :
      comp = compsToMerge.pop();
      for var in newComp:
        varToComp[var] = comp
        compsToVar[comp].append(var);

      return;
  
    #delete components to merge and put their variables into newComp
    for comp in compsToMerge:
      vars = compToVar[comp];
      for var in vars:
        newComp.append(var);

      compToVar.erase(comp);
      realComps--;
    
    #mark all variables in newComp belonging to 'nextComp'
    for var in newComp:
      varToComp[var] = nextComp;
    
    compToVar[nextComp] = newComp;
    nextComp++;
    realComps++;

There are ways to make this algorithm faster. A trivial one is to remove the ‘set()’ and use an array initialized to 0 and mark components already seen. A fixed-sized array of the size of variables will do the job. Similarly, the use of ‘nextComp’ forces us to use a map for ‘compToVar’ which is slow — a smarter algorithm will use a round-robin system with an array of arrays. We can also avoid having too many small components at the beginning by calling the handleClause() function with the longer clauses first. It doesn’t give any guarantees, but in practice works quite well.

Finally, we need to add approximate time measurements so that we could exit early in case the algorithm takes too much time. According to my measurements, on the SAT’11 Application instances it took only 0.186s on average to find all components, and in only 34 cases out of 1186 did it take longer than the timeout — which was set to around 7s. At the same time, the maximum time ever spent on finding components was 7.8s. In other words, it is almost free to execute the algorithm.

Handling disconnected components

CryptoMiniSat handles found components in the following way. First, it counts the number of components found. If there is only one component it simply continues solving it. If there are more than one, it orders them according to the number of variables inside and picks the smallest one, solves it with a sub-solver, and checks it solution. If it’s UNSAT, it exits with UNSAT, otherwise it saves the solution for later and picks the next smallest component until there is only one left. When there is only one left, it continues its job with this, largest, component.

The sub-solver CryptoMiniSat launches is a bit special. First of all, it renumbers the variables so if variables 100..120 are in a component, the sub-solver will be launched with variables 0..20. This saves memory in the sub-solver but it means variables must be back-numbered when extracting solutions. Since the subsolvers themselves also internally re-number variables for further speedup, this can get a little bit complicated.

Further, if the sub-solver is handling an extremely small problem (<50 vars) most of its internal functionality is turned off to reduce build-up and tear-down speed. This is important, because in case there are 5000 components we need to go through, even a 0.01s build-up&tear-down time is unacceptable. Finding 5000 components, by the way, is not some wild imaginary number: it happens with the instance transport-[...]10packages-2008seed.040 from SAT'11. This instance actually contains ~5500 components at beginning of the search at toplevel, with an average of 8 variables each.

Demo

Let’s see an example output from CryptoMiniSat:

c Reading file 'traffic_3b_unknown.cnf.gz'
[..]
c -- clauses added: [..] 533919 irredundant 
c -- vars added      39151
[..]
c Found components: 23 BP: 8.85M time: 0.06 s
c large component     0 size:        833
c large component     1 size:       1071
c large component     2 size:       4879
c large component     4 size:        357
c large component     5 size:      14994
c large component     6 size:        476
c large component     7 size:        952
c large component     9 size:        952
c large component    10 size:        595
c large component    11 size:      10234
c large component    16 size:        476
c large component    17 size:        476
c large component    19 size:        476
c Not printed total small (<300 vars) components:10 vars: 2380

In total 23 components were found, the largest component containing 15K variables. The other 22 components contain 10K to as little as less-than 300 variables. Notice that finding the components was essentially free at 0.06s (on a i7-3612QM).

Another typical output is:

c Reading file 'transport-transport-city-sequential-25nodes-1000size-3degree-100mindistance-3trucks-10packages-2008seed.060-SAT.cnf.gz'
[...]
c -- clauses added [..] 3869060 irredundant
c -- vars added     723130
[...]
c Found component(s): 2779 BP: 253.02M time: 0.58 s
c large component  2778 size:     657548
c Not printed total small (<300 vars) components:2778 vars: 25002
c Coming back to original instance, solved 2778 component(s), 25002 vars T: 1.46

In this case there were 2.8K small disconnected components with (on average) 9 variables each within a problem that originally contained 723K variables and 3.9M clauses. The components were found in 0.58s and all but the largest were solved within 1.46s. This means that 2.8K CryptoMiniSat sub-solvers were launched, initialized, ran and destroyed within a span of 1.46s. The largest component with 658K variables is what's left to the solver to deal with. The rest have been removed and their solutions saved.

Conculsions

It's fun, useful, and relatively easy to find disconnected components in SAT problems. They appear at the beginning of the search for some problems, and for some more, during search. Although it is not a game-changer, it is a small drop of water in a cup that gets mostly filled with small drops anyway.

People doing problem modeling for a living are probably crying in horror and blame the modeler who created an instance with thousands of disconnected components. As someone who deals with CNFs though, I don't have the liberty of blaming anyone for their instances and have to deal with what comes my way.

While reading this thesis, I have had a thought about autarkies. Let me first define what an autarky in SAT solving is:

A partial assignment phi is called a weak autarky for F if “phi*F is a subset of F” holds, while phi is an autarky for F if phi is a weak autarky for all sub-clause-sets F' of F. (Handbook of Satisfiability, p. 355)

What does this mean? Well, it means that in the clause-set {x V y, x} the assignment y=False is a weak autarky, because assigning y to False will lead to a clause that is already in the clause set (the clause x), while the assignment of x=True is a (normal) autarky, because it will satisfy all clauses that x is in. The point of autarkies is that we can remove clauses from the clause set by assigning values to a set of variables, while not changing the (un)satisfiability property of the original problem. In other words, autarkies are a cheap way of reducing the problem size. The only problem is that it seems to be relatively expensive to search for them, so conflict-driven SAT solvers don’t normally search for them (but, some lookahead solvers such as march by Marijn Heule do).

So, what is the idea? Well, I think we can have autarkies for equivalent literals, too. Here is an example CNF:

 a V d V -f
-b V d V -f
-a V e
 b V e

setting a = -b will not change the satisfiability of the problem.

We can add any number of clause pairs of the form X V Y where Y is any set of literals not in {a, -a, b, -b}, and X is (-)a in clause 1 and (-)b in clause 2. Further, one of the two variables, say, a can be in any clauses at will (in any literal form), though variable b can then only be in clauses defined by the clause pairs. Example:

 a V  d V -f
-b V  d V -f
-a V  e
 b V  e
 a V  c V -h
 a V -f

An extension that is pretty simple from here, is that we can even have clauses whose Y part is somewhat different: some literals can be missing, though again in a controlled way. For example:

 a V d
-b V d V -f
-a V e
 b V e

is possible. It would restrict b a bit more than necessary, but if a is the only one of the pairs missing literals, we are still OK I believe.

Finally, let me give an example of what these higher-level autarkies are capable of. Let’s assume we have the clause set:

 a V  d
-b V  d V -f
-a V  e
 b V  e
 a V  c V -h
 a V -f
 a V  b V  f V -g

setting a=-b now simplifies this to:

 a V  d
-a V  e
 a V  c V -h
 a V -f

which is equisatisfiable to the first one. Further, if the latter is satisfiable, computing the satisfying solution to the former given a solution to the latter is trivial.

Wonderings of a SAT geek

A blog about SAT solving and cryptography

Tag Archives: autarky

Handling disconnected components

Why deal with disconnected components?

Finding disconnected components

Handling disconnected components

Demo

Conculsions

Note to self: higher level autarkies