MiniSat FAQ | Wonderings of a SAT geek

This is just a short list of things that is often asked about MiniSat. If you have more questions, maybe you can browse the MiniSat google groups, or read the other FAQ by David A. Wheeler, and maybe the definition of the DIMACS file format. I also suggest to attentively read the ./minisat --help:

USAGE: ./minisat [options]  

  where input may be either in plain or gzipped DIMACS.

OPTIONS:

  -polarity-mode = {true,false,rnd}
  -decay         = <num> [ 0 - 1 ]
  -rnd-freq      = <num> [ 0 - 1 ]
  -verbosity     = {0,1,2}

How are variables picked?

The function responsible is pickBranchLit() in MiniSat. There are two methods of picking. One that picks from an ordered heap, and another that picks totally randomly. The random pick is done 2% of the time (see random_var_freq), though this is adjustable from command line (see -rnd-freq=X option). The ordered heap is simply an ordered set of variables. A variable is higher in the list if it appeared lots of times in the clauses that were involved in the generated conflict (see in function analyze() the part with varBumpActivity()).

Why doesn’t MiniSat find the solution to my problem that has at most 2-3 really important variables?

This is related to the previous question. You probably have a LOT of internal variables (that are not important). Since MiniSat tries to explore the search space by sometimes (2%) picking variable at random, it can get lost at a place in the search space that is uninteresting. Try lowering the random pick frequency by giving the -rnd-freq=X, setting X to be very-very low.

MiniSat is taking too long to solve my cryptographic function!

Try giving some randomly selected, randomly assigned “help bits”. These “help bits” should be of course your important variables (e.g. internal state or key). Run these random experiments with N number of help bits for 1000 times, and average the time. Now run with N-1 help bits, and average the time. Repeat with N-2, N-3… until you have enough point on the graph. The produced graph should be easy to extrapolate.

Why doesn’t MiniSat recognise my `p cnf VAR CLAUSES` ?

Because MiniSat doesn’t read that (see parse_DIMACS_main()). It doesn’t need to, as it automatically adds variables (see newVar()) and clauses (see addClause()).

How do I make my CNF file prettier? It looks like a spaghetti!

Use comments. They are lines starting with a c, like this:

c ------------
c clauses that code down equation number 3749
10 -2 7 0
14 -8 2 0
9 -1 5 0
c -----------

Does variable numbering matter?

No. But take care: if your variable numbers start at 100000, or a large range of variable numbers are unused, then MiniSat will have to store and work with useless memory contents. So, in general, use every variable number possible from 1 up. Don’t leave gaps, and don’t start numbering form large variable numbers.

If I change the order of the clauses, MiniSat solves faster/slower!

That’s by chance. MiniSat inserts the first clause in the DIMACS into file into the beginning of its literal watch-list (see attachClause()). When MiniSat starts propagating, it will propagate the first possible propagatable variable. So, depending on which is the first, second, etc. MiniSat will behave differently.

If you need to have an average time to solve for a certain problem, you should permutate the clauses and also change the internal seed of MiniSat. This can by done easily by CryptoMiniSat, through the -randomize=X command-line option.

MiniSat is so fast to solve a problem whose unknown variables are almost all `false`!

This is because MiniSat guesses unknown variables to false by default (see pickBranchLit() function’s switch directive). To change it, use the command-line option -polarity-mode={true, false, rnd}.

MiniSat is so slow to solve a problem whose unknown variables are almost all `true`!

See previous question.

I have a simple linear set of equations, and MiniSat cannot solve it!

Let’s suppose we have a linear set of equations like: (+ is the binary XOR, variables are binary)

a + b + c + ... = true
a + c + e + ... = false
...

Let’s convert these equations to CNF! The SAT solver will not solve it in any reasonable amount of time. However, Gaussian elimination will solve it in seconds! Why? Because MiniSat doesn’t understand the structure of the problem. It will simply guess some of the variables and try to make propagations. However, if the equations are dense, then MiniSat will need to guess a lot of variables before any propagations can be made. So, the search tree will be huge.

To overcome this problem, CryptoMiniSat can do Gaussian elimination on the xor-clauses if you give the “–gaussuntil=100” option, where 100 is a parameter you might need to adjust (it affects the depth of the search tree until which Gauss. elim. is executed).

How can I make my solving faster?

This is too complex of a topic to go into. What I suggest are the following:

Have a look at the conversion of your functions to CNF. Maybe some can be improved? Try to use the excellent espresso logic minimiser or use Boom.
Do you have a lot of XOR functions? Maybe you should consider using CryptoMiniSat. It handles XOR-s natively
Maybe you have forgotten to add the redundant clauses during the Tseitin transformation? (see page 5 of article by Een et al.)
Try to encode your problem by minimising the size and the number of clauses instead of the number of variables. Naturally, there is a trade-off, but focusing solely on the number of variables can lead to problems
Have a look at the statistics that CryptoMiniSat can produce for you
Maybe other tools would work better. Grobner base algorithms are a starting point

How can I make MiniSat find ALL the solutions to a problem?

First of all, make sure the number of solutions is not too big. Then simply solve once, and output the solution to a file: ./minisat satfile solution. Then simply put in a clause into the satfile that is false for the found solution. E.g. let’s suppose the solution was

1 = true, 2 = true, 3 = false

Then put into the satfile the line:

-1 -2 3 0

and run MiniSat again. If the solution is UNSAT, then there are no other solutions. If it is SAT, then again put in a clause that bans that one solution. Iterate this as many times as you need a solution.(there is another way of doing this by using MiniSat as a library. It’s faster, too).

There is a somewhat more detailed example in this FAQ, under the heading “Getting more solutions”.

Why is MiniSat giving me UNSAT without even looking at my file?

You probably have a line in your file that only has a 0 in it:

this is an empty clause and so it is unsatisfiable. Therefore the system of clauses is unsatisfiable. Remove this clause.

How can I control the number of clauses learnt during solving?

Tweak the learntsize_factor and the learntsize_inc variables in MiniSat. Learning a lot of clauses might slow down the solver since it needs to jump to these clauses often to see if any of them make a conflict/propagation. But having too few, learning doesn’t help to trim the search tree. But the trade-off can be different for your problem. The values that are used in MiniSat are 1/3 for the init (i.e. learntsize_factor) and 1.1 for the increase per restart (i.e. learntsize_inc).

I want to make MiniSat run faster. What can I do?

If you use the gcc compiler, compile with flags "-O3 -g0 -march=native".

I want to know exactly what happens during solving!

Try to use CryptoMiniSat and turn on the VERBOSE_DEBUG option. It gives a lot of quite understandable details of MiniSat’s inner workings. Use small example problems, and try also CryptoMiniSat’s graphing tool.

How does MiniSat keep track of which variable caused a propagation/conflict?

It’s always the first literal in the clause that is causing the propagation or conflict (see propagate()‘s first reference variable).

Why doesn’t MiniSat simply use a `vector<Lit>` to store the literals in the clauses?

Probably because from a cache&program-flow point of view, it’s simply not a good idea. vector will store a pointer to an allocated memory space, which is probably somewhere else than the other datas(size, marking, etc) of the clause, so the processor would have to jump there every time. Also, when accessing the Clause’s info, e.g. size(), data that is adjacent to this memory location will be pulled into the cache automatically, pulling in exactly what will be looked at next: the literals.

What is `level` and `trail_lim`?

vec<level> stores the assignment level of the variables (therefore it is of size nVars()). If variable 10 is set, then you have level[10] set to the decision level at which it was set. The decision level at any point in time can be asked by simply calling decisionLevel(), which simply returns trail_lim.size(). vec<trail> stores the assignement level at which a certain variable was assigned. trail_lim stores the assignement level at which guesses were made during the solving.

When guessing (=branching on) a variable, the following is called:

trail_lim.push(trail.size());
uncheckedEnqueue(guess);

i.e. inside trail_lim the current assignement level (trail.size()) is stored, and the newly guessed variable is pushed on the assignment stack (and the reason for this variable assignement will be NULL to indicate that it’s not a propagation, but a guess)