statistics | Wonderings of a SAT geek

Lately, I have been putting a lot of effort into collecting data about solver behaviour and dumping it on-the-fly to a MySQL database. Some of this data is then presented by a PHP&javascript interface, such as this.The goal is to better understand how solving works, and thus to possibly come up with better strategies for solving. The requirements of the data gathered are: speed to calculate, size to store and usefulness.

Gathered data

There are some obvious statistics that one would like to collect, such as number of conflicts, propagations and conflicts per second, etc — these are all gathered by almost every SAT solver out there. However, it would be interesting to have more. Here is what CryptoMiniSat now gathers and dumps using highly efficient prepared bulk insert statements, using at most 5% of time.

For each variable, dumped once in a while:

no. times propagated false, and true (separately)
no. times decided false and true (sep.)
no. times flipped polarity
avg & variance of decision level
avg & variance of propagation level

For each clause larger than 3-long, dumped once in a while:

activity
conflict at which it was introduced (if learnt)
number of propagations made
number of conflicts made
number of times any of its literal was looked at during BCP
number of times it was looked at (dereferenced) during BCP
number of times used to resolve with during learnt clause generation
number of resolutions needed to during its generation (if learnt clause)

For earch restart:

no. of reducible&irreducible (i.e. learnt&non-learnt) binary clauses
no. of red&irred tertiary clauses (separately)
no. of red&irred long clauses (sep)
avg, variance, min and max of learnt clause glues
avg, var, min, max of learnt clause sizes
avg, var, min, max of number of resolutions for 1st UIP
avg,var,min,max of branch depths
avg,var,min,max of backjump length –in the number of literals unassigned
avg,var,min,max of backjump lenght — in the number of levels backjumped
avg, var, min, max times a conflict followed a conflict without decisions being made
avg,var,min,max of agility
no. of propagations by red&irred binary clauses
no. of props by red&irred tertiary clauses
no. of props by red&irred long clauses
no. of conflicts by red&irred binary clauses (separately)
no. of conflicts by red&irred tertiary clauses (sep)
no. of conflicts by red&irred long clauses (sep)
no. of learnt unit, binary, tertiary, and long clauses (sep)
avg,var, min,max of watchlist size traversed during BCP
time spent
no. of decisions
no. of propagations
no. of variables flipped
no. of variables set positive&negative (sep)
no. of variables currently unset
no. of variables currently replaced
no. of variables currently eliminated
no. of variables currently set

For each learnt clause database cleaning:

for reducible clauses removed: all the data in the “clauses larger than 3-long” data above, summed up
for reducible clauses not removed: all the data in the “clauses larger than 3-long” data above, summed up
no. of clauses removed
no. of clauses not removed
for all irred clauses (these are not touched): all the data in the “clauses larger than 3-long” data above, summed up

For every N conflicts:

clause size distribution
clause glue distribution
clause size and glue scatter data

This is all, and is not all mine. Suggestions were made by many, some as much as a year ago: George Katsirelos, Luca Melette, Vegard Nossum, Valentin-Mayer Eichberger, Ben Schlabs, Said Jabbour and Niklas Sörensson. Naturally, none of these people necessarily approve of how I gather data or what I do with the data gathered, but they helped, so listing them is only fair.

What’s not yet or cannot be gathered

Two suggestions are still on the TODO list:

counting the number of conflicts done while a learnt clause was “locked”, i.e. has propagated in the current search tree. This stat of a learnt clause could tell us if the clause seemed essential to the search or not. If a clause propagated once, at the bottom of the search tree, and then the variable propagated was quickly unset, it’s not the same as if the same clause propagated at the top of the search tree, and then the variable it set was essentially never unset.
for each variable, counting the number of conflicts done while the variable was set. This is interesting, because if a variable was propagated only once, at top-level, it will seem like it was never set (set only once), yet it may have had a huge influence on the search tree and consequently on the learnt clauses.

Both of these require a change in the algorithm used to backjump and although I am a bit worried about the performance, I will add these soon.

Unfortunately, the data about variables can’t really be dumped so often, because it’s huge for large problems with millions of variables. So I guess I will only dump that for the most active N variables, for some sane definition of “active”. Similarly, the data about individual clauses is not dumped, only in a summed-up way during database cleaning.

Suggestions?

In case you have any ideas what else could be interesting to dump, please put it as a comment. The idea is to dump data that is cheap to compute and cheap to dump yet would be interesting for some reason or another. I am prepared to add stuff to datastructures, as you have probably guessed from the above. Yes, it makes the clause database less space-efficient, and so increases cache-miss. But on large problems, it’s going to be a cache-miss most of the time anyway, and a cache-fetch brings in 128B of data, which should be plenty for both the stats and the clause. Similarly with variables. So, don’t refrain from suggesting some stuff that takes space. Also, weird stuff is interesting. The most weird stat on the list above is definitely the “conflict after a conflict without decisions” (suggested by Said Jabbour) which I would have guessed to be zero, or near-zero, yet is pretty high, in the 30+% range.

Suggestions are welcome!

Wonderings of a SAT geek

A blog about SAT solving and cryptography

Tag Archives: statistics

Collecting solver data

Gathered data

What’s not yet or cannot be gathered

Suggestions?