Collecting solver data

Lately, I have been putting a lot of effort into collecting data about solver behaviour and dumping it on-the-fly to a MySQL database. Some of this data is then presented by a PHP&javascript interface, such as this.The goal is to better understand how solving works, and thus to possibly come up with better strategies for solving. The requirements of the data gathered are: speed to calculate, size to store and usefulness.

Gathered data

There are some obvious statistics that one would like to collect, such as number of conflicts, propagations and conflicts per second, etc — these are all gathered by almost every SAT solver out there. However, it would be interesting to have more. Here is what CryptoMiniSat now gathers and dumps using highly efficient prepared bulk insert statements, using at most 5% of time.

For each variable, dumped once in a while:

  • no. times propagated false, and true (separately)
  • no. times decided false and true (sep.)
  • no. times flipped polarity
  • avg & variance of decision level
  • avg & variance of propagation level

For each clause larger than 3-long, dumped once in a while:

  • activity
  • conflict at which it was introduced (if learnt)
  • number of propagations made
  • number of conflicts made
  • number of times any of its literal was looked at during BCP
  • number of times it was looked at (dereferenced) during BCP
  • number of times used to resolve with during learnt clause generation
  • number of resolutions needed to during its generation (if learnt clause)

For earch restart:

  • no. of reducible&irreducible (i.e. learnt&non-learnt) binary clauses
  • no. of red&irred tertiary clauses (separately)
  • no. of red&irred long clauses (sep)
  • avg, variance,  min and max of learnt clause glues
  • avg, var, min, max of learnt clause sizes
  • avg, var, min, max of number of resolutions for 1st UIP
  • avg,var,min,max of branch depths
  • avg,var,min,max of backjump length –in the number of literals unassigned
  • avg,var,min,max of backjump lenght — in the number of levels backjumped
  • avg, var, min, max times a conflict followed a conflict without decisions being made
  • avg,var,min,max of agility
  • no. of propagations by red&irred binary clauses
  • no. of props by red&irred tertiary clauses
  • no. of props by red&irred long clauses
  • no. of conflicts by red&irred binary clauses (separately)
  • no. of conflicts by red&irred tertiary clauses (sep)
  • no. of conflicts by red&irred  long clauses (sep)
  • no. of learnt unit, binary, tertiary, and long clauses (sep)
  • avg,var, min,max of watchlist size traversed during BCP
  • time spent
  • no. of decisions
  • no. of propagations
  • no. of variables flipped
  • no. of variables set positive&negative (sep)
  • no. of variables currently unset
  • no. of variables currently replaced
  • no. of variables currently eliminated
  • no. of variables currently set

For each learnt clause database cleaning:

  • for reducible clauses removed: all the data in the “clauses larger than 3-long” data above, summed up
  • for reducible clauses not removed: all the data in the “clauses larger than 3-long” data above, summed up
  • no. of clauses removed
  • no. of clauses not removed
  • for all irred clauses (these are not touched): all the data in the “clauses larger than 3-long” data above, summed up

For every N conflicts:

  • clause size distribution
  • clause glue distribution
  • clause size and glue scatter data

This is all, and is not all mine. Suggestions were made by many, some as much as a year ago: George Katsirelos, Luca Melette, Vegard Nossum, Valentin-Mayer Eichberger, Ben Schlabs,  Said Jabbour and Niklas Sörensson. Naturally, none of these people necessarily approve of how I gather data or what I do with the data gathered, but they helped, so listing them is only fair.

What’s not yet or cannot be gathered

Two suggestions are still on the TODO list:

  • counting the number of conflicts done while a learnt clause was “locked”, i.e. has propagated in the current search tree. This stat of a learnt clause could tell us if the clause seemed essential to the search or not. If a clause propagated once, at the bottom of the search tree, and then the variable propagated was quickly unset, it’s not the same as if the same clause propagated at the top of the search tree, and then the variable it set was essentially never unset.
  • for each variable, counting the number of conflicts done while the variable was set. This is interesting, because if a variable was propagated only once, at top-level, it will seem like it was never set (set only once), yet it may have had a huge influence on the search tree and consequently on the learnt clauses.

Both of these require a change in the algorithm used to backjump and although I am a bit worried about the performance, I will add these soon.

Unfortunately, the data about variables can’t really be dumped so often, because it’s huge for large problems with millions of variables. So I guess I will only dump that for the most active N variables, for some sane definition of “active”. Similarly, the data about individual clauses is not dumped, only in a summed-up way during database cleaning.


In case you have any ideas what else could be interesting to dump, please put it as a comment. The idea is to dump data that is cheap to compute and cheap to dump yet would be interesting for some reason or another. I am prepared to add stuff to datastructures, as you have probably guessed from the above. Yes, it makes the clause database less space-efficient, and so increases cache-miss. But on large problems, it’s going to be a cache-miss most of the time anyway, and a cache-fetch brings in 128B of data, which should be plenty for both the stats and the clause. Similarly with variables. So, don’t refrain from suggesting some stuff that takes space. Also, weird stuff is interesting. The most weird stat on the list above is definitely the “conflict after a conflict without decisions” (suggested by Said Jabbour) which I would have guessed to be zero, or near-zero, yet is pretty high, in the 30+% range.

Suggestions are welcome!






0 responses to “Collecting solver data”

  1. Matthew Gwynne Avatar

    This is really a great thing! I imagine that a lot of people hack existing solvers to output data that they need/want so that they can understand either
    (a) the solver acting on their special instances, or
    (b) some new technique that they’ve added to the solver.

    Will be good to see what we can learn, extract and visualise from all this data, and also, possibly the new collaborations with other areas that it brings. The portfolio guys already use some “machine learning for SAT” but this is “big data for SAT” in a different direction :).

    Will be interesting to see all the things it gets used for!

  2. Vegard Avatar

    Here’s one idea for the variable selection heuristic. It’s a bit tricky to describe, so please bear with me. Here goes:

    Intuitively, it’s a measure of “dynamicity” of the variable selection heuristic. Dynamicity actually just means something like how quickly or how much the ordering of the variables changes per time unit.

    Ideally, we’d like to see a single number that characterises the dynamicity. We should also calculate this number from the search tree rather than any heuristic-specific data structures (such as the VSIDS heap containing the variable activities at each and every decision).

    Let’s first try to imagine what the least dynamic variable selection heuristic would be. It would be a heuristic that always picks variables in the same order, right? So a heuristic where every variable is always picked at the same decision level every time is the most static heuristic imaginable, and would have dynamicity 0.

    We see that dynamicity has something to do with the depth at which each variable is picked. Good! This is a property that we can calculate as a function of the search tree.

    It seems logical that we inspect the trail at every leaf of the search tree (i.e. at every conflict), since this is when we have maximal paths from the root to a leaf (where “maximal” doesn’t mean “longest”, but paths that are not included in any other paths). I propose the following: for each variable, we keep a list of depths at which this variable was branched on. Then, at every conflict, we inspect the trail and update these lists accordingly (i.e. if variable 5 was picked at decision level 20, we add 20 to 5’s list).

    Now we have reduced our problem to the following: Given a list of decision levels, how much variation is there in the list? And that’s pretty easy to answer. We can use e.g. standard deviation as a measure of variability.

    We can keep collecting decision levels for each variable until we restart, for example. So at each restart, we calculate the standard deviation of each variable’s list of decision levels. Having a measure of variability for each variable, we can now calculate the “average” variability of all variables!

    This sounds a bit tricky, but if you think about it, it’s quite straightforward. To implement this, you would need:

    – Each variable should have a vec which is its list of decision levels where it was picked.

    – Every time a decision is made, the current decision level is pushed onto the list of the variable that was picked.

    – At every restart, you go through every variable and calculate the standard deviation of the numbers in each list. Then you calculate the average of these standard deviations and report it to the user.

  3. Vegard Avatar

    I forgot one thing: at every restart, the per-variable lists should be cleared. (This way, data does not accumulate between resets, which would anyway slow down the solver more and more.)

    Also, to clarify. I wrote “at every conflict, we inspect the trail and update these lists accordingly”. In the actual implementation, I would update the list just after each decision has been made, as I wrote later.

    1. Vegard Avatar

      As a special case, if a variable has just 0 or 1 decisions in a restart, we should maybe hard-code its variability to 0. Since you can’t calculate the standard deviation for less than two numbers.

      Other measures of variability can also be used. For example, we could use the range as variability, i.e. the difference between the largest and smallest values. This one works for lists of all sizes and even has the added advantage of not requiring that we keep track of the decision-levels using a list! (Instead, we can just associate two numbers with each variable: the minimum and the maximum decision level at which the variable was picked since the last restart.)