Tag Archives: research

Note to self: higher level autarkies

While reading this thesis, I have had a thought about autarkies. Let me first define what an autarky in SAT solving is:

A partial assignment phi is called a weak autarky for F if “phi*F is a subset of F” holds, while phi is an autarky for F if phi is a weak autarky for all sub-clause-sets F' of F. (Handbook of Satisfiability, p. 355)

What does this mean? Well, it means that in the clause-set {x V y, x} the assignment y=False is a weak autarky, because assigning y to False will lead to a clause that is already in the clause set (the clause x), while the assignment of x=True is a (normal) autarky, because it will satisfy all clauses that x is in. The point of autarkies is that we can remove clauses from the clause set by assigning values to a set of variables, while not changing the (un)satisfiability property of the original problem. In other words, autarkies are a cheap way of reducing the problem size. The only problem is that it seems to be relatively expensive to search for them, so conflict-driven SAT solvers don’t normally search for them (but, some lookahead solvers such as march by Marijn Heule do).

So, what is the idea? Well, I think we can have autarkies for equivalent literals, too. Here is an example CNF:

 a V d V -f
-b V d V -f
-a V e
 b V e

setting a = -b will not change the satisfiability of the problem.

We can add any number of clause pairs of the form X V Y where Y is any set of literals not in {a, -a, b, -b}, and X is (-)a in clause 1 and (-)b in clause 2. Further, one of the two variables, say, a can be in any clauses at will (in any literal form), though variable b can then only be in clauses defined by the clause pairs. Example:

 a V  d V -f
-b V  d V -f
-a V  e
 b V  e
 a V  c V -h
 a V -f

An extension that is pretty simple from here, is that we can even have clauses whose Y part is somewhat different: some literals can be missing, though again in a controlled way. For example:

 a V d
-b V d V -f
-a V e
 b V e

is possible. It would restrict b a bit more than necessary, but if a is the only one of the pairs missing literals, we are still OK I believe.

Finally, let me give an example of what these higher-level autarkies are capable of. Let’s assume we have the clause set:

 a V  d
-b V  d V -f
-a V  e
 b V  e
 a V  c V -h
 a V -f
 a V  b V  f V -g

setting a=-b now simplifies this to:

 a V  d
-a V  e
 a V  c V -h
 a V -f

which is equisatisfiable to the first one. Further, if the latter is satisfiable, computing the satisfying solution to the former given a solution to the latter is trivial.

Fun facts about SAT solving (part 1)

Last time I gave a talk, I got some quite deserved fire about the way I approach SAT solving: in a more practical than scientific way. So, to give some food for thought for those who wish to approach SAT from a more scientific viewpoint, and to demonstrate to what lengths I have gone to give (at least to myself) a scientific viewpoint, let me talk a little bit about a fun fact that I have found. The fun fact, as is quite usual for SAT, seems entirely trivial, even banal, but has somehow (so far) eluded proper explanation for me. Maybe you will be the one to give the right explanation :)

So, I generate a Grain cipher instance using, e.g. the Grain-of-Salt tool that I developed. If you don’t want to generate these yourself, there are a couple (multi-hundred thousand) pre-generated such problems you can download from here. I launch MiniSat 2.1 “core”(i.e. the most simple version) on these problems with a little twist: I write to a file the length of each generated learnt clause. Now, there are two kinds of CNF files I use: one that has 60 randomly picked randomly set state variables, and another one that is plain, and doesn’t have any variables set. Obviously, the CNF that has variables set should be easier to solve, and if Grain is a proper cipher, it should be about 2^60 easer to solve. Now, I print the average learnt clause length that I observe for the CNF that has 0 variables set, and I get this graph:

I am sure one could sing long and elaborate songs about this graph, but instead, I will just say: it has a strange curvy thing at around clause length 120. Now, I will generate a similar graph, but this time, I will use a CNF that has 60 state (i.e. important) variables set. Note that this should indeed be 2^60 times easier to solve (since Grain is a nice cipher, and my CNF generation abilities aren’t that poor). So the same graph for this CNF looks like this:

All right. This one, as I am sure you have noticed, has a curvy thing at around 60. Using elementary math, 120-60 = 60. Right, so one could reason: well, there were X unknowns, I have set 60 of them, now there are only X-60 left, so the average clause length should be 60 less! That I think doesn’t explain much, if anything, however. First off, the number of unknowns were in fact 160 — that’s the unknown state of Grain that we were trying to solve. Second, the learnt clauses don’t only contain state variables… in fact, a lot of variables they contain are not state variables at all! Furthermore, why should the 1st UIP scheme be so exact about clause sizes? After all, it’s just a graph-analysis algorithm working at the local conflict — it has no clue about the problem it is working with, much less about the number of state bits set in our instance of Grain.

One could say that this is just a dumb example, and it has no real meaning. Maybe I just stumbled upon it, after all, 120/2 = 60, etc. However, this doesn’t seem to be the case. I can, in fact, reproduce this for a lot of X’s for any of the following ciphers: Grain, Trivium, Bivium, HiTag2 and Crypto-1. Let me show one from HiTag2. Again, we are solving for the state, and we don’t set any state bits (i.e. this is the full HiTag2 problem):

Okay, there is a bumpy thing at around 23. Now, let’s set 10 randomly picked state variables to random values, and run the example again:

There seems to be a bumpy thing at around 13. 23-10 = 13. That’s about right.

By the way, I have stumbled upon the above fun fact about 2 years ago (well before CryptoMiniSat was born), when working on my diploma thesis — these graphs are actually taken verbatim from my thesis. I have worked a lot on SAT in the past 2 years, but this has been haunting me ever since. Maybe someone could explain it to me?

PS: Does anyone know if someone has done an in-depth analysis of learnt clause length distributions, maybe even three-dimensional, showing the time on the Z axis? It would be awesome to do that, if it hasn’t been done yet!

Come to Hackito Ergo Sum’11

There is going to be a nice hacker conference in Paris real soon now: Hackito Ergo Sum, between the 7th and the 11th of April. There will be plenty of interesting talks, among them, I will also be presenting some fun crypto-breaking :) The schedule is here, I am scheduled for the first day, on Thursday at 11:30 AM. If you are in or near Paris, come around for the conference, I am sure it will be a really nice one :) If you are interested in meeting me, we can also meet in Paris, just drop me a mail, but better yet, come and chat with me at the conf, I will be there all day long.

The HES conference mainly focuses on attacks: against software, crypto, infrastructure, hardware, and the like. If you are interested what the current trend in attacking these systems is, and/or you are interested in making your organisation safer from these attacks, it’s a good idea to come and visit to get the hang of what’s happening. It’s also a great opportunity to meet all the organisations and people who are driving the change in making software and hardware systems more secure by demonstrating their weaknesses and presenting possibilities for securing them.

Meet you at HES’11 ;)

Stats on CryptoMiniSat’s development

I have just discovered the gitstats tool, and quickly put together a set of statistics on the development of CryptoMiniSat, the SAT solver I have been working on lately. I have been using the GIT repository for most of my work on CryptoMiniSat, doing ~2000 commits since its inception, so it gives a good picture of how things evolved, and how I usually work. Firstly, the hours I usually work:

I basically work almost as much at midnight as I do at midday. The afternoon is clearly the most active and practically useful part of my day. Next, the graph to show which days I work the most:

Essentially, I worked as much on CryptoMiniSat during Sunday as I did during Monday or Friday. Saturday was the day I must have been the most fed up with development. Finally, the number of commits by months:

The 2nd and 3rd months of 2010 was a struggle because of the SAT and PoS deadlines. I tried to put together a credible research paper, which finally got accepted at PoS. During the 6th and 7th months of 2010 I was mostly looking forward to the results of the SAT Race, so I wasn’t developing so much.

The 2010 SAT Conference

I am currently at the SAT 2010 conference, in Edinburgh. It is my second SAT conference, but the first one where I finally have a chance to know who is who. I had a chance to talk to some great researchers. For instance, I talked to (in no particular order) Armin Biere (author of PrecoSat and Lingeling), Niklas Een, Niklas Sörensson (authors of MiniSat — new version is out, btw), Marijn Heule (author of the march_* set of solvers), Matti Järvisalo (one of the authors of Blocked Clause Elimination), the author of MiraXT, and many others. It’s really a great place to meet up people and get a grasp on what SAT is really about. It’s also a great way to understand the motivation behind certain ideas, and to hear the problems encountered by others.

For instance, I now know that bugs, about which I have already written a blog post about is a problem that comes up for nearly everyone, and tackling them is one of the most difficult things for the developers. I have also finally understood the reason behind the MiniSat Template Library (MTL), which I’ve always found odd: it re-implements many things in the STL (the Standard Template Library), which I have always found unnecessary. Although the reasons are now clear (better control, easier debugging, less memory overhead), I think I will still replace it with STL. I have also had a chance to talk to the kind reviewer who did such an extensive review of my paper, which was quite an enlightening discussion. If you are interested in SAT, I encourage you to attend next year: there are usually some workshops associated with the conference, where it is easier to have a paper accepted: e.g. this year, a Masters student got his quite interesting paper (long version) accepted, and I am sure he has benefited a lot from it.