Presentation at Hackito Ergo Sum

The HES’11 event was great: I had the pleasure of listening to some awesome presentations, and to meet some great people. The most interesting presentation from a non-technical point of view was the attacks at the automount feature of Linux, which everybody thinks is completely secure, but is in fact very flawed due to some buggy rendering libraries. It’s quite interesting that almost everyone thinks that their Linux installation is secure, when in fact if Linux was mainstream, viruses would be abound — but Linux is only a minor player, so malicious software is rarely written for it.

My presentation is available here. I tried mostly to demonstrate how SAT solvers work as an element of the technique that can most amply described as:

As the graphics show, the SAT solver is in fact only one player in this environment. As it turns out, it is the very last step after obtaining the cipher, creating equations describing the cipher, and converting the ANF equations into CNF. The best way to create equations from the original cipher is to use the excellent Sage Maths library for this, a tutorial of which is here. Then, the ANF created by Sage can be transcribed into CNF using, e.g. the anf2cnf tool by Martin Albrecht and me. Finally, the CNF must be solved with a SAT solver to recover the key of the cipher. This last step can be carried out by many SAT solvers, such as lingeling or MiniSat, but I prefer CryptoMiniSat, since I am the main developer for that SAT solver, and it is also very convenient to use in this domain due to some domain-specific advantages it has over other solvers. The middle two steps of the diagram are all automated by the Grain-of-Salt tool if you don’t want to use Sage, and it also contains some example ciphers, so you don’t even have to do step no. 1 in case you wish to work on one of multiple pre-defined industrial ciphers.

In case you are interested in the visualisations I used during my presentation, here is the set of tools I used. For the 3D visualisation, I used 3Dvis by Carsten Sinz — it’s a great tool to extract some structure from problems already in CNF. In case you still have the ANF, it contains more structure, though, and so it is more interesting to look at it that way. Unfortunately, that is rarely the case for typical SAT problems, and so one must often resort back to 3Dvis. For the example search tree, I used CryptoMiniSat 1.0 and gnuplot, and for the example real-time search, I used CryptoMiniSat 2.9.0, available from the same place. Unfortunately, CryptoMiniSat 2.9.0 cannot generate a search tree yet, but this eventually will be included, with time — especially if you join the effort of developing the solver. We are always looking forward to people joining in and helping out with various issues from graph generation to algorithm performance tuning, or even just some fun research.

Come to Hackito Ergo Sum’11

There is going to be a nice hacker conference in Paris real soon now: Hackito Ergo Sum, between the 7th and the 11th of April. There will be plenty of interesting talks, among them, I will also be presenting some fun crypto-breaking :) The schedule is here, I am scheduled for the first day, on Thursday at 11:30 AM. If you are in or near Paris, come around for the conference, I am sure it will be a really nice one :) If you are interested in meeting me, we can also meet in Paris, just drop me a mail, but better yet, come and chat with me at the conf, I will be there all day long.

The HES conference mainly focuses on attacks: against software, crypto, infrastructure, hardware, and the like. If you are interested what the current trend in attacking these systems is, and/or you are interested in making your organisation safer from these attacks, it’s a good idea to come and visit to get the hang of what’s happening. It’s also a great opportunity to meet all the organisations and people who are driving the change in making software and hardware systems more secure by demonstrating their weaknesses and presenting possibilities for securing them.

Meet you at HES’11 ;)

Recovering XORs from a CNF

Let’s suppose you are looking to recover XORs. It’s fun to recover XORs, because you could try to XOR the XORs together to obtain new information, like this:

a \oplus b \oplus c = 0
a \oplus d \oplus b = 0
therefore,
c \oplus d = 0
therefore,
c = d

Simple XOR recovery

The simplest way to recover an XOR clause is to look for clauses that encode it, which is the following set of clauses:

\neg a \vee b \vee c
a \vee \neg b \vee c
a \vee b \vee \neg c
\neg a \vee \neg b \vee \neg c

Finding such a set is not that hard: we need to first sort each clause according to its literals, then sort the list of clauses according to their sizes and variable contents and then go through the list of clauses linearly. If you think through the first two steps, you will see that it ensures that the above 4 clauses are next to each other in the list of clauses, making it trivial to find them.

Improved XOR recovery

There is only one small glitch with the algorithm above: other sets of clauses can also make up XORs. For example, this set of clauses:
\neg a \vee b \vee c
a \vee \neg b \vee c
a \vee b \vee \neg c
\neg a \vee \neg b

Notice that all I took away was one literal from one clause. Now, if a CNF implies \neg a \vee \neg b it must surely also imply \neg a \vee \neg b \vee \neg c since this latter one is less stringent than the first one. So, we could easily have it inside our CNF, but we don’t for reasons of lazyness: who wants to keep around data that is implied by unit propagation already? However, keeping in mind that the second, less stringent clause is implied by the first one is important: it could lead to the discovery of more XORs.

Results

Let’s move on to the magical territory of practical SAT solving. Let’s first take a typically industrial problem, UTI-20-10p0:

$./cryptominisat_old UTI-20-10p0.cnf
[...]
c Finding non-binary XORs:     0.10 s (found:       0, avg size: -nan)

The old, 2.9.0 CryptoMiniSat seems to have no luck at finding any XOR clauses. This is very typical: industrial instances seem not to contain any XOR clauses at all. Let’s look at the new algorithm now:

$./cryptominisat_new UTI-20-10p0.cnf
[...]
c XOR finding finished. Num XORs:   9288 

All right, that seems better.

Edited to add: Thanks to Vegard Nossum for telling me something was odd with XOR recovery. As usually is the case, answering a question is much easier than knowing what question to ask.

Edited to add2: The above algorithm can be easily improved by using cached literals and stamping.

CryptoMiniSat in SAT Competition’11

I have submitted three versions of CryptoMiniSat to the 2011 SAT Competition, all available here. The releases are commonly called “Strange Night”, after the music track of the same name. Inside the archive you will find 5 binaries and a PDF description. The five binaries are made up of two single-threaded, two multi-threaded, and a unified single- and multi-threaded binary. This latter is called techdemo, as it is more of a technological demonstrator than anything else, and was developed in collaboration with George Katsirelos and Laurent Simon. All versions are collaborative, however, as they all have the hands of some people from the CryptoMiniSat development mailing list on them.

To be honest, I am not overly optimistic of these binaries, especially since I couldn’t get the 3.0 version of CryptoMiniSat ready for the competition, on which I have been working on the past two months. I have tried to backport some features to the 2.9.0 branch to make these releases, and so the resulting executables are pretty unstable. The techdemo version is a broken 3.0 release, as it is missing the signature feature of fully shared clauses between threads. Since I have finally obtained a 4-core CPU — with hyper-threading for 8 “independent” threads — I am all the more optimistic about a fully-shared scheme. The upcoming 3.0 release will be specifically made to work in multi-threaded environments, and will suffer if launched in single-threaded mode. This architectural decision was made because multi-threading nowadays no longer seems optional: it’s a must. On the CryptoMiniSat mailing list, some people are reporting performance on 64-core machines — optimising for single-threaded environment (and having multi-threading as an afterthought) in these times seems like a broken approach.

For those interested in the actual sourcecode of what has been submitted, everything is available from GIT, as usual:

  1. Strange Night1 – single-threaded
  2. Strange Night1 – multi-threaded
  3. Strange Night2 – single-threaded
  4. Strange Night2 – multi-threaded
  5. Tech Demo

Since these are pretty unstable, I wouldn’t use them in a production environment… happy bug-hunting ;)

Visiting Linz

Lately I had the pleasure of visiting Linz, Armin Biere’s workplace, where I gave a quick talk on SAT solver architectures. To me, it was really interesting to think through that presentation — not because it was entirely new or exciting, but because it recapped on many issues that have been bothering me lately. Namely, that it’s difficult to make a really versatile SAT solver, because the low-level choices that must be made (e.g. watchlist scheme) determines so many things when one must make higher-level architectural decisions such as clause sharing or even something as simple as hyper-binary resolution. As for this latter, thanks to Armin Biere’s thoughts I have finally managed to realise why my hyper-binary resolution was so slow: I lately took the decision not to fully propagate binary clauses before propagating normal (i.e. larger) clauses, which meant that doing hyper-binary resolution was much slower as I had to re-calculate the binary graph. The fact of not fully propagating binary clauses before normal clauses seemed also to influence my much higher-level choice of using cached implications, as they (intuitively, and also in practice) help much more if binary clauses are not fully propagated before normal clauses. This latter influence is interesting to think through, as something this trivial shouldn’t — at least in principle — influence such a high-level decision.

Also thanks to Armin Biere, I have managed to grasp a better understanding of lingeling and its superior watchlist scheme. Some of the architectural elements of lingeling’s watchlist scheme are really interesting, and when they get published I will definitely port some of them to CryptoMiniSat. It seems to use much less space, and stores information in a more cache-friendly manner, aiding the processor in its job. A related, interesting pointer that I have learnt is this paper that originally introduced blocking literals, but also talks about a range of other ideas that can help. All in all, it was great to visit Linz and the group of Armin Biere, as I have managed to learn a lot from him and his colleagues.