# Presentation at Hackito Ergo Sum

The HES’11 event was great: I had the pleasure of listening to some awesome presentations, and to meet some great people. The most interesting presentation from a non-technical point of view was the attacks at the automount feature of Linux, which everybody thinks is completely secure, but is in fact very flawed due to some buggy rendering libraries. It’s quite interesting that almost everyone thinks that their Linux installation is secure, when in fact if Linux was mainstream, viruses would be abound — but Linux is only a minor player, so malicious software is rarely written for it.

My presentation is available here. I tried mostly to demonstrate how SAT solvers work as an element of the technique that can most amply described as:

As the graphics show, the SAT solver is in fact only one player in this environment. As it turns out, it is the very last step after obtaining the cipher, creating equations describing the cipher, and converting the ANF equations into CNF. The best way to create equations from the original cipher is to use the excellent Sage Maths library for this, a tutorial of which is here. Then, the ANF created by Sage can be transcribed into CNF using, e.g. the anf2cnf tool by Martin Albrecht and me. Finally, the CNF must be solved with a SAT solver to recover the key of the cipher. This last step can be carried out by many SAT solvers, such as lingeling or MiniSat, but I prefer CryptoMiniSat, since I am the main developer for that SAT solver, and it is also very convenient to use in this domain due to some domain-specific advantages it has over other solvers. The middle two steps of the diagram are all automated by the Grain-of-Salt tool if you don’t want to use Sage, and it also contains some example ciphers, so you don’t even have to do step no. 1 in case you wish to work on one of multiple pre-defined industrial ciphers.

In case you are interested in the visualisations I used during my presentation, here is the set of tools I used. For the 3D visualisation, I used 3Dvis by Carsten Sinz — it’s a great tool to extract some structure from problems already in CNF. In case you still have the ANF, it contains more structure, though, and so it is more interesting to look at it that way. Unfortunately, that is rarely the case for typical SAT problems, and so one must often resort back to 3Dvis. For the example search tree, I used CryptoMiniSat 1.0 and gnuplot, and for the example real-time search, I used CryptoMiniSat 2.9.0, available from the same place. Unfortunately, CryptoMiniSat 2.9.0 cannot generate a search tree yet, but this eventually will be included, with time — especially if you join the effort of developing the solver. We are always looking forward to people joining in and helping out with various issues from graph generation to algorithm performance tuning, or even just some fun research.

# anf2cnf script released

I have finally managed to fix the script that converts ANF problems to CNF format in the Sage math system. The original script was having some problems that I blogged about. The new script has corrected most of the shortcomings of the original script, as well as added some textual help for the user.

For instance, the equations

sage: print two_polynoms
[x0*x1 + 1, x0*x1 + x1]

that last time required 13 clauses and 4 variables in CNF, now look like this:

sage: print anf2cnf.cnf(two_polynoms)
p cnf 3 6
c ------------------------------
c Next definition: x0*x1 + 1
3 0
c ------------------------------
c Next definition: x0*x1 + x1
3 -2 0
-3 2 0
c ------------------------------
c Next definition: monomial x0*x1
1 -3 0
2 -3 0
3 -1 -2 0

which is 1 variable and 7 clauses shorter than the original, not to mention the visually cleaner look and human-parseable output. The new script is available here. Hopefully, some of my enhancements included in the Grain-of-Salt package will be included in this script. The problem is mainly that Grain-of-Salt uses radically different data structures, and is written in a different programming language, so porting is not trivial.

# anf2cnf hell in Sage

There is an ANF (Algebraic Normal Form) to CNF (Conjunctive Normal Form) converter by Martin Albrecht in Sage. Essentially, it performs the ANF to CNF conversion that I have described previously in this blog entry. Me, as unsuspecting as anyone else, have been using this for a couple of days now. It seemed to do its job. However, today, I wanted to backport some of my ideas to this converter. And then it hit me.

Let me illustrate with a short example why I think something is wrong with this converter. We will try to encode that variable 0 and variable 1 cannot both be TRUE. This is as simple as saying x0*x1 = 0 in plain old math. In Sage this is done like this:

sage: B = BooleanPolynomialRing(10,'x')
sage: anf2cnf = ANFSatSolver(B)
sage: polynom = B.gen(0)*B.gen(1)
sage: print polynom
x0*x1

So far, so good. Let’s try to make a CNF out of this:

sage: print anf2cnf.cnf([polynom])
p cnf 4 6
2 -4 0
3 -4 0
4 -2 -3 0
1 0
4 1 0
-4 -1 0

Oooops. Why do we need 6 clauses to describe this? It can be described with exactly one:

p cnf 2 1
-1 -2

This lonely clause simply bans the solution 1 = TRUE, 2 = TRUE, which was our original aim.

sage: print two_polynoms
[x1*x2 + 1, x1*x2 + x1]
sage:  print anf2cnf.cnf(two_polynoms)
p cnf 4 13
2 -4 0
3 -4 0
4 -2 -3 0
1 0
4 0
2 -4 0
3 -4 0
4 -2 -3 0
1 0
4 2 1 0
-4 -2 1 0
-4 2 -1 0
4 -2 -1 0

Notice that clause 2 -4 0 and the two following it have been repeated twice, as well as the clause setting 1 to TRUE.

I have been trying to get around these problems lately. When ready, the new script will be made available, along with some HOWTO. It will have some minor shortcomings, but already, the number of clauses in problem descriptions have dramatically dropped. For example, originally, the description of an example problem in CNF contained 221’612 clauses. After minor corrections, the same can now be described with only 122’042 clauses. This of course means faster solving, cleaner and even human-readable CNF output, etc. Fingers are crossed for an early release ;)

# Truth table to ANF conversion

It is sometimes important to be able to quickly convert a binary function, described in as a truth table, to its Algebraic Normal Form (or simply, ANF). The truth table basically lists the points where a function is 1 and when it is 0. For instance, if the function has two variables, and its truth table is 0110 or, in hexadecimal form, 0x5 then the ANF of this function is simply x0 + x1, where x0 is the first variable of the function, and x1 is the second.

When it comes to more complex functions, an automatic method for this conversion is helpful. The Sage software can do exactly that:

sage: from sage.crypto.boolean_function import BooleanFunction
sage: B = BooleanFunction("0123456789ABCDEF")
sage: B.algebraic_normal_form()
x0*x1*x2 + x0*x1*x3 + x0*x1*x4 + x0*x1*x5 + x0*x2
+ x0*x3 + x1*x2 + x1*x4 + x2 + 1

Here, we used Sage to convert the function given by the truth table (in hex) 0x0123456789ABCDEF to its ANF equivalent.

I have long searched for a method for this, but at the end it was Luk Bettale who showed me the way. Many thanks to him for this!

# ANF to CNF conversion

Algebraic Normal Form, or simply ANF, is the form used to describe cryptographic functions. It looks like this:
$a*b \oplus b*c \oplus b \oplus d = 0\\b*c \oplus c \oplus a = 0\\\ldots$
Where $a,\ldots d$ are binary variables, the $\oplus$ sign is binary XOR, and the $*$ sign is binary AND. Every item that is connected with the XOR-s is called a monomial, and its degree is the number of independent variables inside it. So, $a*b$ is a 2-degree monomial in the first equation, and $c$ is a 1-degree monomial in the second equation.

An important problem in SAT is to translate an ANF into the input of SAT solvers, Conjunctive Normal Form, or simply CNF. A CNF formula looks like this:
$a \vee b \vee c = 1\\a \vee \neg d = 1\\\ldots$
Where again $a,\ldots d$ are binary variables, the $\vee$ sign is the binary OR, and the $\neg$ sign is the binary NOT (i.e. inverse).

### The scientific reference paper

The most quoted article about ANF-to-CNF translation is that by Courtois and Bard, which advocates for the following translation process:

1. Introduce internal variables for every monomial of degree larger than 1
2. Describe the equations as large XORs using the recently introduced variables

### The example problem in CNF

According to the original method, the equations presented above are first translated to the following form:
$v1 = a*b\\v2 = b*c\\v1 \oplus v2 \oplus b \oplus d = 0\\v2 \oplus c \oplus a = 0$

Where $v1, v2$ are fresh binary variables. Then, each of the above equations are translated to CNF. The internal variables are translated as such:

1. Translation of $v1 = a*b$:
$v1 \vee \neg a \vee \neg b = 1\\\neg v1 \vee a = 1\\\neg 1 \vee b = 1$
2. Translation of $v2 = b*c$
$v2 \vee \neg b \vee \neg c = 1\\\neg v2 \vee b = 1\\\neg v2 \vee c = 1$
3. Translation of $v1 + v2 + b + d = 0$:
$\neg v1 \vee v2 \vee b \vee d = 1\\v1 \vee \neg v2 \vee b \vee d = 1\\v1 \vee v2 \vee \neg b \vee d = 1\\v1 \vee v2 \vee b \vee -d = 1\\\neg v1 \vee \neg v2 \vee \neg b \vee d = 1\\\neg v1 \vee \neg v2 \vee b \vee \neg d = 1\\\neg v1 \vee v2 \vee \neg b \vee \neg d = 1\\v1 \vee \neg v2 \vee \neg b \vee \neg d = 1$
4. Translation of $v2 + c + a = 0$ :
$v2 \vee c \vee \neg a = 1\\v2 \vee \neg c \vee a = 1\\\neg v2 \vee c \vee a = 1\\\neg v2 \vee \neg c \vee \neg a = 1$

We are now done. The final CNF file is this. This final CNF  has a small header, and some  fluffs have been removed: variables are not named, but referred to with a number, and the = true-s have been replaced with a line-ending 0.

As you can imagine, there are many ways to enhance this process. I have written a set of them down in this paper. The set of improvements in a nutshell are the following:

1. If a variable’s value is given, (e.g. a = true), first substitute this value in the ANF, and transcribe the resulting ANF to CNF.
2. If there are two monomials, such as: a*b + b in an equation, make a non-standard monomial (-a)*b from the two, and transcribe this to CNF. Since the CNF can use negations, this reduces the size of the resulting CNF
3. If the ANF can be described using Karnaugh maps shorter than with the method presented above, use that translation method.

### An automated way

I just got a nice script to perform step (1) from Martin Albrecht, one of the developers of Sage:

sage: B = BooleanPolynomialRing(4500,'x')
sage: L = [B.random_element(degree=2,terms=10)
for _ in range(4000)]
sage: s = [B.gen(i) + GF(2).random_element()
for i in range(1000)]
sage: %time F =
mq.MPolynomialSystem(L+s).
eliminate_linear_variables(maxlength=1)
CPU time: 1.03 s,  Wall time: 1.11 s

In this code, Martin generates a boolean system of equations with 4500 variables, with 4000 random equations each with randomly selected monomials of degree 2 and of XOR size 10. Then, he sets 1000 random variables to a random value (true or false), and finally, he substitutes the assigned values, and simplifies the system. If you think this is a trivial issue, alas, it is not. Both Maple and Sage take hours to perform it if using the standard eval function. The function above uses a variation of the ElimLin algorithm from the Courtois-Bard paper to do this efficiently.