CNF | Wonderings of a SAT geek

Algebraic Normal Form, or simply ANF, is the form used to describe cryptographic functions. It looks like this:
$a*b \oplus b*c \oplus b \oplus d = 0\\b*c \oplus c \oplus a = 0\\\ldots$
Where $a,\ldots d$ are binary variables, the $\oplus$ sign is binary XOR, and the $*$ sign is binary AND. Every item that is connected with the XOR-s is called a monomial, and its degree is the number of independent variables inside it. So, $a*b$ is a 2-degree monomial in the first equation, and $c$ is a 1-degree monomial in the second equation.

An important problem in SAT is to translate an ANF into the input of SAT solvers, Conjunctive Normal Form, or simply CNF. A CNF formula looks like this:
$a \vee b \vee c = 1\\a \vee \neg d = 1\\\ldots$
Where again $a,\ldots d$ are binary variables, the $\vee$ sign is the binary OR, and the $\neg$ sign is the binary NOT (i.e. inverse).

The scientific reference paper

The most quoted article about ANF-to-CNF translation is that by Courtois and Bard, which advocates for the following translation process:

Introduce internal variables for every monomial of degree larger than 1
Describe the equations as large XORs using the recently introduced variables

The example problem in CNF

According to the original method, the equations presented above are first translated to the following form:
$v1 = a*b\\v2 = b*c\\v1 \oplus v2 \oplus b \oplus d = 0\\v2 \oplus c \oplus a = 0$

Where $v1, v2$ are fresh binary variables. Then, each of the above equations are translated to CNF. The internal variables are translated as such:

Translation of $v1 = a*b$ :
$v1 \vee \neg a \vee \neg b = 1\\\neg v1 \vee a = 1\\\neg 1 \vee b = 1$
Translation of $v2 = b*c$
$v2 \vee \neg b \vee \neg c = 1\\\neg v2 \vee b = 1\\\neg v2 \vee c = 1$
Translation of $v1 + v2 + b + d = 0$ :
$\neg v1 \vee v2 \vee b \vee d = 1\\v1 \vee \neg v2 \vee b \vee d = 1\\v1 \vee v2 \vee \neg b \vee d = 1\\v1 \vee v2 \vee b \vee -d = 1\\\neg v1 \vee \neg v2 \vee \neg b \vee d = 1\\\neg v1 \vee \neg v2 \vee b \vee \neg d = 1\\\neg v1 \vee v2 \vee \neg b \vee \neg d = 1\\v1 \vee \neg v2 \vee \neg b \vee \neg d = 1$
Translation of $v2 + c + a = 0$ :
$v2 \vee c \vee \neg a = 1\\v2 \vee \neg c \vee a = 1\\\neg v2 \vee c \vee a = 1\\\neg v2 \vee \neg c \vee \neg a = 1$

We are now done. The final CNF file is this. This final CNF has a small header, and some fluffs have been removed: variables are not named, but referred to with a number, and the = true-s have been replaced with a line-ending 0.

As you can imagine, there are many ways to enhance this process. I have written a set of them down in this paper. The set of improvements in a nutshell are the following:

If a variable’s value is given, (e.g. a = true), first substitute this value in the ANF, and transcribe the resulting ANF to CNF.
If there are two monomials, such as: a*b + b in an equation, make a non-standard monomial (-a)*b from the two, and transcribe this to CNF. Since the CNF can use negations, this reduces the size of the resulting CNF
If the ANF can be described using Karnaugh maps shorter than with the method presented above, use that translation method.

An automated way

I just got a nice script to perform step (1) from Martin Albrecht, one of the developers of Sage:

sage: B = BooleanPolynomialRing(4500,'x')
sage: L = [B.random_element(degree=2,terms=10) 
      for _ in range(4000)]
sage: s = [B.gen(i) + GF(2).random_element() 
      for i in range(1000)]
sage: %time F = 
      mq.MPolynomialSystem(L+s).
      eliminate_linear_variables(maxlength=1)
CPU time: 1.03 s,  Wall time: 1.11 s

In this code, Martin generates a boolean system of equations with 4500 variables, with 4000 random equations each with randomly selected monomials of degree 2 and of XOR size 10. Then, he sets 1000 random variables to a random value (true or false), and finally, he substitutes the assigned values, and simplifies the system. If you think this is a trivial issue, alas, it is not. Both Maple and Sage take hours to perform it if using the standard eval function. The function above uses a variation of the ElimLin algorithm from the Courtois-Bard paper to do this efficiently.

I recently got a question that intrigued me:

I am new to this SAT solving world but I was wondering whether you thought considerable speedups were possible for crypto type problems (multivariate polys over GF(2)) by simply never converting the problem to cnf at all and thereby avoiding the combinatorial explosion that results in the conversion process. That is using the original xor formulation.

First of all, the question is a follow-up to xor-clauses: they implement XOR-s natively. Using them avoids a number of problems relating to the increase of variables. Why not implement monomials (i.e. “a*b” or “a*b*c”, where “*” is binary AND and variables are binary) natively? They are the only thing left to do. Personally, I am not overly optimistic about them, though. Let me got through some of my reasons here.

Firstly, the “exponential explosion” expressed in the question is in fact much less existent than people tend to think. The reason is that the intelligent variable activity heuristics, unit propagation, and conflict generation tend to take care of a lot of potential problems. Since the propagation of a variable will entail the propagation of many others (it depends, for crypto, around ~100), there is no real explosion, since there is not really 2^n, but more like 2^(n/100) combinations that need to be explored. This argumentation takes away some of the potential benefits that native monomials could bring.

The real problem, though, is the following. By moving monomials into cryptominisat and thus potentially speeding up the solving, conflict generation could become much more complex. So, if moving to an internal monomial representation entails making a mess of conflict generation, then using monomials internally may only make the solving slower.

Another reason that native monomials may not speed up solving so much, is that a lot of clauses inserted when converting monomials are binary clauses, which are extremely well dealt with in the CNF world — it would be hard to do it any better.

As a last, but very minor point, using monomials would increase the complexity of the program, which would mean not only a lot of man-hours lost debugging it, but also a loss of performance due to a (probably non-negligible) increase of instruction cache misses.

Oh well, so those are my reasons. I would be interested if someone has some comments on these, though.

Wonderings of a SAT geek

A blog about SAT solving and cryptography

Tag Archives: CNF

ANF to CNF conversion

The scientific reference paper

The example problem in CNF

An automated way

Could monomials be handled natively from SAT solvers?