Tag Archives: ANF

Bosphorus, an ANF and CNF simplifier and converter

I am happy to finally release a piece of work that I have started many years ago at Security Research Labs (many thanks to Karsten Nohl there). Back in the days, it helped us to break multiple real-world ciphers. The released system is called Bosphorus and has been released with major, game-changing work by Davin Choo and Kian Ming A. Chai from DSO National Laboratories Singapore and great help by Kuldeep Meel from NUS. The paper will be published at the DATE 2019 conference.

ANFs and CNFs

Algebraic Normal Form is a form that is used by most cryptographers to describe symmetric ciphers, hash algorithms, and lately a lot of post-quantum asymmetric ciphers. It’s a very simple notation that basically looks like this:

x1 ⊕ x2 ⊕ x3 = 0
x1 * x2 ⊕ x2 * x3 + 1 = 0

Where “⊕” represents XOR and “*” represents the AND operator. So the first line here is an XOR of binary variables x1, x2 and x3 and their XOR must be equal to 0. The second line means that “(x1 AND x2) XOR (x2 AND x3)” must be equal to 1. This normal form allows to see a bunch of interesting things. For example, it allows us to see the so-called “maximum degree” of the set of equations, where the degree is the maximum number of variables AND-ed together in one line. The above set of equations has a maximum degree of 2, as (x1*x2) is of degree 2. Degrees can often be a good indicator for the complexity of a problem.

What’s good about ANFs is that there are a number of well-known algorithms to break problems described in them. For example, one can do (re)linearization and Gauss-Jordan elimination, or one could run Grobner-basis algorithms such as F4/F5 on it. Sometimes, the ANFs can also be solved by converting them to another normal form, Conjunctive Normal Form (CNF), used by SAT solvers. The CNF normal form looks like:

x1 V x2 V x3
-x1 V x3

Where x1, x2 and x3 are binary variables, “V” is the logical OR, and each line must be equal to TRUE. Using CNF is interesting, because the solvers used to solve them, SAT solvers, typically provide a different set of trade-offs for solving than ANF problem solvers. SAT solvers tend to use more CPU time but a lot less memory, sometimes making problems viable to solve in the “real world”. Whereas sometimes breaking of a cipher is enough to be demonstrated on paper, it also happens that one wants to break a cipher in the real world.

Bridging and Simplifying

Bosphorus is I believe a first of its kind system that allows ANFs to be simplified using both CNF- and ANF-based systems. It can also convert between the two normal forms and can act both as an ANF and a CNF preprocessor, like SatELite (by Een and Biere) was for CNF. I believe this makes Bosphorus unique and also uniquely useful, especially if you are working on ANFs.

Bosphorus uses an iterative architecture that performs the following set of steps, either until it runs out of time or until fixedpoint:

  1. Replace variables and propagate constants in the ANF
  2. Run limited Extended Linarization (XL) and inject back unit and binary XORs
  3. Run limited ElimLin and inject back unit and binary XORs
  4. Convert to CNF, run a SAT solver for a limited number of conflicts and inject back unit and binary (and potentially longer) XORs

In other words, the system is an iterative simplifier/preprocessor that invokes multiple reasoning systems to try to simplify the problem as much as possible. It can outright solve the system, as most of these reasoning systems are complete, but the point is to run them only to a certain limit and inject back into the ANF the easily “digestible” information. The simplified ANF can then either be output as an ANF or a CNF.

Bosphorus can also take a CNF as input, perform the trivial transformation of it to ANF and then treat it as an ANF. This allows the CNF to be simplified using techniques previously unavailable to CNF systems, such as XL.

ANF to CNF Conversion

I personally think that ANF-to-CNF conversion is actually not that hard, and that’s why there hasn’t been too much academic effort devoted to it. However, it’s an important step without which a lot of opportunities would be missed.

The implemented system contains a pretty advanced ANF-to-CNF converter, using Karnaugh tables through Espresso, XOR cutting, monomial reuse, etc. It should give you a pretty optimal CNF for all ANFs. So you can use Bosphorus also just as an ANF-to-CNF converter, though it’s so much more.

Final Thoughts

What I find coolest about Bosphorus is that it can simplify/preprocess ANF systems so more heavyweight ANF solvers can have a go at them. Its ANF simplification is so powerful, it can even help to solve some CNFs by lifting them to ANF, running the ANF simplifiers, converting it back to CNF, and solving that(!). I believe our initial results, published in the paper, are very encouraging. Further, the system is in a ready-to-use state: there is a Docker image, the source should build without a hitch, and there is even a precompiled Linux binary. Good luck using it, and let me know how it went!

Presentation at Hackito Ergo Sum

The HES’11 event was great: I had the pleasure of listening to some awesome presentations, and to meet some great people. The most interesting presentation from a non-technical point of view was the attacks at the automount feature of Linux, which everybody thinks is completely secure, but is in fact very flawed due to some buggy rendering libraries. It’s quite interesting that almost everyone thinks that their Linux installation is secure, when in fact if Linux was mainstream, viruses would be abound — but Linux is only a minor player, so malicious software is rarely written for it.

My presentation is available here. I tried mostly to demonstrate how SAT solvers work as an element of the technique that can most amply described as:

As the graphics show, the SAT solver is in fact only one player in this environment. As it turns out, it is the very last step after obtaining the cipher, creating equations describing the cipher, and converting the ANF equations into CNF. The best way to create equations from the original cipher is to use the excellent Sage Maths library for this, a tutorial of which is here. Then, the ANF created by Sage can be transcribed into CNF using, e.g. the anf2cnf tool by Martin Albrecht and me. Finally, the CNF must be solved with a SAT solver to recover the key of the cipher. This last step can be carried out by many SAT solvers, such as lingeling or MiniSat, but I prefer CryptoMiniSat, since I am the main developer for that SAT solver, and it is also very convenient to use in this domain due to some domain-specific advantages it has over other solvers. The middle two steps of the diagram are all automated by the Grain-of-Salt tool if you don’t want to use Sage, and it also contains some example ciphers, so you don’t even have to do step no. 1 in case you wish to work on one of multiple pre-defined industrial ciphers.

In case you are interested in the visualisations I used during my presentation, here is the set of tools I used. For the 3D visualisation, I used 3Dvis by Carsten Sinz — it’s a great tool to extract some structure from problems already in CNF. In case you still have the ANF, it contains more structure, though, and so it is more interesting to look at it that way. Unfortunately, that is rarely the case for typical SAT problems, and so one must often resort back to 3Dvis. For the example search tree, I used CryptoMiniSat 1.0 and gnuplot, and for the example real-time search, I used CryptoMiniSat 2.9.0, available from the same place. Unfortunately, CryptoMiniSat 2.9.0 cannot generate a search tree yet, but this eventually will be included, with time — especially if you join the effort of developing the solver. We are always looking forward to people joining in and helping out with various issues from graph generation to algorithm performance tuning, or even just some fun research.

anf2cnf script released

I have finally managed to fix the script that converts ANF problems to CNF format in the Sage math system. The original script was having some problems that I blogged about. The new script has corrected most of the shortcomings of the original script, as well as added some textual help for the user.

For instance, the equations

sage: print two_polynoms
[x0*x1 + 1, x0*x1 + x1]

that last time required 13 clauses and 4 variables in CNF, now look like this:

sage: print anf2cnf.cnf(two_polynoms)
p cnf 3 6
c ------------------------------
c Next definition: x0*x1 + 1
3 0
c ------------------------------
c Next definition: x0*x1 + x1
3 -2 0
-3 2 0
c ------------------------------
c Next definition: monomial x0*x1
1 -3 0
2 -3 0
3 -1 -2 0

which is 1 variable and 7 clauses shorter than the original, not to mention the visually cleaner look and human-parseable output. The new script is available here. Hopefully, some of my enhancements included in the Grain-of-Salt package will be included in this script. The problem is mainly that Grain-of-Salt uses radically different data structures, and is written in a different programming language, so porting is not trivial.

anf2cnf hell in Sage

There is an ANF (Algebraic Normal Form) to CNF (Conjunctive Normal Form) converter by Martin Albrecht in Sage. Essentially, it performs the ANF to CNF conversion that I have described previously in this blog entry. Me, as unsuspecting as anyone else, have been using this for a couple of days now. It seemed to do its job. However, today, I wanted to backport some of my ideas to this converter. And then it hit me.

Let me illustrate with a short example why I think something is wrong with this converter. We will try to encode that variable 0 and variable 1 cannot both be TRUE. This is as simple as saying x0*x1 = 0 in plain old math. In Sage this is done like this:

sage: B = BooleanPolynomialRing(10,'x')
sage: load anf2cnf.py
sage: anf2cnf = ANFSatSolver(B)
sage: polynom = B.gen(0)*B.gen(1)
sage: print polynom
x0*x1

So far, so good. Let’s try to make a CNF out of this:

sage: print anf2cnf.cnf([polynom])
p cnf 4 6
2 -4 0
3 -4 0
4 -2 -3 0
1 0
4 1 0
-4 -1 0

Oooops. Why do we need 6 clauses to describe this? It can be described with exactly one:

p cnf 2 1
-1 -2

This lonely clause simply bans the solution 1 = TRUE, 2 = TRUE, which was our original aim.

Let me just mention one more thing about this converter: it repeats definitions. For example:

sage: print two_polynoms
[x1*x2 + 1, x1*x2 + x1]
sage:  print anf2cnf.cnf(two_polynoms)
p cnf 4 13
2 -4 0
3 -4 0
4 -2 -3 0
1 0
4 0
2 -4 0
3 -4 0
4 -2 -3 0
1 0
4 2 1 0
-4 -2 1 0
-4 2 -1 0
4 -2 -1 0

Notice that clause 2 -4 0 and the two following it have been repeated twice, as well as the clause setting 1 to TRUE.

I have been trying to get around these problems lately. When ready, the new script will be made available, along with some HOWTO. It will have some minor shortcomings, but already, the number of clauses in problem descriptions have dramatically dropped. For example, originally, the description of an example problem in CNF contained 221’612 clauses. After minor corrections, the same can now be described with only 122’042 clauses. This of course means faster solving, cleaner and even human-readable CNF output, etc. Fingers are crossed for an early release ;)

Truth table to ANF conversion

It is sometimes important to be able to quickly convert a binary function, described in as a truth table, to its Algebraic Normal Form (or simply, ANF). The truth table basically lists the points where a function is 1 and when it is 0. For instance, if the function has two variables, and its truth table is 0110 or, in hexadecimal form, 0x5 then the ANF of this function is simply x0 + x1, where x0 is the first variable of the function, and x1 is the second.

When it comes to more complex functions, an automatic method for this conversion is helpful. The Sage software can do exactly that:

sage: from sage.crypto.boolean_function import BooleanFunction
sage: B = BooleanFunction("0123456789ABCDEF")
sage: B.algebraic_normal_form()
x0*x1*x2 + x0*x1*x3 + x0*x1*x4 + x0*x1*x5 + x0*x2
+ x0*x3 + x1*x2 + x1*x4 + x2 + 1

Here, we used Sage to convert the function given by the truth table (in hex) 0x0123456789ABCDEF to its ANF equivalent.

I have long searched for a method for this, but at the end it was Luk Bettale who showed me the way. Many thanks to him for this!