On Testing and Security Engineering

I have been working in a large organization for quite a while now where I have seen a lot of testing going on. As an information security engineer, I naturally aligned more with testing and indeed, information security assurance does align well with testing: It’s done on a continuous basis, its results usually mean work for developers, operations people, system architects, etc. and not caring about it is equivalent to accepting unknown risks. Since I have been working in an environment where testing was paramount, I have been digging more and more into the testing literature.

Testing and Information Security, Worlds Apart

What was most surprising is that security engineering is miles away separated from the testing community in general. Just listen to the largest testing conference’s keynote and be astounded: no security folks in the audience at all. As a white-hat hacker I can understand where this is coming from: information security seem to have evolved on its own with little influence from the testing crowd. When I picked my Masters to be Information Security, it was the very first year that kind of Masters was taught at my university. At the largest European security conference I go every year, the Chaos Communication Congress (in Germany), I have never met any testers. It seems that a lot of the security know-how came about from the underground, people hacking together and creating know-how from scratch. If you look at one of the main reference books, Security Engineering by Ross Anderson (first published 2001), you will scarcely find a reference to testing and the testing literature.

Commonalities and Differences

I find this strict separation eerie. First of all, there is incredible overlap. For example, if you want to properly fuzz a complicated system, say, Mozilla’s javascript engine, you need to have the component under test (in this case, SpiderMonkey) made into a module so it can be fed input directly. Such stand-alone executables are normally created by and for testers (or developers who care about testing, see TDD) so that modules can be properly tested. Without such an executable, trying to fuzz Firefox’s javascript engine would be quite futile — Firefox startup times would kill your fuzzing session’s efficiency. Similarly, protocol and deployment diagrams used for functional and performance testing and validation is extremely useful for security architecture reviews. Note that testers almost always know more about the overall system and the context it is deployed in than developers, who in turn tend to be more focused on individual subsystems.

Secondly, handling security similarly as you would high/medium/low priority issues caught by testers improves the chances that security is taken into consideration at the right level and in the right way. Just as you wouldn’t ship a game that crashes after 10 minutes of play (and you might remove the functionality that crashes it as a stopgap), you wouldn’t ship a system that has an SQL injection vulnerability in it (and you might remove the functionality that allows for that unescaped input as a stopgap). In other words, insecurity is created by some functionality — note that no functionality means perfect security — and that functionality can be changed, disabled or restricted to allow for lower risk operations. Or indeed, the risk can simply be accepted as it is sometimes accepted for functional or deployment issues. In this sense information security engineers highlight risks that management then can decide to either accept, mitigate, or eliminate. This is very similar to let’s say a functional issue that may endanger the reputation of the company — and even human life if it’s a medical or safety device — in case it is released.

Finally, it must be acknowledged that information security is different in a number of ways from normal testing. First of all, the risks are often larger and harder to control for. If you inadvertently delete your customers’ data, it’s a shame and you might loose some customers. If that same data ends up in someone else’s hands, you’ll be in for a much rougher ride. Systems typically low-risk from a functional standpoint can carry large security risks — as one can evidence time and again looking at the Full Disclosure mailing list. Similarly, testing of systems, even end-to-end, often doesn’t need to take into consideration a number of factors that security folk have to: the legal landscape, trust of third parties such as governmental organisations, incentives of both inside and outside threat agents, etc. In general, a security engineer needs to take into account a more broader context and may need to make calls on higher-than-average risk mitigation strategies. But note that risk mitigation is still the key component here and controlling for risk often requires changes that are not unlike what testers might demand of developers to fix performance, deployment, or functional issues.


Successful companies seemed to have capitalized on some of this divide. Codenomicon clearly appeals to the testing crowd with its intuitive interface and the way it can be integrated into test system automation. The same can be said of Nessus. Both companies seemed to have packaged up information security know-how into something that testers can use who are not necessarily experts in information security.

Given the divide between these two crowds I believe there is an opportunity here. Testing is a well-established area with a large knowledge base. Similarly, information security can now be considered a well-established discipline with its own certifications, knowledge base, etc. Intermingling of the two crowds and hence some of the know-how would I think serve both communities quite well. There are some obvious low-hanging fruits such as more effective fuzzing and automated test case generation. The non-obvious ones are probably more interesting and I believe will only come about once more bridges have been built.

Testing CryptoMiniSat using GoogleTest

Lately, I have been working quite hard on writing module tests for CryptoMinisat using GoogleTest. I’d like to share what I’ve learnt and what surprised me most about this exercise.

An example

First of all, let me show how a typical test looks like:

Here we are checking that intree probing finds that the set of three binary clauses cause a failure and it enqueues “-2” at top level. If one looks at it, it’s a fairly trivial test. It turns out that most are in fact, fairly trivial if the system is set up well. This test’s setup is the following test fixture:

Which hides no complexity, either. In fact, what’s surprising about this test is how easy it really is to write.

The fiddly parts

Looking at the test code, it’s easy to see if a module implements a bit too much:

This is testing the same principle, probing, but using the non-intree version. This probing type has been in CryptoMiniSat for a long while, and the feature creep shows. There is stamping, hyper-binary resolution and Stalmarck all in the module, separately controllable. This test needs none of these functionalities, and is only interested if the 1st UIP (i.e. -2) will be caught if “-1” is probed. Note that Stalmarck could set -2 by probing 3 and -3 so we need to make sure to turn it off.

What I learned

The surprising thing about this module testing exercise was that it’s much easier than I first thought. I have written about 100 tests for 11 modules in just a week. The vast majority of the tests are very simple to set up, execute and check. However, for tests to be easy to write and reliable to run, there are some prerequisites. Any time the tests were flaky, weird or plain wrong, either the module under test didn’t expose enough control or it had too many functionalities. In other words, if the module is not self-contained and manageable, it cannot be tested. What’s interesting about that is that based solely on the test code, one can discern the quality of the code being tested. That, I think, was my main takeaway.

Interestingly, I have found no actual bugs in the code. However, while trying to hunt down why a test wasn’t working, I have often found places to clean up, fix up, and generally be more sane and readable. In other words, just because the tests haven’t caught any real bugs, they weren’t useless at all. I find this fascinating — though it only takes a cursory look at the testing literature to see that there is more to testing than just quality, risk and performance metrics. It’s fun to see that first-hand, though.

Future Work

I have written tests for all the main modules of CryptoMiniSat except for the occurrence-based systems. These are up next — and these are the most hairy, and the most interdependent. For example, bounded variable elimination (BVE) calls backward-subsumption for each clause it creates. So testing BVE in isolation will either need some stubbing or more control parameters to turn off subsystems. Self-subsuming resolution is also similarly interdependent on subsumption. Handling these interdependencies in a sane way will be an fun challenge.

CryptoMiniSat: 8000 commits later

The GitHub repository for CryptoMiniSat just hit 8000 commits. To celebrate this rather weird and crazy fact, let me put together a bit of a history.

The Beginnings

CryptoMiniSat began as a way of trying to prove that a probabilistic cryptographic scheme was not possible to break using SAT solvers. This was the year 2009, and I was working in Grenoble at INRIA. It was a fun time and I was working really hard to prove what I wanted to prove, to the point that I created a probabilistic SAT solver where one could add probability weights to clauses. The propagations and conflict engine would only work if the conflict or propagation was supported by multiple clauses. Thus began a long range of my unpublished work in SAT. This SAT solver, curiously, still works and solves problems quite well. It’s a lot of fun, actually — just add some random clause into the database that bans the only correct solution and the solver will find the correct solution. A bit of a hackery, but hey, it works. Also, it’s so much faster than doing that solving with the “right” systems, it’s not even worth comparing.

The same year, my friend, who later became my boss, came around because he had the code of the Mifare protocol in his hands but he couldn’t crack the crypto. He pushed me to use the SAT solver for something practical and in just two days we did it, we modelled the cipher and could break it — but it took too much time to solve. So the rest of the year I was working on improving the solver as much as I could. He had the idea for the XOR clauses, which I implemented in one night of crazy idea-flash.

Winning the 2010 SAT Race

Then came the end of 2009 and I was bored in Paris, wanting to work on SAT solvers no matter what. There was a completely unrelated project that I had to work on for about 2 months to get a full year of salary. This was all thanks to Claude Castelluccia who pulled that project together amazingly well, and we did so well we got selected as “Project Phare” which I already forgot what it meant, but we were around top 5%. So I had 10 months to spare and I worked all weekdays and all weekends on my shiny new SAT solver which my friend, later boss, also thought would be good to call CryptoMiniSat. I picked the name, it was the EuroCrypt’09 conference and he flew in from Berlin. I wanted to give credit to the MiniSat guys, something that later nobody did.

I put the solver through a lot of stress test on Grid’5000, the French supercomputing cluster and I submitted it to the SAT Race. Winning it was something completely unanticipated. What was even more unanticipated was that everybody thought it was because the solver had XOR clause support. Of course it had nothing to do with it, and Gaussian elimination was not even compiled into the executable. It was interesting to win, but once I was back in the lab, things got a bit weird. My supervisor at the place was clearly not impressed. They kept on using MiniSat without ever asking me about SAT solvers.

The Berlin years: 2010-2014

I went to work for my friend in Berlin who I already worked with on Crypto-1. I was now working in security consultancy, but I continued to work on my SAT solver of course and submitted it to the SAT Competition of 2011. The system was exceedingly fast — it should have won, but didn’t, because I memory-outed on a number of input instances. The problem was multi-fold, but essentially, the number of inprocessing systems  grew to such a large number that making sure that none blew up was hard. Also, my testing was inadequate, something that I didn’t properly start fixing for years. I still won some prizes, including a gold medal for parallel solving, which was super-funny as all I did was add a blocking structure that shared unitary and binary clauses and that’s it. I fondly remember the description of some of the parallel solvers I beat that were so complicated even to explain that I gave up reading their descriptions. I can only wonder how much time it took to write them.

The years rolled by and I never got around to properly test or performance test the system. Whenever I submitted it, it did OK and I wasn’t too sad about it, but it never did too well. There was always something that didn’t work, that broke just the last day. Every time there was some unanticipated input with 2000 disconnected components or such. It was fun to compete, and I tried to clean up the code, but I never got to win.

The past year of CryptoMiniSat

The beginning of this year I decided to do something slightly different. It was clear that CryptoMiniSat was not winning any major prizes. At the same time, I started working at another company, Cisco, where I saw a lot of work being done on testing. This inspired me to put a lot more effort into both performance and regular testing. It’s strange how an environment influences you. I started putting in hooks and building systems around the program. I built an AWS system for performance testing. I spun up a Jenkins server to perform regular long-running tests. I added code coverage metrics. I hooked in Coverity for good. I created ways to dump internal state and check it. Some of these were already present for the SAT Race 2015, some weren’t. But I was quite content on winning a difficult prize there, the incremental track — I think that validated my approach to testing. The incremental prize was not difficult to win because it was difficult to be fast. It was difficult to win because you had to be returning the right result even though you were in the middle of the run. The race also pointed to where the bugs were still hiding: the parallel solving didn’t work because I didn’t early-exit the other threads when one thread found the satisfying solution. I never properly performance-tested the parallel version.


There aren’t many conclusions to be had. It turns out testing is hard, and performance testing is both hard and expensive. I tried to escape them both but in the end, they both bit me. I’m now concentrating on these areas. I think CryptoMiniSat has potential, it just needs to work right :)

STP and CryptoMiniSat in the competitions of 2015

Both the SAT Race and the SMT competition of 2015 are now over. The results are in and it’s time to draw some conclusions. CryptoMiniSat won the SAT Race’s incremental solving track, and the SMT solver that uses (among other solvers) CryptoMiniSat, CVC4, won the SMT competiton. These two wins are not unrelated — CryptoMiniSat is used in an incremental way by the CVC4 team. What CryptoMiniSat and STP (the other project I work on) hasn’t done is win their respective main competitions. Let me reflect on these quickly.

The SMT competition

The SMT competition’s quantifier free boolean vector track (QF_BV) was won by Boolector.  Boolector is an amazing piece of engineering and so is the SAT solver it uses, lingeling. Second was CVC using CryptoMiniSat and third was STP. The reason for STP being third was because I managed to add a bug into STP and never fuzzed it. So now fuzzing is part of the continuous integration suite. Without this bug, STP would have come pretty close to Boolector, of which I’m proud of.

In general, STP needs to be greatly overhauled. It’s a legacy codebase with global variables, weird dependencies and long list of weirdnesses. It’s also performing amazingly well, so one needs to be careful how to approach it. I have been trying to cut it up and slice it any way I can so it makes more sense to read the code and work on it, but it’s still quite hard.

The SAT competition

The SAT competition had its share of surprises. One such surprise was that lingeling didn’t win any main prizes. I wonder why it wasn’t submitted to the incremental track, for example. Another surprise was the 1st and 2nd places of two solvers by the same author, Jingchao Chen, who unfortunately I have never met or conversed with. I was also a bit disappointed that CryptoMiniSat didn’t make it too high on the list. There have been a number of issues I tried to fix, but apparently  that wasn’t enough. I will be interested in the sources of the winning solvers, see if I can learn something from them.

In general, it’s really hard to create winning SAT solvers that behave well in all circumstances because it takes massive amount of time to code them up and properly test them. Especially the latter is really complicated and writing good test harnesses takes concentrated effort. My test scripts are easily over a thousand lines of code, mostly python. The continuous integration and build systems (both necessary for timely releases) are another thousand, and my AWS scripts for performance testing is almost a thousand as well. That’s more code than some early, well-performing SAT solvers had.

Most competing SAT solvers (with the exception of lingeling) don’t have such elaborate test and release systems in place. Although maintaining such systems seems like a burden, in the long run, they pay off well. For example, the incremental track is particularly hard to compete in without proper tests and hence CryptoMiniSat had a good chance there — and I don’t see why lingeling didn’t compete, given that is must have extremely elaborate test and release systems, too.


It has been a good year for CryptoMiniSat and STP. Some of their strengths have been highlighted more than others, and progress has been made on all fronts. They might both be released as part of standard packages in Debian, Fedora and other distros. All this sounds great. I have put lots of effort into both, and some of that effort is paying off, which is good to hear.

On the CryptoMiniSat side, in the future, I need to concentrate more on experimentation so it can be as agile as some of the newly winning solvers. On the STP side, I need to put more effort into fuzzing, testing, and refactoring.

Setting up encrypted mail in Chrome and Gmail

The use of Gmail is now ubiquitous. Unfortunately, it’s easy to read email in transit and some national governments abuse their power to read email in transit. I have always been using PGP to encrypt email, and today I thought I’d put down how to communicate with me, or with your friends, using signed and encrypted mail. I think the biggest reason email encryption is not being used is because it’s hard to set up. So, here is a simple, step-by-step tutorial that is easy to follow.

Installing and creating a key

  1. Install Mailvelope . Click “add to chrome”, pop-up appears, click “add”
  2. little padlock icon appears on the top right of your Chrome
  3. Click little padlock icon, click “Options”
  4. At the bottom, click “Generate key”
  5. Fill in Name (you can put fictitious name, it’s good!), Email (the email you want to use, e.g. Jon.Doe@gmail.com), put in a password that you will remember. This password is never sent anywhere. It’s used so that when you want to read email that is encrypted to you, the encryption keys can be accessed.
  6. Click submit, wait for the generation to finish.
  7. Setup is done!

Obtaining keys of others

You need to get the keys of others so you can send encrypted email to them. Here is how.

  1. Look for your friend’s email here and copy the gibberish text that appears, e.g. mine
  2. Click the little padlock icon on the top right of your Chrome
  3. Click “setup” on list of things
  4. Paste the gibberish text in the textbox
  5. Click Import
  6. It should say “success … Mate Soos [or your friend’s fictitious name]… imported”
  7. Done!

Sending encrypted email

  1. Go to gmail.com
  2. Click compose. Fill in subject and recipient (these will not be encrypted)
  3. You will see a little notepad icon in the email text body. Click it.
  4. 1st Pop-up appears. Write your mail here! Note that these drafts are not saved, so you need to be careful
  5. Click “Encrypt”
  6. 2nd Pop-up comes up. Your own email address will be automatically added to “Encrypt for”
  7. You need to select at the top drop-down menu the destination email address (which you must have imported as per setup). Once you selected the destination,  click “Add” (don’t forget to click “Add’, it’s easy to forget)
  8. You now have 2 email addresses in the bottom list. One is yourself, one is the recipient
  9. Click OK. 2nd pop-up disappears. Your email is now encrypted in this popup.
  10. Click “Transfer”, 1st Pop-up disappears
  11. You are now in the compose window again. Don’t change anything, just click “Send”
  12. Done!

Reading encrypted email

  1. go to gmail.com
  2. click on email to read
  3. email window opens, email is yellow with a lock on top. Click!
  4. enter the password you used above
  5. decrypted email appears

Backing up your private keys

  1. In chrome, click the top right padlock
  2. Click Options
  3. Click Export
  4. Popup appears. Click Download
  5. The file “all_keys.asc” is now saved to disk
  6. Keep this file backed up. Done!

Sending your public key to others

You need to send your public key to your recipient so they can send you encrypted mail (and verify your signature). Also, it’s a good idea to put them on a public site, like your blog or on pgp.mit.edu.

  1. In chrome, click the top right padlock
  2. Click Options
  3. Pop-up appears. On the bottom, click the email address that says “primary”
  4. 2nd pop-up appears. On the top menu, click “Export”
  5. Here, make sure you have “Public” selected (it’s the default)
  6. Click download.
  7. Attach that file to an email to me or anyone you want to send you encypted mail
  8. Upload the contents of the file here

Try no to confuse the private and the public keys. Send the public key to everyone. Never send the private key to anyone, ever.

Closing thoughts

The security professionals would point out that trusting the public key is not discussed above. It’s true, it’s not discussed, and it’s not easy to know what to trust. However, the vast majority of the time, the resurrecting duckling principle will work well — it’s when you trust the first thing you see, just like the duckling trusting that the first thing it sees is its mother. It’s imperfect, but the web of trust is very unintuitive and therefore may bring less benefits than most think. A technology that is good but is unintuitive often harms more than most people would admit.

Trusting the first key will be good for most people. In case you are in danger of being seriously harmed for the information you are about to send, though, please try to make sure the key you got was legitimate, by, e.g. phoning up your recipient and confirming the key fingerprint (displayed on the setup page).

Note that in case your computer is compromised, the above will not help. They will be able to read your data and read your password as you type it. So, try not to visit dodgy sites, allow Windows and Mac to encrypt your drive, choose a strong login password,  use a password manager for all sites, and keep your Mac and Windows updated.