The Cult of Security/Privacy By Design

I have been reading a lot of privacy literature lately and the more I read, the more I got frustrated with the “Privacy by Design” motto. It somehow was not right, and made me realise that “Security by Design” motto is similarly hyped and is just as misguided. Let me explain.

Why security/privacy wasn’t baked in at the start

First of all, the basics: “X by Design” means that security/privacy is baked into the product from the inception and design through to implementation,  and all the way to operation. This of course sounds great, and one can easily see that trying to retroactively implement security or privacy features/requirements into an already existing product is a lot harder than baking it in all from the start. Where I think this starts to fall apart is that often, the reason for something being only “an afterthought” is not that it wasn’t actually considered while e.g. conceptualizing the product. It’s because it was considered but either:

  1. There weren’t any people available who could help i.e. the security/privacy personnel were busy, expensive or too complicated to acquire/hire/consult or
  2. It wasn’t considered important to involve the right people/team because the project/product was supposed to be small/unimportant and security/privacy-irrelevant but over time it grew to be important/large

In other words, it was either resource constraints or legitimate business reason not to spend resources on something (that was meant to be) small/irrelevant.

Reminiscing of the past

What I see in a lot of these “X by Design” discussions is a set of technical people reminiscing about how awesome it would have been if e.g. Google built their search engine in 1998 with security/privacy in mind. Yes, it would have been, but the engineers writing the code had no idea if it would take off and had neither the time nor the money to spend on expensive consultants and long consultation periods to bake security/privacy in. The same can be said of many other startups’ products and of most projects in large companies that grow to become products of their own.

If security and privacy engineers would have to be involved in all smaller projects then innovation would seriously suffer, because there are simply too few of us and we would  take too long to take a sufficiently good look at everything (that might not take off anyway). So, since we cannot be involved at the beginning of every project, the best we can hope for is to fix it afterwards — there is no point in reminiscing in how awesome it would have been if we were there when Twitter started in 2006. We weren’t there, it now took off and they are offering value to their customers, bringing in revenue which they are (hopefully) using to fix their security/privacy issues. I think it’s too optimistic to assume they could have afforded the delay-to-market and the price attached to doing it “right” from the start. Most probably, they couldn’t.

Living in the present

All in all, I think a most of the “X by Design” is wishful thinking with some sermons attached to it, telling long tales of how things could have been so much better, had the right security/privacy people been consulted. What these tales often forget to tell is that when the idea was born, and the project/product was created, the situation was not even remotely amenable to such an intervention. Hence, I think it’s time to let go for the most part — we should concentrate on doing what needs to be done. We should make sure we have solid processes and good, flexible patterns for fixing already existing products.

CryptoMiniSat and Parallel SAT Solving

Since CryptoMiniSat has been getting quite a number of awards with parallel SAT solving, it’s about time I talk about how it does that.There is a ton of literature on parallel SAT solving, and I unfortunately I have barely had time to read any of them. The only research within the parallel SAT solving area that I think has truly weathered the test of time is Plingeling and Treengeling — and they are really interesting to play with. The rest most likely  has some merit too, but I am usually suspect as the results are often –unintentionally — skewed to show how well the new idea performs and in the end they rarely win too many awards, especially not in the long run (this is where Plingeling and Treengeling truly shine). I personally haven’t published what I do in this scene because I have always found it to be a bit too easy and to hence have little merit for publication — but maybe one day I will.

Note: by unintentionally skewed results I mean that as you change parameters, some will inevitably be better than others because of randomness in the SAT solving. This randomness is easy to mistake for positive results. It has happened to everyone, I’m sure, including myself.

Exploiting CryptoMiniSat-specific features

CryptoMiniSat has many different inprocessing systems and many parameters to turn them on/off or to tune them. It has over 60k lines of code which allows this kind of flexibility. This is unlike the Maple*/Glucose* set of solvers, all coming from MiniSat, which basically can do one thing, and one thing only, really well. That seriously helps in the single-threaded setup, but may be an issue when it comes to multi-threading. They have (almost) no inprocessing (there is now vivification in some Maple* solvers), and no complicated preprocessing techniques other than BVE, subsumption and self-subsuming resolution. So, there is little to turn on and off, and there are very few parameters — and the few parameters that are there are all hard-coded into the solver, making them difficult to change.

CryptoMiniSat in parallel mode

To run in parallel mode, CMS takes advantage of its potential heterogeneity by running N different threads, each with radically different parameter settings, and exchanging nothing but unit and binary clauses(!) with the most rudimentary locking system. No exchange of longer clauses, no lockless exchanges, no complicated multi-lock system. One lock for unit clauses, one for binary, even for 24 threads. Is this inefficient? Yes, but it seems good enough, and I haven’t really had too many people asking for parallel performance. To illustrate, here are the parameter sets of the different threads used and here is the sharing and locking system. It’seriously simple, I suggest you take a peek, especially at the parameter sets.

Note that the literature is full of papers explaining what kind of complicated methods can be used to exchange clauses using different heuristics, with pretty graphs, complicated reasoning, etc. I have to admit that it might be useful to do that, however, just running heterogeneous solvers in parallel and exchanging unit&binary clauses performs really well. In fact, it performs so well that I never, ever, in the entire development history of 7 years of CMS, ran even one full experiment to check parallel performance. I usually concentrate on single-threaded performance because checking parallel performance is really expensive.

Checking the performance of a 24 thread setup is about 15x more expensive than the single-threaded variant. I don’t really want to burn the resources for that, as I think it’s good enough as it is. It’s mostly beating solvers with horrendously complicated systems inside them with many research papers backing them up, etc. I think the current performance is proof enough that making things complicated is not the only way to go. Maybe one day I will implement some more sophisticated clause sharing, e.g. sharing clauses that are longer than binary and then I won’t be able to claim that I am doing something quite simple. I will think about it.

Conclusions

I am kinda proud of the parallel performance of CMS as it can showcase the heterogeneity of the system and the different capabilities of the solver. It’s basically doing a form of acrobatics where the solver can behave like a very agile SAT solver with one set of parameters or like a huge monolith with another set of parameters. Since there are many different parameters, there are many different dimensions, and hence there are many orthogonal parameter sets. It’s sometimes interesting to read through the different parameter settings and wonder why one set works so much better than the other on a particular type of benchmark. Maybe there could be some value in investigating that.

CryptoMiniSat 5.6.3 Released

The latest CryptoMiniSat, version 5.6.3 has been released. This release marks the 12’000th commit to this solver that has weathered more than I originally intended it to weather. It’s been an interesting ride, and I have a lot to thank Kuldeep and NSCC‘s ASPIRE-1 cluster for this release. I have burned over 200k CPU hours to make this release, so it’s a pretty well-performing release (out-performing anything out there, by a wide margin), though I’m working very hard to make sure that neither I nor anyone else will have to burn anything close to that to make a well-performing SAT solver.

The solver has some interesting new algorithms inside, the most interesting of which is Gauss-Jordan elimination using a Simplex-like method, generously contributed by Jie-Hong Roland Jiang and Cheng-Shen Han from the National Taiwan University. This addition should finally settle the issues regarding Gaussian vs Gauss-Jordan elimination in SAT solvers. Note that to use this novel system, you must configure with “cmake -DUSE_GAUSS=ON ..” and then re-compile.

What’s also interesting is what’s not inside, though. I have been reading (maybe too much) Nassim Taleb and he is very much into via negativa. So I tried removing algorithms that have been in the solver for a while and mostly nobody would question if they are useful. In the end I removed the following algorithms from running by default, each removal leading to better solving time:

  • Regular probing. Intree probing is significantly better, so regular probing is not needed. Thanks Matti/Marijn/Armin!
  • Stamping. This was a big surprise, especially because I also had to remove caching, which is my own, crappy (“different”) version of stamping. I knew that it wasn’t always so good to have, but damn. It was a hard call, but if it’s just slowing it down, what can I do. It’s weird.
  • Burst searching. This is when I search for a short period with high randomness all over the search space. I thought it would allow me to explore the search space in places where VSIDS/Maple doesn’t. Why this is slowing the solver down so much may say more about search heuristics/variable bumping/clause bumping than anything.
  • Note that I never had blocked clause elimination. It doesn’t work well for incremental solving. In fact, it doesn’t work at all, though apparently the authors have some new work that allows it to work, super-interesting!

I’m nowadays committed to understanding this damned thing rather than adding another impossible-to-explain magic constant  to make the solving 10% faster. I think there is interesting stuff out there that could be done to make SAT solvers 10x, not 10%, faster.

IT Security Differently

Compliance and regulations are one way to achieve IT security. If one looks at industries that have been around for a very long time, and have very high stakes, for example commercial airline travel, mining, oil&gas, etc., one can find compliance and regulations everywhere. It’s how safety is managed in these environments. I have always been fascinated with safety incidents and read a lot of reports around them — these are almost always free to read and very detailed, unlike IT security incident reports. See for example the now very famous Challenger Accident Report (“For a successful technology, reality must take precedence over public relations, for nature cannot be fooled.”) or the similarly famous, and more recent AF-447 accident report. These are fascinating reads and if you are willing to read between the lines, they all talk about systems issues — not a single person making a single mistake.
Continue reading IT Security Differently

Testing and pentesting, a road to effectiveness

I have been involved in computer security and security testing for a while and I think it’s time to talk about some aspects of it that get ignored, mostly for the worse. Let me just get this out of the way: security testing (or pentesting, if you like) and testing are very closely related.

The Testing Pyramid

What’s really good about security testing being so close to testing is that you can apply the standard, well-know and widely used techniques from testing to the relatively new field of security testing. First of all, this chart:

Continue reading Testing and pentesting, a road to effectiveness