Predicting Clause Usefulness

In this post, I will be taking a shot at building prediction models for learnt clause usefulness by running over 150 unsatisfiable CNF problems, extracting over 140 features for each learnt clause, and then calculating whether the clause was used in the UNSAT proof. Here I need to thank Marijn Heule who has helped me adding a clause ID to each clause in his DRAT checker so I could perform supervised learning on the collected data. Thanks Marijn!

The first works on machine learning in SAT has been by the crowd creating portifolio solvers such as SATZilla that calculate features of a CNF and then run the SAT solver that is predicted to solve that instance the fastest. This has been achieved by coming up with a set of easy-to-calculate but useful features and then using a standard prediction method to fit SAT solvers to CNFs. The issue, as always with machine learning, is overfitting. Unfortunately, CNFs tended to stay very similar between old SAT Competitions, and overfitting was richly rewarded. Hence, these portfolio solvers sometimes did well in competitions but sometimes relatively poorly in practice.

The second wave of improvements came with MapleSAT where a multi-armed bandit (MAB) framework was used to sometimes to pick branching variables. This won the 2016 SAT Competition‘s Main Track and was a novel and truly interesting idea, manipulating the branching heuristic based on the feature that the researchers called “learning rate“.

With this blog post, I hope to inspire a third wave of improvements in SAT solver performance. Learnt clause usefulness metrics have been a a very hot potato since the glucose SAT solver which engineered and used a new feature, glues, to decide the usefulness on. Maybe with the work and data provided below, we could use features not used before, or combine them in novel ways to achieve better prediction accuracy.
Continue reading

IT Security Differently

Compliance and regulations are one way to achieve IT security. If one looks at industries that have been around for a very long time, and have very high stakes, for example commercial airline travel, mining, oil&gas, etc., one can find compliance and regulations everywhere. It’s how safety is managed in these environments. I have always been fascinated with safety incidents and read a lot of reports around them — these are almost always free to read and very detailed, unlike IT security incident reports. See for example the now very famous Challenger Accident Report (“For a successful technology, reality must take precedence over public relations, for nature cannot be fooled.”) or the similarly famous, and more recent AF-447 accident report. These are fascinating reads and if you are willing to read between the lines, they all talk about systems issues — not a single person making a single mistake.
Continue reading

Testing and pentesting, a road to effectiveness

I have been involved in computer security and security testing for a while and I think it’s time to talk about some aspects of it that get ignored, mostly for the worse. Let me just get this out of the way: security testing (or pentesting, if you like) and testing are very closely related.

The Testing Pyramid

What’s really good about security testing being so close to testing is that you can apply the standard, well-know and widely used techniques from testing to the relatively new field of security testing. First of all, this chart:

Continue reading

Non-reproducible results with MapleCOMSPS

I’ve been reading through the source code of the 2016 SAT Competition Main Track winner, MapleCOMSPS_DRUP, and I found that it has an important piece of code, regulating its behaviour that depends on timing:

Continue reading

Why Most Published Research Findings Are False

I read this paper about most research findings being false. Given that most research papers in SAT take a sample size that is incredibly small (especially considering that it’s cheap to have large sample sizes relative to, e.g. medical trials), and the samples are very often hand-picked, it’s easy to see why this could be the case. But that article lists a number of other factors, too, and they are interesting to consider as well. Only few true innovations stick around in SAT (glues, VSIDS, UIP, restarts, etc). Most are forgotten because, frankly, they didn’t show the promise they purported to have. It’d be interesting to force authors to e.g. run their systems on much large sample sizes (e.g. 2-3000 instances from SAT competitions) with much longer timeouts (e.g. 5000s). Then those implementing SAT solvers wouldn’t have to wade through piles of articles to get to something worth implementing. One is allowed to dream.