Category Archives: Research

Research-related information

Ganakv2 Released

May 12, 2025Development, Research, SAT, Toolsmodel counting, Release, researchmsoos

Finally, after many months of waiting for our paper to be accepted (preliminary PDF here), Ganak2 is finally released (GitHub code, released binaries) and easily accessible and modifiable. I wish this release had come earlier, but double-blind requirements didn’t allow us to release the code any sooner.

Basically, Ganakv2 has been re-written in many ways. Some of the original ideas by Thurley for SharpSAT has been kept alive, but many-many things are completely different and completely re-written — well over 10’000 lines of change in total. This set of changes has allowed Ganak2 to win every available track at the Model Counting Competition of 2024. In other words, Ganak2 is incredibly fast, making it a state-of-the-art tool.

Chonological Backtracking, Enhanced SAT solving, S- and D-sets

The new Ganak has a pretty powerful SAT solver engine with its own restart strategy, polarity and variable branching strategy. This SAT solver is tightly integrated into Ganak, and reuses the same datastructures a Ganak, hence the counter/SAT transitions are smooth. This relies on the observation that once every variable in the (minimal) independent set has been assigned, there is at most one solution, so we can just run a SAT solver, there is no need to a full-blown model counter. This idea has existed at least since GPMC, but we improved on it in important ways.

A very complicated but very important aspect of Ganak2 is our integration of Chronological Backtracking which we adopted to the model counting setting. This required a lot of attention to detail, and in order to debug this, we took full advantage of the fuzzer written and maintained by Anna Latour. We further added very serious and thorough checking into Ganak2, with varying levels of internal checking and per-node checking of counts. This allows to perform effective debugging: the first node that the count is incorrect will immediately stop and error-out, along with a full human-readable debug log of what had happened. This was necessary to do, as some of the issues that Chonological Backtracking leads to are highly non-trivial, as mentioned in the paper.

Ganak2 now understands something we call a D-set, which is a set of variables that’s potentially much larger than the projection set but when branched on, the count is the same. This allows Ganak to branch on far more variables than any other current model counter. We run an algorithm that takes advantage of Padoa’s theorem to make this D-set as large as possible. Further, Ganak2 takes advantage of projection set minimization, calling he minimal set an S-set, and running a SAT solver as soon as all variables from the S-set have been decided. This allows us to run a SAT solver much earlier than any other model counter.

Component Discovery, Subsumption and Strengthening, and Counting over any Field

Since the paper, I have also added a number of pretty interesting improvements, especially related to component generation, which is now a highly cache-optimized datastructure, using time stamping, a trick I learned from Heule et al’s brilliant tree-based lookahead paper (worth a read). I also changed the hash function used to chibihash, it’s not only faster (on the small inputs we run it on) but also doesn’t rely on any specific instruction set, so it can be compiled to emscripten, and so Ganak2 can run in your browser.

There is also a pretty experimental on-the-fly subsumption and strengthening that I do. It actually rewrites the whole search tree in order to achieve this, which is a lot harder to do than I imagined. It’s the most trivial subsumption & strengthening code you can imagine, but then the rewriting is hell on earth. However, the rewriting is necessary in case we want to be able to continue counting without restarting the whole counting process.

Ganak, unlike many of its counterparts, also manages memory very tightly and tries not to die due to memory-out. The target memory usage of the cache can be given via –maxcache in MB. A cache size of 2500MB is default, which should cap out at about 4GB total memory usage. More cache should lead to better performance, but we actually won (every available track of) the Model Counting Competition 2024 with rather low memory usage, I think I restricted it to 8-12GB — other competitors ran with ~2x that.

Furthermore, Arjun, our CNF minimizer has also been tuned. While this is not specifically Ganak, it gives a nice improvement, and has some cool ideas that I am pretty happy about. In particular, it runs the whole preprocessing twice, and it incorporates Armin Biere’s CadiBack for backbone detection. I wish I could improve this system a bit, because sometimes CadiBack takes >1000s to run, but without it, it’s even slower to count in most cases. But not always, and sometimes backbone is just a waste of time.

The system also now allows to use any field, all you need to do is implement the +/-/*/div operators and the 0 and 1 constants. I have implemented integers, rationals, modulo prime counting, and counting over a the field of polynomials with rational coefficients. All you need to do is override the Field and FieldGen interfaces. I made it work for modulo prime counting in under 20 minutes, it’s like 30 lines of code.

Floating Point Numbers

Notice that floating point numbers don’t form a field, because 0.1+0.2 is not equal to 0.2+0.1, i.e. it’s not commutative. Actually, Ganak2’s weighted model counting doesn’t currently support floating point numbers, because it doesn’t need to — we won literally every available track with infinite rational numbers: 0.3 is simply interpreted as 3/10. This sounds minor, but it also means that it’s actually trivial to check if Ganak runs correctly — it simply needs to be run with a different configuration and it should produce the same solution. Notice that this is not at all true for any other weighted model counter. Literally all of them, except for Ganak, will give different solutions if ran with different seeds. Which is hilarious, since there is obviously only one correct solution. But since 0.1+0.2 is not equal to 0.2+0.1 in floating point, they can’t do anything… rational.

This whole floating point saga is actually quite annoying, as it also means we can’t check Ganak against any other counter — at least not easily. All other counters give wrong answers, given that floating point is incorrect, and so we can only compare Ganak2 to other counters if we allow a certain deviation. Which is pretty hilarious, given that counters all claim to use “infinite precision”. In fact, the only thing infinite about their results is likely the error. Since (0.1+02)-(0.2+0.1) is not 0, it is actually possible to have theoretically infinite error in case of negative weights.

Approximate Counting and Ganak

Ganak2 incorporates ApproxMC, and can seamlessly transition into counting via approximate model counting methods. To do this, simply pass the “–appmct T” flag, where T is the number of seconds after which Ganak will start counting in approximate mode. This can only be done for unweighted counting, as ApproxMC can only do unweighted counting. However, this seamless transition is very interesting to watch, and demonstrates that using a multi-modal counting framework, i.e. exact+approximate is a very viable strategy.

What happens under the hood is currently unpublished, but basically, we stop the exact counting, check how much we have counted so far, subtract that from the formula, and pass it to ApproxMC. We then get the result from ApproxMC and add the partial count that Ganak counted, and display the result to the user. So the final count is partially approximate, and partially exact, so we actually give better approximation (epsilon and delta) guarantees than what we promise.

Compiling Ganak

The new Ganak incorporates 8 libraries: GNU MP, CryptoMiniSat, Arjun, ApproxMC, SBVA, CadiCal, CadiBack, and BreakID. Furthermore, it and compiles in (optinally) BuDDy, Oracle (from Korhonen and Jarvisalo) and Flowcutter. These are all individually mentioned and printed to the console when running. However, this means that building Ganak is not trivial. Hence, with the very generous help of Noa Aarts, Ganak has a flake nix, so you can simply:

git clone https://github.com/meelgroup/ganak
cd ganak
nix-shell

And it will reproducibly build Ganak and make it available in the path. All this needs is for one to have installed Nix, which is a single-liner and works incredibly reliably. Otherwise, the Ganak repository can be cloned together with its GitHub Actions, and then each push to the repository will build Ganak2 for 4 different architectures: Linux x86&ARM, Mac x86&ARM.

Submit your Version of Ganak to the 2025 Competition

Ganak2 is highly extensible/modifiable. I strongly encourage you to extend/change/improve it and submit your improvement to the Model Counting Competition of 2025. Deadline is in 3 weeks, it’s time to get your idea into Ganak and win. I promise I will have all my changes publicly available until then, and you can literally submit the same thing I will, if you like. I’d prefer if you put in some cool change, though.

Our tools for solving, counting and sampling

February 18, 2024Development, Research, SAT, ToolsANF, CNF, model counting, samplingmsoos

This post is just a bit of a recap of what we have developed over the years as part of our toolset of SAT solvers, counters, and samplers. Many of these tools depend on each other, and have taken greatly from other tools, papers, and ideas. These dependencies are too long to list here, but the list is long, probably starting somewhere around the Greek period, and goes all the way to recent work such as SharpSAT-td or B+E. My personal work stretches back to the beginning of CryptoMiniSat in 2009, and the last addition to our list is Pepin.

Overview

Firstly when I say “we” I loosely refer to the work of my colleagues and myself, often but not always part of the research group lead by Prof Kuldeep Meel. Secondly, almost all these tools depend on CryptoMiniSat, a SAT solver that I have been writing since around 2009. This is because most of these tools use DIMACS CNF as the input format and/or make use of a SAT solver, and CryptoMiniSat is excellent at reading, transforming , and solving CNFs. Thirdly, many of these tools have python interface, some connected to PySAT. Finally, all these tools are maintained by me personally, and all have a static Linux executable as part of their release, but many have a MacOS binary, and some even a Windows binary. All of them build with open source toolchains using open source libraries, and all of them are either MIT licensed or GPL licensed. There are no stale issues in their respective GitHub repositories, and most of them are fuzzed.

CryptoMiniSat

CryptoMiniSat (research paper) our SAT solver that can solve and pre- and inprocess CNFs. It is currently approx 30k+ lines of code, with a large amount of codebase dedicated to CNF transformations, which are also called “inprocessing” steps. These transformations are accessible to the outside via an API that many of the other tools take advantage of. CryptoMiniSat used to be a state-of-the-art SAT solver, and while it’s not too shabby even now, it hasn’t had the chance to shine at a SAT competition since 2020, when it came 3rd place. It’s hard to keep SAT solver competitive, there are many aspects to such an endeavor, but mostly it’s energy and time, some of which I have lately redirected into other projects, see below. Nevertheless, it’s a cornerstone of many of our tools, and e.g. large portions of ApproxMC and Arjun are in fact implemented in CryptoMiniSat, so that improvement in one tool can benefit all other tools.

Arjun

Arjun (research paper) is our tool to make CNFs easier to count with ApproxMC, our approximate counter. Arjun takes a CNF with or without a projection set, and computes a small projection set for it. What this means is that if say the question was: “How many solutions does this CNF has if we only count solutions to be distinct over variables v4, v5, and v6?”, Arjun can compute that in fact it’s sufficient to e.g. compute the solutions over variables v4 and v5, and that will be the same as the solutions over v4, v5, and v6. This can make a huge difference for large CNFs where e.g. the original projection set can be 100k variables, but Arjun can compute a projection set sometimes as small as a few hundred. Hence, Arjun is used as a preprocessor for our model counters ApproxMC and GANAK.

ApproxMC

ApproxMC (research paper) is our probabilistically approximate model counter for CNFs. This means that when e.g. ApproxMC gives a result, it gives it in a form of “The model count is between 0.9*M and 1.1*M, with a probability of 99%, and with a probability of 1%, it can be any value”. Which is very often enough for most cases of counting, and is much easier to compute than an exact count. It counts by basically halfing the solution space K times and then counts the remaining number of solutions. Then, the count is estimated to be 2^(how many times we halved)*(how many solutions remained). This halfing is done using XOR constraints, which CryptoMiniSat is very efficient at. In fact, no other state-of-the-art SAT solver can currently perform XOR reasoning other than CryptoMiniSat.

UniGen

UniGen (research paper) is an approximate probabilistic uniform sample generator for CNFs. Basically, it generates samples that are probabilistically approximately uniform. This can be hepful for example if you want to generate test cases for a problem, and you need the samples to be almost uniform. It uses ApproxMC to first count and then the same idea as ApproxMC to sample: add as many XORs as needed to half the solution space, and then take K random elements from the remaining (small) set of solutions. These will be the samples returned. Notice that UniGen depends on ApproxMC for counting, Arjun for projection minimization, and CryptoMiniSat for the heavy-lifting of solution/UNSAT finding.

GANAK

GANAK (research paper, binary) is our probabilistic exact model counter. In other words, it returns a solution such as “This CNF has 847365 solutions, with a probability of 99.99%, and with 0.01% probability, any other value”. GANAK is based on SharpSAT and some parts of SharpSAT-td and GPMC. In its currently released form, it is in its infancy, and while usable, it needs e.g. Arjun to be ran on the CNF before, and while competitive, its ease-of-use could be improved. Vast improvements are in the works, though, and hopefully things will be better for the next Model Counting Competition.

CMSGen

CMSGen (research paper) is our fast, weighted, uniform-like sampler, which means it tries to give uniform samples the best it can, but it provides no guarantees for its correctness. While it provides no guarantees, it is surprisingly good at generating uniform samples. While these samples cannot be trusted in scenarios where the samples must be uniform, they are very effective in scenarios where a less-than-uniform sample will only degrade the performance of a system. For example, they are great at refining machine learning models, where the samples are taken uniformly at random from the area of input where the ML model performs poorly, to further train (i.e. refine) the model on inputs where it is performing poorly. Here, if the sample is not uniform, it will only slow down the learning, but not make it incorrect. However, generating provably uniform samples in such scenarios may be prohibitively expensive. CMSGen is derived from CryptoMiniSat, but does not import it as a library.

Bosphorus

Bosphorus (research paper) is our ANF solver, where ANF stands for Algebraic Normal Form. It’s a format used widely in cryptography to describe constraints over a finite field via multivariate polynomials over a the field of GF(2). Essentially, it’s equations such as “a XOR b XOR (b AND c) XOR true = false” where a,b,c are booleans. These allow some problems to be expressed in a very compact way and solving them can often be tantamount to breaking a cryptographic primitive such as a symmetric cipher. Bosphorus takes such a set of polynomials as input and either tries to simplify them via a set of inprocessing steps and SAT solving, and/or tries to solve them via translation to a SAT problem. It can output an equivalent CNF, too, that can e.g. be counted via GANAK, which will give the count of solutions to the original ANF. In this sense, Bosphorus is a bridge from ANF into our set of CNF tools above, allowing cryptographers to make use of the wide array of tools we have developed for solving, counting, and sampling CNFs.

Pepin

Pepin (research paper) is our probabilistically approximate DNF counter. DNF is basically the reverse of CNF — it’s trivial to ascertain if there is a solution, but it’s very hard to know if all solutions are present. However, it is actually extremely fast to probabilistically approximate how many solutions a DNF has. Pepin does exactly that. It’s one of the very few tools we have that doesn’t depend on CryptoMiniSat, as it deals with DNFs, and not CNFs. It basically blows all other such approximate counters out of the water, and of course its speed is basically incomparable to that of exact counters. If you need to count a DNF formula, and you don’t need an exact result, Pepin is a great tool of choice.

Conclusions

My personal philosophy has been that if a tool is not easily accessible (e.g. having to email the authors) and has no support, it essentially doesn’t exist. Hence, I try my best to keep the tools I feel responsible for accessible and well-supported. In fact, this runs so deep, that e.g. CryptoMiniSat uses the symmetry breaking tool BreakID, and so I made that tool into a robust library, which is now being packaged by Fedora, because it’s needed by CryptoMiniSat. In other words, I am pulling other people’s tools into the “maintained and supported” list of projects that I work with, because I want to make use of them (e.g. BreakID now builds on Linux, MacOS, and Windows). I did the same with e.g. the Louvain Community library, which had a few oddities/issues I wanted to fix.

Another oddity of mine is that I try my best to make our tools make sense to the user, work as intended, give meaningful (error) messages, and good help pages. For example, none of the tools I develop call subprocesses that make it hard to stop a computation, and none use a random number seed that can lead to reproducibility issues. While I am aware that working tools are sometimes less respected than a highly cited research paper, and so in some sense I am investing my time in a slightly suboptimal way, I still feel obliged to make sure the tax money spent on my academic salary gives something tangible back to the people who pay for it.

Pepin, our Probabilistic Approximate Volume Counter

September 23, 2023Research, Toolsmsoos

Let’s say you are allowed to have cubes in a 3-dimensional space that happens to be binary. In this space, x1, x2, and x3 are the axis, and for example the cube x1=0 is a cube that happens to have 4 points in it: 000, 001, 010, and 011. So far, so good. Now, let’s say we need to calculate the total volume of two cubes in this space. If they don’t intersect, it’s quite easy, we simply add their volumes. But what if they intersect? Now we need to compute their overlap, and subtract it from their sum of volumes. But there is a better way.

Probabilistic Approximate Volume Counting

Our new tool Pepin (code, paper) is based on the probabilistic approximate model counting algorithm by Meel, Vinodchandran, and Chakraborty (paper) that is so simple in principle yet so ingenious that even Donald Knuth got excited about it, recently writing a 15-page note and spending considerable amount of time on the algorithm.

So, what is this algorithm? Well, it’s so simple it’s almost funny. Basically, we keep randomly sampled points from the volume that we currently hold. At every moment we have a representative, evenly sampled set of points from the volume we currently have, and we know the sampling rate. You can think of the “sampling rate” as the approximate volume that each sample represents. So, if at any point we want to know the volume, we calculate number_of_points*(1/sampling_rate) and have the approximate volume. Kinda neat, no? So all we need is a randomly sampled set of points and a corresponding sampling rate. But how do we go about that?

Let’s say we are about to process the first cube (i.e. volume, could be any shape, actually, but cubes are simple). Our sampling rate is 1.0, we have 10 dimensions, and we have a limit of 10 samples in our bag. Let’s say the first cube is:

x1=0

Notice that this cube has 2^9 elements: 0000000000, 0000000001, … 0111111111. But that’s too many elements to hold, we only have space for 10 samples in the bag! We have to drop our sampling rate. The algorithm throws coins for each element in the volume (at first, a coin that has 1.0 probability of heads… funny coin!), and realizes that it won’t fit the sample bag. So it halves the sample rate (to 0.5) and throws the coins again. Realizes it doesn’t fit… halves the sample rate… and eventually will likely end up with a sampling rate of 1/2^6. That sample rate makes sense: it means 2^9/2^6 = 9 samples on average, which fits. Let’s say the algorithm flipped the coins, halved the sampling rate every time, and settled on a sampling rate of 1/2^6, and 9 samples.

Let’s see these samples. For readability, I’ll write the samples as 0110… which means x1=0, x2=1, x3=1, x4=0, etc.

Think of each of these points representing a volume of 2^6. The current estimate of the volume is 9*2^6 = 2^9.17 (instead of 2^9). So far so good. Now a new cube comes in. Let’s say this cube is:

x1=0, x1=1

This is the tricky part. What do we do? The algorithm is extremely simple: we throw away every sample that matches this cube, and then we sample the cube. Let’s throw away everything that matches, i.e. everything that starts with “01”:

Good. Now we sample the cube “x1=0, x1=1” with 1/2^6 probability. Time to take out our weird coin that has a 1/2^6 chance of winning! We toss this coin on every element in the cube, i.e. all 2^8 (since x0=1, x1=1 has 2^8 elements). We should get about 4 heads, but let’s say we got 3. Happens. Now, append these 3 to our neat little sample bag:

Nice. What just happened? Well, we got rid of the elements from the volume we were holding that were contained in the cube that we are processing. Then, we randomly sampled elements from the cube at our given sampling rate. If you think about it, this means that as long as we did everything uniformly at random, we are basically back to where we started. In fact, the 2nd cube could have been anything. It could have been completely outside of our current volume, or it could have partially intersected it. The game is the same. Running out of space? Just throw away each sample with probability half, and half your sampling rate! Here’s a visual representation what we would do in case the two cubes partially intersected:

As the number of samples goes over the limit in our bag, we simply throw away each sample with 0.5 probability, and halve our sampling rate. This allows us to keep a fixed maximum number of samples, regardless of the size of the volume we are holding. For our example, we decided to limit ourselves to 10 samples, and got a pretty good estimate of the volume: 9*2^6 (=~2^9.17) instead of 2^9. One thing of note. This algorithm is not only approximate in the sense that we get something close to the actual solution, like 2^9.17. It’s also probabilistic in the sense that with a low probability, we get a complete garbage value. You can reduce the probability of getting a garbage value in a number of ways, though.

This algorithm is really powerful, because you could approximately count a 100-dimensional volume with thousands of cubes using less than 1MB of memory, and actually be pretty damn accurate, with a very low probability of getting a wrong value. Unsurprisingly, this is exactly what we do in our tool, Pepin.

The Tricks of Pepin

If you think about it, Pepin does nothing but: (1) holds a bunch of samples (2) deals with some large & small numbers (2^10000 is large, and 1/2^100000 is small), and (3) deals with probabilities. For dealing with small & large numbers, we used the GNU MP Bignum library, with all the standard tricks (pre-allocating memory & constants, etc), and for probabilities we use a number of tricks — sampling the binomial distribution is easy until you have to do it with t=2^100 and p=1/2^95. While dealing with probabilities was hard from a mathematical perspective, the really fun part for me was dealing with the sampling bag.

It turns out I wasn’t the only one who got sidetracked with the sampling bag: if you take a look at Knuth’s implementation, he goes into great detail about his datastructure, the Treap (honestly, that whole note by Knuth is quite a trip, you can see how his mind is racing). Anyway, back to the sampling bag. Firstly, the bag is just a matrix (with some rows sometimes empty) so we can store it either column or row-major. This is probably the first thing that comes to everyone’s mind as a computer science 101 trick. More interestingly, if one looks at the performance bottlenecks of the algorithm, it quickly becomes apparent that: (1) generating millions of bits of randomness for all the samples is expensive, and (2) writing all those samples to memory is expensive. So, what can we do?

The cool trick we came up with is what Knuth would call “late binding”, or we can call it lazy evaluation. Let’s keep to our original example: we first had the cube “x1=1”. We ran our coin-flip technique and found that we needed 9 samples (setting the sampling rate to 1/2^6), fine. But why do we really need all the bits of the samples? We don’t need them now! We may need them later — but since it’s all random anyway, we can generate them anytime, now or later! So, let’s generate 9 samples like this:

1*********
1*********
1*********
1*********
1*********
1*********
1*********
1*********
1*********

The “*” simply means this bit needs to be selected randomly. Now the cube “x1=1, x2=0” comes, which forces our hands, we have to know what the 2nd bit is: if it’s a “1” we have to throw the sample away, if it’s a “0”, we can keep it in. So we decide this bit now (not in the past, but now, when we need it), randomly:

00********
00********
00********
00********
01********
00********
01********
01********
00********

And now we can throw away the samples that match “x1=0, x2=1”, just like before. The sample bag is now:

00********
00********
00********
00********
00********
00********

We can now sample from the cube “x1=0, x2=1” randomly — again, notice we don’t need to know what the 3rd or 4th, etc. bits are. They can be decided later, when they are needed:

00********
00********
00********
00********
00********
00********
01********
01********
01********

Nice. Notice that this leads to exactly where we would have arrived anyway from a conceptual point of view. Except we (1) don’t need to generate as many random numbers and (2) if we can fill memory with “*” faster, then we can sample much faster.

In our implementation, we store values mem-packed, with 2 bits representing 0/1/*: for us, 00=0, 01=1, 11=*. This way, we can simply memset() with 1-s and fill it all up with “*”, a common occurrence. It probably won’t surprise anyone that of course we tried using a sparse matrix representation as well. It works very well for some problems. However, it depends what the exact problem is, and the performance degrades drastically for many problems. So the system uses dense representation by default. I’d be happy to merge a pull request that flips between sparse and dense depending on some density metric.

Obviously, I could not have done any of this alone. Pepin, the tool & paper, is by Divesh Aggarwal, Sourav Chakraborty, Kuldeep S. Meel, Maciej Obremski, and myself. Honestly speaking, it was wonderful to work together with all these amazing people.

Performance

At this stage, it should be obvious that it doesn’t even make sense to compare this tool to exact methods. The difference is mind-boggling. It’d be like racing a fighter jet against a rabbit. It’s a lot more fun to compare against other, existing approximate volume counting tools.

Above is a set of graphs of how Pepin performs against other approximate volume counting tools. Basically, it’s either way faster (easy 1000x speedup), or it’s a lot faster. Graph (c) is a bit misleading: for completeness, we included DNFApproxMC (PDF), which performs very poorly on these problems, and so Pepin seems to perform “as well as the others”. But, on closer examination, it’s about 3x faster than any other.

Potential Future Work

As mentioned above, a pretty straightforward (but not trivial) improvement would be to automatically switch to a sparse matrix representation. This would be akin to “making the horse run faster”, rather than inventing the steam engine, but hey, if it works, it works. Building a steam engine would be more like putting the whole algorithm into GPGPU and/or parallelizing it. It should be possible to rewrite this algorithm in a dynamic programming way, as it should be possible to combine sample bags and sampling probabilities (maybe not, I’m just an engineer). Then you can do divide-and-conquer. If you do that over a GPGPU that has 1000+ streaming cores, it could be possible to make this whole thing run 100x+ faster.

As engineers we like to over-engineer for performance, so it’s important to keep in mind that we are already hundreds of times faster than exact algorithms. Hence, perhaps it’d be more interesting to come up with interesting use-cases, rather than focusing on further improving speed. To keep with the horse analogy, it may be useful not to put the cart in front of the horse.

Closing Thoughts

Approximate volume counting is actually really cool. It takes the power of randomization and uses it to its own advantage to make something really difficult into something that one could explain a child. I have a feeling we could make a few beautiful graphics and teach this algorithm to 12 year olds. The sample-in-a-bag idea is so simple, yet so powerful. Actually, it’s also incredibly weird if you start going into the weeds of it. Like, what happens when the size of your bag is 1? It’s the kind of question that only Knuth would ask, and then answer with clarity and prowess that only one with deep mathematical insight can. I won’t even entertain the thought, but you can, if you read his notes and then work on his questions.

PS: Pepin was named after the rather eccentric character of the same name from Bohumil Hrabal‘s book Cutting It Short, also released as a film. Pepin in the book was inspired by Hrabal’s own uncle who came to visit his hometown for two weeks but stayed for 40 years. I think we have all been there.

A Tale of Shift Left, Shift Right, and MEV

May 13, 2022Research, Thoughts on SecurityAccident Investigations, ETH, MEV, Sharp Endmsoos

The NIST Cybersecurity Framework’s five Functions are: Identify, Protect, Detect, Respond, and Recover. Within this framework “shift left” means that we shift our focus towards Identify and Protect, i.e. towards fixing issues before our system is deployed into production. Say, if you have a bug in your Apple Store/Google Play app, you’d prefer fixing it before you deploy it to the internet, or, similarly you prefer fixing the bug in your cryptocurrency contract before you deploy it in the wild (i.e. “on the mainnet”). While shift left makes lots of sense (let’s prevent the issue in the first place!), in many cases, preventing all issues is likely impossible.

Shifting Left

Firstly, it is important to state that preventing all issues is often impossible because of what safety practitioners call the “blunt end”. When writing code, practitioners, i.e. developers, are at what safety literature would call the “sharp end”: they are like airplane pilots, directly at the control of their craft. While it’s easy to imagine that pilots are in “full control”, in reality, airplane pilots depend on and are affected by many things: accurate weather reports, quality and amount of training, certification bodies, proper aircraft maintenance, production pressure from corporate, etc. These are what safety literature calls the “blunt end” — things that contribute to both success and failure, but are not directly visible/obvious. When the 10th plane drops out of the sky this same year, and Boeing keeps blaming the (now dead) pilots, you know it’s the blunt end you ought to be looking at.

What this boils down to in terms of shift left is that while developers are in “full control”, in reality, there is a large blunt end that they have little to no control over, and which may set up the conditions such that failure is sometimes inevitable, even with the most careful of developers. An easy example of this is previously unknown vulnerability classes: for example unknown issues with CPU cache timing side-channels in classical IT security, or flash loan attacks for cryptocurrency contracts, which are a relatively new phenomenon, allowing anyone to instantly obtain a large capital for a fraction of time to break assumptions about a contract, which can help attackers. Another slightly less obvious example is production pressure, where competing entities are known to be working on the same idea/system and releasing first is critical for business success. It is one thing to “ask for time”, it’s another thing when you gotta release or it was all for nothing. While techniques like fuzzing, formal verification, coding best practices, etc. can help, they are neither bulletproof, nor are they reasonable to expect to be done fully for all contracts.

Notice that given the nature of cryptocurrencies where “code is the law”, it seems counter-intuitive to think that there are factors at play other than the code itself. On the surface, the rules of the game seem set and all mistakes seem to happen to/with code, hence the culprit is always clear. In other words, it seems as if there is no “blunt end”. However, as with all interesting endeavors, in my view, blunt ends are abound, even if they only manifest themselves in code in subtle ways. For example, before overflow protection was enabled by default on the most used Ethereum compiler, Solidity, overflow issues were possible (e.g. issues where 255+1=0, oops!), and could have lead to attacks. In other words, changing the compiler defaults can prevent a type of attack. Similarly, the SMT Checker, shipped as part of Solidity and Remix, can also warn developers of potential mistakes. These systems change the landscape, preventing the “sharp end”, i.e. developers, from making mistakes.

Shifting Right

The IT security sector has realized for some time now that spending all their effort on shift left, i.e. identifying issues and preventing them pre-deployment, is not adequate. With some effort, time, and money being spent on detection, adequate response, and recovery, i.e. post-deployment and hence on the “right” side, the impact of attacks or unforeseen events can be minimized. This is incredibly important, because as per above, it is often impossible to prevent all attacks. From phishing attacks, novel attack classes, persistent attackers, to production pressures, some attacks will go through, and be successful — at which point the only question is how successful will they be, i.e. will they reach all (or any) of their intended goals, or will they be stopped early, minimizing the damage caused.

There are some interesting properties of spending proper time dealing with post-production incidents. Firstly, it allows the system designers and developers to understand what attack patterns are typical against systems. In classical IT security, e.g. honeypots are often used to understand attacker behavior. These are (often under-protected) systems deployed either on the public internet, or more interestingly, on the intranet(s) of organizations, to see if (and how) attackers try to attack it. These systems can help spot (novel) attack patterns, and attackers in general. Other information-sharing systems are also in-place in the traditional IT security world: in Germany the BSI is responsible to warn large enterprises of (novel/successful) attack patterns in the wild. Within the space of cryptocurrencies, attacks tend to be on the public chain, and can be observed directly both as they happen (unless they are in a so-called private mempool, more on this later), and retrospectively, thanks to the public ledger. There is plenty to learn from how others succeed, or fail.

MEV, and On-Chain Security-as-a-Service

With all the above out of the way, let’s get to the meat of this article: how behavior patterns in the cryptocurrency world that are often cited as undesirable are actually doing a form of shift right. This inadvertent shift right is something I believe we could exploit to improve the safety of the chain overall.

Generalized frontrunning is a technique that consists of people listening to the set of sent, but unconfirmed transactions on the chain, and seeing if copying the same transaction themselves would make them money. They then submit the same transaction with a higher fee, giving more fee to the miners who confirm transactions. This leads to both transactions being executed, but theirs being executed first (and the 2nd likely failing), thereby giving them the money. Clearly, miners (who confirm transactions) could always do such frontrunning, and could do it much easier, since they decide who is going to be first, so they can make themselves first without paying a higher (or any) fee. Transaction frontrunning is slightly more complicated: it tries to both front- and back-run a transaction (also called sandwiching), manipulating the price before the transaction is executed (the “front” part) and then selling the excess (the “back” part). Generalized and transaction frontrunning attacks all fit under the umbrella of transaction reordering attacks and are also called Miner Extractable Value, or MEV for short. MEV is big business, with 8 million USD extracted in the past 30 days on the Ethereum chain, as of writing.

When you perform generalized frontrunning, you do two things: (1) you constantly check for potential value to be extracted, such an attack would do, and (2) you make sure your transaction runs first (hence the name “front-run”) that extracts the value to get the value for yourself. This clearly maps to the post-deployment part of the Cybersecurity Framework of NIST: detect & respond. In other words, it would actually make sense to constantly run a bot on-chain to detect and front run attacks against your own contracts. In a sense, the frontrunners are operating what in the traditional IT security world would be called a SOC — a Security Operations Center that is constantly on the lookout for attacks.

In this sense, MEV is not undesired, but a price to pay to have a non-stop security system. Of course, it is not a given that MEV bot operators extracting MEV will give back your money for the attack they front-ran. However, I posit that it’s much more likely that you’ll get your money back from them than the attackers. For example, at the MEV day for Devconnect, there was a discussion I participated in where a large MEV bot operator mentioned that they were willing to give back the funds of attacks they front-ran for a 10% fee. I expect that large organizations will in the future run their own MEV bots, and medium-sized ones will either simply pay the fee (10% or whatever it will be) when an attack happens, or perhaps pay a monthly fee for the protection service provided.

(Footnote: there is not only bad MEV. For example, backrunning (i.e. running after) liquidation events are incentivized to happen, and are considered normal for the system to run as designed.)

Current Solutions to MEV, and their Impact

There are two obvious ways to deal with the MEV issue: creating economic incentives to reduce their impact, and hiding pending transactions. Let’s to to get into both.

One way to deal with MEV is by not making pending transaction visible. So-called private mempools are operated by some miners (i.e. confirmers of transactions) that don’t allow the public to see the not-yet confirmed transactions. This is advantageous because it means that attackers cannot run clients to frontrun/sandwich/etc. pending transactions sent to these private mempools. In a way, private mempools are kind of a trusted environment, where you can be safe from people trying to extract MEV. However, it also means that malicious behavior submitted to these private pools cannot be front-run. Since these private pools are trust-based, it would be possible to allow some trusted parties to observe these private pools and frontrun attacks. Rather than being purely trust-based, it is possible that some sort of incentives-based system could also be set up, which seems like an open question/potential for improvement. This would be similar to what Flashbots does. Flashbots tries to use economic incentives through an open marketplace to make MEV extraction less disruptive to the ecosystem (among other benefits). However, it does not work for private mempools, since, well, they are private, and Flashbots requires publicly visible unconfirmed transactions to be able to work, due to its open marketplace concept.

MEV is not only big business, it’s also wasteful: often, the original transaction will fail, but still execute, consuming computational resources and burning money. MEV systems trying to outbid one another for the prize are effectively participating in an all-pay auction, i.e. everyone pays for the good but only one gets it. Thankfully Flashbots solves this by running an off-chain auction that is not all-pay. However, once we move from proof-of-work to proof-of-stake (i.e. to ETH 2.0), the attack transaction can be directly proposed to the next validator, skipping the public pool entirely. Essentially, the attacker can choose be in the private mempool by default (with some caveats, see Proposer-Builder Separation, and MEV-Boost), rather than by exception.

Conclusions

While Miner Extractable Value (MEV) seems on the surface to be an undesirable side-effect of the combination of current cryptocurrency protocols and the public nature of digital ledgers, it can also be seen as a feature. This feature could allow one to perform detection & response to attacks, allowing legitimate entities to mitigate the impact of attacks on contracts post-deployment. There is some evidence that some people running MEV extraction bots are already effectively providing a form of post-deployment attack-mitigation-as-a-service for a fee. However, as usual, preventative measures are both more reliable and cheaper, and should be the focus of most efforts against attacks.

CMSGen, a Fast Uniform-Like Sampler

September 18, 2021Research, SAT, ToolsRelease, samplingmsoos

Uniform sampling is a problem where you are given a solution space and you have to present solutions uniformly, at random. In some cases, this is quite simple, say, for the lotto. Just pick 5 random numbers from a box and we are done! For the lotto the solution space is very easy to generate. However, when there are constraints on the solution space, it starts to get tricky.

Let’s say that I have a function I want to test, but the input to the function has some real-world constraints like e.g. the 1st parameter must be larger than the second, the 2nd parameter must be divisible by the 3rd etc. If I want to test that this function operates correctly, one way to do it is to generate 100 uniformly random inputs that don’t violate any of the constraints, run the function, and see if all is OK. For this, I need a fast way of generating uniform samples given the constraints on the solution space.

Sampler speed vs. accuracy

There have been many samplers proposed in the literature. I personally have worked on one called UniGen, a guaranteed approximate probabilistic sampler, meaning that it’ll give approximately uniform samples most of the time, and we have a proof to back this up. It’s a great sampler and will work very fast on many instances. However, for really complex solution spaces, it can have trouble. Say, you want to generate interesting test inputs for your deep learning algorithm. Deep neural networks tend to be extremely complex when translated to binary constraints, so UniGen will likely not be fast enough. It would give very good quality samples (i.e. properly uniform samples), but if it’s too slow, we may want to exchange quality of samples for speed of generation.

There are two well-known samplers that are supposed to generate uniform samples on complex solution spaces, QuickSampler (code), and STS (code), but give no guarantees, let’s call these “uniform-like” samplers. Unfortunately, the paper by Chakraborty et al and its resulting code Barbarik showed that these uniform-like samplers are highly non-uniform. Barbarik is a pretty neat idea that basically constructs solution spaces with known solution distributions and then asks the sampler to generate uniform samples. Knowing the solution space, Barbarik can then verify how non-uniform the sampler is. Imagine having a box with 1000 balls, half of them blue and the other half green. Now if I ask the sampler to give me 50 balls at random, and all of them are green, I’d be a bit surprised to say the least. It’d be like the 5-lottery having the same numbers 3 times in a row. Possible, but… not very likely. If I do this experiment 100 times, and I always get 50 green balls, it’s fair to conclude that the sampler is not uniform.

Our new uniform-like sampler, CMSGen

Given an effective tester, Barbarik, we (Priyanka Golia, Sourav Chakraborty, Kuldeep S. Meel, and myself) thought perhaps we can follow the nowadays very successful test-driven development (TDD) methodology. All we have to do is to make our sampler pass the test of Barbarik, while being at least as fast as STS/QuickSampler, and we’ll be good to go. In fact, given Barbarik, it only took about a week of playing around with CryptoMiniSat’s options to beat both STS and QuickSampler in both accuracy and speed. This speaks volumes to how important it is to have a robust, reliable, and fast testing framework that can give immediate feedback about the quality of samples generated.

Our new uniform-like sampler, based on CryptoMiniSat, is called CMSGen (research paper here), and effectively takes CyrptoMiniSat and applies the following set of changes, through pre-set command line options:

Pick polarities at random. Normally, SAT solvers use polarity caching scheme, but of course we want uniform samples over all the search space, so we need to pick polarities at random.
Branch on variables at random. Normally, SAT solver branch on variables that will most likely lead to a conflict to maximize search efficiency (the VSIDS heuristic). However, we want to explore the solution space as evenly as possible, and so we want to approach the solution space from as many angles as possible. If you think about the search space as an N-dimensional binary cube, then we are trying to approach this cube as any ways as possible.
Turn off all pre- and inprocessing. Pre and inprocessing in SAT solvers are used to minimize the instance, transforming it into something easier to solve, e.g. through Bounded Variable Elimination. We later reconstruct a viable solution based on the solution to the transformed instance. However, the transformed instance may (and often will!) have a very different solution space. We cannot have that, so we must turn this off. To be fair, some pre- and inprocessing could be left intact, e.g. subsumption and self-subsuming elimination, perhaps a future paper :)
Restart at static intervals. Restarts are nowadays often dynamic in modern SAT solvers, or even if not dynamic, then follow a non-regular pattern. However, that could disturb how we find solution. Imagine, let’s say that solutions with variable A set to TRUE are very easy to find, but solutions with FALSE are very hard to find. What will happen? Well, in restarts where A was randomly set to TRUE, we’ll always quickly find a solution and output it. But for restarts when A was randomly set to FALSE, the system would struggle to find a solution, and after some conflicts, it will simply restart into a status where hopefully A is set to TRUE, and it can find a solution again. It is quite clear to see that this will lead to serious issues with sampling quality. Hence, we set an adjustable but static restart interval of 100 conflicts, with higher values typically leading to more uniform samples.

Performance of CMSGen

Performance of the system is on the ridiculous scale in comparison with other samplers:

When it comes to 2-wise coverage, i.e. the quality of samples, the data speaks for itself (note, UniGen is missing here because it was too slow):

Note that between STS and QuickSampler, STS is both the more uniform sampler, but also the slower one. CMSGen overcomes this limitation: it’s both faster than QuickSampler, and more uniform than STS.

And of course, the Barbarik tester gives “Accept” on CMSGen much more often than on STS or QuickSampler:

Conclusion

If you need non-guaranteed uniform but fast sampling, I’d go and try out CMSGen. It’s really a completely different beast. It’s not a guaranteed uniform sampler, but it’s incredibly effective. In fact, it’s so effective and works so well, it took me a full year to figure out how best to generate problems for it where it wouldn’t be uniform. But that’s another paper, and another blog post! In the meantime, the sampler is here, go check it out!

Wonderings of a SAT geek

A blog about SAT solving and cryptography

Category Archives: Research

Ganakv2 Released

Chonological Backtracking, Enhanced SAT solving, S- and D-sets

Component Discovery, Subsumption and Strengthening, and Counting over any Field

Floating Point Numbers

Approximate Counting and Ganak

Compiling Ganak

Submit your Version of Ganak to the 2025 Competition

Our tools for solving, counting and sampling

Overview

CryptoMiniSat

Arjun

ApproxMC

UniGen

GANAK

CMSGen

Bosphorus

Pepin

Conclusions

Pepin, our Probabilistic Approximate Volume Counter

Probabilistic Approximate Volume Counting

The Tricks of Pepin

Performance

Potential Future Work

Closing Thoughts

A Tale of Shift Left, Shift Right, and MEV

Shifting Left

Shifting Right

MEV, and On-Chain Security-as-a-Service

Current Solutions to MEV, and their Impact

Conclusions

CMSGen, a Fast Uniform-Like Sampler

Sampler speed vs. accuracy

Our new uniform-like sampler, CMSGen

Performance of CMSGen

Conclusion