A Tale of Shift Left, Shift Right, and MEV

The NIST Cybersecurity Framework’s five Functions are: Identify, Protect, Detect, Respond, and Recover. Within this framework “shift left” means that we shift our focus towards Identify and Protect, i.e. towards fixing issues before our system is deployed into production. Say, if you have a bug in your Apple Store/Google Play app, you’d prefer fixing it before you deploy it to the internet, or, similarly you prefer fixing the bug in your cryptocurrency contract before you deploy it in the wild (i.e. “on the mainnet”). While shift left makes lots of sense (let’s prevent the issue in the first place!), in many cases, preventing all issues is likely impossible.

Shifting Left

Firstly, it is important to state that preventing all issues is often impossible because of what safety practitioners call the “blunt end”. When writing code, practitioners, i.e. developers, are at what safety literature would call the “sharp end”: they are like airplane pilots, directly at the control of their craft. While it’s easy to imagine that pilots are in “full control”, in reality, airplane pilots depend on and are affected by many things: accurate weather reports, quality and amount of training, certification bodies, proper aircraft maintenance, production pressure from corporate, etc. These are what safety literature calls the “blunt end” — things that contribute to both success and failure, but are not directly visible/obvious. When the 10th plane drops out of the sky this same year, and Boeing keeps blaming the (now dead) pilots, you know it’s the blunt end you ought to be looking at.

What this boils down to in terms of shift left is that while developers are in “full control”, in reality, there is a large blunt end that they have little to no control over, and which may set up the conditions such that failure is sometimes inevitable, even with the most careful of developers. An easy example of this is previously unknown vulnerability classes: for example unknown issues with CPU cache timing side-channels in classical IT security, or flash loan attacks for cryptocurrency contracts, which are a relatively new phenomenon, allowing anyone to instantly obtain a large capital for a fraction of time to break assumptions about a contract, which can help attackers. Another slightly less obvious example is production pressure, where competing entities are known to be working on the same idea/system and releasing first is critical for business success. It is one thing to “ask for time”, it’s another thing when you gotta release or it was all for nothing. While techniques like fuzzing, formal verification, coding best practices, etc. can help, they are neither bulletproof, nor are they reasonable to expect to be done fully for all contracts.

Notice that given the nature of cryptocurrencies where “code is the law”, it seems counter-intuitive to think that there are factors at play other than the code itself. On the surface, the rules of the game seem set and all mistakes seem to happen to/with code, hence the culprit is always clear. In other words, it seems as if there is no “blunt end”. However, as with all interesting endeavors, in my view, blunt ends are abound, even if they only manifest themselves in code in subtle ways. For example, before overflow protection was enabled by default on the most used Ethereum compiler, Solidity, overflow issues were possible (e.g. issues where 255+1=0, oops!), and could have lead to attacks. In other words, changing the compiler defaults can prevent a type of attack. Similarly, the SMT Checker, shipped as part of Solidity and Remix, can also warn developers of potential mistakes. These systems change the landscape, preventing the “sharp end”, i.e. developers, from making mistakes.

Shifting Right

The IT security sector has realized for some time now that spending all their effort on shift left, i.e. identifying issues and preventing them pre-deployment, is not adequate. With some effort, time, and money being spent on detection, adequate response, and recovery, i.e. post-deployment and hence on the “right” side, the impact of attacks or unforeseen events can be minimized. This is incredibly important, because as per above, it is often impossible to prevent all attacks. From phishing attacks, novel attack classes, persistent attackers, to production pressures, some attacks will go through, and be successful — at which point the only question is how successful will they be, i.e. will they reach all (or any) of their intended goals, or will they be stopped early, minimizing the damage caused.

There are some interesting properties of spending proper time dealing with post-production incidents. Firstly, it allows the system designers and developers to understand what attack patterns are typical against systems. In classical IT security, e.g. honeypots are often used to understand attacker behavior. These are (often under-protected) systems deployed either on the public internet, or more interestingly, on the intranet(s) of organizations, to see if (and how) attackers try to attack it. These systems can help spot (novel) attack patterns, and attackers in general. Other information-sharing systems are also in-place in the traditional IT security world: in Germany the BSI is responsible to warn large enterprises of (novel/successful) attack patterns in the wild. Within the space of cryptocurrencies, attacks tend to be on the public chain, and can be observed directly both as they happen (unless they are in a so-called private mempool, more on this later), and retrospectively, thanks to the public ledger. There is plenty to learn from how others succeed, or fail.

MEV, and On-Chain Security-as-a-Service

With all the above out of the way, let’s get to the meat of this article: how behavior patterns in the cryptocurrency world that are often cited as undesirable are actually doing a form of shift right. This inadvertent shift right is something I believe we could exploit to improve the safety of the chain overall.

Generalized frontrunning is a technique that consists of people listening to the set of sent, but unconfirmed transactions on the chain, and seeing if copying the same transaction themselves would make them money. They then submit the same transaction with a higher fee, giving more fee to the miners who confirm transactions. This leads to both transactions being executed, but theirs being executed first (and the 2nd likely failing), thereby giving them the money. Clearly, miners (who confirm transactions) could always do such frontrunning, and could do it much easier, since they decide who is going to be first, so they can make themselves first without paying a higher (or any) fee. Transaction frontrunning is slightly more complicated: it tries to both front- and back-run a transaction (also called sandwiching), manipulating the price before the transaction is executed (the “front” part) and then selling the excess (the “back” part). Generalized and transaction frontrunning attacks all fit under the umbrella of transaction reordering attacks and are also called Miner Extractable Value, or MEV for short. MEV is big business, with 8 million USD extracted in the past 30 days on the Ethereum chain, as of writing.

When you perform generalized frontrunning, you do two things: (1) you constantly check for potential value to be extracted, such an attack would do, and (2) you make sure your transaction runs first (hence the name “front-run”) that extracts the value to get the value for yourself. This clearly maps to the post-deployment part of the Cybersecurity Framework of NIST: detect & respond. In other words, it would actually make sense to constantly run a bot on-chain to detect and front run attacks against your own contracts. In a sense, the frontrunners are operating what in the traditional IT security world would be called a SOC — a Security Operations Center that is constantly on the lookout for attacks.

In this sense, MEV is not undesired, but a price to pay to have a non-stop security system. Of course, it is not a given that MEV bot operators extracting MEV will give back your money for the attack they front-ran. However, I posit that it’s much more likely that you’ll get your money back from them than the attackers. For example, at the MEV day for Devconnect, there was a discussion I participated in where a large MEV bot operator mentioned that they were willing to give back the funds of attacks they front-ran for a 10% fee. I expect that large organizations will in the future run their own MEV bots, and medium-sized ones will either simply pay the fee (10% or whatever it will be) when an attack happens, or perhaps pay a monthly fee for the protection service provided.

(Footnote: there is not only bad MEV. For example, backrunning (i.e. running after) liquidation events are incentivized to happen, and are considered normal for the system to run as designed.)

Current Solutions to MEV, and their Impact

There are two obvious ways to deal with the MEV issue: creating economic incentives to reduce their impact, and hiding pending transactions. Let’s to to get into both.

One way to deal with MEV is by not making pending transaction visible. So-called private mempools are operated by some miners (i.e. confirmers of transactions) that don’t allow the public to see the not-yet confirmed transactions. This is advantageous because it means that attackers cannot run clients to frontrun/sandwich/etc. pending transactions sent to these private mempools. In a way, private mempools are kind of a trusted environment, where you can be safe from people trying to extract MEV. However, it also means that malicious behavior submitted to these private pools cannot be front-run. Since these private pools are trust-based, it would be possible to allow some trusted parties to observe these private pools and frontrun attacks. Rather than being purely trust-based, it is possible that some sort of incentives-based system could also be set up, which seems like an open question/potential for improvement. This would be similar to what Flashbots does. Flashbots tries to use economic incentives through an open marketplace to make MEV extraction less disruptive to the ecosystem (among other benefits). However, it does not work for private mempools, since, well, they are private, and Flashbots requires publicly visible unconfirmed transactions to be able to work, due to its open marketplace concept.

MEV is not only big business, it’s also wasteful: often, the original transaction will fail, but still execute, consuming computational resources and burning money. MEV systems trying to outbid one another for the prize are effectively participating in an all-pay auction, i.e. everyone pays for the good but only one gets it. Thankfully Flashbots solves this by running an off-chain auction that is not all-pay. However, once we move from proof-of-work to proof-of-stake (i.e. to ETH 2.0), the attack transaction can be directly proposed to the next validator, skipping the public pool entirely. Essentially, the attacker can choose be in the private mempool by default (with some caveats, see Proposer-Builder Separation, and MEV-Boost), rather than by exception.

Conclusions

While Miner Extractable Value (MEV) seems on the surface to be an undesirable side-effect of the combination of current cryptocurrency protocols and the public nature of digital ledgers, it can also be seen as a feature. This feature could allow one to perform detection & response to attacks, allowing legitimate entities to mitigate the impact of attacks on contracts post-deployment. There is some evidence that some people running MEV extraction bots are already effectively providing a form of post-deployment attack-mitigation-as-a-service for a fee. However, as usual, preventative measures are both more reliable and cheaper, and should be the focus of most efforts against attacks.