I have been trying to debug why some MiniSat-based solvers perform better at unit propagation than CryptoMiniSat. It took me exactly 3 full days to find out and I’d like to share this tidbit of information with everyone, as I think it might be of interest and I hardly believe many understand it.
The mystery of the faster unit propagation was that I tried my best to make my solver behave exactly as MiniSat to debug the issue, but even though it was behaving almost identically, it was still slower. It made no sense and I had to dig deeper. You have to understand that both solvers use their own memory managers. In fact, CryptoMiniSat had a memory manager before MiniSat. Memory managers are used so that the clauses are put close to one another so there is a chance that they are in the same memory page, or even better, close enough for them not to waste memory in the cache. This means that a contiguous memory space is reserved where the clauses are placed.
The high-level logic when consolidating memory, i.e. removing unused clauses from this space is the following in CryptoMiniSat:
n = malloc(enough_space) start = n for offset in clauses: Clause* old_cl = base + offset copy(n, old_cl, cl->size()) old_cl->new_offset = n-start offset = n-start n += cl.size() for w in watches: w.offset = (base + w.offset)->new_offset free(base) base = start
In contrast, in MiniSat it’s:
n = malloc(enough_space) : start = n for w in watches: Clause* old_cl = base + w.offset if !old_cl->updated: copy(n, old_cl, old_cl->size()) old_cl->new_offset = n-start old_cl->updated = true w.offset = n-start n += old_cl->size() else: w.offset = old_cl->new_offset for offset in clauses: offset = (base + offset)->new_offset free(base) base = start
So the difference seemed like this: CryptoMiniSat was a bit simpler, not using the ‘updated’ flag, and MiniSat was a bit more complex. But if you look closely, you’ll discover something quite mind-bending. When I realized I had to stop working for about 10 minutes. The genius is right in there.
MiniSat puts clauses next to each other in memory that are next to each other in the watchlist. This, heuristically speaking, means that with a good chance, these clauses will be dereferenced one after the other — and one will pull in the other due to automatic pre-caching of contiguous memory blocks. Note that especially if blocking literals are used, watchlist elements tend to stay close to one another.
So that is how I wasted three days of my life. It was a fun chase, though. I wonder how many more are in there. I know of only one more, which I also found to be a lot of fun, and it’s about the position of the size of the clause in the clause data structure. But I’ll let you figure that one out.