I read this paper about most research findings being false. Given that most research papers in SAT take a sample size that is incredibly small (especially considering that it’s cheap to have large sample sizes relative to, e.g. medical trials), and the samples are very often hand-picked, it’s easy to see why this could be the case. But that article lists a number of other factors, too, and they are interesting to consider as well. Only few true innovations stick around in SAT (glues, VSIDS, UIP, restarts, etc). Most are forgotten because, frankly, they didn’t show the promise they purported to have. It’d be interesting to force authors to e.g. run their systems on much large sample sizes (e.g. 2-3000 instances from SAT competitions) with much longer timeouts (e.g. 5000s). Then those implementing SAT solvers wouldn’t have to wade through piles of articles to get to something worth implementing. One is allowed to dream.