Wow, that's kinda interesting. I somehow cannot get rid of the feeling of having a really nice hammer and then treating everything as a nail. There are other uses of deep/machine learning that could help SAT solvers.
For example, one could try to better guess the learnt clauses to keep/throw away or to restart when the search space is deemed non-interesting through prediction models built using machine learning. See (my) blogpost here: https://www.msoos.org/2018/01/predicting-clause-usefulness/ (sorry, self-promotion, but relevant)
Let's not forget the work that could be done on auto-configuring SAT solvers, tuning their configuration to the instance, as per the competition at: http://aclib.net/cssc2014/
Author here. We weren't shooting for low-hanging fruit, and we realize that we are still very far from contributing to the state-of-the-art. We tried to approach this project as scientists instead of as engineers. I personally found NeuroSAT's success to be extremely surprising, and I think it underscores how little we understand about the capabilities and limitations of neural networks. Our approach might prove to be a dead-end but I think there is some chance we have stumbled on something powerful.
I have not been too impressed with fancier SAT solvers. I'm currently drafting a blog post where I compare Minisat, Picosat, Cryptominisat, Lingeling and Glucose. (Those were all that I could easily get running on Linux. Open to more suggestions.)
The venerable Minisat was (for the most part) the fastest in my simple use case. What I did notice though was that the "smarter" solvers were more consistent. A poorly written CNF might take 50x longer than it should (when compared to N-1 and N+1 instances). Minisat (but moreso Picosat) would occasionally hit an edge case or something and slow way down. Fancier solvers produced a nice clean line on the graphs, without anomalous 50x spikes.
For smaller problems, the fancier solvers are actually worse than MiniSat/Picosat. That's because of the overhead of creating and maintaining datastructures and the fancy preprocessing. The larger, more complex solvers are meant to have a better chance to solve more complex problems and will use hybrid strategies to make sure they don't accidentally go down into some rabbit hole. So I would expect them to have better consistency.
Note that winning in the competition, until 2017, meant you solved the most instances, each with a ~5000s timeout. So if you solved every single one within 4999s you won, you were the best... This obviously encouraged incredibly long startup times that are gained back over the overall 5000s timeout. It clearly does not mimic normal use-cases.
One thing to try, especially with Glucose: turn off the preprocessing. The preprocessing seems to help with really "hard" instances like you see in SAT competitions, but takes time and doesn't really help with simpler ones.
We use SAT solvers extensively in the Phil crossword construction tool, to figure out when the grid is overconstrained, and which letters have only a single value in all possible solutions. We ended up running Glucose (compiled to wasm and running in the browser) because the simpler solvers couldn't solve many instances quickly, and with preprocessing off but most other parameters at their defaults. I didn't do a careful evaluation of all solvers, but it might be interesting.
I agree they'd be better for improving something human driven. I previously suggested plugging learning architectures provers to either spot opportunities to feed terms to automated solvers or create new tactics. Similarly, they could augment the automated solvers to improve their heuristics. Improving heuristics or models of data that changes quite a bit in different use cases seems to be the strong suite of these learning architectures. Whereas, there's already efficient algorithms for things like search or validating stuff. Might as well keep them in initial design attempts.
Improving heuristics... everybody has been doing that, but based on _very_ sparse information, mostly just running times and number of conflicts&decisions. I leads to poor understanding and slow improvement of heuristics.
I think we ought to collect a lot more data, which would allow for better heuristics. But collecting data incurs running overhead, so it must be configurable -- collect when not in a "competition mode" and analyse, then during in the competition, only collect the data really needed for the computed prediction algorithm to correctly function.
The SAT solvers related to Lingeling use kNN to "predict the origin of the benchmark" (I assume to figure out which bucket or set of SAT problems the system was given and then tune itself to the optimal settings in the bucket).
For a second I was wondering whether test prep for American college-bound high schoolers had entered an amazing AI-fueled arms race. Then I realized this was actually about the Boolean satisfiability problem.