This year I was part of the organizing team of the SAT Competition 2014, a competitive event for solvers for the Boolean Satisfiability (SAT) problem. It is organized – usually annually – as satellite event to the International Conference on Theory and Applications of Satisfiability Testing.
SAT solvers are crucial in many applications, from resolving package dependencies in Linux package managers to hardware and software verification and planning in air traffic control. Many problems can be encoded as SAT instances and solved very effectively by general purpose SAT solvers.
The competition was a huge success in terms of participation with 79 participants from all over the world and 137 submitted solvers. About 10 years worth of CPU time were spent on evaluating the solvers. The winners were awarded silver medals at the FLoC Olympic Games 2014 in Vienna.
SAT Challenge 2012, SAT Competition 2013 and now SAT Competition 2014 were conducted using EDACC, a framework for experiments with solvers on computer clusters, on which I work on as core developer. It was started as student project at Ulm University and further improved as part of various bachelor and master theses. The hardware resources for the SAT Competition 2014 were provided by the Texas Advanced Computing Center (TACC). The solvers ran on the very impressive Lonestar cluster, which consists of around 22000 Cores. Sadly you cannot allocate all of them at once but at times we had more than 1000 cores reserved 🙂
Naturally there are always unforeseen issues when running such competitions and this year was not different. The main technical issues were due to the large UNSAT proof files generated by solvers. These proof files can grow up to hundreds of GB in size and checking them for correctness often takes about as long as running the solver itself. Unfortunately, EDACC can not parallelize the proof verification yet which can leave the other cores idle while verification is running. This would be a great feature to implement in the future.
Another issue was caused by TCP connection timeouts between the computation clients running on the cluster nodes and the database server. This issue was quite hard to track down because the clients seemed to randomly stop for no reason. Such timeouts were also never an issue on another cluster before. The log files written by the client helped enormously to track down the issue and we worked around it by periodically sending keep-alive queries over the connection.
Overall, it was a lot of work but it’s great to have a university project like EDACC used on such a scale with a real purpose.