CM Prover – Evaluation on the CASC-25 Problems

Result Tables

The CM Prover, embedded in the PIE system for preprocessing, has been tested on the two relevant categories FNE (valid first-order formulas without equality) and FEQ (valid first-order formulas with equality) of the CASC-25, which took place in 2015.

The tests have been run in March and April 2016 on a Linux computer with i3-4150 CPU @ 3.50GHz. Virtual memory was limited to 3GB. The timeout for the FNE problems was 400s, for the FEQ problems 300s. The Prolog system used was SWI Prolog, Version 7.2.3. Inputs were taken directly from TPTP Version 6.2.0, without any offline preprocessing.

Several configurations of CM have been tried. CM-n1 to CM-n8 are configurations used for problems without equality, CM-e1 to CM-e4 configurations used with equality. The CM-ALL columns display the results that could be solved by either one of the configurations with the shortest of their running times. The following tables show these results, embedded into the results for other provers from the CASC-25 competition.

The sources of the CASC-25 results are the tables:

Conditions at the CASC-25 competition were a timeout of 300s, an E5-2609 2.40GHz CPU, and a memory limit of 32GB.

Tested Configurations

The tables below shows the essential settings of the various tested configurations of CM.

The exact configurations were chosen automatically depending on the number of clauses of the clausified problem and on whether it is Horn.

Reduction is understood in the sense of the model elimination calculus, where it is required for completeness of non-Horn formulas. For Horn formulas, by default reduction is not performed. Thus configurations with restrictions of reduction just affect non-Horn problems.

The default mode for the depth measure used at iterative deepening is a combination of the tableau height and the size of processed clauses. Only other modes are indicated in the table below.

``Dynamic depth increase'' (see CM Prover Main Page) was used in all experiments.

In the table below settings are shown in different columns, depending on whether they affect completeness. Also configurations with empty field Settings leading to incompleteness are incomplete in some cases, in particular because the configured automatic choice of the concrete settings does exclude reduction for very large non-Horn formulas.

Some information on the settings shown in the tables is provided at the CM Prover Main Page.

Configurations Used for the FNE Problems
Configuration Settings that do not affect completeness Settings leading to incompleteness
CM-n1 antilemmas, sideways lemmas reduction followed by cut
CM-n2 antilemmas, sideways lemmas sideways lemmas followed by cut
CM-n3 no tautology/subsumption disequalities no reduction
CM-n4 antilemmas, sideways lemmas reduction followed by cut, sideways lemmas followed by cut, timeout 2s at each depth
CM-n5 antilemmas, tableau height as depth measure timeout 2s at each depth
CM-n6 antilemmas, tableau height as depth measure
CM-n7antilemmastimeout 2s at each depth
CM-n8number of subgoals as depth measure

Configurations Used for the FEQ Problems
Configuration Settings that do not affect completeness Settings leading to incompleteness
CM-e2sideways lemmas reduction followed by cut
CM-e3no tautology/subsumption disequalities timeout 5s at each depth
CM-e4tableau height as depth measure, sideways lemmas

Proof Output

All reported experiments were run with proof construction switched on, effecting that during proving a term representing a clausal tableaux is constructed. After the problem has been solved, the proof is then printed. This succeeded on all considered problems, except for four problems from FNE (SYO525+1.015, SYO525+1.018, SYN986+1.004, SYO525+1.021), where the write_canonical and writeq predicates of SWI Prolog produced a segmentation violation, seemingly due to the size of the proof terms.

For this reason, recent versions of PIE includes the possibility to print proof terms with a Prolog-implemented predicate, which succeeded on the three SYO problems, where the file size with the printed proof term is up to 211MB, requiring 44s for printing (SYO525+1.021). For SYN986+1.004, the printing had been manually stopped after the file size reached 5GB.

So far, preprocessing operations are not considered in the proof representation of PIE and CM.

Experiments with YAP Prolog

YAP Prolog is known to have superior performance to SWI Prolog. Thus some experiments have been also done with YAP Prolog. Unfortunately, it seems currently too less stable for systematic use over a set of problems: The stable version 6.2.2 got stuck at some problems (e.g. CSR047+3) seemingly due to a system bug. The development version 6.3.4 succeeded on these problems, but for other ones sometimes showed different behavior on different runs. This might be related to the use of timeouts, but has not yet been fully tracked down. Nevertheless, it might be useful to use YAP Prolog when trying to prove singular difficult problems. Some hints and scripts for using CM/PIE with YAP Prolog are provided with the system.