The CM Prover, embedded in the PIE system for preprocessing, has been tested on the two relevant categories FNE (valid first-order formulas without equality) and FEQ (valid first-order formulas with equality) of the CASC-25, which took place in 2015.
The tests have been run in March and April 2016 on a Linux computer with i3-4150 CPU @ 3.50GHz. Virtual memory was limited to 3GB. The timeout for the FNE problems was 400s, for the FEQ problems 300s. The Prolog system used was SWI Prolog, Version 7.2.3. Inputs were taken directly from TPTP Version 6.2.0, without any offline preprocessing.
Several configurations of CM have been tried. CM-n1 to CM-n8 are configurations used for problems without equality, CM-e1 to CM-e4 configurations used with equality. The CM-ALL columns display the results that could be solved by either one of the configurations with the shortest of their running times. The following tables show these results, embedded into the results for other provers from the CASC-25 competition.
The sources of the CASC-25 results are the tables:
Conditions at the CASC-25 competition were a timeout of 300s, an E5-2609 2.40GHz CPU, and a memory limit of 32GB.
The tables below shows the essential settings of the various tested configurations of CM.
The exact configurations were chosen automatically depending on the number of clauses of the clausified problem and on whether it is Horn.
Reduction is understood in the sense of the model elimination calculus, where it is required for completeness of non-Horn formulas. For Horn formulas, by default reduction is not performed. Thus configurations with restrictions of reduction just affect non-Horn problems.
The default mode for the depth measure used at iterative deepening is a combination of the tableau height and the size of processed clauses. Only other modes are indicated in the table below.
``Dynamic depth increase'' (see CM Prover Main Page) was used in all experiments.
In the table below settings are shown in different columns, depending on whether they affect completeness. Also configurations with empty field Settings leading to incompleteness are incomplete in some cases, in particular because the configured automatic choice of the concrete settings does exclude reduction for very large non-Horn formulas.
Some information on the settings shown in the tables is provided at the CM Prover Main Page.
Configuration | Settings that do not affect completeness | Settings leading to incompleteness |
---|---|---|
CM-n1 | antilemmas, sideways lemmas | reduction followed by cut |
CM-n2 | antilemmas, sideways lemmas | sideways lemmas followed by cut |
CM-n3 | no tautology/subsumption disequalities | no reduction |
CM-n4 | antilemmas, sideways lemmas | reduction followed by cut, sideways lemmas followed by cut, timeout 2s at each depth |
CM-n5 | antilemmas, tableau height as depth measure | timeout 2s at each depth |
CM-n6 | antilemmas, tableau height as depth measure | |
CM-n7 | antilemmas | timeout 2s at each depth |
CM-n8 | number of subgoals as depth measure |
Configuration | Settings that do not affect completeness | Settings leading to incompleteness |
---|---|---|
CM-e1 | antilemmas | |
CM-e2 | sideways lemmas | reduction followed by cut |
CM-e3 | no tautology/subsumption disequalities | timeout 5s at each depth |
CM-e4 | tableau height as depth measure, sideways lemmas |
All reported experiments were run with proof construction switched
on, effecting that during proving a term representing a clausal tableaux
is constructed. After the problem has been solved, the proof is then
printed. This succeeded on all considered problems, except for four
problems from FNE (SYO525+1.015, SYO525+1.018, SYN986+1.004,
SYO525+1.021), where the write_canonical
and
writeq
predicates of SWI Prolog produced a
segmentation violation, seemingly due to the size of the proof
terms.
For this reason, recent versions of PIE includes the possibility to print proof terms with a Prolog-implemented predicate, which succeeded on the three SYO problems, where the file size with the printed proof term is up to 211MB, requiring 44s for printing (SYO525+1.021). For SYN986+1.004, the printing had been manually stopped after the file size reached 5GB.
So far, preprocessing operations are not considered in the proof representation of PIE and CM.
Contact: info@christophwernhard.com