The CM Prover, embedded in the PIE system for preprocessing, has been tested on the two relevant categories FNE (valid first-order formulas without equality) and FEQ (valid first-order formulas with equality) of the CASC-25, which took place in 2015.

The tests have been run in March and April 2016 on a Linux computer with i3-4150 CPU @ 3.50GHz. Virtual memory was limited to 3GB. The timeout for the FNE problems was 400s, for the FEQ problems 300s. The Prolog system used was SWI Prolog, Version 7.2.3. Inputs were taken directly from TPTP Version 6.2.0, without any offline preprocessing.

Several configurations of CM have been tried. *CM-n1* to
*CM-n8* are configurations used for problems without equality,
*CM-e1* to *CM-e4* configurations used with equality. The
*CM-ALL* columns display the results that could be solved by either
one of the configurations with the shortest of their running times. The
following tables show these results, embedded into the results for other
provers from the CASC-25 competition.

- Result table of CM for the 150 FNE problems (FOF theorems without equality) of CASC-25
- Result table of CM for the 250 FEQ problems (FOF theorems with equality) of CASC-25
- Logfiles

The sources of the CASC-25 results are the tables:

- CASC-25 Results for FNE (FOF Theorems without Equality)
- CASC-25 Results for FEQ (FOF Theorems with Equality)

Conditions at the CASC-25 competition were a timeout of 300s, an E5-2609 2.40GHz CPU, and a memory limit of 32GB.

The tables below shows the essential settings of the various tested configurations of CM.

The exact configurations were chosen automatically depending on the number of clauses of the clausified problem and on whether it is Horn.

*Reduction* is understood in the sense of the model elimination
calculus, where it is required for completeness of non-Horn formulas.
For Horn formulas, by default reduction is not performed. Thus
configurations with restrictions of reduction just affect non-Horn
problems.

The default mode for the depth measure used at iterative deepening is a combination of the tableau height and the size of processed clauses. Only other modes are indicated in the table below.

``Dynamic depth increase'' (see CM Prover Main Page) was used in all experiments.

In the table below settings are shown in different columns, depending
on whether they affect completeness. Also configurations with empty
field *Settings leading to incompleteness* are incomplete in some
cases, in particular because the configured automatic choice of the
concrete settings does exclude reduction for very large non-Horn
formulas.

Some information on the settings shown in the tables is provided at the CM Prover Main Page.

Configuration | Settings that do not affect completeness | Settings leading to incompleteness |
---|---|---|

CM-n1 | antilemmas, sideways lemmas | reduction followed by cut |

CM-n2 | antilemmas, sideways lemmas | sideways lemmas followed by cut |

CM-n3 | no tautology/subsumption disequalities | no reduction |

CM-n4 | antilemmas, sideways lemmas | reduction followed by cut, sideways lemmas followed by cut, timeout 2s at each depth |

CM-n5 | antilemmas, tableau height as depth measure | timeout 2s at each depth |

CM-n6 | antilemmas, tableau height as depth measure | |

CM-n7 | antilemmas | timeout 2s at each depth |

CM-n8 | number of subgoals as depth measure |

Configuration | Settings that do not affect completeness | Settings leading to incompleteness |
---|---|---|

CM-e1 | antilemmas | |

CM-e2 | sideways lemmas | reduction followed by cut |

CM-e3 | no tautology/subsumption disequalities | timeout 5s at each depth |

CM-e4 | tableau height as depth measure, sideways lemmas |

All reported experiments were run with proof construction switched
on, effecting that during proving a term representing a clausal tableaux
is constructed. After the problem has been solved, the proof is then
printed. This succeeded on all considered problems, except for four
problems from FNE (SYO525+1.015, SYO525+1.018, SYN986+1.004,
SYO525+1.021), where the `write_canonical`

and
`writeq`

predicates of SWI Prolog produced a
segmentation violation, seemingly due to the size of the proof
terms.

For this reason, recent versions of PIE includes the possibility to print proof terms with a Prolog-implemented predicate, which succeeded on the three SYO problems, where the file size with the printed proof term is up to 211MB, requiring 44s for printing (SYO525+1.021). For SYN986+1.004, the printing had been manually stopped after the file size reached 5GB.

So far, preprocessing operations are not considered in the proof representation of PIE and CM.

Contact: info@christophwernhard.com