CM Prover – Evaluation on the TPTP
The CM Prover,
embedded in the PIE system for
preprocessing, has been tested on all suitable TPTP
problems. In
different tested configurations it could, on current hardware with a
timeout of 600s, solve 1627 problems of the 2143 problems without equality
and 2955 of the 11321 problems with equality.
Table Explanation
- TPTP problems are shown ordered according to their most recent
TPTP rating value and their name.
- For each problem the most recent TPTP difficulty rating value and
its attributes according to the TPTP Specialist Problem Class
(SPC) field is shown.
- Result values are shown in seconds for the problems that
could be proved by PIE/CM in the respective configuration.
Problem Selection
- Problems were selected on the basis of the Specialist Problem
Class (SPC) field taking into account that
- CM only accepts first-order formulas and clausal formulas, in
contrast to higher-order or typed formulas,
- CM does not in general detect satisfiability,
- CM is on TPTP problems usually invoked in a goal oriented
way, that is, with respect to a distinguished theorem.
Hence, problems that according to the SPC field values are not in a
logic accepted by CM, are known to be satisfiable, or are without
distinguished theorem have not been selected.
- More precisely, all problems of the TPTP have been selected whose
set of SPC values is a superset of one of the following sets: {CNF,
UNS}, {CNF, UNK}, {CNF, OPN}, {FOL, THM}, {FOL, UNK}, {FOL, OPN}.
Those of these problems whose set of SPC values includes NEQ have
been used for the tests without equality. In the TPTP 7.1.0 these are
2143 problems. The remaining problems have been used for the tests
with equality. In the TPTP 7.1.0 these are 11321 problems.
Prover Configurations
- In all configurations proof construction was switched on,
effecting that during proving a term that represents a closed
clausal tableau is constructed. It documents the proof of the core
proving phase (but not the preprocessing) in detail. Proof terms
of CM are, for example, with suitably restricted preprocessing
sufficiently detailed to extract Craig-Lyndon interpolants.
- The configurations n1-n8 and e1-e4 are as described in the evaluation on the CASC-25
problems.
- Configuration n9 is like n6, but not with tableau height as
depth measure. It is suggested as standard configuration of the
system.
Measured Time
- Inputs were taken directly from the TPTP without any offline
preprocessing.
- The shown time is the CPU time as determined by SWI Prolog. It
includes preprocessing but not the printing of the proof
term. (For some problems the printed proof terms take several
gigabytes.)
Hardware
- The computations were performed on the Bull Cluster Taurus
at the Center for Information Services and High Performance Computing (ZIH) at
TU Dresden.
Resource Limitations
- Timeout: 600s wall clock
- Memory: 10G
Software Versions
Log Files
- The log files can be grepped for
cm_result
to extract
Prolog-readable information about the results.
- The log files include representations of the proofs as
Prolog-readable terms. (In some cases where the printed representation
takes several gigabytes, the printout might be aborted due to the
timeout.)
- Note: For the problem LCL418-1 (which could not be solved) in some
cases two log files are in the archives. It entered the list of
selected problems twice as it was classified in the TPTP 7.1.0 as
UNK as well as OPN.
Contact: info@christophwernhard.com