CD Tools main page

# FragSeq Data: Formula Instances in Proof Fragments

Draft version, Sep 4, 2022

## Outline

The provided data are extracted from CD proofs as basis for experiments with other systems and techniques, for example, machine learning or computing tree edit distance measures. A particular working thesis to be investigated with such techniques is formulated at the end of this outline, after the necessary concepts have been introduced.

We call the format of the provided data FragSeq, suggesting sequences of proof fragments.

The considered data are composed of pieces that are extracted individually from proofs. Hence we outline their content in the following from the perspective of a single given proof, that is, a D-term that proves a given goal from given axioms. As a particular example we consider a proof of problem LCL044-1. Its axioms are, in Prolog notation:

 1 i(A,i(n(A),B)) 2 i(A,i(B,A)) 3 i(i(A,i(B,C)),i(B,i(A,C))) 4 i(i(A,B),i(i(C,A),i(C,B))) 5 i(i(A,B),i(i(n(A),B),B))

Its goal is:

 i(a,n(n(a)))

### Open Nodes, Sequences of Proof Fragments (Frags) and Frontiers

We generalize D-terms to permit, in addition to leaf nodes representing an axiom, also leaf nodes considered as open, which we label by 0.

We ascribe to the proof a sequence of construction steps that lead from a one-node tree consisting of a single open leaf to the proof, where in each step an open leaf is replaced either by an axiom leaf or by an inner node with two open children. We call the D-terms appearing in this sequence, where all but the last have at least one open leaf, proof fragments, or briefly frags.

We assume a specific sequence of construction steps, characterized by the replaced node in each step the leaf that appears first in depth-first left-to-right order. (There is actually a vast number of ways to ascribe a sequence of construction steps to a proof tree. For an example tree of size 7 more than 2 million such sequences are possible, such that we defer the consideration of sequences characterized in alternate ways to future work).

The following image shows the D-terms of the proof sequence for the considered proof of LCL044-1, which appears as last member, step 8, in the sequence. The IPT (in-place theorem) of a node of a D-term with respect to a goal and axioms is its conclusion formula under the most general substitution that takes account of the goal instantiation at the root of the D-term, of copies of axioms at leaves that are not open, and of the relationships induced by detachment.

The frontier of a proof fragment is the sequence of IPTs at its open leafs (in order as these appear depth-first left-to-right) with respect to the overall proof goal and the given axioms.

The following table shows for each proof fragment in the sequence the operation that was applied to close the first open leaf of its predecessor and the resulting frontier. The operations are as follows: d for replacement by an inner node with two open successors; axiom(N) for replacement by a leaf representing axiom N; or the special value init for the first member of the sequence.

 1 init [i(a,n(n(a)))] 2 d [i(A,i(a,n(n(a)))), A] 3 d [i(B,i(A,i(a,n(n(a))))), B, A] 4 axiom(5) [i(C,i(a,n(n(a)))), i(n(C),i(a,n(n(a))))] 5 d [i(D,i(C,i(a,n(n(a))))), D, i(n(C),i(a,n(n(a))))] 6 axiom(3) [i(a,i(C,n(n(a)))), i(n(C),i(a,n(n(a))))] 7 axiom(1) [i(n(n(a)),i(a,n(n(a))))] 8 axiom(2) []

The construction steps can also be viewed as operations that are directly performed on frontiers. For example, viewed in this way, the frontier of step 3 in the above sequence is obtained from that of 2 by replacing the first member with i(B,Y) and B, where B is a fresh variable and Y is bound to the replaced first member. The frontier of step 4 is obtained from that of 3 by applying the MGU of its first member with a fresh copy of axiom 5 and then removing this first member.

### Variable Scopes

The scope of variables occurring in the formulas associated with a D-term cover the whole D-term (for tableaux, such variables are known as rigid). Hence, a variable may occur in different members of a frontier, expressing constraints between these.

In addition, although different frontiers represent different stages of the proof construction, we use here global variable names to display them, which facilitates to track the changes from one stage to the next.

### Negative Examples: Nonfrags

Our data sets do not just represent the sequence of frontiers that leads to the proof, but also negative examples, data for steps that do not lead to the proof. We represent these as a special kind of partial proofs, which we call nonfrags, and also provide their frontiers. A nonfrag is obtained from a frag, an actual partial proof in the sequence of frags that leads to the full proof, by a single step with an operation other than that used to obtain the successor of the frag in the sequence. An additional prerequisite for the nonfrags is that they have a defined most general theorem with respect to the overall goal, and thus have a defined frontier. As an example, consider step 3 of our example. Its successor, step 4 is obtained from 3 by the axiom(5) operation. Hence, the "other" possible steps would be d, axiom(1), axiom(2), axiom(3), axiom(4). For each of these other steps, the data set contains a nonfrag with the frag of 3 as parent. Their respective frontiers are:

 [i(C,i(B,i(A,i(a,n(n(a)))))), C, B, A] [B, n(B)] [i(a,n(n(a))), A] [i(a,i(A,n(n(a)))), A] [i(C,n(n(a))), i(a,C)]

### Working Thesis

A particular working thesis to be addressed with the data is the conjecture that there are general characteristics of and relationships between frontiers and also the axioms that help to distinguish between the step that leads from a proof fragment to its proper successor and the misdirected steps leading to a "nonfrag". If such a distinction is possible, it may be applied to guide a theorem prover.

## Documentation, Tools and Data

• fragseq_doc.txt. Documentation of the data format FragSeq.
• outputs_of_show_frontiers_LCL044-1.txt, outputs_of_show_frontiers_LCL082-1.txt. Outputs of the show_frontiers program for example proofs, which indicate frontiers of frags and nonfrags in different human-readable forms.
• show_frontiers.pl. A small program to extract and print out information about the frontiers in human-readable form from the data.
• fragseq_triples_ccs_sgcd.pl.gz. FragSeq data for proofs of 176 CD problems in the TPTP, obtained by CCS and SGCD. (gzipped to 6 MB; expands into 71 MB; 797,646 facts, 176 proofs; 34,894 frags; 68,949 nonfrags). Created from the proof collection results_ccs_sgcd.pl.
• fragseq_triples_ccs_sgcd_LCL082-1.pl. Brief excerpt from fragseq_triples_ccs_sgcd.pl with just the data for problem LCL082-1, e.g., for debugging.
• mk_fragdata.pl Module to generate FragSeq data from proofs represented with the result/5 predicate (specified in exp/cdex_testing).
• Proof collections, represented with the result/5 predicate (specified in exp/cdex_testing).
• results_ccs_sgcd.pl. 176 CD proofs obtained with CCS and SGCD. After n-simplification, which took effect for 2 of the proofs. At most one proof per problem. Chosen among various results of experiments by the following criteria: preference if solution by CCS (in order found by CCS configured for minimal csize), smaller csize, smaller tsize, smaller height. The problems are from TPTP 8.0.0.
• results_prover9.pl. 168 CD proofs obtained with Prover9. After n-simplification, which took effect for 72 of the proofs. At most one proof per problem. The problems are from TPTP 8.0.0.

CD Tools main page