[Draft, Aug 30, 2022]

==============================================================================
==============================================================================
FragSeq Data - Documentation
==============================================================================
==============================================================================

==============================================================================
Overview
==============================================================================

The considered data are intended for use with external tools. Their
representation is in Prolog-readable form to make it easy to write small
Prolog programs that produce extracts and convert these to the specific forms
required by these tools. An example for such a small program is provided
(show_frontiers.pl), which shows a conversion to a human-readable form.

All data are represented by subject-property-object triples with the spo/3
predicate.

Entities are represented by atoms. The 'type' property indicates their type.
The following entity types are used: 'proof', 'fragseq', 'frag', 'nonfrag'.

The atoms representing entities are build with reqular schemes that start with
<Type>_<ProblemId>_<LocalProofID>_ such that grep can be easily used to
extract data pertaining to a specific problem or proof from a file with one
fact per line.

Literal values can be Prolog terms, including atoms, numbers, compund terms,
and lists of terms.

The terminology (e.g., "D-term", "MGT", "IPT", "tree size", "compacted size",
"height") follows the paper Christoph Wernhard and Wolfgang Bibel: "Learning
from Łukasiewicz and Meredith: Investigations into Proof Structures". In
"Automated Deduction: CADE 2021", volume 12699 of LNCS (LNAI), pages 58-75.
Springer, 2021. (Preprint: https://arxiv.org/abs/2104.13645) and the
implementation framework CD Tools (e.g., "IF", "CF" form)
(http://cs.christophwernhard.com/cdtools/downloads/cdtools/).

==============================================================================
Type proof
==============================================================================

Represents a proof (D-term).

------------------------------------------------------------------------------
Properties
------------------------------------------------------------------------------

Properties describing the proof:

  problem : Atom

      The proven problem as TPTP identifier, e.g., 'LCL082-1'.

  local_proof_id : Atom

      Together with the value of 'problem' uniquely identifies the proof. May
      be, for example, an identifier of a series of experiments with a prover
      in a specific configuration.

  dterm : D-term

      The D-term.

  csize, tsize, height : NatNum

      Dimensions (compacted size, tree size, height) of the D-term.

  goal : IF-term

      Goal as term in IF form, e.g., i(a,i(b,a)).

  axiom_1, axiom_2, ... : IF-term

      Axioms as terms in IF form with variables, e.g., i(X,i(Y,X)). The number
      postfix n identifies the respective n-th axiom in the D-term.

  number_of_axioms : NatNum >= 1

      Number of axioms.

==============================================================================
Type fragseq
==============================================================================

Represents a sequence of frags that specify a proof construction in a
step-by-step way by closing an open leaf in each step, either by setting it to
a specific axiom or to an inner node with two new open children.

So far, only a single specific construction order is, called 'dflr', is
supported: in each step the first open node (in depth-first left-to-right
order) is closed.

Note that there is a vast number of possible such construction orders. For
example, the proof d(d(d(1,1),d(d(d(1,d(1,1)),1),1)),1) of 'LCL082-1' with
tree size 7 has 2,365,000 different such orders.

------------------------------------------------------------------------------
Properties
------------------------------------------------------------------------------

  proof : proof

      The proof.

  length : NatNum >= 1

      The length of the fragseq. This is (t+1)*2, where t is the tree size
      (number of inner nodes) of the proof.

  order: Atom

      An identifier of the construction order. The order described above is
      specified with 'dflr'.

==============================================================================
Type frag
==============================================================================

Represents a fragment of a proof, i.e., a partially constructed proof, as a
member of a fragseq.

------------------------------------------------------------------------------
Properties
------------------------------------------------------------------------------

  fragseq : fragseq

      The fragseq which contains the frag.

  position : NatNum >= 1
    
      The position of the frag in its containing fragseq.

  parent : frag

      The preceding frag in the fragseq. Not provided for the first frag in
      the fragseq.

  modified_node : NatNum >= 1

      A number n that specifies the n-th open node (in depth-first
      left-to-right order) of the parent that is modified to obtain the
      present frag. Not provided for the first frag in the fragseq.
      
  modification : Term

      A term indicating the modifying operation applied to the parent's open
      node to obtained the present frag. Possible values: 'init' for the first
      frag in the fragseq, 'd' for setting the open node to an inner node,
      'axiom'(N), with N an axiom number, for setting it to a leaf with the
      specified axiom.

  frontier : List of IF-term

      A list of terms in IF form representing the IPTs at the open leaves, in
      depth-first left-to-right order.

      Note that a variable may appear in different list members, reflecting
      that the scope of the variables covers the full tree (variables are
      "rigid").

  frontier_varnums : List of NatNum >= 0

      A list of numbers whose length is the number of variables in the value
      of frontier, ordered to match the output of the ISO predicate
      term_variables/2. Can be used to give the variables in frontiers global
      names whose scope spans all frontiers in a sequence. The following code
      fragment shows how variables are bound accordingly to terms '$VAR'(N)
      which can be handled specially by Prolog's predicates to write terms.

      set_varnums(Frontier, Varnums) :-
 	    term_variables(Frontier, Vs),
	    set_varnums_1(Varnums, Vs).

      set_varnums_1([N|Ns], ['$VAR'(N)|Xs]) :-
  	    !,
	    set_varnums_1(Ns, Xs).
      set_varnums_1([], []) :-
   	    !.

  frontier_length : NatNum >= 0

      The length of 'frontier', i.e., the number of open leaves in the D-term.

==============================================================================
Type nonfrag
==============================================================================

Intended as counterexamples, in particular, for comparison with the respective
actual successors in the fragseqs.

In general, the first open leaf of a given proof fragment can be closed in
different ways: replaced by an inner node with two open successors or replaced
by a leaf representing a particular axiom. Only one of these possibilities is
actually applied in the proof and results in the successor proof fragment that
appears in the proof construction sequence. We call the fragments that are
obtained by one of the other possibilities nonfrags and provide their
frontiers as negative examples, associated with the parent proof fragment from
which they can be derived in one step. A prerequisite for the nonfrags is that
they have a most general theorem with respect to the overall goal, and thus a
defined frontier.
	
All possible nonfrags are considered, i.e., there is a nonfrag for each
possibility to close the open node of its parent in any other than the
parent's successor frag, provided the obtained d-term has a defined MGT (and
hence a defined frontier) for the overall goal.

------------------------------------------------------------------------------
Properties
------------------------------------------------------------------------------

  frontier, frontier_length, modified_node, modification

      These properties are used analogously as for type 'frag'.

  parent : frag

      The frag in the fragseq from which the nonfrag was derived as described
      above.

  frontier_varnums

      As for type frag, except that variables in the value of frontier that
      are not present in the parent frag or its preceding frags may be
      associated with arbitrary numbers not used in these frags, and thus
      represent locally scoped variables. Such numbers may, however, be also
      associated with global variables in the value of frontier_varnums of
      frags that appear after the parent in the sequence of frags.