[Agda] notes on type check performance
Sergei Meshveliani
mechvel at botik.ru
Fri May 12 20:43:29 CEST 2017
Dear list,
These are several notes concerning the type check performance.
I. Sharing representation
=========================
Probably the current problem is implementing a real sharing
representation of a term.
For example, to make the type checker work with
let T = (A → A) → (A → A) → (A → A)
in
T × T
instead of its expanded form.
Agda has the --sharing option, but it helps in rare cases, at least,
no in my DoCon-A project.
II. One of the principal exponent cost source
=============================================
The following is about the source of the type check cost which has a
fundamental origin and is common to Agda and Coq.
- On 2017-01-13 Robbert Krebbers wrote: --------
This may be related: having large chains of coercions is also
problematic in Coq, see [1]: There is also a severe efficiency issue:
the complexity of Coq's term comparison algorithm is exponential in the
length of the coercion chain. While this is clearly a problem specific
to the current Coq implementation, it is hard and unlikely to be
resolved soon, so it seems prudent to seek a design that does not run
into it.
The paper [1] describes another way of organizing algebraic hierarchies
in Coq so that these long chains of coercions can be avoided. It quite
crucially relies on Coq's canonical structures mechanism, so I have no
clue whether you can practically do something similar in Agda.
[1] François Garillot, Georges Gonthier, Assia Mahboubi, Laurence
Rideau. Packaging Mathematical Structures.
TPHOLs'09. https://hal.inria.fr/inria-00368403v2
----------------------------------------------------------------------
The paper [1] writes in introduction:
"
While this unbundling allows for maximal flexibility, it also induces a
proliferation of arguments that is rapidly overwhelming. A typical
algebraic structure, such as a ring, involves half a dozen constants and
even more axioms. Moreover such structures are often nested, e.g., for
the Cayley-Hamilton theorem one needs to consider the ring of
polynomials over the ring of matrices over a general commutative ring.
The size of the terms involved grows as C^n , where C is the number of
separate components of a structure, and n is the structure nesting
depth. For Cayley-Hamilton we would have C = 15 and n = 3, and thus
terms large enough to make theorem proving impractical, given that
algorithms in user-level tactics are more often than not nonlinear.
"
I am trying to understand this example, and thus to understand the
nature of this particular source of the exponent cost grow effect.
It is about the construction of
Pol (SquareMatrix (R : CommutativeRing)),
and let us imagine that this is coded in Agda.
Here CommutativeRing is a record defining a notion of an abstract
commutative ring - something similar to CommutativeRing of Agda Standard
library.
a) CommutativeRing has the operations
0#, 1#, _+_, _-_, _*_, zeroLaw, unityLaw, +oppositeLaw, +-assoc,
+-comm, *-assoc, *-comm, distributivityLawFor+*,
may be something else. All right, let us assume that there are about
15 operations ("constants and axioms", as the paper writes).
b) SquareMatrix over any (R : CommutativeRing)
is a data structure to be defined in application.
This is something like
let C = Carrier R in List (Vec C n),
with the condition of length of the vector list equal to n.
If R : CommutativeRing, then indeed, for SquareMatrix R
there can be implemented the operations
#0, #1, _+_, _-_, _*_, zeroLaw, unityLaw, +oppositeLaw,
+-assoc, +-comm; *-assoc, distributivityLawFor+*.
Here _*_ is multiplication of two n × n matrices, and the
implementation actually proves that these matrix operations satisfy the
axioms of Ring.
This makes it the instance squareMatrixRing : Ring.
c) For any (R : Ring) an univariate polynomial domain Pol R
can be defined as a data structure.
There can be defined the operations 0#, 1#, _+_, _*_ for Pol R.
If R is commutative by _*_, then Pol R can be proved a certain
CommutativeRing instance.
If R is not commutative, then, I expect, probably, Pol R can be proved
a certain Ring instance.
d) So, we have that the construct of
Pol (SquareMatrix (R : CommutativeRing))
is expressed in the Agda program as
∀ R : CommutativeRing, -- abstract
∀ n : ℕ, n > 0,
mR = (squareMatrixRing R n) : Ring,
pmR = (polynomialRing mR) : Ring.
About 15 operation instances are implemented in mR,
Implementing _+_ for mR needs to use only 0# and _+_ of R.
Implementing _*_ for mR needs to use almost all operations of R,
this makes it a certain term depending of 15 argument operations.
Further, about 15 operation instances are implemented in pmR,
Implementing _*_ for pmR uses almost all operations of mR.
Assume that each operation instance implementation op in mR is
represented merely as a _tree_ where each node expresses an operation.
This node has several outbound edges, each edge points to some operation
op' in R which is used in the implementation of op.
The paper authors write:
"The size of the terms involved grows as C^n".
I understand this so that the size of the trees involved in the
described above Agda program for
pmR = (polynomialRing mR) : Ring
is bounded by 15 ^ 3
- provided that the above trees are stored in memory in expanded
unfolded form.
In various unlucky examples this upper bound of 15^3 is reached.
Assume roughly that each operation in the mR instance is a term in
which occur 15 operations from R
(like it is for the _*_ instance).
So the tree size for mR (the type expression size) is 15^2.
Assume roughly that each operation in the pmR instance is a term in
which occur 15 operations from mR (like it is for the _*_ instance).
Then, we have that that the tree for pmR has 15^3 leaves.
And note that in this term the terms for mR._+_ and mR._*_ are copied
many times. And the implementation term for rm._*_ is large.
I guess that this is what the authors wrote about.
II.2. What DoCon-A has with this respect
========================================
* DoCon-0.04.1 has the generic tower from Semigroup and Monoid up to
CommutativeRing, IntegralRing, FactorizationMonoid, FactorizationRing,
UniqueFactorizationRing, Field.
* FactorizationRing relies on the structure of Multiset and on the
instance of CommutativeMonoid for (Multiset D) for D : DecSetoid and
for the sum operation on multisets.
Various theorems rely on the instance of CommutativeMonoid Nonzero
for multipication on nonzero elements in any R : IntegralRing.
* There are implemented all the above instances for Integer, except
Field.
And the (FactorizationRing Integer) instance is implemented via
(FactorizationNonoid (Nat \ 0)).
This concerns _only an ordinary textbook on basic algebra_, a book of
about, say, 200 pages.
For this particular textbook, the depth of term for the instances
involved is bounded.
The depth in the above example from [1] is 3.
(1) A similar depth bound in DoCon-A-0.04.1 is somewhat about 5.
(2) Now, DoCon-A-0.04.1 needs 8 Gb memory and 64 minutes to type-check
on a 3 GHz machine.
(3) On the other hand, the corresponding definitions and rigorous
constructive proofs take only about 200 pages in a textbook, and a human
checks these proofs by reading this book. A human will check it,
probably, in 20 days. But the type checker processes the text millions
times faster.
We see that the checker of (3) takes a small resource on this term of
size C^5, while that checker of (2) takes a large resource.
May be, this inefficiency can be improved by introducing the let-in
representation of terms.
The paper [1] describes a certain subtle approach taken in Coq which
handles the problem. So far, I do not understand it
(I need to look further into the paper).
And I wonder, why did not they simply apply the let-in representation of
terms.
II.3. Comparing to Glasgow Haskell
==================================
DoCon (written in Haskell) has been tested on computing in the domains
like
Matrix (Pol [x, y] ((Pol t (Fraction Integer)) / (f))),
which have a greater term size.
And there are examples in practice that have sense and that have even
more levels in the domain construct. And we need to assume that the
depth in this tower of constructs is not bounded - similarly as the data
item in Agda may have arbitrary many constructor occurrences.
Glasgow Haskell compiles fast the above constructs of a great depth
(with nested classes and instances).
Though, without dependent type, the problem is, probably, easier.
Regards,
------
Sergei
More information about the Agda
mailing list