[Agda] about serialization

Tue Apr 4 16:29:37 CEST 2017

On Tue, 2017-04-04 at 12:02 +0200, Ulf Norell wrote:
> On Sat, Apr 1, 2017 at 2:45 PM, Sergei Meshveliani <mechvel at botik.ru>
> wrote:
>         
>         But anyway I need to understand: what generally the type
>         checker does in
>         the examples of this kind.
>         Here is a simplified example.
>         
>         module OfSemigroup  is given an arbitrary semigroup H as a
>         parameter,
>         and it defines certain items using H.
>         
>         module OfRing  is given an arbitrary ring R as a parameter.
>         It defines  *-semigroup-2  -- a direct product of the
>         multiplcative
>         semigroup of R with itself.
>         
>         module M is given an arbitrary ring R as a parameter.
>         And its only implementation is
>             where
>             open OfRing R using (*-semigroup-2)
>             module OfMulSemigroup = OfSemigroup *-semigroup-2
>         
>         Suppose that M1 and M2 have been type-checked earlier.
>         When type-checking M,
>         * what is substituted to where?
>         * what is being normalized?
>         * what part can be expensive?
> 
> 
> Nothing is normalised and the only type checking that happens is
> to check that the given module arguments match the types expected
> by the module. The arguments are substituted into the types of the
> module definitions, but not the bodies. Nevertheless, this may create
> very big types that take a long time to serialise. What does the
> profile information tell you about where the time is spent? My money
> would be on serialisation.
> 

Thank you for explanation.

In the real example, it reports for  -M8G  and  3 GHz machine:

  Total             122,391ms 
  Miscellaneous      51,607ms                    
  UFtRing0           70,784ms      

  Serialization      49,883ms (61,239ms) 
  Deserialization    51,303ms,

where  M.agda = UFtRing0.agda,
the first 3 lines are of  profile.modules:10,
the last  2 lines are of  profile:7. 

I am trying to understand what is happening there so that takes a great
cost.
Let us, so far, consider only the cost of Serialization.

For the above M.agda, I have in reality quite a small source:

-------------------------------------------------------------------
module UFtRing0 where
open import Data.Product using (_,_)

-- of application-----   
open import Structures3 using (UpIntegralRing; UpIntegralRing-with∣?)
open import Structures4 using (module OverIntegralRing-0)
open import Structures6 using (module OverIntegralRing-2)
open import FtMonoid1   using (module OfCCMonoid-1)
open import FtRing      using (module OverIntegralRing-5)

module UniqueFtRing-0 {α α=}
             (upIR∣? : UpIntegralRing-with∣? α α=)
             (open OverIntegralRing-2 upIR∣? using (Prime∣split))
             (open OverIntegralRing-5 upIR∣? using (FactorizationRing))
             (ftRing      : FactorizationRing)
             (prime∣split : Prime∣split)
  where
  open OverIntegralRing-0 upIR∣? using (_∣'?_; *upCMon-NZ; *cancel-nz)

  module NZPack1 = OfCCMonoid-1 (*upCMon-NZ , _∣'?_) *cancel-nz
----------------------------------------------------------------------

The last line  "module NZPack1 = ..."
takes 93 % of the type check cost. This is visible when we comment out
this line.

It remains to see what happens when type checking this last line.

OfCCMonoid-1  is defined in previous .agda file:

  module OfCCMonoid-1 {α α=}
                  (upCMon∣? : UpCommutativeMonoid-with∣? α α=)
                  (let upCMon = proj₁ upCMon∣?)
                  (open UpCommutativeMonoid upCMon using (setoid; _∙_))
                  (cancel : LCancellation setoid _∙_)
     where ...

(1) So, the type checker checks that  *upCMon-NZ :  UpCommutativeMonoid.
This has been checked earlier, in the module  OverIntegralRing-0.agda,
so its Serialization cost here is zero
Right?

(2) The type checker processes all the signatures in the module  
    OfCCMonoid-1,  with substituting there the parameter values.

The main parameter value is  *upCMon-NZ,
(the parameters  _∣'?_ and *cancel-nz  give a smaller cost, also they
are considered similarly).

*upCMon-NZ  is imported from OverIntegralRing-0 as ready.
This (nested record structure) is the instance of a generic structure of
CommutativeMonoid for the case of the monoid of nonzero elements in the
given IR : IntegralRing,  for multiplication in IR.  It may be a large
term.

And  *upCMon-NZ  is substituted into all signatures in OfCCMonoid-1.
C = Carrier  is extracted from  upCMon∣?;  the latter is replaced
according to  *upCMon-NZ,  so that  C  is being replaced in  with
Nonzero  in  OfCCMonoid-1.

For example, in
  pairValue : C × ℕ → C
  pairValue = uncurry _^_

the signature is replaced with  
                          pairValue : Nonzero × ℕ → Nonzero,

and the second line is not touched.
Right?

The signature 
    pairToAsd : C × ℕ → Associated × ℕ

is replaced with 
    pairToAsd : Nonzero × ℕ → Associated-NZ × ℕ

Here a notion of  Associated  is imported by
      open OfAssociated upCMon∣? using (Associated; ...),

upCMon∣? is replaced there (in a previous .agda module) with *upCMon-NZ,
and  Associated  is replaced there with a certain  Associated-NZ,
and this has been done when type checking previous .agda modules.

In the definition

  record Factorization (a : C) : Set (α ⊔ α=)
    where
    constructor factorization′

    field  multiset  : Multiset
           allPrime  : All IsPrime $ msKeys multiset
           productEq : a ~ msValue multiset

Multiset  is replaced with certain Multiset-NZ,
IsPrime  with IsPrime-NZ,
Both these values are checked and ready in previous .agda modules.

OfCCMonoid-1  is so that (according to your explanation) only the above
trivial substitutions are done for the values prepared and checked in
previous .agda files.  

Only the size of terms being substituted may be large.
For example,  *upCMon-NZ  is a structure which contains a proof for that
nonzero elements  Nonzero  in  upIR : IntegralRing  form a commutative
monoid for multiplication, and also implementation for several
operations for Monomial.

These construct and proof are simple conceptually, but take 
about 150 nonempty lines of the source in the module devoted to
substructures (sub-semigroup, sub-monoid, etc).
I do not know: how the term for this  *upCMon-NZ instance is stored in
the memory.
In the .agdai file, it may take much size (agdai files are large).
But this refers to Deserialization, right?
And symbolic substitutions with this term refer to Serialization. 
Right?
I do not know how to find: 
which substitution is the most expensive here.

Intuitively, noone can (probably) be expensive for serialization.
Because a thing there is unfolded once in the memory, and then many
references are copied (?).
But I do not know how Agda performs these substitutions. 
May be,             
           profile.definitions:10 
will help? 
The declaration 
    module NZPack1 = OfCCMonoid-1 (*upCMon-NZ , _∣'?_) *cancel-nz

leads to substituting of the parts of *upCMon-NZ into the definitions in
the module  OfCCMonoid-1  which is type-checked in previous .agda files.
How to measure the cost of any of these substitutions? 

Thanks,

------
Sergei