[Agda] Nat arithmetic performance

Sergei Meshveliani mechvel at botik.ru
Sat Sep 24 19:13:15 CEST 2016


On Sat, 2016-09-24 at 16:40 +0200, Ulf Norell wrote:
> Oh, and the reason your Haskell program doesn't have the same problem
> is that you made a mistake in the implementation sum': the recursive
> call is to the library sum instead of sum'.
> 

I see. Thank you. 
After fixing sum', the performance ratio is only about  3/2.

As to GHC, its performance on  foldr, foldl, sum, sum' below, and such,
depends on whether the compliler detects the possibility to apply strict
evaluation for this particular function ("strictness analysis").
Its solution is not stable, a small change in a program may cause an
expensive laziness (the -O key helps with this respect in some cases).
And in our example (as in many others), expensive laziness still
preserves the asymptotic cost order.
(I never used strictness annotation, and still DoCon (in Haskell) is
fast enough).    
On the other hand, Agda + MAlonzo compiles into GHC. So I hoped that the
function  sum  in my Agda program will be compiled into this  sum' in
Haskell. So that the performance is expected to be the same.
And now I see that it is almost the same.

------
Sergei


> 
> On Sat, Sep 24, 2016 at 4:36 PM, Ulf Norell <ulf.norell at gmail.com>
> wrote:
>         On Sat, Sep 24, 2016 at 12:52 PM, Sergei Meshveliani
>         <mechvel at botik.ru> wrote:
>                 People,
>                 
>                 Please, find attached a certain test for performance
>                 of arithmetic in
>                 Agda, as compared to ghc-7.10.2.
>                 
>                 Shortly saying, it is
>                                    sum [x*x - y*y | x, y <- [n .. 0]],
>                                    n = 1000, ..., 6000, ...
>                 
>                 The Hasell program is also there.
>                 
>                 {-
>                 --------------------------------------------------------------------
>                 Timing for  Agda of September 16, 2016,  ghc-7.10.2,
>                 3 GHz computer
>                 
>                  n    |  s (control)    |    time [sec]
>                       |                 | Agda     ghc-7.10.2 -O
>                 --------------------------------------------------
>                  1000 |    166499833500 |  0.3  |  0.04
>                  2000 |   2665332667000 |  1.3  |  0.2
>                  3000 |  13495498500500 |  3.0  |  0.4
>                  4000 |  42655997334000 |  5.1  |  0.8
>                  5000 | 104145829167500 |  8.0  |  1.6
>                  6000 | 215963994001000 | 11.9  |  2.5
>                 -}
>                 
>                 The cost order of  n^2  is as needed.
>                 Bug GHC is somewhat 5 times faster.
>                 I think, this factor of 5 is not so important.
>                 But still, what may be its source?
>         
>         
>         You get a clue that something is wrong if you run with +RTS
>         -s:
>         
>         
>         $ ./NatArithPerformance +RTS -s
>         n = 5000
>         s =  104145829167500
>            8,616,354,344 bytes allocated in the heap
>            6,531,203,072 bytes copied during GC
>            1,593,171,592 bytes maximum residency (13 sample(s))
>                4,362,336 bytes maximum slop
>                     2772 MB total memory in use (0 MB lost due to
>         fragmentation)
>         
>         
>                                              Tot time (elapsed)  Avg
>         pause  Max pause
>           Gen  0     15734 colls,     0 par    3.300s   3.427s
>         0.0002s    0.0016s
>           Gen  1        13 colls,     0 par    2.329s   3.367s
>         0.2590s    1.5709s
>         
>         
>           INIT    time    0.000s  (  0.003s elapsed)
>           MUT     time    3.664s  (  3.460s elapsed)
>           GC      time    5.629s  (  6.794s elapsed)
>           EXIT    time    0.029s  (  0.275s elapsed)
>           Total   time    9.324s  ( 10.531s elapsed)
>         
>         
>           %GC     time      60.4%  (64.5% elapsed)
>         
>         
>           Alloc rate    2,351,612,899 bytes per MUT second
>         
>         
>           Productivity  39.6% of total user, 35.1% of total elapsed
>         
>         
>         The key number here is total memory in use: 2772MB. You'll
>         find that the Haskell program
>         runs in constant space. In this case your problem is sum:
>         
>         
>         sum : List ℕ → ℕ
>         sum []       =  0
>         sum (n ∷ ns) =  n + (sum ns)
>         
>         
>         For long lists this will build up a huge thunk of +s in memory
>         before it can start evaluating. If you change it to
>         
>         open import Agda.Builtin.Strict
>         
>         foldl' : ∀ {a b} {A : Set a} {B : Set b} → (B → A → B) → B →
>         List A → B
>         foldl' f z [] = z
>         foldl' f z (x ∷ xs) = primForce (f z x) λ z' → foldl' f z' xs
>         
>         sum : List ℕ → ℕ
>         sum = foldl' _+_ 0
>         
>         
>         we get instead
>         
>         
>         n = 5000
>         s =  104145829167500
>            8,201,151,016 bytes allocated in the heap
>                3,231,768 bytes copied during GC
>                  195,104 bytes maximum residency (2 sample(s))
>                   48,648 bytes maximum slop
>                        2 MB total memory in use (0 MB lost due to
>         fragmentation)
>         
>         
>                                              Tot time (elapsed)  Avg
>         pause  Max pause
>           Gen  0     15848 colls,     0 par    0.039s   0.049s
>         0.0000s    0.0009s
>           Gen  1         2 colls,     0 par    0.001s   0.010s
>         0.0049s    0.0096s
>         
>         
>           INIT    time    0.000s  (  0.002s elapsed)
>           MUT     time    3.335s  (  3.393s elapsed)
>           GC      time    0.039s  (  0.059s elapsed)
>           EXIT    time    0.000s  (  0.000s elapsed)
>           Total   time    3.376s  (  3.454s elapsed)
>         
>         
>           %GC     time       1.2%  (1.7% elapsed)
>         
>         
>           Alloc rate    2,459,180,636 bytes per MUT second
>         
>         
>           Productivity  98.8% of total user, 96.6% of total elapsed
>         
>         
>         On my machine that's within 25% of the corresponding Haskell
>         program (which runs in 2.7s).
>         
>         
>         / Ulf
>         
>         
> 
> 




More information about the Agda mailing list