<div dir="ltr">Oh, and the reason your Haskell program doesn't have the same problem is that you made a mistake in the implementation sum': the recursive call is to the library sum instead of sum'.<div><br></div><div>/ Ulf</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Sep 24, 2016 at 4:36 PM, Ulf Norell <span dir="ltr"><<a href="mailto:ulf.norell@gmail.com" target="_blank">ulf.norell@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><div class="h5">On Sat, Sep 24, 2016 at 12:52 PM, Sergei Meshveliani <span dir="ltr"><<a href="mailto:mechvel@botik.ru" target="_blank">mechvel@botik.ru</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">People,<br>
<br>
Please, find attached a certain test for performance of arithmetic in<br>
Agda, as compared to ghc-7.10.2.<br>
<br>
Shortly saying, it is<br>
sum [x*x - y*y | x, y <- [n .. 0]],<br>
n = 1000, ..., 6000, ...<br>
<br>
The Hasell program is also there.<br>
<br>
{-<br>
------------------------------<wbr>------------------------------<wbr>--------<br>
Timing for Agda of September 16, 2016, ghc-7.10.2, 3 GHz computer<br>
<br>
n | s (control) | time [sec]<br>
| | Agda ghc-7.10.2 -O<br>
------------------------------<wbr>--------------------<br>
1000 | 166499833500 | 0.3 | 0.04<br>
2000 | 2665332667000 | 1.3 | 0.2<br>
3000 | 13495498500500 | 3.0 | 0.4<br>
4000 | 42655997334000 | 5.1 | 0.8<br>
5000 | 104145829167500 | 8.0 | 1.6<br>
6000 | 215963994001000 | 11.9 | 2.5<br>
-}<br>
<br>
The cost order of n^2 is as needed.<br>
Bug GHC is somewhat 5 times faster.<br>
I think, this factor of 5 is not so important.<br>
But still, what may be its source?<br></blockquote><div><br></div></div></div><div>You get a clue that something is wrong if you run with +RTS -s:</div><div><br></div><div>$ ./NatArithPerformance +RTS -s</div><div><div>n = 5000</div><div>s = 104145829167500</div><div> 8,616,354,344 bytes allocated in the heap</div><div> 6,531,203,072 bytes copied during GC</div><div> 1,593,171,592 bytes maximum residency (13 sample(s))</div><div> 4,362,336 bytes maximum slop</div><div> 2772 MB total memory in use (0 MB lost due to fragmentation)</div><div><br></div><div> Tot time (elapsed) Avg pause Max pause</div><div> Gen 0 15734 colls, 0 par 3.300s 3.427s 0.0002s 0.0016s</div><div> Gen 1 13 colls, 0 par 2.329s 3.367s 0.2590s 1.5709s</div><div><br></div><div> INIT time 0.000s ( 0.003s elapsed)</div><div> MUT time 3.664s ( 3.460s elapsed)</div><div> GC time 5.629s ( 6.794s elapsed)</div><div> EXIT time 0.029s ( 0.275s elapsed)</div><div> Total time 9.324s ( 10.531s elapsed)</div><div><br></div><div> %GC time 60.4% (64.5% elapsed)</div><div><br></div><div> Alloc rate 2,351,612,899 bytes per MUT second</div><div><br></div><div> Productivity 39.6% of total user, 35.1% of total elapsed</div><div><br></div></div><div>The key number here is total memory in use: 2772MB. You'll find that the Haskell program</div><div>runs in constant space. In this case your problem is sum:</div><div><br></div><div><div>sum : List ℕ → ℕ</div><div>sum [] = 0</div><div>sum (n ∷ ns) = n + (sum ns)</div></div><div><br></div><div>For long lists this will build up a huge thunk of +s in memory before it can start evaluating. If you change it to</div></div></div><br>open import Agda.Builtin.Strict<br><br>foldl' : ∀ {a b} {A : Set a} {B : Set b} → (B → A → B) → B → List A → B<br>foldl' f z [] = z<br>foldl' f z (x ∷ xs) = primForce (f z x) λ z' → foldl' f z' xs<br><br>sum : List ℕ → ℕ<br><div class="gmail_extra"><div class="gmail_quote"><div><div>sum = foldl' _+_ 0</div></div></div></div><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>we get instead</div><div><br></div><div><div>n = 5000</div><div>s = 104145829167500</div><div> 8,201,151,016 bytes allocated in the heap</div><div> 3,231,768 bytes copied during GC</div><div> 195,104 bytes maximum residency (2 sample(s))</div><div> 48,648 bytes maximum slop</div><div> 2 MB total memory in use (0 MB lost due to fragmentation)</div><div><br></div><div> Tot time (elapsed) Avg pause Max pause</div><div> Gen 0 15848 colls, 0 par 0.039s 0.049s 0.0000s 0.0009s</div><div> Gen 1 2 colls, 0 par 0.001s 0.010s 0.0049s 0.0096s</div><div><br></div><div> INIT time 0.000s ( 0.002s elapsed)</div><div> MUT time 3.335s ( 3.393s elapsed)</div><div> GC time 0.039s ( 0.059s elapsed)</div><div> EXIT time 0.000s ( 0.000s elapsed)</div><div> Total time 3.376s ( 3.454s elapsed)</div><div><br></div><div> %GC time 1.2% (1.7% elapsed)</div><div><br></div><div> Alloc rate 2,459,180,636 bytes per MUT second</div><div><br></div><div> Productivity 98.8% of total user, 96.6% of total elapsed</div></div><div><br></div><div>On my machine that's within 25% of the corresponding Haskell program (which runs in 2.7s).</div><span class="HOEnZb"><font color="#888888"><div><br></div><div>/ Ulf</div><div><br></div></font></span></div></div></div>
</blockquote></div><br></div>