[Agda] Does Agda2 have fast arrays?
Don Stewart
dons at galois.com
Tue Oct 5 00:10:58 CEST 2010
ajeffrey:
> On 10/04/2010 12:15 PM, Daniel Peebles wrote:
>> Things like stream fusion, that vector (and lots of high-performance
>> Haskell) relies on, rely on rewrite rules firing, which depend on the
>> syntactic structure of terms. So throwing unsafeCoerces in, in the wrong
>> places, will prevent the rules from firing, and would thus prevent
>> deforestation.
>
> Fair enough, Haskell's rules mechanism is just based on syntactic
> pattern-matching, so anything that introduces additional syntax is going
> to block the rewrites. For some applications (e.g. the kinds of
> high-performance computing way back in the original post) this will be a
> deal-killer.
>
> I've been trying to replicate this experimentally, just to see how bad
> the problem is, and what can be done about it, but I've hit a newbie
> roadblock.
>
> A simple example is the one in the Stream Fusion paper:
>
> module Main ( main ) where
>
> import Prelude ( Int, (<=), (*), putStr, show )
> import Data.Stream ( Stream, sum, concatMap, map, unfoldr,
> enumFromToInt )
>
> f :: Int -> Int
> f n = sum (concatMap (\ k -> map (\m -> k * m) (enumFromToInt 1 k))
> (enumFromToInt 1 n))
>
> main = putStr (show (f 10000))
>
Well, you usually don't write in Streams directly, they're used at a concrete type.
Vector is a good example of a fusible library.
Also, concatMap doesn't fuse. (It is the hardest thing to fuse (Stream (Stream ..)))
-- NOTE: We can't fuse concatMap anyway so don't pretend we do.
-- concatMap f = unstream . Stream.concatMap (stream . f) . stream
So, let's start with something simpler.
import qualified Data.Vector as V
f :: Int -> Int
f n = V.sum $ V.map (\m -> n * m) (V.enumFromN 1 n)
main = putStr (show (f 10000))
Fuses to:
main_$s$wfoldlM'_loop :: Int# -> Int# -> Int# -> Int#
main_$s$wfoldlM'_loop =
\ (sc_s1io :: Int#)
(sc1_s1ip :: Int#)
(sc2_s1iq :: Int#) ->
case ># sc_s1io 0 of _ {
False -> sc2_s1iq;
True ->
main_$s$wfoldlM'_loop
(-# sc_s1io 1)
(+# sc1_s1ip 1)
(+# sc2_s1iq (*# 10000 sc1_s1ip))
}
The following optimizations happen:
203 PreInlineUnconditionally
94 PostInlineUnconditionally
95 UnfoldingDone
5 RuleFired
3 SC:$wfoldlM'_loop0
2 stream/unstream [Vector]
13 LetFloatFromLet
1 EtaReduction
313 BetaReduction
3 CaseOfCase
65 KnownBranch
1 CaseIdentity
1 FillInCaseDefault
21 SimplifierDone
And, with the -fasm backend, it becomes:
Main_mainzuzdszdwfoldlMzqzuloop_info:
.Lc1j9:
testq %r14,%r14
jg .Lc1jc
movq %rdi,%rbx
jmp *(%rbp)
.Lc1jc:
movq %rsi,%rax
imulq $10000,%rax
movq %rdi,%rcx
addq %rax,%rcx
leaq 1(%rsi),%rax
decq %r14
movq %rax,%rsi
movq %rcx,%rdi
jmp Main_mainzuzdszdwfoldlMzqzuloop_info
With the -fvia-C backend, it becomes:
Main_mainzuzdszdwfoldlMzqzuloop_info:
testq %r14, %r14
jle .L5
.L2:
imulq $10000, %rsi, %rcx
leaq -1(%r14), %r14
leaq 1(%rsi), %rsi
leaq (%rcx,%rdi), %rdi
jmp Main_mainzuzdszdwfoldlMzqzuloop_info
And something similar under -fllvm.
Using ghc-core tool to inspect asm and core.
-- Don
More information about the Agda
mailing list