[Agda] Allow certain whitespace characters in identifiers

David Raymond Christiansen david at davidchristiansen.dk
Fri Apr 1 14:38:25 CEST 2016


Dear Agda community,

We are writing to announce the availability of a new extension to Agda: 
full support for reprogrammable Unicode whitespace. One of the best 
features of Agda is that we are able to flexibly adapt its syntax to the 
domains that we work in - for instance, instead of:

     data Term : Context -> Type -> Set where
       ...

one can write:

     data _⊢_ : Γ → τ → Set where
       ...

This makes Agda code far more accessible!

Unfortunately, it was previously impossible to change the meaning of 
whitespace characters, or to use whitespace in the names of 
constructors. In our extension to Agda, we enable the following Unicode 
characters to be used like any other identifier or operator character:

* The non-breaking space character ' '
* The ideographic space character ' '
* The medium mathematical space character ' '
* The zero width non-breaking space character ''
* The zero width joiner character '‍'

Also, we relax a limitation in Agda's lexical syntax that disallowed the 
zero-width space character from being used as whitespace, which is 
essential when using custom notation (e.g. in our embedding of Edwin 
Brady's whitespace language, we can write " ​​ ​​ ​​ ​​ ​​ ​​ ​​ 
  ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​‍" in our Agda buffer for 
a part of the Hello World program).

As evidence for the utility of this extension, we present two 
demonstrations: a representation of untyped lambda terms using de Bruijn 
indices, and an embedding of the syntax of Whitespace.

We represent our lambda terms as follows:

     module LambdaTerms where
       data Nat : Set where
         z : Nat
         s : Nat -> Nat

       infixl 99999 _ _
       infix 99 

       data Tm : Set where
         _ _ : Tm -> Tm -> Tm
         ƛ‍_  : Tm -> Tm
          : Nat -> Tm

       self apply : Tm
       self apply = ƛ‍​​z​ ​​z​

The constructor for application is the infix use of the medium 
mathematical space, to reflect the mathematical nature of function 
application. Lambda terms are, as one would expect from the name, joined 
to their bodies via a zero-width joiner character, and their scopes 
space out with a non-breaking space. Because the de Bruijn indices 
should naturally appear to a part of the term, we inject them with a 
constructor whose name is the zero-width non-breaking space. Agda's need 
for spacing between operators can be satisfied with '​', preserving the 
readability of the embedding.

The whitespace language is embedded as a list of commands:

     module   where
       data WS : Set where
               : WS

       infixr 100 __

       data Program : Set where
         ‍ : Program
         __ : WS -> Program -> Program


       helloworld : Program
       helloworld =  ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​  
​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ 
  ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​ 
​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​  
​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​  
​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​  
​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​ 
​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​ 
​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​  
​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ 
  ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​ 
​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​‍

We have had to make some compromises for practicality - overridable 
newlines would be absolutely unreadable, after all. So we have chosen to 
use the non-breaking space, the ideographic space, and the mathematical 
medium space to represent space, tab, and newline, respectively. The 
commands are built in a list-like structure, using zero-width 
non-breaking spaces as cons and the zero-width joiner as nil.

Recall that, in Emacs, one can enter arbitrary Unicode using C-x 8 RET 
CHARACTER NAME RET, where the CHARACTER NAME is the official all-caps 
Unicode designator for the character in question. Future work includes 
adapting Agda-mode to make it even easier to work with these encodings.


Happily, this change did not even require major changes to Agda. We 
attach the patch to this email in hopes that others will find this 
extension as useful as we have when adopting syntax from the domains 
that we work in.

Best,

David Christiansen & Cameron Swords
-------------- next part --------------
module Spaces where


module LambdaTerms where
  data Nat : Set where
    z : Nat
    s : Nat -> Nat

  infixl 99999 _ _
  infix 99 

  data Tm : Set where
    _ _ : Tm -> Tm -> Tm
    ƛ‍_  : Tm -> Tm
     : Nat -> Tm

  self apply : Tm
  self apply = ƛ‍​​z​ ​​z​ 


module   where
  data WS : Set where
          : WS

  infixr 100 __

  data Program : Set where
    ‍ : Program
    __ : WS -> Program -> Program



  test : Program
  test =  ​​ ​​ ​​ ​​ ​​‍

  helloworld : Program
  helloworld =  ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​ ​​‍
-------------- next part --------------
A non-text attachment was scrubbed...
Name: relax-whitespace-restrictions.patch
Type: text/x-patch
Size: 1930 bytes
Desc: not available
Url : http://lists.chalmers.se/pipermail/agda/attachments/20160401/0868538e/relax-whitespace-restrictions.bin


More information about the Agda mailing list