# Terminating Unfolds (2)

After seeing our code, Nils Anders Danielsson suggested two improvements. Firstly, to wrap the bound in the seed. The terminating unfoldr↓ would thus have a simpler type as well as a simpler implemenation:

unfoldr↓ : {a : Set}(b : ℕ -> Set){n : ℕ} ->
(f : forall {i} -> b (suc i) -> ⊤ ⊎ (a × b i)) ->
b n -> [ a ]
unfoldr↓ b {zero} f y = []
unfoldr↓ b {suc i} f y with f y
... | inj₁ _ = []
... | inj₂ (x , y') = unfoldr↓ b {i} f y'

The definition passes the termination check, apparently, because unfoldr↓ is defined inductively on n.

To generate a descending a list, one may invent a datatype Wrap that wraps the seed, whose bound is simply the seed itself:

data Wrap : ℕ -> Set where
W : (n : ℕ) -> Wrap n

The list may then be generated by an unfold:

dec : forall {i} -> Wrap (suc i) -> ⊤ ⊎ (ℕ × Wrap i)
dec {i} (W .(suc i)) = inj₂ (i , W i)

down↓ : ℕ -> [ ℕ ]
down↓ n = unfoldr↓ Wrap dec (W n)

Of course, this would defeat my original goal of reusing the non-dependently typed dec, which is probably a bad idea anyway.

To show that the bound need not be exact, let’s try to generate a descending list whose elements are decremented by 2 in each step. We may use this slightly generalised wrapper:

data Wrap2 : ℕ -> Set where
W2 : (x : ℕ) -> (bnd : ℕ) -> x ≤ bnd -> Wrap2 bnd

and a function dec2 that decrements a suc i-bounded seed by 2 but showing that the new seed is bounded by i:

dec2 : forall {i} -> Wrap2 (suc i) -> ⊤ ⊎ (ℕ × WAlt i)
dec2 {i} (W2 0 .(1 + i) _) = inj₁ tt
dec2 {i} (W2 1 .(1 + i) _) = inj₁ tt
dec2 {i} (W2 (suc (suc n)) .(1 + i) 2+n≤1+i) =
inj₂ (n , W2 n i (suc-≤-weaken-l (≤-pred 2+n≤1+i)))

The list can then be unfolded by:

down↓2 : ℕ -> [ ℕ ]
down↓2 n = unfoldr↓ Wrap2 dec2 (W2 n n ≤-refl)

where suc-≤-weaken-l is a proof of forall {m n} -> suc m ≤ n -> m ≤ n.

### Unfolding a Tree

It is an easy exercise to come up with a tree version of the unfold above:

unfoldT↓ : {a : Set}(b : ℕ -> Set){n : ℕ} ->
(f : forall {i} -> b (suc i) -> a ⊎ (b i × b i)) ->
B n -> Tree a
unfoldT↓ b {0} f y = Nul
unfoldT↓ b {suc i} f y with f y
... | inj₁ x = Tip x
... | inj₂ (y₁ , y₂) =
Bin (unfoldT↓ b {i} f y₁) (unfoldT↓ b {i} f y₂)

To deal with the second task of building a roughly balanced binary tree, one may try this wrapper:

data Split (a : Set): ℕ -> Set where
Sp : (xs : [ a ]) -> (bnd : ℕ) ->
length xs < bnd -> Split a bnd

and try to code up a generator function split↓ having this type:

split↓ : forall {a i} -> Split a (suc i) -> a ⊎ (Split a i × Split a i)

The function split↓ I eventually come up with, however, is much more complicated than I had wished. Even worse, it is now split↓ that fails to pass the termination check.

Ulf Norell suggested some possible fixes. The difficulties, however, is probably a hint that there is something wrong in my approach in the first place. Rather than trying to fix it, Nils Anders showed me how he would tackle the problem from the beginning.

### Using Well-Founded Recursion

Nils Anders showed me how to define unfoldT↓ using well-founded recursion. For a simplified explanation, the Standard Library provides a function <-rec having type (after normalisation):

<-rec : (P : ℕ -> Set) ->
(f : (i : ℕ) -> (rec : (j : ℕ) -> j <′ i -> P j) -> P i) ->
(x : ℕ) -> P x

With <-rec one can define functions on natural numbers by recursion, provided that the argument strictly decreases in each recursive call. P is the type of the result, parameterised by the input. The function f is a template that specifies the body of the recursion which, given i, computes some result of type P i. The functional argument rec is supposed to be the recursive call. The constraint j <′ i, however, guarantees that it accepts only inputs strictly smaller than i (the ordering <′ is a variation of < that is more suitable for this purpose). One may perhaps think of <-rec as a fixed-point operator computing the fixed-point of f, only that f has to take i before rec because the latter depends on the former.

Let us try to define an unfold on trees using <-rec. The "base-functor" of the datatype Tree⁺ a is F b = a ⊎ b × b. One of the lessons we have learnt is that it would be more convenient for the generating function to return the bound information. We could use a type like this:

F a b k = a ⊎ ∃ (ℕ × ℕ) (\(i , j) -> b i × b j × i <′ k × j <′ k)

But it is perhaps more reader-friendly to define the base functor as a datatype:

data Tree⁺F (a : Set) (b : ℕ -> Set) : ℕ -> Set where
tip : forall {k} -> a -> Tree⁺F a b k
bin : forall {i j k} ->
b i -> b j -> i <′ k -> j <′ k -> Tree⁺F a b k

The generating function for the unfold should thus have type forall {a b i} -> b i -> Tree⁺F a b i.

The function unfoldT↓F is the template for unfoldT↓:

unfoldT↓F : {a : Set} {b : ℕ -> Set} ->
(f : forall {i} -> b i -> Tree⁺F a b i) ->
(n : ℕ) -> ((i : ℕ) -> i <′ n -> b i -> Tree⁺ a) ->
b n -> Tree⁺ a
unfoldT↓F f n rec y with f y
... | tip a = Tip⁺ a
... | bin {i} {j} y₁ y₂ i<n j<n =
Bin⁺ (rec i i<n y₁) (rec j j<n y₂)

Now unfoldT↓ can be defined by:

unfoldT↓ : {a : Set} {b : ℕ -> Set} ->
(f : forall {i} -> b i -> Tree⁺F a b i) ->
forall {n} -> b n -> Tree⁺ a
unfoldT↓ {a}{b} f {n} y =
<-rec (\n -> b n -> Tree⁺ a) (unfoldT↓F f) n y

All the definition above makes no recursive calls. All the tricks getting us through the termination check are hidden in <-rec. How is it defined?

### Well-Founded Recursion Defined on <′

Currently, well-founded recursion are defined in Induction and its sub-modules. They are very interesting modules to study. The definitions there, however, are very abstract. To understand how <-rec works, let's try to implement our own.

This is the definition of <′ from Data.Nat:

data _≤′_ : Rel ℕ where
≤′-refl : forall {n} -> n ≤′ n
≤′-step : forall {m n} -> m ≤′ n -> m ≤′ suc n

_<′_ : Rel ℕ
m <′ n = suc m ≤′ n

Recall that the recursion template f has type forall i -> (forall j -> j <′ i -> P j) -> P i. That is, given i, f computes P i, provided that we know of a method to compute P j for all j <′ i. Let us try to construct such a method using f. The function guard f i computes f, provided that the input is strictly smaller than i:

guard : {P : ℕ -> Set} ->
(forall i -> (forall j -> j <′ i -> P j) -> P i) ->
forall i -> forall j -> j <′ i -> P j
guard f zero _ ()
guard f .(suc j) j ≤′-refl = f j (guard f j)
guard f (suc i) j (≤′-step j<i) = guard f i j j<i

Observe that guard terminates because it is defined inductively on i. If we discard the type information, all what guard f i j is to make a call to f j. Before doing so, however, it checks through the proof to make sure that j is strictly smaller than i. In f j (guard f j), the call to guard f j ensures that subsequent calls to f are given arguments smaller than j.

Now <-rec can be defined by:

<-rec : (P : ℕ -> Set) ->
(forall i -> (forall j -> j <′ i -> P j) -> P i) ->
forall n -> P n
<-rec P f n = f n (guard f n)

In Induction.Nat, <-rec is an instance of well-founded recursion defined using the concept of accessibility, defined in Induction.WellFounded. I find them very interesting modules about which I hope to understand more.

To be continued...

# Terminating Unfolds (1)

The AoPA(Algebra of Programming in Agda) library allows one to derive programs in a style resembling that in the book Algebra of Programming in Agda. For example, the following is a derivation of insertion sort:

sort-der : ∃ ([ Val ] -> [ Val ]) (\f -> ordered? ○ permute ⊒ fun f )
sort-der = exists (
⊒-begin
ordered? ○ permute
⊒⟨ (\vs -> ·-monotonic ordered? (permute-is-fold vs)) ⟩
ordered? ○ foldR combine nil
⊒⟨ foldR-fusion ordered? ins-step ins-base ⟩
foldR (fun (uncurry insert)) nil
⊒⟨ foldR-to-foldr insert [] ⟩
fun (foldr insert [])
⊒∎)

isort : [ Val ] -> [ Val ]
isort = witness sort-der

The type of sort-der is a proposition that there exists a program of type [ Val ] → [ Val ] that is contained in ordered ? ◦ permute, a relation mapping a list to one of its ordered permutations. The combinator fun embeds a function in a relation. Relations on each line are related by relational inclusion (), while the proof of inclusion are given in angle brackets. In the final step, the relation foldR (fun (uncurry insert)) nil is refined to a function using the rule:

foldR-to-foldr : {A B : Set} -> (f : A -> B -> B) -> (e : B) ->
foldR (fun (uncurry f)) (singleton e) ⊒ fun (foldr f e)

which says that a relational fold is a functional fold if the algebra (f) is a function. When the proof is completed, an algorithm isort is obtained by extracting the witness of the proposition. It is an executable program that is backed by the type system to meet the speciﬁcation.

One may wonder: why insertion sort, rather than something more interesting such as mergesort or quicksort? Because we have not yet figured out how to properly deal with hylomorphism yet. One of the things still missing in AoPA is a treatment of unfolds. AoPA attempts to model the category of sets and relations. In this context one can talk about unfolds because they are simply the converses of folds. We perform the derivations in terms of folds, and refine the converse of a relational fold to a functional unfold provided that the coalgebra is a function:

foldR˘-to-unfoldr : {A B : Set} -> (f : B -> ⊤ ⊎ (A × B)) ->
(foldR ((fun f) ˘ ○ (fun inj₂)) (\b -> isInj₁ (f b))) ˘ ⊒ fun (unfoldr f)

where ˘ denotes relational converse and unfoldr is given by the usual definition (one that a Haskell programmer would expect, modulo syntax):

unfoldr : {A B : Set} -> (B -> ⊤ ⊎ (A × B)) -> B -> [ A ]
unfoldr f b with f b
... | inj₁ _ = []
... | inj₂ (a , b') = a ∷ unfoldr f b'

The only problem is that this unfoldr cannot pass the termination check of Agda.

My first thought was to wait until Agda supports codata and coinduction. However, the reason I wanted an unfold in AoPA in the first place was to compose it with a fold to form a hylomorphism, which is not allowed if the unfold returns codata. I then attempted to model unfolds in a way similar to what we did in relational program derivation. We know that the recursive equation:

$R = S ○$FR ○ T


has a unique solution R = ⦇S⦈ ○ ⦇T⦈˘ if and only if T is well-founded. In program derivation, the proof of well-foundedness is usually given separately. Therefore I hope to come up with a variation of unfoldr which, given a proof of well-foundedness, somehow passes the termination check.

During DTP 2008, I consulted a number of people and was suggested at least three solutions. I am going to summarise them in turn.

### Canonical Examples

With each approach I would like to tackle two tasks. Given the following function dec:

dec : ℕ -> ⊤ ⊎ (ℕ × ℕ)
dec zero = inj₁ tt
dec (suc n) = inj₂ (n , n)

One may attempt to define a function down using unfoldr such that down n produces a list from n to 0. As mentioned above, unfoldr cannot pass the termination check. In fact, even the following definition fails the termination check:

down : ℕ -> [ ℕ ]
down n with dec n
... | inj₁ _ = []
... | inj₂ (a , n') with down n'
...    | x = a ∷ x

The first task is to define down using some kind of unfold.

In the second tasks, given a non-empty list, we would like to distribute its elements to a roughly balanced binary tree. The list and the tree are define by:

data List⁺ (a : Set) : Set where
[_]⁺ : a -> List⁺ a
_∷⁺_ : a -> List⁺ a -> List⁺ a
data Tree⁺ (a : Set) : Set where
Tip⁺ : a -> Tree⁺ a
Bin⁺ : Tree⁺ a -> Tree⁺ a -> Tree⁺ a

If we apply, for example, foldT merge wrap, we get merge sort (I use non-empty lists to save the constructor for empty trees). In Haskell, one might first define a function that splits a list to roughly two halfs, something equivalent to this:

split⁺ : {a : Set} -> List⁺ a -> a ⊎ (List⁺ a × List⁺ a)
split⁺ [ x ]⁺ = inj₁ x
split⁺ (x ∷⁺ xs) with split⁺ xs
... | inj₁ y = inj₂ ([ y ]⁺ , [ x ]⁺)
... | inj₂ (ys , zs) = inj₂ (zs , x ∷⁺ ys)

and use an unfold on Tree⁺. The second task is to construct a function expand : {a : Set} -> List⁺ a -> Tree⁺ a that builds a roughly balanced tree, using some kind of unfold that makes use of split⁺, or at least some variation of it.

### Josh Ko’s Approach

Josh and I tried to extend unfolds with extra arguments representing a bound and proofs that the bound decreases with each call to f:

unfoldr↓ : {a b : Set} {_≾_ : b -> ℕ -> Set} ->
(f : b -> ⊤ ⊎ (a × b)) -> (y : b) -> (t : ℕ) ->
y ≾ t -> (forall {y} -> y ≾ zero -> f y ≡ inj₁ tt) ->
(forall {y} t' {x y'} ->
y ≾ suc t' -> f y ≡ inj₂ (x , y') -> y' ≾ t')
-> [ a ]

Apart from the generating function f and the seed b, the function unfoldr↓ takes a bound t, a natural number. The seed and the bound is related by , to be defined for each specific problem. Before producing the list, unfoldr↓ demands that the current seed is bounded by the bound (y ≾ t), a proof that f must yield inj₁ when the bound is zero, and a proof that if f returns inj₂ (x , y'), the new seed y' is bounded by a strictly smaller bound.

To define unfoldr↓ we need the inspect idiom (see the explanation in Relation.Binary.PropositionalEquality):

unfoldr↓ f y t y≾t p1 p2 with inspect (f y)
... | (inj₁ tt) with-≡ inj₁≡fy = []
... | (inj₂ (x , y')) with-≡ inj₂≡fy with t
...  | zero = contradiction (p1 y≾t)
(\fy≡inj₁ -> inj₁≢inj₂ (≡-sym (≡-trans inj₂≡fy fy≡inj₁)))
...  | suc t' = x ∷ unfoldr↓ f y' t'
(p2 t' y≾t (≡-sym inj₂≡fy)) p1 p2

Notice that unfoldr↓ passes the termination check because it pattern matches on t.

#### Descending List

For the first task, the seed is a natural number, and we may simply define to be :

_≾_ : ℕ -> ℕ -> Set
n ≾ c = n ≤ c

Then down↓ can be defined in terms of unfoldr↓ and dec by:

stop : forall {n} -> n ≾ zero -> dec n ≡ inj₁ tt
stop {.zero} z≤n = ≡-refl

proceed : forall {n} c' {x n'} -> n ≾ suc c' -> dec n ≡ inj₂ (x , n') -> n' ≾ c'
proceed {.zero} c' {x} {n'} z≤n dec-zero≡inj₂ = contradiction dec-zero≡inj₂ inj₁≢inj₂
proceed {suc m} c' {.m} {.m} (s≤s m≤c') ≡-refl = m≤c'

down↓ : ℕ -> [ ℕ ]
down↓ n = unfoldr↓ dec n n ≤-refl stop proceed

where ≤-refl : forall x -> x ≤ x.

#### Expanding a Binary Tree

The unfold above can be generalised to trees in the obvious way:

unfoldT↓ : {a b : Set} {_≼_ : b -> ℕ -> Set} ->
(f : b -> a ⊎ b × b) -> (y : b) -> (n : ℕ) ->
y ≼ n -> (forall y -> y ≼ zero -> ∃ a (\x -> f y ≡ inj₁ x)) ->
(forall y i y₁ y₂ ->
y ≼ suc i -> f y ≡ inj₂ (y₁ , y₂) -> (y₁ ≼ i × y₂ ≼ i))
-> Tree⁺ a
unfoldT↓ f y n y≼n p₁ p₂ with inspect (f y)
... | inj₁ x with-≡ _ = Tip⁺ x
... | inj₂ (y₁ , y₂) with-≡ inj₂≡fy with n
...  | zero = contradiction (proof (p₁ y y≼n))
(\fy≡inj₁ -> inj₁≢inj₂ (≡-sym (≡-trans inj₂≡fy fy≡inj₁)))
...  | suc m with p₂ y m y₁ y₂ y≼n (≡-sym inj₂≡fy)
...    | (y₁≼m , y₂≼m) = Bin⁺ (unfoldT↓ f y₁ m y₁≼m p₁ p₂)
(unfoldT↓ f y₂ m y₂≼m p₁ p₂)

One would wonder whether the condition (y₁ ≼ i × y₂ ≼ i) is too restrictive. When the proposition has to be proved inductively, we may need a stronger inductive hypothesis, for example, that both y₁ and y₂ are bounded by half of suc i. The current definition luckily works for our second task. We may need some generalisation later.

Repeated calls to split⁺ eventually terminates because the lengths of the lists are strictly decreasing. Therefore we define:

_#≤_ : {A : Set} -> List⁺ A -> ℕ -> Set
xs #≤ n = pred (length⁺ xs) ≤ n

With properly defined lemma1 and lemma2, we can expand the tree by unfoldT↓:

expand↓ : {a : Set} -> List⁺ a -> Tree⁺ a
expand↓ {a} xs = unfoldT↓ {a} {List⁺ a} {_#≤_} split⁺ xs
(pred (length⁺ xs)) ≤-refl lemma1 lemma2

where the two lemmas respectively have types:

lemma1 : {a : Set}(xs : List⁺ a) ->
xs #≤ zero -> ∃ a (\x -> split⁺ xs ≡ inj₁ x)
lemma2 : {a : Set} (xs : List⁺ a) (i : ℕ)
(ys : List⁺ a) (zs : List⁺ a) ->
xs #≤ suc i -> split⁺ xs ≡ inj₂ (ys , zs) -> ys #≤ i × zs #≤ i

Definition of lemma1 is a simple application of contradiction. On the other hand, lemma2 is very tedious to define. It appears to me that the definition inevitably becomes cumbersome because split⁺ and lemma2 are defined separately. It could be much easier if split⁺ is defined to return the computed results as well as some proofs about them. For my application in AoPA, however, I would prefer to be able to reuse the old split⁺. I don’t know a good solution yet.

Nils Anders Danielsson suggested that it would be nicer to integrate the bound into the type of the seed, while Conor McBride suggested yet another approach. I will try to summarise them next time.

To be continued…

# AoPA — Algebra of Programming in Agda

2015/03/29: AoPA is on GitHub now, updated to work with Agda 2.4.2.2 and Standard Library 0.9.

An Agda library accompanying the paper Algebra of Programming in Agda: Dependent Types for Relational Program Derivation, developed in co-operation with Hsiang-Shang Ko and Patrik Jansson.

Dependent type theory is rich enough to express that a program satisfies an input/output relational specification, but it could be hard to construct the proof term. On the other hand, squiggolists know very well how to show that one relation is included in another by algebraic reasoning. The AoPA library allows one to encode Algebra of Programming style program derivation, both functional and relational, in Agda.

### Example

The following is a derivation of insertion sort in progress:

isort-der : ∃ (\f → ordered? ○ permute ⊒ fun f )
isort-der = (_ , (
⊒-begin
ordered? ○ permute
⊒⟨ (\vs -> ·-monotonic ordered? (permute-is-fold vs)) ⟩
ordered? ○ foldR combine nil
⊒⟨ foldR-fusion ordered? ins-step ins-base ⟩
foldR (fun (uncurry insert)) nil
⊒⟨ { foldR-to-foldr insert []}0
{ fun (foldr insert [])
⊒∎ }1))

isort : [ Val ] -> [ Val ]
isort = proj₁ isort-der

The type of isort-der is a proposition that there exists a function f that is contained in ordered ? ◦ permute , a relation mapping a list to one of its ordered permutations. The proof proceeds by derivation from the speciﬁcation towards the algorithm. The ﬁrst step exploits monotonicity of and that permute can be expressed as a fold. The second step makes use of relational fold fusion. The shaded areas denote interaction points — fragments of (proof ) code to be completed. The programmer can query Agda for the expected type and the context of the shaded expression. When the proof is completed, an algorithm isort is obtained by extracting the witness of the proposition. It is an executable program that is backed by the type system to meet the speciﬁcation.

The complete program is in the Example directory of the code.

### The Code

The code consists of the following files and folders:

• AlgebraicReasoning: a number of modules supporting algebraic reasoning. At present we implement our own because the PreorderReasoning module in earlier versions of the Standard Library was not expressive enough for our need. We may adapt to the new Standard Library later.
• Data: defining relational fold, unfold, hylomorphism (using well-founded recursion), the greedy theorem, and the converse-of-a-function theorem, etc, for list and binary tree.
• Examples: currently we have prepared four examples: a functional derivation of the maximum segment sum problem, a relational derivation of insertion sort and quicksort (following the paper Functional Algorithm Design by Richard Bird), and solving an optimisation problem using the greedy theorem.
• Relations: modules defining various properties of relations.
• Sets: a simple encoding of sets, upon with Relations are built.

# Agda Exercise: Proving that Mergesort Returns Ordered Lists

Continuing with my Adga programming practice. This time I am implementing the example in another section of Why Dependent Types Matter by Altenkirch, McBride, and McKinna in Agda: proving that mergesort returns an ordered list.

> module Main where

> open import Prelude
> open import Data.Nat hiding (_≤_)
> open import Data.List

### Order

To begin with, a term of type m ≤ n is a proof that a natural number m is at most n:

> data _≤_ : Nat -> Nat -> Set where
>  le0 : {n : Nat} -> zero ≤ n
>  le1 : {m n : Nat} -> m ≤ n -> (1 + m) ≤ (1 + n)

An auxiliary function weaken, which I will need later, weakens a proof of (1+m) ≤ n to a proof of m ≤ n:

> weaken : {y x : Nat} -> (1 + y) ≤ x -> y ≤ x
> weaken (le1 le0) = le0
> weaken (le1 (le1 p)) = le1 (weaken (le1 p))

Given two natural numbers m and n, the function order, to be defined later, determines their order by returning a proof. The proof has to be wrapped in a datatype Order:

> data Order : Nat -> Nat -> Set where
>   leq : {m n : Nat} -> m ≤ n -> Order m n
>   gt : {m n : Nat} -> (1 + n) ≤ m -> Order m n

Agda does not provide a generic case expression, so I have to define my own for Order:

> caseord : {m n : Nat} {a : Set} -> Order m n
>             -> ((m ≤ n) -> a) -> (((1 + n) ≤ m) -> a) -> a
> caseord (leq p) f g = f p
> caseord (gt p) f g = g p

The function order can thus be defined by:

> order : (m : Nat) -> (n : Nat) -> Order m n
> order zero n = leq le0
> order (suc m) zero = gt (le1 le0)
> order (suc m) (suc n) =
>   caseord (order m n) (leq ∘ le1) (gt ∘ le1)

### Ordered List

There are many ways one can express the property that a list is sorted. The choice depends on what one needs for the proof. To prove that merge sorts the list, Altenkirch, McBride, and McKinna suggests the following “external bound” representation of sortedness:

> data OList : Nat -> Set where
>   Nil : {b : Nat} -> OList b
>   Cons : {b : Nat} -> (x : Nat) -> (b ≤ x) -> OList x -> OList b

The datatype OList b represents a sorted list of natural numbers whose elements are all greater than or equal to the bound b. The bound, however, is supplied externally and need not be precise. Every sorted list of natural numbers has type OList 0, for instance. The Cons constructor demands that the newly added element x is bounded by the externally supplied bound b, and that the tail can be bounded by x.

### Merging Sorted Lists

With OList, merge can be verified very easily:

> merge : {b : Nat} -> OList b -> OList b -> OList b
> merge Nil ys = ys
> merge xs Nil = xs
> merge (Cons x blex xs) (Cons y bley ys) =
>   caseord (order x y)
>     (\\ xley -> Cons x blex (merge {x} xs (Cons y xley ys)))
>     (\\ pylex -> Cons y bley (merge {y} (Cons x (weaken pylex) xs) ys))

Note, for example, that Cons y bley ys : OList b is updated to Cons y xley ys : OList x simply by using a different proof.

### Mergesort

The rest are just some routine code: dealing elements to a tree, and perform successive merging by a fold on the tree. Mergesort is defined by mergeT ∘ dealT. The code below is similar to that in my previous blog entry, but without the size constraint.

> data Parity : Set where
>   Even : Parity
>   Odd : Parity

> data Tree (a : Set) : Set where
>   Nul : Tree a
>   Tip : a -> Tree a
>   Bin : Parity -> Tree a -> Tree a -> Tree a

> foldT : {a b : Set}
>           -> (Parity -> b -> b -> b)
>           -> (a -> b)
>           -> b
>           -> Tree a -> b
> foldT f g e Nul = e
> foldT f g e (Tip x) = g x
> foldT f g e (Bin p t u) = f p (foldT f g e t) (foldT f g e u)

> insert : {a : Set} -> a -> Tree a -> Tree a
> insert x Nul = Tip x
> insert x (Tip y) = Bin Even (Tip x) (Tip y)
> insert x (Bin Even t u) =
>    Bin Odd (insert x t) u
> insert x (Bin Odd t u) =
>    Bin Even t (insert x u)

> dealT : List Nat -> Tree Nat
> dealT = foldr insert Nul

> mergeT : Tree Nat -> OList 0
> mergeT = foldT (\p xs ys -> merge xs ys)
>                (\x -> Cons x le0 Nil)
>                Nil

> msort : List Nat -> OList 0
> msort = mergeT ∘ dealT

# Agda Exercise: Sized Mergesort

Continuing with my Adga programming practice. Part of the mergesort example became easy once Max showed me how I should perform the pattern matching in insert.

> open import Prelude
> open import Logic.Identity
> open import Data.Bool
> open import Data.Nat
> open import Data.Nat.Properties

> module Main where

### Length-Indexed Lists

Firstly, define length-indexed lists and its fold function.

> data List (a : Set) : Nat -> Set where
>   [] : List a zero
>   _::_ : {n : Nat} -> a -> List a n -> List a (suc n)

> foldr : {a : Set} -> {n : Nat} -> (b : {m : Nat} -> List a m -> Set)
>           -> ({m : Nat} -> {xs : List a m}
>                   -> (x : a) -> b xs -> b (x :: xs))
>           -> b []
>           -> (xs : List a n) -> b xs
> foldr b f e [] = e
> foldr b f e (x :: xs) = f x (foldr b f e xs)

Apart from the extra parameter b, the body of foldr is not too different from the ordinary fold on lists. For genericity, however, its type is more complicated. I wonder whether it is still appropriate to call it a “fold”, now that xs appears as a parameter. Perhaps that is why people talk about “eliminators” rather than folds or catamorphisms.

### Size-Indexed Balanced Trees

This is one of the possible ways to define sized-indexed balanced trees. The parity bit keep notes of whether the number of the elements is even or odd.

> data Parity : Set where
>   Even : Parity
>   Odd : Parity

> parity : Parity -> Nat
> parity Even = 0
> parity Odd = 1

The Bin constructor enforces the constraint that if the size is even, the two subtrees must have the same size. Otherwise the left subtree contains exactly one more element than the right.

> data Tree (a : Set) : Nat -> Set where
>   Nul : Tree a 0
>   Tip : a -> Tree a 1
>   Bin : {n : Nat} -> (p : Parity) ->
>           Tree a (parity p + n) -> Tree a n -> Tree a (parity p + (n + n))

Fold on trees is defined in a way similar to fold on lists.

> foldT : {a : Set} -> {n : Nat} -> (b : {m : Nat} -> Tree a m -> Set)
>           -> ({m : Nat} -> (p : Parity)
>                 -> {t : Tree a (parity p + m)} -> {u : Tree a m}
>                 -> b t -> b u -> b (Bin p t u))
>           -> ((x : a) -> b (Tip x))
>           -> b Nul
>           -> (xs : Tree a n) -> b xs
> foldT b f g e Nul = e
> foldT b f g e (Tip x) = g x
> foldT b f g e (Bin p t u) = f p (foldT b f g e t) (foldT b f g e u)

### Mergesort

The function insert adds one element to a balanced tree. I did not know how to get the case for Bin Odd t u type-check until Max pointed out that I could just pattern match the implicit argument {n} of Bin.

> insert : {a : Set} -> {n : Nat} ->
>            a -> Tree a n -> Tree a (suc n)
> insert x Nul = Tip x
> insert x (Tip y) = Bin Even (Tip x) (Tip y)
> insert x (Bin Even t u) =
>    Bin Odd (insert x t) u
> insert {a} x (Bin {n} Odd t u) =
>    subst (\\ i -> Tree a (suc i)) (sym (+suc n n))
>      (Bin Even t (insert x u))

The function dealT, dealing elements of the input list to a balanced tree, can thus be defined as a fold:

> dealT : {a : Set} -> {n : Nat} -> List a n -> Tree a n
> dealT {a} = foldr (\{m} xs -> Tree a m) insert Nul

The function merge has been discussed before:

> merge : {m n : Nat} ->
>           List Nat m -> List Nat n -> List Nat (m + n)
> merge [] ys = ys
> merge {suc m} {zero} (x :: xs) Nil =
>   subst (List Nat) (+zero (suc m)) (x :: xs)
> merge {suc m} {suc n} (x :: xs) (y :: ys) =
>    if x < y then
>        x :: merge xs (y :: ys)
>      else (y ::
>            subst (List Nat) (+suc m n) (merge (x :: xs) ys))

The merging phase of mergesort can be seen as a fold on a binary tree:

> mergeT : {n : Nat} -> Tree Nat n -> List Nat n
> mergeT  = foldT (\\ {m} t -> List Nat m)
>                 (\\ {m} p xs ys ->
>                   subst (List Nat) (+assoc (parity p) m m) (merge xs ys))
>                 (\\ x -> x :: [])
>                 []

I needed +assoc to type-cast (parity p + m) + m into parity p + (m + m).

Finally, here comes mergesort.

> msort : {n : Nat} -> List Nat n -> List Nat n
> msort = mergeT  ∘ dealT

Of course, when it comes to sorting, one would expect that we prove that it actually sorts the list than merely that it preserves the length. It will perhaps be left to the next exercise.

# My First Agda Program: Append, Reverse, and Merge

I have been looking for a programming language to practice dependent type with. During the WG 2.1 meeting, my attention was brought to Agda (in fact, Adga 2). The word Agda may refer to both the proof assistant and its input language. It has been under long development, is relatively matured, has good library support. It supports real dependent type, and, pardon me for my personal preference, it is Haskell-like. So far it seems like a perfect choice.

So, here is my first Agda program. I am merely repeating some simple exercises on lists, which I tried using Haskell in a previous entry Developing Programs and Proofs Spontaneously using GADT.

> open import Logic.Identity
> open import Data.Bool
> open import Data.Nat
> open import Data.Nat.Properties
> module Main where

### List Concatenation

Adga has natural numbers defined in Data.Nat. With Nat, one can define lists indexed by their lengths as:

> data List (a : Set) : Nat -> Set where
>  [] : List a zero
>  _::_ : {n : Nat} -> a -> List a n -> List a (suc n)

The definition of append naturally follows:

> append : {a : Set} -> {m , n : Nat} ->
>             List a m -> List a n -> List a (m + n)
> append [] ys = ys
> append (x :: xs) ys = x :: append xs ys

### List Reversal

Unlike in Haskell, identifiers in Adga have to be defined earlier in the file before they can be referred to. In the usual definition of linear-time list reversal using an accumulating parameter, the recursive call yields a list of length m + suc n. I have got to somehow provide a proof that it equals suc m + n.

How does one provide such a proof? In Omega, one would make a list of all lemmas needed in a where clause and the compiler would mysteriously figure out a way to use them. (Well, I am sure it is well documented. I just have not understood its algorithm.) I would like to have more control over what is happening, and therefore would prefer to present the proof as a type cast from f (m + suc n) to f (suc m + n) for arbitrary type functor f. This is what one would do in Epigram and Adga.

psuc : {m : Nat} -> {n : Nat} -> {f : Nat -> Set} ->
f (m + suc n) -> f (suc m + n)
psuc {zero} {n} {f} xs = xs
psuc {suc m} {n} {f} xs = psuc {m} {n} {f'} xs
where f' : Nat -> Set
f' n = f (suc n)

At this point I did not yet know that I could use the lemmas defined in Data.Nat.Properties instead of rolling my own. The function revcat can then be defined as below:

revcat : {a : Set} -> {m n : Nat} ->
List a m -> List a n -> List a (m + n)
revcat {a} {zero} {n} [] ys = ys
revcat {a} {suc m} {n} (x :: xs) ys =
psuc {m} {n} {List a} (revcat xs (x :: ys))

One thing I have yet to figure out is how much of the type information can be inferred. The first clause can be simplified to revcat [] ys = ys. If I omit the type information in the right-hand side of the second clause:

revcat {a} {suc m} {n} (x :: xs) ys =
psuc (revcat xs (x :: ys))

I would expect Agda to tell me that it cannot infer the types. However, the program still appears to typecheck. When I try to execute revcat (1 :: (3 :: [])) (2 :: (4 :: [])) in the interpreter, I get _86 1 1 (3 :: []) (2 :: (4 :: [])), which seems to be an unevaluated expression. Does it not violate the slogan that “well-typed programs don’t go wrong?”

Using the predefined lemma in Data.Nat.Properties:

+suc : (n m : Nat) -> n + suc m ≡ suc (n + m)
+suc  zero   m = refl
+suc (suc n) m = cong suc (+suc n m)

The type cast in revcat can be performed using subst:

>  revcat : {a : Set} -> {m n : Nat} ->
>              List a m -> List a n -> List a (m + n)
>  revcat {a} {zero} {n} [] ys = ys
>  revcat {a} {suc m} {n} (x :: xs) ys =
>     subst (List a) (sym (+suc m n)) (revcat xs (x :: ys))

### Merging

Once I got revcat working, merge is not too difficult.

> merge : {m n : Nat} ->
>           List Nat m -> List Nat n -> List Nat (m + n)
> merge [] ys = ys
> merge {suc m} {zero} (x :: xs) Nil =
>   subst (List Nat) (+zero (suc m)) (x :: xs)
> merge {suc m} {suc n} (x :: xs) (y :: ys) =
>    if x < y then
>        x :: merge xs (y :: ys)
>      else (y ::
>            subst (List Nat) (+suc m n) (merge (x :: xs) ys))

Curiously, the parentheses around the else branch can not be omitted. Without them the expression would be parsed as (if ... then .. else y) :: subst.... The precedence for _::_, which is not declared, is assumed to be higher than if_then_else_.

### Balanced Trees

I tried to go on to define balanced trees, like in Altenkirch, McBride, and McKinna’s introduction to Epigram:

data Parity : Set where
Even : Parity
Odd : Parity

parity : Parity -> Nat
parity Even = 0
parity Odd = 1

data Tree (a : Set) : Nat -> Set where
Nul : Tree a zero
Tip : a -> Tree a (suc zero)
Bin : {m n : Nat} -> (p : Parity) ->
Tree a (parity p + n) -> Tree a n -> Tree a (parity p + (n + n))

The Bin constructor needs a parity bit. If the parity bit is Even, the two subtrees have the same size. Otherwise they differ by one.

However, I have not figured out how to pattern match the size parameter: when defining a function f : {a :Set} -> {n : Nat} -> Tree a n-> ..., Agda complained when I tried this pattern f {a} {n} (Bin Odd t u) = ... because parity p + (n' + n') != n. Perhaps I should start reading some more documents.

# Developing Programs and Proofs Spontaneously using GADT

This entry was also posted to the Haskell-Cafe mailing list.

I am curious about the possibility of developing Haskell programs spontaneously with proofs about their properties and have the type checker verify the proofs for us, in a way one would do in a dependently typed language.

In the exercise below, I tried to redo part of the merge-sort example in Altenkirch, McBride, and McKinna’s introduction to Epigram: deal the input list into a binary tree, and fold the tree by the function merging two sorted lists into one. The property I am going to show is merely that the length of the input list is preserved.

Given that dependent types and GADTs are such popular topics, I believe the same must have been done before, and there may be better ways to do it. If so, please give me some comments or references. Any comments are welcomed.

> {-# OPTIONS_GHC -fglasgow-exts #-}

To begin with, we define the usual type-level representation of natural numbers and lists indexed by their lengths.

> data Z = Z       deriving Show
> data S a = S a   deriving Show

> data List a n where
>   Nil :: List a Z
>   Cons :: a -> List a n -> List a (S n)

### Append

To warm up, let us see the familiar “append” example. Unfortunately, unlike Omega, Haskell does not provide type functions. I am not sure which is the best way to emulate type functions. One possibility is to introduce the following GADT:

> data Plus m n k where    --- m + n = k
>   PlusZ :: Plus Z n n
>   PlusS :: Plus m n k -> Plus (S m) n (S k)

such that Plus m n k represents a proof that m + n = k.

Not having type functions, one of the possible ways to do append is to have the function, given two lists of lengths m and n, return a list of length k and a proof that m + n = k. Thus, the type of append would be:

append :: List a m -> List a n
-> exists k. (List a k, Plus m n k)

In Haskell, the existential quantifier is mimicked by forall. We define:

> data DepSum a p = forall i . DepSum (a i) (p i)

The term “dependent sum” is borrowed from the Omega tutorial of Sheard, Hook, and Linger. Derek Elkins and Dan Licata explained to me why they are called dependent sums, rather than products.

The function append can thus be defined as:

> append :: List a m -> List a n -> DepSum (List a) (Plus m n)
> append Nil ys = DepSum ys PlusZ
> append (Cons x xs) ys =
>    case (append xs ys) of
>      DepSum zs p -> DepSum (Cons x zs) (PlusS p)

Another possibility is to provide append a proof that m + n = k. The type and definition of append would be:

< append :: Plus m n k -> List a m -> List a n -> List a k
< append PlusZ Nil ys = ys
< append (PlusS pf) (Cons x xs) ys = Cons x (append pf xs ys)

I thought the second append would be more difficult to use: to append two lists, I have to provide a proof about their lengths! It turns out that this append actually composes easier with other parts of the program. We will come to this later.

### Some Lemmas

Here are some lemmas represented as functions on terms. The function incAssocL, for example, converts a proof of m + (1+n) = k to a proof of (1+m) + n = k.

> incAssocL :: Plus m (S n) k -> Plus (S m) n k
> incAssocL PlusZ = PlusS PlusZ
> incAssocL (PlusS p) = PlusS (incAssocL p)

> incAssocR :: Plus (S m) n k -> Plus m (S n) k
> incAssocR (PlusS p) = plusMono p

> plusMono :: Plus m n k -> Plus m (S n) (S k)
> plusMono PlusZ = PlusZ
> plusMono (PlusS p) = PlusS (plusMono p)

For example, the following function revcat performs list reversal by an accumulating parameter. The invariant we maintain is m + n = k. To prove that the invariant holds, we have to use incAssocL.

> revcat :: List a m -> List a n -> DepSum (List a) (Plus m n)
> revcat Nil ys = DepSum ys PlusZ
> revcat (Cons x xs) ys =
>     case revcat xs (Cons x ys) of
>        DepSum zs p -> DepSum zs (incAssocL p)

### Merge

Apart from the proof manipulations, the function merge is not very different from what one would expect:

> merge :: Ord a => List a m -> List a n -> DepSum (List a) (Plus m n)
> merge Nil ys = DepSum ys PlusZ
> merge (Cons x xs) Nil = append (Cons x xs) Nil
> merge (Cons x xs) (Cons y ys)
>   | x <= y = case merge xs (Cons y ys) of
>                  DepSum zs p -> DepSum (Cons x zs) (PlusS p)
>   | otherwise = case merge (Cons x xs) ys of
>                   DepSum zs p -> DepSum (Cons y zs) (plusMono p)

The lemma plusMono is used to convert a proof of m + n = k to a proof of m + (1+n) = 1+k.

### Sized Trees

We also index binary trees by their sizes:

> data Tree a n where
>   Nul :: Tree a Z
>   Tip :: a -> Tree a (S Z)
>   Bin :: Tree a n1 -> Tree a n ->
>             (Plus p n n1, Plus n1 n k) -> Tree a k

The two trees given to the constructor Bin have sizes n1 and n respectively. The resulting tree, of size k, comes with a proof that n1 + n = k. Furthermore, we want to maintain an invariant that n1 either equals n, or is bigger than n by one. This is represented by the proof Plus p n n1. In the definition of insertT later, p is either PlusZ or PlusS PlusZ.

### Lists to Trees

The function insertT inserts an element into a tree:

> insertT :: a -> Tree a n -> Tree a (S n)
> insertT x Nul = Tip x
> insertT x (Tip y) = Bin (Tip x) (Tip y) (PlusZ, PlusS PlusZ)
> insertT x (Bin t u (PlusZ, p)) =
>      Bin (insertT x t) u (PlusS PlusZ, PlusS p)
> insertT x (Bin t u (PlusS PlusZ, p)) =
>      Bin t (insertT x u) (PlusZ, PlusS (incAssocR p))

Note that whenever we construct a tree using Bin, the first proof, corresponding to the difference in size of the two subtrees, is either PlusZ or PlusS PlusZ.

The counterpart of foldr on indexed list is defined by:

> foldrd :: (forall k . (a -> b k -> b (S k))) -> b Z
>               -> List a n -> b n
> foldrd f e Nil = e
> foldrd f e (Cons x xs) = f x (foldrd f e xs)

The result is also an indexed type (b n).

The function deal :: List a n -> Tree a n, building a tree out of a list, can be defined as a fold:

> deal :: List a n -> Tree a n
> deal = foldrd insertT Nul

### Trees to Lists, and Merge Sort

The next step is to fold through the tree by the function merge. The first two clauses are simple:

> mergeT :: Ord a => Tree a n -> List a n
> mergeT Nul = Nil
> mergeT (Tip x) = Cons x Nil

For the third clause, one would wish that we could write something as simple as:

mergeT (Bin t u (_,p1)) =
case merge (mergeT t) (mergeT u) of
DepSum xs p -> xs

However, this does not type check. Assume that t has size n1, and u has size n. The DepSum returned by merge consists of a list of size i, and a proof p of type Plus m n i, for some i. The proof p1, on the other hand, is of type P m n k for some k. Haskell does not know that Plus m n is actually a function and cannot conclude that i=k.

To explicitly state the equality, we assume that there is a function plusFn which, given a proof of m + n = i and a proof of m + n = k, yields a function converting an i in any context to a k. That is:

plusFn :: Plus m n i -> Plus m n k
-> (forall f . f i -> f k)

The last clause of mergeT can be written as:

mergeT (Bin t u (_,p1)) =
case merge (mergeT t) (mergeT u) of
DepSum xs p -> plusFn p p1 xs

How do I define plusFn? On Haskell-Cafe, Jim Apple suggested this implementation (by the way, he maintains a very interesting blog on typed programming):

plusFn :: Plus m n h -> Plus m n k -> f h -> f k
plusFn PlusZ PlusZ x = x
plusFn (PlusS p1) (PlusS p2) x =
case plusFn p1 p2 Equal of
Equal -> x

data Equal a b where
Equal :: Equal a a

Another implementation, which looks closer to the previous work on type equality [3, 4, 5], was suggested by apfelmus:

> newtype Equal a b = Proof { coerce :: forall f . f a -> f b }

> newtype Succ f a  = InSucc { outSucc :: f (S a) }

> equalZ :: Equal Z Z
> equalZ = Proof id

> equalS :: Equal m n -> Equal (S m) (S n)
> equalS (Proof eq) = Proof (outSucc . eq . InSucc)

> plusFnEq :: Plus m n i -> Plus m n k -> Equal i k
> plusFnEq PlusZ     PlusZ     = Proof id
> plusFnEq (PlusS x) (PlusS y) = equalS (plusFn x y)

> plusFn :: Plus m n i -> Plus m n k -> f i -> f k
> plusFn p1 p2 = coerce (plusFnEq p1 p2)

Now that we have both deal and mergeT, merge sort is simply their composition:

> msort :: Ord a => List a n -> List a n
> msort = mergeT . deal

The function mergeT can be defined using a fold on trees as well. Such a fold might probably look like this:

> foldTd :: (forall m n k . Plus m n k -> b m -> b n -> b k)
>           -> (a -> b (S Z)) -> b Z
>           -> Tree a n -> b n
> foldTd f g e Nul = e
> foldTd f g e (Tip x) = g x
> foldTd f g e (Bin t u (_,p)) =
>    f p (foldTd f g e t) (foldTd f g e u)

mergeT :: Ord a => Tree a n -> List a n
mergeT = foldTd merge' (\x -> Cons x Nil) Nil
where merge' p1 xs ys =
case merge xs ys of
DepSum xs p -> plusFn p p1 xs

I am not sure whether this is a "reasonable" type for foldTd.

### Passing in the Proof as an Argument

Previously I thought the second definition of append would be more difficult to use, because we will have to construct a proof about the lengths before calling append. In the context above, however, it may actually be more appropriate to use this style of definitions.

An alternative definition of merge taking a proof as an argument can be defined by:

< merge :: Ord a => Plus m n k -> List a m ->
<                       List a n -> List a k
< merge PlusZ Nil ys = ys
< merge pf (Cons x xs) Nil = append pf (Cons x xs) Nil
< merge (PlusS p) (Cons x xs) (Cons y ys)
<   | x <= y    = Cons x (merge p xs (Cons y ys))
<   | otherwise = Cons y (merge (incAssocL p) (Cons x xs) ys)

A definition of mergeT using this definition of merge follows immediately because we can simply use the proof coming with the tree:

< mergeT :: Ord a => Tree a n -> List a n
< mergeT Nul = Nil
< mergeT (Tip x) = Cons x Nil
< mergeT (Bin t u (_,p1)) =
<   merge p1 (mergeT t) (mergeT u)

I don't know which approach can be called more "natural". However, both Jim Apple and apfelmus pointed out that type families, a planned new feature of Haskell, may serve as a kind of type functions. With type families, the append example would look like ( code from apfelmus):

type family   Plus :: * -> * -> *
type instance Plus Z n     = n
type instance Plus (S m) n = S (Plus m n)

append :: (Plus m n ~ k) => List a m -> List a n -> List a k
append Nil         ys = ys
append (Cons x xs) ys = Cons x (append xs ys)

apfelmus commented that "viewed with the dictionary translation for type classes in mind, this is probably exactly the alternative type of append you propose: append :: Plus m n k -> List a m -> List a n -> List a k".

### References

• [1] Thorsten Altenkirch, Conor McBride, and James McKinna. Why Dependent Types Matter.
• [2] Tim Sheard, James Hook, and Nathan Linger. GADTs + Extensible Kinds = Dependent Programming.
• [3] James Cheney, Ralf Hinze. A lightweight implementation of generics and dynamics.
• [4] Stephanie Weirich, Type-safe cast: (functional pearl), ICFP 2000.
• [5]Arthur I. Baars , S. Doaitse Swierstra, Typing dynamictyping, ACM SIGPLAN Notices, v.37 n.9, p.157-166, September 2002

### Appendix. Auxiliary Functions

> instance Show a => Show (List a n) where
>  showsPrec _ Nil = ("[]"++)
>  showsPrec _ (Cons x xs) = shows x . (':':) . shows xs

> instance Show (Plus m n k) where
>   showsPrec _ PlusZ = ("pZ"++)
>   showsPrec p (PlusS pf) = showParen (p>=10) (("pS " ++) .
>                              showsPrec 10 pf)

> instance Show a => Show (Tree a n) where
>   showsPrec _ Nul = ("Nul"++)
>   showsPrec p (Tip x) = showParen (p >= 10) (("Tip " ++) . shows x)
>   showsPrec p (Bin t u pf) =
>     showParen (p>=10)
>      (("Bin "++) . showsPrec 10 t . (' ':) . showsPrec 10 u .
>       (' ':) . showsPrec 10 pf)