Category Archives: Research Blog

Well-Foundedness and Reductivity

Before learning dependent types, all I knew about program termination was from the work of Henk Doornbos and Roland Backhouse [DB95, DB96]. I am curious about the connection between their formulation of well-foundedness by relational calculation and the type theoretic view.

It is generally accepted that a relation _<_ is well-founded if, for all x, there is no infinite chain x > x₁ > x₂ > x₃ > …. . A related concept is inducvitity. We say that a relation admits induction of we can use it to perform induction. For example, the less-than ordering on natural number is inductive.

Most of the type theoretical paper I have seen equate well-foundedness to the concept of accessibilty, which I talked about in a previous post. For example, Bengt Nordström [Nor88] introduced the datatype Acc as a “constructively more useful description” of well-foundedness.

Doornbos and Backhouse generalised well-foundedness and other related notions to an arbitrary F-algebra. Well-foundedness, however, was generalised to “having a unique solution.” A relation R : FA ← A is F-well-founded if there is a unique solution for X = S . FX . R.

Reductivity

Inductivity, on the other hand, is generalised to F-reductivity. While F-reductivity is a generalisation of strong induction, Doornbos and Backhouse called it reductivity “in order to avoid confusion with existing notions of inductivity.” [DB95] (Nevertheless, an equivalent concept in Algebra of Programming modelled using membership relation rather than monotype factors, calls itself “inductivity”. Different usage of similar terminology can be rather confusing sometimes.)

A relation R : FA ← A is F-reductive if

μ(λP . R⧷FP) = id

where is the monotype factor defined by ran(R . A) ⊆ B ≡ A ⊆ R⧷B. The function λP . R⧷FP takes coreflexive (a relation that is a subset if id) P to coreflexive R⧷FP, and μ takes its least fixed-point.

Well, it is a definition that is very concise but also very hard to understand, unless you are one of the few who really like these sort of things. If we take P to be a predicate, R\FP expands to a predicate on x that is true if

∀y . y R x ⇒ FP y

Taking the fixed point means to look for the minimum P such that

∀x . (∀y . y R x ⇒ FP y) ⇒ P x

The expression now resembles strong induction we all know very well. Computationally, how do we take the fixed point? Of course, by defining a recursive datatype:

data Acc {a : Set}(R : a -> a -> Set) : a -> Set where
  acc : (x : a) -> (forall y -> y < x -> f (Acc R) y) -> Acc R x

where f : (a -> Set) -> (F a -> Set) is the arrow-operation of functor F. When we take F to be the identity functor, it is exactly the definition of accessibility in my previous post! Finally, the requirement that μ(λP . R⧷FP) = id means every element in a is accessible.

So, it seems that reductivity of Doornbos and Backhouse is in fact accessibility, which is often taken by type theorists to be an alternative definition of well-foundedness. Doornbos and Backhouse, however, showed that reductivity guarantees uniqueness solution of hylo equations (which is their definition of F-well-foundedness). The converse, however, is not true in general.

New Reductive Relations from Old

Doornbos and Backhouse also developed some principles to build new reductive relations from existing ones. Let us try to see their Agda counterparts. For simplicity, we take F to be the identity functor in the discussion below.

All initial algebras are reductive. For example, given Pred relating n and suc n, one can show that Pred is reductive.

data Pred : ℕ -> ℕ -> Set where
  pre : (n : ℕ) -> Pred n (suc n)

ℕPred-wf : well-found Pred
ℕPred-wf n = acc n (access n)
  where
    access : (n : ℕ) -> (m : ℕ) -> Pred m n -> Acc Pred m
    access .(suc n) .n (pre n) = acc n (access n)

We have seen in the previous post that the less-than relation is reductive.

If R is reductive and S ⊆ R, then S is reductive too:

acc-⊑ : {a : Set}{S R : a -> a -> Set} ->
      S ⊑ R-> (x : a) -> Acc R x -> Acc S x
acc-⊑ {a}{S} S⊑R x (acc .x h) = acc x access
  where
    access : (y : a) -> (y S x) -> Acc S y
    access y y≪x = acc-⊑ S⊑R y (h y (S⊑R y x ySx))

If f . R . f˘ is reductive for some function f, so is R. This is how we often prove termination of loops using f as the variant.

acc-fRf˘ : {a b : Set}{R : a -> a -> Set}{f : a -> b} ->
    (x : a) -> Acc (fun f ○ R ○ ((fun f) ˘)) (f x) -> Acc R x
acc-fRf˘ {a}{b}{R}{f} x (acc ._ h) = acc x access
  where
    access : (y : a) -> R y x -> Acc R y
    access y xRy = acc-fRf˘ y
      (h (f y) (exists x (exists y (≡-refl , xRy) , ≡-refl)))

where the operators for relational composition (_○_) and converse () is that defined in AoPA.

Finally, for the special case where F is the identity functor, R is reductive if and only if its transitive closure R⁺ is. Let us first define transitive closure:

data _⁺ {a : Set}(R : a -> a -> Set) : a -> a -> Set where
  ⁺-base : forall {x y} -> R x y -> (R ⁺) x y
  ⁺-step : forall {x z} -> ∃ a (\y -> R y z × (R ⁺) x y) -> (R ⁺) x z

The “if” direction is easy. Let us prove the “only-if” direction.

acc-tc : {a : Set}{R : a -> a -> Set} ->
    (x : a) -> Acc R x -> Acc (R ⁺) x
acc-tc {a}{R} x ac = acc x (access x ac)
  where
    access : (x : a) -> Acc R x ->
        (y : a) -> (R ⁺) y x -> Acc (R ⁺) y
    access x (acc .x h) y (⁺-base yRx) = acc-tc y (h y yRx)
    access x (acc .x h) y (⁺-step (exists z (zRx , yR⁺z))) =
      access z (h z zRx) y yR⁺z

Combined with the proof that Pred is reductive, we have another way to show that the less-than relation is reductive. One may compare it with the proof of well-found _<′_ in the previous post.

References

[DB95] Henk Doornbos and Roland Backhouse. Induction and recursion on datatypes. B. Moller, editor, Mathematics of Program Construction, 3rd Internatinal Conference, volume 947 of LNCS, pages 242-256. Springer-Verslag, July 1995.

[DB96] Henk Doornbos and Roland Backhouse. Reductivity. Science of Computer Programming, 26, pp. 217--236, 1996.

[Nor88] Bengt Nordström. Terminating general recursion. BIT archive, Volume 28, Issue 3, pp 605-619. 1988.

Well-Founded Recursion and Accessibility

Since Nils Anders Danielsson showed me how to use Induction.WellFounded in Agda’s Standard Library, I have been studying issues related to well-founded recursion.

In dependently typed programming, we would like to ensure that proof terms are terminating. Apart from the type checker, Agda employs a separate termination checker. How it exactly works is still at times mysterious to me, but basically it recognises a program as terminating if some of its arguments become “structurally” smaller after each recursive call. This is already far more expressive than I thought — quite a number of programs can be written by structural recursion. Still, we occasionally need some more flexibility. One of such occasion was when I wished to express terminating unfolds in AoPA. For another example, consider this function mentioned in the paper Terminating General Recursion by Bengt Nordström:

f (n, k) = if k ≤ n then n else f(f (n + 1, k), k)

It is a variation of McCarthy’s 91 function, used as a popular example due to its non-trivial recursion pattern. Can we express this function in Agda while informing the termination checker that it does terminate?

Given a relation < that has been shown to be well-founded, we know that a recursively defined function is terminating if its arguments becomes “smaller” with respect to < at each recursive call. We can encode such well-founded recursion in Agda’s style of structural recusion. The earliest paper I could find about such encoding is Terminating General Recursion by Bengt Nordström. It may have appeared in other forms, however.

Accessibility

Given a binary relation _<_ : a -> a -> Set, an element x is in the well-founded subset of < if there is no infinite chain of x₁, x₂, x₃ ... such that x > x₁ > x₂ > x₃ > .... The relation is well-founded if all elements of a are in its well-founded subset. The less-than ordering on natural number is well-founded, while the greater-than ordering is not. A more constructive way to talk about well-foundedness is via accessibility. Given a relation _<_ : a -> a -> Set, the proposition Acc _<_ x states that x is accessible under relation <. An element x : a is accessible if all y : a "smaller" than x are accessible:

data Acc {a : Set}(_<_ : a -> a -> Set) : a -> Set where
  acc : (x : a) -> (forall y -> y < x -> Acc _<_ y) -> Acc _<_ x

A relation _<_ : a -> a -> Set is well-founded if all elements in a are accessible:

well-found : {a : Set} -> (a -> a -> Set) -> Set
well-found _<_ = forall x -> Acc _<_ x

For an example, consider the less-than-or-equal-to ordering on natural numbers. It is usually modelled by a datatype stating that 0 ≤ n for all n and that 1+m ≤ 1+n follows from m ≤ n. The strictly-less-than ordering is defined by m < n = 1+m ≤ n. For well-foundedness proofs, however, the following definitions turn out to be more convenient:

data _≤′_ : ℕ -> ℕ -> Set where
  ≤′-refl : forall {n} -> n ≤′ n
  ≤′-step : forall {m n} -> m ≤′ n -> m ≤′ suc n

_<′_ : ℕ -> ℕ -> Set
m <′ n = suc m ≤′ n

Let's prove that <′ is well-founded:

ℕ<-wf : well-found _<′_
ℕ<-wf n = acc n (access n)
  where
    access : (n : ℕ) -> (m : ℕ) -> m <′ n -> Acc _<′_ m
    access zero m ()
    access (suc n) .n ≤′-refl = acc n (access n)
    access (suc n) m (≤′-step m<n) = access n m m<n

An operational intuition for access: knowing that m <′ n, access demonstrates that it is possible to reach m from n in a finite number of steps.

Well-Founded Recursion

As an inductively defined datatype, Acc naturally induces a fold operator:

acc-fold : {a : Set} (_<_ : a -> a -> Set) {P : a -> Set} ->
  ((x : a) -> (forall y -> y < x -> P y) -> P x) ->
    (z : a) -> Acc _<_ z -> P z
acc-fold _<_ Φ z (acc .z h) =
  Φ z (\y y<z -> acc-fold _<_ Φ y (h y y<z))

Its type says that if we can provide a function Φ that proves some property P x for all x provided that P y has been proved for all y < x, then P holds for all accessible elements in a. Recall strong induction.

When < is well-founded, all elements are accessible. Therefore we can define:

rec-wf : {a : Set} {_<_ : a -> a -> Set} {P : a -> Set} ->
  well-found _<_ ->
    ((x : a) -> ((y : a) -> y < x -> P y) -> P x) ->
      (x : a) -> P x
rec-wf {a}{_<_} wf f x = acc-fold _<_ f x (wf x)

For an example, consider the function div : ℕ -> ℕ -> ℕ performing integral division (but letting div zero zero be zero):

div : ℕ -> ℕ -> ℕ
div zero n = zero
div (suc m) zero = suc m
div (suc m) (suc n) = suc (div (suc m ∸ suc n) (suc n))

If we define div by well-founded recursion with explicit fixed-point, it looks like:

div-wf : ℕ -> ℕ -> ℕ
div-wf m n = rec-wf ℕ<-wf (body n) m
  where
    body : (n : ℕ) -> (m : ℕ) -> ((k : ℕ) -> k <′ m -> ℕ) -> ℕ
    body n zero rec = zero
    body zero (suc m) rec = suc m
    body (suc n) (suc m) rec =
        suc (rec (suc m ∸ suc n) (sm-sn<sm m n))

where sm-sn<sm : forall m n -> (m ∸ n) <′ suc m. Notice how rec-wf ℕ<-wf behaves like a fixed-point operator taking the "fixed-point" of body. The recursive call is replaced by a function rec, having type (m : ℕ) -> m <′ n -> ℕ. To call rec, the programmer has to provide a proof that the argument is indeed getting smaller with respect to the relation _<_.

The function could also be written without explicit fixed-point, but with explicit pattern matching with the accessibility proof:

div-wf' : forall m -> ℕ -> Acc _<′_ m -> ℕ
div-wf' zero n _ = zero
div-wf' (suc m) zero _ = suc m
div-wf' (suc m) (suc n) (acc .(suc m) h) =
  suc (div-wf' (suc m ∸ suc n) (suc n)
        (h (suc m ∸ suc n) (sm-sn<sm m n)))

Upper-Bounded Reverse Ordering

As a practice, let us consider another ordering. For a given natural number k, define n ◁ m = m < n ≤ k. That is, for m and n upper-bounded by k, the one closer to k is considered smaller. Apparently is well-founded. I tried to encode it in Agda by:

_<_≤_ : ℕ -> ℕ -> ℕ -> Set
m < n ≤ k = (m <′ n) × (n ≤′ k)

◁ : ℕ -> ℕ -> ℕ -> Set
(◁ k) m n = n < m ≤ k

However, the accessibility proof became very difficult to construct. I ended up defining yet another variation of the less-than-or-equal-to ordering:

data _≤′′_ : ℕ -> ℕ -> Set where
  ≤′′-refl : forall {n} -> n ≤′′ n
  ≤′′-step : forall {m n} -> suc m ≤′′ n -> m ≤′′ n

and defined m < n ≤ k = (m <′′ n) × (n ≤′′ k).

A number of properties can be proved about ≤′:

m≤n⇒m≤1+n : forall {m n} -> m ≤′′ n -> m ≤′′ (suc n)
m<n⇒¬n≤m : forall {m n} -> m <′′ n -> ¬ (n ≤′′ m)
¬[m<n∧n≤m] : forall {m n} -> ¬ (m <′′ n × n ≤′′ m)
≤′′-trans : forall {m n k} -> m ≤′′ n -> n ≤′′ k -> m ≤′′ k
≤′′-antisym : forall {m n} -> m ≤′′ n -> n ≤′′ m -> m ≡ n

Some of the properties above depends on the following:

¬1+n≤n : forall {n} -> ¬ (suc n ≤′′ n)

which I have not yet proved.

Now let us show that ◁ k is well-founded.

ℕ◁-wf : forall k -> well-found (◁ k)
ℕ◁-wf k m = acc m access
  where
   access : forall n -> (m < n ≤ k) -> Acc (◁ k) n
   access n (m<n , n≤k) =
    ◁-access k m n (≤′′-trans m<n n≤k) (m<n , n≤k)

The function ◁-access needs, apart from proofs of m < n and n ≤ k, a proof of m < k, which can be established by transitivity of ≤′. The definition is shown below:

◁-access : forall k m n ->
    (m <′′ k) -> (m < n ≤ k) -> Acc (◁ k) n
◁-access .(suc m) m .(suc m) ≤′′-refl (≤′′-refl , 1+m≤1+m) =
  acc (suc m)
    (\y 1+m<y∧y≤1+m -> ⊥-elim (¬[m<n∧n≤m] 1+m<y∧y≤1+m))
◁-access k m .(suc m) (≤′′-step 1+m<k) (≤′′-refl , 1+m≤k) =
  acc (suc m) (\y -> ◁-access k (suc m) y 1+m<k)
◁-access .(suc m) m n ≤′′-refl (≤′′-step 1+m<n , n≤1+m) =
  acc n (⊥-elim (m<n⇒¬n≤m 1+m<n n≤1+m))
◁-access k m n (≤′′-step 1+m<k) (≤′′-step 1+m<n , n≤k) =
  ◁-access k (suc m) n 1+m<k (1+m<n , n≤k)

It is perhaps quite hard to understand unless you try to construct it yourself. The reader only needs to notice that in the two recursive calls to ◁-access (in the second and the last clause), the argument m is incremented to suc m. That is why we need to include a proof term of m <′′ k as an argument, which becomes structurally smaller in the recursive calls. This is how we inform the termination checker that, although m increases, the distance between m and k shrinks. The proof of n ≤ k, on the other hand, is used to establish the contradictions in the first and the third clause.

A Variation of McCarthy's 91 Function

Back to the function f mentioned in the beginning, whose native Agda translation:

f k n with k ≤? n
... | yes _ = n
... | no _ = f k (f k (suc n))

certainly does not pass the termination check.

In fact, the second argument (n) to f becomes smaller with respect to ◁ k. This is obvious for the inner recursive call (f k (suc n)) since k ≰ n implies n < 1 + n ≤ k. To prove the termination of the outer recursive call, we have to show that f k n actually computes the maximum of k and n. Since k ≰ n, we have f k (suc n) = k. Therefore, f k (suc n) is smaller than n with respect to ◁ k.

To prove the termination of f we have to prove something about its result. One of the possible ways to do so is to define:

P : ℕ -> ℕ -> Set
P k n = ∃ ℕ (\fn -> n ≤′′ k -> fn ≡ k)

and extend the type of f to:

f : (k : ℕ) -> (n : ℕ) -> P k n

Now that f returns an existential type, we have to pattern math its result. A definition of f may look like:

f k = rec-wf (ℕ◁-wf k) body
 where
  body : (i : ℕ) -> ((j : ℕ) -> i < j ≤ k -> P k j) -> P k i
  body n rec with (k ≤′′? n)
  ... | inj₁ k≤n = exists n (\n≤k -> ≤′′-antisym n≤k k≤n)
  ... | inj₂ n<k with rec (1 + n) (≤′′-refl , n<k)
  ...  | exists m 1+n≤k⇒m≡k with rec m (n<m , m≤k)
  ...    | exists h m≤k⇒h≡k = exists h (\_ -> h≡k)
   where
    m≡k = 1+n≤k⇒m≡k n<k
    n<m = ≡-subst (\i -> n <′′ i) (≡-sym m≡k) n<k
    m≤k = ≡-subst (\i -> m ≤′′ i) m≡k ≤′′-ref
    h≡k = m≤k⇒h≡k m≤k

However, Agda's scoping rule with the presence of both with and where still confuses me. Eventually I had to inline the subordinate proofs, which gives me:

f : (k : ℕ) -> (n : ℕ) -> P k n
f k = rec-wf (ℕ◁-wf k) body
 where
  body : (i : ℕ) -> ((j : ℕ) -> i < j ≤ k -> P k j) -> P k i
  body n rec with (k ≤′′? n)
  ... | inj₁ k≤n = exists n (\n≤k -> ≤′′-antisym n≤k k≤n)
  ... | inj₂ n<k with rec (1 + n) (≤′′-refl , n<k)
  ...  | exists m 1+n≤k⇒m≡k with 1+n≤k⇒m≡k n<k
  ...   | m≡k with rec m (≡-subst (\i -> n <′′ i) (≡-sym m≡k) n<k ,
        ≡-subst (\i -> m ≤′′ i) m≡k ≤′′-refl)
  ...    | exists h m≤k⇒h≡k =
          exists h (\_ -> m≤k⇒h≡k (≡-subst (\i -> m ≤′′ i) m≡k ≤′′-refl))

Unfortunately it looks quite remote from the original f. Nevertheless, it does pass the termination checker of Agda.

Programs

Terminating Unfolds (2)

After seeing our code, Nils Anders Danielsson suggested two improvements. Firstly, to wrap the bound in the seed. The terminating unfoldr↓ would thus have a simpler type as well as a simpler implemenation:

unfoldr↓ : {a : Set}(b : ℕ -> Set){n : ℕ} ->
    (f : forall {i} -> b (suc i) -> ⊤ ⊎ (a × b i)) ->
      b n -> [ a ]
unfoldr↓ b {zero} f y = []
unfoldr↓ b {suc i} f y with f y
... | inj₁ _ = []
... | inj₂ (x , y') = unfoldr↓ b {i} f y'

The definition passes the termination check, apparently, because unfoldr↓ is defined inductively on n.

To generate a descending a list, one may invent a datatype Wrap that wraps the seed, whose bound is simply the seed itself:

data Wrap : ℕ -> Set where
      W : (n : ℕ) -> Wrap n

The list may then be generated by an unfold:

dec : forall {i} -> Wrap (suc i) -> ⊤ ⊎ (ℕ × Wrap i)
dec {i} (W .(suc i)) = inj₂ (i , W i)

down↓ : ℕ -> [ ℕ ]
down↓ n = unfoldr↓ Wrap dec (W n)

Of course, this would defeat my original goal of reusing the non-dependently typed dec, which is probably a bad idea anyway.

To show that the bound need not be exact, let’s try to generate a descending list whose elements are decremented by 2 in each step. We may use this slightly generalised wrapper:

data Wrap2 : ℕ -> Set where
      W2 : (x : ℕ) -> (bnd : ℕ) -> x ≤ bnd -> Wrap2 bnd

and a function dec2 that decrements a suc i-bounded seed by 2 but showing that the new seed is bounded by i:

dec2 : forall {i} -> Wrap2 (suc i) -> ⊤ ⊎ (ℕ × WAlt i)
dec2 {i} (W2 0 .(1 + i) _) = inj₁ tt
dec2 {i} (W2 1 .(1 + i) _) = inj₁ tt
dec2 {i} (W2 (suc (suc n)) .(1 + i) 2+n≤1+i) =
    inj₂ (n , W2 n i (suc-≤-weaken-l (≤-pred 2+n≤1+i)))

The list can then be unfolded by:

down↓2 : ℕ -> [ ℕ ]
down↓2 n = unfoldr↓ Wrap2 dec2 (W2 n n ≤-refl)

where suc-≤-weaken-l is a proof of forall {m n} -> suc m ≤ n -> m ≤ n.

Unfolding a Tree

It is an easy exercise to come up with a tree version of the unfold above:

unfoldT↓ : {a : Set}(b : ℕ -> Set){n : ℕ} ->
  (f : forall {i} -> b (suc i) -> a ⊎ (b i × b i)) ->
    B n -> Tree a
unfoldT↓ b {0} f y = Nul
unfoldT↓ b {suc i} f y with f y
... | inj₁ x = Tip x
... | inj₂ (y₁ , y₂) =
    Bin (unfoldT↓ b {i} f y₁) (unfoldT↓ b {i} f y₂)

To deal with the second task of building a roughly balanced binary tree, one may try this wrapper:

data Split (a : Set): ℕ -> Set where
  Sp : (xs : [ a ]) -> (bnd : ℕ) ->
    length xs < bnd -> Split a bnd

and try to code up a generator function split↓ having this type:

split↓ : forall {a i} -> Split a (suc i) -> a ⊎ (Split a i × Split a i)

The function split↓ I eventually come up with, however, is much more complicated than I had wished. Even worse, it is now split↓ that fails to pass the termination check.

Ulf Norell suggested some possible fixes. The difficulties, however, is probably a hint that there is something wrong in my approach in the first place. Rather than trying to fix it, Nils Anders showed me how he would tackle the problem from the beginning.

Using Well-Founded Recursion

Nils Anders showed me how to define unfoldT↓ using well-founded recursion. For a simplified explanation, the Standard Library provides a function <-rec having type (after normalisation):

<-rec : (P : ℕ -> Set) ->
  (f : (i : ℕ) -> (rec : (j : ℕ) -> j <′ i -> P j) -> P i) ->
    (x : ℕ) -> P x

With <-rec one can define functions on natural numbers by recursion, provided that the argument strictly decreases in each recursive call. P is the type of the result, parameterised by the input. The function f is a template that specifies the body of the recursion which, given i, computes some result of type P i. The functional argument rec is supposed to be the recursive call. The constraint j <′ i, however, guarantees that it accepts only inputs strictly smaller than i (the ordering <′ is a variation of < that is more suitable for this purpose). One may perhaps think of <-rec as a fixed-point operator computing the fixed-point of f, only that f has to take i before rec because the latter depends on the former.

Let us try to define an unfold on trees using <-rec. The "base-functor" of the datatype Tree⁺ a is F b = a ⊎ b × b. One of the lessons we have learnt is that it would be more convenient for the generating function to return the bound information. We could use a type like this:

F a b k = a ⊎ ∃ (ℕ × ℕ) (\(i , j) -> b i × b j × i <′ k × j <′ k)

But it is perhaps more reader-friendly to define the base functor as a datatype:

data Tree⁺F (a : Set) (b : ℕ -> Set) : ℕ -> Set where
  tip : forall {k} -> a -> Tree⁺F a b k
  bin : forall {i j k} ->
      b i -> b j -> i <′ k -> j <′ k -> Tree⁺F a b k

The generating function for the unfold should thus have type forall {a b i} -> b i -> Tree⁺F a b i.

The function unfoldT↓F is the template for unfoldT↓:

unfoldT↓F : {a : Set} {b : ℕ -> Set} ->
  (f : forall {i} -> b i -> Tree⁺F a b i) ->
   (n : ℕ) -> ((i : ℕ) -> i <′ n -> b i -> Tree⁺ a) ->
    b n -> Tree⁺ a
unfoldT↓F f n rec y with f y
... | tip a = Tip⁺ a
... | bin {i} {j} y₁ y₂ i<n j<n =
    Bin⁺ (rec i i<n y₁) (rec j j<n y₂)

Now unfoldT↓ can be defined by:

unfoldT↓ : {a : Set} {b : ℕ -> Set} ->
  (f : forall {i} -> b i -> Tree⁺F a b i) ->
    forall {n} -> b n -> Tree⁺ a
unfoldT↓ {a}{b} f {n} y =
    <-rec (\n -> b n -> Tree⁺ a) (unfoldT↓F f) n y

All the definition above makes no recursive calls. All the tricks getting us through the termination check are hidden in <-rec. How is it defined?

Well-Founded Recursion Defined on <′

Currently, well-founded recursion are defined in Induction and its sub-modules. They are very interesting modules to study. The definitions there, however, are very abstract. To understand how <-rec works, let's try to implement our own.

This is the definition of <′ from Data.Nat:

data _≤′_ : Rel ℕ where
≤′-refl : forall {n} -> n ≤′ n
≤′-step : forall {m n} -> m ≤′ n -> m ≤′ suc n

_<′_ : Rel ℕ
m <′ n = suc m ≤′ n

Recall that the recursion template f has type forall i -> (forall j -> j <′ i -> P j) -> P i. That is, given i, f computes P i, provided that we know of a method to compute P j for all j <′ i. Let us try to construct such a method using f. The function guard f i computes f, provided that the input is strictly smaller than i:

guard : {P : ℕ -> Set} ->
   (forall i -> (forall j -> j <′ i -> P j) -> P i) ->
      forall i -> forall j -> j <′ i -> P j
guard f zero _ ()
guard f .(suc j) j ≤′-refl = f j (guard f j)
guard f (suc i) j (≤′-step j<i) = guard f i j j<i

Observe that guard terminates because it is defined inductively on i. If we discard the type information, all what guard f i j is to make a call to f j. Before doing so, however, it checks through the proof to make sure that j is strictly smaller than i. In f j (guard f j), the call to guard f j ensures that subsequent calls to f are given arguments smaller than j.

Now <-rec can be defined by:

<-rec : (P : ℕ -> Set) ->
   (forall i -> (forall j -> j <′ i -> P j) -> P i) ->
      forall n -> P n
<-rec P f n = f n (guard f n)

In Induction.Nat, <-rec is an instance of well-founded recursion defined using the concept of accessibility, defined in Induction.WellFounded. I find them very interesting modules about which I hope to understand more.

To be continued...

Terminating Unfolds (1)

The AoPA(Algebra of Programming in Agda) library allows one to derive programs in a style resembling that in the book Algebra of Programming in Agda. For example, the following is a derivation of insertion sort:

sort-der : ∃ ([ Val ] -> [ Val ]) (\f -> ordered? ○ permute ⊒ fun f )
sort-der = exists (
  ⊒-begin
      ordered? ○ permute
  ⊒⟨ (\vs -> ·-monotonic ordered? (permute-is-fold vs)) ⟩
      ordered? ○ foldR combine nil
  ⊒⟨ foldR-fusion ordered? ins-step ins-base ⟩
      foldR (fun (uncurry insert)) nil
  ⊒⟨ foldR-to-foldr insert [] ⟩
      fun (foldr insert [])
  ⊒∎)

isort : [ Val ] -> [ Val ]
isort = witness sort-der

The type of sort-der is a proposition that there exists a program of type [ Val ] → [ Val ] that is contained in ordered ? ◦ permute, a relation mapping a list to one of its ordered permutations. The combinator fun embeds a function in a relation. Relations on each line are related by relational inclusion (), while the proof of inclusion are given in angle brackets. In the final step, the relation foldR (fun (uncurry insert)) nil is refined to a function using the rule:

foldR-to-foldr : {A B : Set} -> (f : A -> B -> B) -> (e : B) ->
    foldR (fun (uncurry f)) (singleton e) ⊒ fun (foldr f e)

which says that a relational fold is a functional fold if the algebra (f) is a function. When the proof is completed, an algorithm isort is obtained by extracting the witness of the proposition. It is an executable program that is backed by the type system to meet the specification.

One may wonder: why insertion sort, rather than something more interesting such as mergesort or quicksort? Because we have not yet figured out how to properly deal with hylomorphism yet. One of the things still missing in AoPA is a treatment of unfolds. AoPA attempts to model the category of sets and relations. In this context one can talk about unfolds because they are simply the converses of folds. We perform the derivations in terms of folds, and refine the converse of a relational fold to a functional unfold provided that the coalgebra is a function:

foldR˘-to-unfoldr : {A B : Set} -> (f : B -> ⊤ ⊎ (A × B)) ->
    (foldR ((fun f) ˘ ○ (fun inj₂)) (\b -> isInj₁ (f b))) ˘ ⊒ fun (unfoldr f)

where ˘ denotes relational converse and unfoldr is given by the usual definition (one that a Haskell programmer would expect, modulo syntax):

unfoldr : {A B : Set} -> (B -> ⊤ ⊎ (A × B)) -> B -> [ A ]
unfoldr f b with f b
... | inj₁ _ = []
... | inj₂ (a , b') = a ∷ unfoldr f b'

The only problem is that this unfoldr cannot pass the termination check of Agda.

My first thought was to wait until Agda supports codata and coinduction. However, the reason I wanted an unfold in AoPA in the first place was to compose it with a fold to form a hylomorphism, which is not allowed if the unfold returns codata. I then attempted to model unfolds in a way similar to what we did in relational program derivation. We know that the recursive equation:

R = S ○ FR ○ T

has a unique solution R = ⦇S⦈ ○ ⦇T⦈˘ if and only if T is well-founded. In program derivation, the proof of well-foundedness is usually given separately. Therefore I hope to come up with a variation of unfoldr which, given a proof of well-foundedness, somehow passes the termination check.

During DTP 2008, I consulted a number of people and was suggested at least three solutions. I am going to summarise them in turn.

Canonical Examples

With each approach I would like to tackle two tasks. Given the following function dec:

dec : ℕ -> ⊤ ⊎ (ℕ × ℕ)
dec zero = inj₁ tt
dec (suc n) = inj₂ (n , n)

One may attempt to define a function down using unfoldr such that down n produces a list from n to 0. As mentioned above, unfoldr cannot pass the termination check. In fact, even the following definition fails the termination check:

down : ℕ -> [ ℕ ]
down n with dec n
... | inj₁ _ = []
... | inj₂ (a , n') with down n'
...    | x = a ∷ x

The first task is to define down using some kind of unfold.

In the second tasks, given a non-empty list, we would like to distribute its elements to a roughly balanced binary tree. The list and the tree are define by:

data List⁺ (a : Set) : Set where
    [_]⁺ : a -> List⁺ a
    _∷⁺_ : a -> List⁺ a -> List⁺ a
data Tree⁺ (a : Set) : Set where
    Tip⁺ : a -> Tree⁺ a
    Bin⁺ : Tree⁺ a -> Tree⁺ a -> Tree⁺ a

If we apply, for example, foldT merge wrap, we get merge sort (I use non-empty lists to save the constructor for empty trees). In Haskell, one might first define a function that splits a list to roughly two halfs, something equivalent to this:

split⁺ : {a : Set} -> List⁺ a -> a ⊎ (List⁺ a × List⁺ a)
split⁺ [ x ]⁺ = inj₁ x
split⁺ (x ∷⁺ xs) with split⁺ xs
... | inj₁ y = inj₂ ([ y ]⁺ , [ x ]⁺)
... | inj₂ (ys , zs) = inj₂ (zs , x ∷⁺ ys)

and use an unfold on Tree⁺. The second task is to construct a function expand : {a : Set} -> List⁺ a -> Tree⁺ a that builds a roughly balanced tree, using some kind of unfold that makes use of split⁺, or at least some variation of it.

Josh Ko’s Approach

Josh and I tried to extend unfolds with extra arguments representing a bound and proofs that the bound decreases with each call to f:

unfoldr↓ : {a b : Set} {_≾_ : b -> ℕ -> Set} ->
  (f : b -> ⊤ ⊎ (a × b)) -> (y : b) -> (t : ℕ) ->
  y ≾ t -> (forall {y} -> y ≾ zero -> f y ≡ inj₁ tt) ->
    (forall {y} t' {x y'} ->
         y ≾ suc t' -> f y ≡ inj₂ (x , y') -> y' ≾ t')
      -> [ a ]

Apart from the generating function f and the seed b, the function unfoldr↓ takes a bound t, a natural number. The seed and the bound is related by , to be defined for each specific problem. Before producing the list, unfoldr↓ demands that the current seed is bounded by the bound (y ≾ t), a proof that f must yield inj₁ when the bound is zero, and a proof that if f returns inj₂ (x , y'), the new seed y' is bounded by a strictly smaller bound.

To define unfoldr↓ we need the inspect idiom (see the explanation in Relation.Binary.PropositionalEquality):

unfoldr↓ f y t y≾t p1 p2 with inspect (f y)
... | (inj₁ tt) with-≡ inj₁≡fy = []
... | (inj₂ (x , y')) with-≡ inj₂≡fy with t
...  | zero = contradiction (p1 y≾t)
        (\fy≡inj₁ -> inj₁≢inj₂ (≡-sym (≡-trans inj₂≡fy fy≡inj₁)))
...  | suc t' = x ∷ unfoldr↓ f y' t'
            (p2 t' y≾t (≡-sym inj₂≡fy)) p1 p2

Notice that unfoldr↓ passes the termination check because it pattern matches on t.

Descending List

For the first task, the seed is a natural number, and we may simply define to be :

_≾_ : ℕ -> ℕ -> Set
n ≾ c = n ≤ c

Then down↓ can be defined in terms of unfoldr↓ and dec by:

stop : forall {n} -> n ≾ zero -> dec n ≡ inj₁ tt
stop {.zero} z≤n = ≡-refl

proceed : forall {n} c' {x n'} -> n ≾ suc c' -> dec n ≡ inj₂ (x , n') -> n' ≾ c'
proceed {.zero} c' {x} {n'} z≤n dec-zero≡inj₂ = contradiction dec-zero≡inj₂ inj₁≢inj₂
proceed {suc m} c' {.m} {.m} (s≤s m≤c') ≡-refl = m≤c'

down↓ : ℕ -> [ ℕ ]
down↓ n = unfoldr↓ dec n n ≤-refl stop proceed

where ≤-refl : forall x -> x ≤ x.

Expanding a Binary Tree

The unfold above can be generalised to trees in the obvious way:

unfoldT↓ : {a b : Set} {_≼_ : b -> ℕ -> Set} ->
  (f : b -> a ⊎ b × b) -> (y : b) -> (n : ℕ) ->
   y ≼ n -> (forall y -> y ≼ zero -> ∃ a (\x -> f y ≡ inj₁ x)) ->
    (forall y i y₁ y₂ ->
        y ≼ suc i -> f y ≡ inj₂ (y₁ , y₂) -> (y₁ ≼ i × y₂ ≼ i))
    -> Tree⁺ a
unfoldT↓ f y n y≼n p₁ p₂ with inspect (f y)
... | inj₁ x with-≡ _ = Tip⁺ x
... | inj₂ (y₁ , y₂) with-≡ inj₂≡fy with n
...  | zero = contradiction (proof (p₁ y y≼n))
      (\fy≡inj₁ -> inj₁≢inj₂ (≡-sym (≡-trans inj₂≡fy fy≡inj₁)))
...  | suc m with p₂ y m y₁ y₂ y≼n (≡-sym inj₂≡fy)
...    | (y₁≼m , y₂≼m) = Bin⁺ (unfoldT↓ f y₁ m y₁≼m p₁ p₂)
      (unfoldT↓ f y₂ m y₂≼m p₁ p₂)

One would wonder whether the condition (y₁ ≼ i × y₂ ≼ i) is too restrictive. When the proposition has to be proved inductively, we may need a stronger inductive hypothesis, for example, that both y₁ and y₂ are bounded by half of suc i. The current definition luckily works for our second task. We may need some generalisation later.

Repeated calls to split⁺ eventually terminates because the lengths of the lists are strictly decreasing. Therefore we define:

_#≤_ : {A : Set} -> List⁺ A -> ℕ -> Set
xs #≤ n = pred (length⁺ xs) ≤ n

With properly defined lemma1 and lemma2, we can expand the tree by unfoldT↓:

expand↓ : {a : Set} -> List⁺ a -> Tree⁺ a
expand↓ {a} xs = unfoldT↓ {a} {List⁺ a} {_#≤_} split⁺ xs
      (pred (length⁺ xs)) ≤-refl lemma1 lemma2

where the two lemmas respectively have types:

lemma1 : {a : Set}(xs : List⁺ a) ->
   xs #≤ zero -> ∃ a (\x -> split⁺ xs ≡ inj₁ x)
lemma2 : {a : Set} (xs : List⁺ a) (i : ℕ)
  (ys : List⁺ a) (zs : List⁺ a) ->
  xs #≤ suc i -> split⁺ xs ≡ inj₂ (ys , zs) -> ys #≤ i × zs #≤ i

Definition of lemma1 is a simple application of contradiction. On the other hand, lemma2 is very tedious to define. It appears to me that the definition inevitably becomes cumbersome because split⁺ and lemma2 are defined separately. It could be much easier if split⁺ is defined to return the computed results as well as some proofs about them. For my application in AoPA, however, I would prefer to be able to reuse the old split⁺. I don’t know a good solution yet.

Nils Anders Danielsson suggested that it would be nicer to integrate the bound into the type of the seed, while Conor McBride suggested yet another approach. I will try to summarise them next time.

To be continued…

Constructing List Homomorphism from Left and Right Folds

Since Josh has already mentioned it, I had better give it a full account.

Back in 2003 when I just started my postdoc in PSD Lab in University of Tokyo, my colleagues there were discussing about the third homomorphism theorem. The theorem says that if a function f can be expressed both as a foldr and a foldl:

f = foldr (≪) e
f = foldl (≫) e

there exists some associative binary operator such that

f [] = e
f [a] = a ≪ e = e ≫ a
f (xs ⧺ ys) = f xs ⊚ f ys

Being a list homomorphism implies the potential of parallel computation.

The paper we studied was, of course, The Thrid Homomorphism Theorem by Jeremy. The aim then was to automatically, or semi-automatically, derive , given , , and e. The motivating examples include:

  • f = sum, where can simply be +.
  • f = sort, where can be the function merge merging two sorted lists.
  • f = scanr ⊛ e. While scanr appears to be an inherently sequential function, it is actually possible to compute f “from the middle”, provided that is associative, if we take xs ⊚ (y : ys) = map (⊛y) xs ⧺ (y : ys).

Can we come up with a method to derive for all these examples?

I myself was not particularly interested in automation. I was interested in the challenge for two reasons. Firstly, it appears that some kind of inverse function is needed. Secondly, when I looked at Jeremy’s proof, I felt there is a relational proof inside waiting to be discovered. So I tried.

Setting the Stage

For the ease of point-free calculation, we define alternatives of folds where the input is paired with the base-cases:

foldrp (≪) ([],a) = a
foldrp (≪) (x:xs,a) = x ≪ foldrp (≪) (xs,a)
foldlp (≫) (a,[]) = a
foldlp (≫) (a, xs⧺[x]) = foldlp (≫) (a,xs) ≫ x

The advantage is that we can shift a suffix or a prefix of the input list to the base case. That is:

foldr (≪) e (xs⧺ys) = foldrp≪ (xs, foldr≪ e ys)
foldl (≫) e (xs⧺ys) = foldlp≪ (foldl ≪ e xs, ys)

or, in point-free style:

foldr (≪) e . cat = foldrp (≪) . (id × foldr (≪) e) (1)
foldl (≫) e . cat = foldlp (≫) . (foldlp (≫) e × id) (2)

where cat (xs,ys) = xs ⧺ ys and (f × g) (a,b) = (f a, g b).

The key property, however, is one that was shown in (possibly among other literature) Algebra of Programming: for a simple relation (i.e. a partial function) S, we have:

S . S° . S = S

where ° stands for relational converse (the relational concept of an “inverse function”).

Recall our aim: given f, look for such that f xs ⊚ f ys = f (xs⧺ys). It translates to point-free style as ⊚ . (f × f) = f . cat.

Proving the Theorem

The third homomorphism theorem is a direct corollary of the following lemma:

Lemma 1: f = foldr (≪) e = f = foldl (≫) e implies that f is prefix and postfix insensitive. That is:

f xs = f xs’ ∧ f ys = f ys’ ⇒ f (xs⧺ys) = f (xs’⧺ys’).

In point-free style: f . cat . (f°. f × f°. f) = f . cat.

Once we have the lemma proved, the theorem follows by taking ⊚ = f . cat . (f °× f °). It has to be a (partial) function because ⊚ . (f × f) = f . cat is a function. Futhermore, is associative because cat is.

The proof of Lemma 1 is simply some applications of (1), (2), and S . S° . S = S:

     f . cat
=      { (1) }
     foldrp (≪) . (id × f)
=      { since f . f° . f = f }
     foldrp (≪) . (id × f) . (id × f° . f)
=      { (1) }
     f . cat . (id × f° . f)
=      { (2) }
     foldlp (≫) . (f × id) . (id × f° . f)
=      { since f . f° . f = f }
     foldlp (≫) . (f × id) . (f° . f × f° . f)
=      { (2) }
     f . cat . (f° . f × f° . f)

The proof is still the same as that of Jeremy’s, but in a relational disguise.

Refining to Functions

To derive actual algorithms, we have yet to refine ⊚ = f . cat . (f°× f°) so that it uses functional components only. We shall pick some g ⊆ f° whose domain equals the range of f, such that ⊚ = f . cat . (g × g). (An equality, rather than inclusion, holds because in both definitions are partial functions having the same domain.)

For example, consider the special case sum = foldr (+) 0 = foldl (+) 0. Here ⊚ = sum . cat . (sum°× sum°). One simple choice is to pick wrap ⊆ sum°, where wrap a = [a]. In this case a ⊚ b = sum [a, b] = a + b.

For sort, define sorted = ran sort. It is a partial function such that sorted xs = xs iff xs is sorted. Notice that sorted ⊆ sort°. Therefore, a possible choice for would be sort . cat . (sorted × sorted) — concatenating two sorted lists, and sort them again. Jeremy then went on with this definition and proved that ⊚ = merge, taking advantage of the fact that both input lists are sorted.

Some more Properties

Some more properties are needed to deal with scanr. The following properties allow foldrp and foldrp to proceed by shifting elements of the list to the base case:

foldrp (≪) (xs⧺[x],a) = foldrp (≪) (xs, x ≪ a)
foldlp (≫) (a, x:xs) = foldlp (≫) (a ≫ x, xs) (3)

When simple approaches of refining ⊚ = f . cat . (f°× f°) does not work, the following approach sometimes does. Since f is a fold, one may attempt to take one of the as an unfold, thus forming an “unfold, then fold” pattern:

    ⊚ = f . cat . (f°× f°)
=      { (1) }
    foldrp (≪) . (id × f) . (f°× f°)
⊇      { since f . f° ⊇ ran f }
    foldrp (≪) . (f°× ran f)
=      { since f = foldl ≫ [] }
    foldrp (≪) . ((foldl ≫ [])°× ran f)

Symetrically,

⊚ = foldlp (≫) . (ran f × (foldr ≪ [])°) (4)

Scanr “from the Middle”

Finally, consider f = scanr ⊛ e for some associative with identity e (that is, e ⊛ x = x ⊛ e = x). A scan can be defined both as a foldr and a foldl as below:

scanr ⊛ e = foldr ≪ [e]
    x ≪ (y : xs) = x ⊛ y : y : xs
scanr ⊛ e = foldl ≫ [e]
    xs ≫ x = map (⊛x) xs ⧺ [e]

From ⊚ = f . cat . (f°× f°), we can prove that xs ⊚ (y : ys) = map (⊛ y) xs ⧺ ys. Here is the inductive case:

    xs ⊚ (x : y : ys)
=      { (4) }
    foldlp (≫) (xs, (foldr ≪ [e])° (x : y :ys))
=      { definition of , let z ⊛ y = x }
    foldlp (≫) (xs, z : (foldr ≪ [e])° (y :ys))
=      { (3) }
    foldlp (≫) (xs ≫ z, (foldr ≪ [])° (y :ys))
=      { (4) }
    (xs ≫ z) ⊚ (y : ys)
=      { induction }
    map (⊛ y) (xs ≫ z) ⧺ ys
=      { definition of }
    map (⊛ y) (map (⊛ z) xs ⧺ [e]) ⧺ ys
=      { map, e identity }
     map (λv → (v ⊛ y) ⊛ z) xs ⧺ [y] ⧺ ys
=      { associtivity of }
     map (λv → v ⊛ (y ⊛ z)) xs ⧺ [y] ⧺ ys
=      { z ⊛ y =x }
    map (⊛ x) xs ⧺ [x] ⧺ (y:ys)

Prologue

Given the complexity of the proof above, I did not think there was a hope to automatically construct for a reasonably useful set of list homomorphisms. My colleagues were talking about “weak inverses” — their attempts to look for a refinement of , which I considered too ad-hoc.

Being just graduated from AoP, I was perhaps too arrogant and proud of all the clever derivation we did to care about automatic program construction. (Like Jeremy, in the end of my thesis I quoted “善數不用籌策 (Those who good at calculation need no mechanical aids)” from 老子 Lao Tzu. And didn’t Dijkstra say “I hardly used the computer they gave me. I was perfectly happy without it.”?) The relational method, which seemed to cover everything, gave me a false feeling that I knew the problem inside out.

Last year, however, my colleagues and I met again and I was told that they eventually published a paper on this subject:

Kazutaka Morita, Akimasa Morihata, Kiminori Matsuzaki, Zhenjiang Hu, Masato Takeichi. Automatic Inversion Generates Divide-and-Conquer Parallel Programs, PLDI 2007.

They focused on weak inverses that returns only lists with fixed lengths. The advantage is that calculations like the one above are no longer necessary — since the lists are of fixed-length, always takes constant time. On the other hand, scanr and scanl are special cases dealt with on a different level. Such distinction is enforced by the language they allow to construct f. No, they do not seem to have handled sort. But their approach still covered a reasonably useful collection of functions.

Well, I think the moral is that we cannot just stop when it appears that is no elegant solution that suits our taste. It sometimes pays to get our hands dirty, through which we may eventually discover the beauty within.

S Combinator is Injective, with Proofs

By chance, I came upon a blog entry by Masahiro Sakai (酒井政裕) in which he, after reading my short comment on my previous homepage “Do you know that the S combinator is injective? I have a simple algebraic proof!”, tried to construct the inverse of S and showed that S⁻¹ ○ S = id in, guess what, Agda!

Wow, people were actually reading what I wrote! This made me feel that I have to be a bit more responsible and, at least, provide the proof when I claim I have one. So, here it is.

S is injective

Recall the definition of S:


S : (A -> B -> C) -> (A -> B) -> A -> C
S = λ x y a -> x a (y a)

I am assuming a simple semantics of sets and functions, and by S being injective I mean that S x = S x' ⇒ x = x', which can be trivially shown below:


    S x = S x'
≡       { η expansion, for all y : A -> B, a : A }
    (∀ a, y : S x y a = S x' y a)
≡       { definition of S }
    (∀ a, y : x a (y a) = x' a (y a))
⇒       { choose y = K b for some b }
    (∀ a, b : x a (K b a) = x' a (K b a))
≡       { definition of K: K b a = b }
    (∀ a, b : x a b = x' a b)
≡       { pointwise equality }
    x = x'

I would usually leave out the ‘s in derivations, but in this case, they are explicitly written to emphasise that b is indeed universally quantified.

So, what is the inverse of S?

Now that we know S is injective, there ought to exist some function S⁻¹ such that S⁻¹ ○ S = id. Nakano san claimed that a definition would be:


S⁻¹ : ((A -> B) -> A -> C) -> A -> B -> C
S⁻¹ = λ x a b -> x (K b) a

That S⁻¹ ○ S = id is easy to see:


   S⁻¹ (S x) a b
=  (S x) (K b) a
=  x a (K b a)
=  x a b

From another direction, we have only S ○ S⁻¹ ⊆ id since S is not a surjective function. How the range of S look like? Inspecting the definition of S. Since y is applied only once to a, the value of y on other input does not matter. That is, the range of S consists of only functions e such that:


e y a = e y' a     for all y, y' such that y a = y' a      ......(1)

We will show that S (S⁻¹ e) = e for all e satisfying (1):


    S (S⁻¹ e) y a
=      { definition of S }
    S⁻¹ e a (y a)
=      { definition of S⁻¹ }
    e (K (y a)) a
=      { by (1), since K (y a) a = y a }
    e y a

Inverting higher-order functions?

Some endnotes. Once Nakano and I thought we discovered how to invert higher-order functions (and even derive the necessary side conditions, like the one on e above) by drawing λ expressions as trees and manipulating them. It turned out that I overlooked something obvious in the very beginning and the result was a disaster.

Does anyone know of some related work on inverting higher order functions? The only one I know is Samson Abramsky’s paper A Structural Approach to Reversible Computation, but I am not sure whether it is about reversibility/invertibility in the same sense.

Deriving a Virtual Machine

After reading Atsushi Igarashi’s paper Deriving Compilers and Virtual Machines for a Multi-Level Language, I gave myself the following exercise: given an interpreter of a small language, derive a virtual machine and a corresponding compiler. Such techniques probably date back to Ager et al., where they built connections between not one but five pairs of denotational semantics and existing abstract machines (Krivine’s, CEK, CLS, SECD, and Categorical Abstract Machine). They continued to explore a number of variations in subsequent papers. Igarashi’s paper also deals with a more difficult topic — to derive an abstract machine for a language with quotes/unquotes. As a starting point to appreciating their work, I tried this small example one could do in one afternoon.

Let us start with this simple language with numerical values, a binary operator for addition, lambda abstraction and variables represented by de Bruijn notation, and application:

data Exp = Val Int
         | Var Int
         | Plus Exp Exp
         | Lam Exp 
         | App Exp Exp deriving Show

The expression (λf a b. f a + f b) ((λc d. c + d) 1) 2 3, for example, would be represented by:

  (App (App (App 
      (Lam (Lam (Lam (Plus (App (Var 2) (Var 1)) 
                           (App (Var 2) (Var 0))))))
      (App (Lam (Lam (Plus (Var 0) (Var 1))))
           (Val 1)))
    (Val 2)) (Val 3))

An expression evaluates to either a number or a closure:

data Val = Num Int
         | Clos Env Exp deriving Show

And the evaluator can be defined, in an almost standard manner, in the continuation passing style:

type Cont = Val -> Val

type Env = [Val]

eval :: Exp -> Env -> Cont -> Val
eval (Val n) _ k = k (Num n)
eval (Var n) env k = k (env !! n)
eval (Plus e1 e2) env k = 
   eval e1 env (\(Num i) -> 
        eval e2 env (\(Num j) -> 
              k (Num (i + j))))
eval (Lam e) env k =
   k (Clos env e)
eval (App e1 e2) env k =
   eval e1 env (\(Clos env' body) ->
      eval e2 env (\v -> eval body (v : env') k))

Like in many derivations, we have cheated a little from the beginning. The use of de Bruijn indices makes env act like a stack, while CPS abstracts the control flow of the virtual machine to be derived.

Defunctionalising the Continuation

The derivation relies on Reynolds’ defunctionalisation: to enumerate the instances where a higher order function is used, and represent them by first-order structures. As the first step, we aim to replace the continuation Cont, a synonym of Val -> Val, by some datatype:

data Cont = Cont 0 | Cont1 ... | Cont2 ... | ...

and to define a function appK :: Cont -> Val -> Val pattern-matching Cont and performing the actions that were to be done in eval. Where there used to be a continuation application in eval, we replace it with an application of appK:

eval :: Exp -> Env -> Cont -> Val
eval (Val n) _ k = appK k (Num n)
eval (Var n) env k = appK k (env !! n)

The case for Plus is more complicated. In the original definition:

eval (Plus e1 e2) env k = 
   eval e1 env (\(Num i) -> 
        eval e2 env (\(Num j) -> 
              k (Num (i + j))))

the red subexpression should be replaced by a first-order data structure. We call it Cont1, leaving the free variables e2, env, and k as its parameters:

eval (Plus e1 e2) env k = 
   eval e1 env (Cont1 e2 env k)

the function appK, upon matching Cont1, carries out what was supposed to be computed in eval:

appK (Cont1 e2 env k) (Num i) =
    eval e2 env (\(Num j) -> k (Num (i + j)))

However, the red subexpression should be replaced by a first order construct too. Introducing a new constructor Cont2 and abstracting over the free variables i and k, we get:

appK (Cont1 e2 env k) (Num i) =
    eval e2 env (Cont2 i k)
appK (Cont2 i k) (Num j) = appK k (Num (i + j))

The next few cases are dealt with in the same way. Eventually we get the code below. The modified parts are marked in red:

data Cont = Cont0
          | Cont1 Exp Env Cont
          | Cont2 Int Cont
          | Cont3 Exp Env Cont     
          | Cont4 Exp Env Cont  deriving Show

eval :: Exp -> Env -> Cont -> Val
eval (Val n) _ k = appK k (Num n)
eval (Var n) env k = appK k (env !! n)
eval (Plus e1 e2) env k = eval e1 env (Cont1 e2 env k)
eval (Lam e) env k = appK k (Clos env e)
eval (App e1 e2) env k =
   eval e1 env (Cont3 e2 env k)

appK :: Cont -> Val -> Val
appK Cont0 v = v
appK (Cont1 e2 env k) (Num i) =
    eval e2 env (Cont2 i k)
appK (Cont2 i k) (Num j) = appK k (Num (i + j))
appK (Cont3 e2 env k) (Clos env' body) =
    eval e2 env (Cont4 body env' k)
appK (Cont4 body env' k) v = 
    eval body (v : env') k

Separating the Evaluator from the Virtual Machine

The functions eval and appK are mutually recursive. Later, however, appK is to evolve into a part of the virtual machine and eval the compiler. We do not want to mix the two stages. The next task is therefore to separate them.

In the definition of appK, the function eval is called in three places. In all these cases, eval is applied to an expression captured in Cont. Rather than performing the application in appK, we can also make it happen earlier in eval and store the partially applied result in Cont:

eval :: Exp -> Env -> Cont -> Val
eval (Val n) _ k = appK k (Num n)
eval (Var n) env k = appK k (env !! n)
eval (Plus e1 e2) env k = eval e1 env (Cont1 (eval e2) env k)
eval (Lam e) env k = appK k (Clos env (eval e))
eval (App e1 e2) env k =
   eval e1 env (Cont3 (eval e2) env k)

appK :: Cont -> Val -> Val
appK Cont0 v = v
appK (Cont1 c2 env k) (Num i) = 
    c2 env (Cont2 i k)
appK (Cont2 i k) (Num j) = appK k (Num (i + j))
appK (Cont3 c2 env k) (Clos env' cb) =
    c2 env (Cont4 cb env' k)
appK (Cont4 cb env' k) v = 
    cb (v : env') k 

The datatypes Cont and Val no longer store expressions, but partially applied instances of eval. Define:

type Compt = Env -> Cont -> Val

The datatypes are updated to:

data Val = Num Int
         | Clos Env Compt

data Cont = Cont0
          | Cont1 Compt Env Cont
          | Cont2 Int Cont
          | Cont3 Compt Env Cont     
          | Cont4 Compt Env Cont

Notice that while Env is a data stack, Cont, having a list-like structure, acts as a control stack. Constructors Cont1, Cont3, and Cont4 store a pointer (Env) into the data stack. What about Compt? As we will soon see, by applying another defunctionalisation, Compt becomes a list of instructions. The pointer in Cont with type Compt can therefore be seen as an instruction pointer.

Defunctionalising the Computation

By defunctionalising, we replace Compt by a first-order data structure, and create a function appC :: Compt -> Env -> Cont -> Val. This time, however, we try to choose more telling names for the introduced instructions. The first two clauses of eval, for example, is split into:

data Compt = Lit Int | Access Int | ...

eval :: Exp -> Compt
eval (Val m) = Lit m
eval (Var n) = Access n

appC :: Compt -> Env -> Cont -> Val
appC (Lit m) env k = appK k (Num m)
appC (Access n) env k = appK k (env !! n)

Examining the more interesting case eval (Plus e1 e2) env k = eval e1 env (Cont1 (eval e2) env k), we realise that we only have to abstract the two calls to eval as parameters:

data Compt = Lit Int | Access Int | Push Compt Compt ...

eval (Plus e1 e2) = Push1 (eval e1) (eval e2)

appC (Push1 is1 is2) env k = appC is1 env (Cont1 is2 env k)

The transformed program is given below. We also rename the constructors of Cont to more telling names, roughly following the naming by Igarashi :

data Cont = Halt
          | EvOpnd Compt Env Cont
          | EvSum Int Cont
          | EvArg Compt Env Cont     
          | EvBody Compt Env Cont  deriving Show

data Compt = Lit Int 
          | Access Int
          | Close Compt
          | Push1 Compt Compt
          | Push2 Compt Compt

eval :: Exp -> Compt
eval (Val m) = Lit m
eval (Var n) = Access n
eval (Plus e1 e2) = Push1 (eval e1) (eval e2)
eval (Lam e) = Close (eval e)
eval (App e1 e2) = Push2 (eval e1) (eval e2)

appC :: Compt -> Env -> Cont -> Val
appC (Lit m) env k = appK k (Num m)
appC (Access n) env k = appK k (env !! n)
appC (Close is) env k = appK k (Clos env is)
appC (Push1 is1 is2) env k = appC is1 env (EvOpnd is2 env k)
appC (Push2 is1 is2) env k = appC is1 env (EvArg is2 env k)

appK :: Cont -> Val -> Val
appK Halt v = v
appK (EvOpnd c2 env k) (Num i) =
    appC c2 env (EvSum i k)
appK (EvSum i k) (Num j) = appK k (Num (i + j))
appK (EvArg c2 env k) (Clos env' cb) =
    appC c2 env (EvBody cb env' k)
appK (EvBody cb env' k) v = 
    appC cb (v : env') k

Now, functions appC and appK together constitute a virtual machine, while eval compiles a program to its machine code Compt. The sample expression (λf a b. f a + f b) ((λc d. c + d) 1) 2 3, for example, is compiled to:


Push2 
  (Push2 
    (Push2 
      (Close 
        (Close
          (Close 
            (Push1 
              (Push2 
                (Access 2)
                (Access 1))
              (Push2 
                (Access 2)
                (Access 0))))))
      (Push2 
        (Close 
          (Close 
            (Push1 
              (Access 0)
              (Access 1))))
        (Lit 1)))
    (Lit 2))
  (Lit 3)

Once you understand how it works, the derived result is not so surprising. Igarashi sort of mentioned that it is not too difficult to add recursion. See his paper for more interesting results, such as extension to multi-level languages — programs that generate programs.

Maximum Segment Sum, Agda Approved

To practise using the Logic.ChainReasoning module in Agda, Josh proved the foldr-fusion theorem, which he learnt in the program derivation lecture in FLOLAC where we used the maximum segment sum (MSS) as the main example. Seeing his proof, I was curious to know how much program derivation I can do in Agda and tried coding the entire derivation of MSS. I thought it would be a small exercise I could do over the weekend, but ended up spending the entire week.

As a reminder, given a list of (possibly negative) numbers, the MSS is about computing the maximum sum among all its consecutive segments. Typically, the specification is:


  mss = max ○ map⁺ sum ○ segs

where segs = concat⁺ ○ map⁺ inits ○ tails computes all segments of a list.

A dependent pair is defined by:


  data _∣_ (A : Set) (P : A -> Set) : Set where
    sub : (x : A) -> P x -> A ∣ P

such that sub x p is a pair where the type of the second component p depends on the value of the first component x. The idea is to use a dependent pair to represent a derivation:


mss-der : (x : List Val) -> (Val ∣ \m -> (max ○ map⁺ sum ○ segs) x ≡ m)
mss-der x = 
   sub ?
       (chain> (max ○ map⁺ sum ○ segs) x 
           === ?)

It says that mss-der is a function taking a list x and returns a value of type Val, with the constraint that the value returned must be equal to (max ○ map⁺ sum ○ segs) x.

My wish was to use the interactive mechanism of the Agda Emacs mode to “derive” the parts marked by ?, until we come to an implementation:


mss-der : (x : List Val) -> (Val ∣ \m -> (max ○ map⁺ sum ○ segs) x ≡ m)
mss-der x = 
   sub RESULT
       (chain> (max ○ map⁺ sum ○ segs) x 
           === ...
           === ...
           === RESULT)

If it works well, we can probably use Agda as a tool for program derivation!

Currently, however, I find it harder to use than expected, perhaps due to my being unfamiliar with the way Agda reports type errors. Nevertheless, Agda does force me to make every details right. For example, the usual definition of max I would use in a paper would be:


   max = foldr _↑_ -∞

But then I would have to define numbers with lower bound -∞. A sloppy alternative definition:


   max = foldr _↑_ 0

led me to prove a base case 0 ↑ max x ≡ max x, which is not true. That the definition does work in practice relies on the fact that segs always returns empty list as one of the possible segment. Alternatively, I could define max on non-empty lists only:


  max : List⁺ Val -> Val
  max = foldr⁺ _↑_ id

where List⁺ A is defined by:


  data List⁺ (A : Set) : Set where
    [_]⁺ : A -> List⁺ A
    _::⁺_ : A -> List⁺ A -> List⁺ A

and refine the types of inits, tails, etc, to return non-empty lists. This is the approach I took.

Eventually, I was able to give a derivation of mss this way:


mss-der : (x : List Val) -> (Val ∣ \m -> (max ○ map⁺ sum ○ segs) x ≡ m)
mss-der x = 
   sub ((max ○ scanr _⊗_ ø) x)
       (chain> (max ○ map⁺ sum ○ segs) x 
           === (max ○ map⁺ sum ○ concat⁺ ○ map⁺ inits ○ tails) x 
                   by refl
           === (max ○ concat⁺ ○ map⁺ (map⁺ sum) ○ map⁺ inits ○ tails) x 
                   by cong max (sym (concat⁺-map⁺ ((map⁺ inits ○ tails) x)))
           === (max ○ map⁺ max ○ map⁺ (map⁺ sum) ○ map⁺ inits ○ tails) x 
                   by max-concat⁺ ((map⁺ (map⁺ sum) ○ map⁺ inits ○ tails) x)
           === (max ○ map⁺ max ○ map⁺ (map⁺ sum ○ inits) ○ tails) x 
                   by cong (\xs -> max (map⁺ max xs)) (sym (map⁺-compose (tails x)))
           === (max ○ map⁺ (max ○ map⁺ sum ○ inits) ○ tails) x
                   by cong max (sym (map⁺-compose (tails x)))
           === (max ○ map⁺ (foldr _⊗_ ø) ○ tails) x
                   by cong max (map⁺-eq max-sum-prefix (tails x)) 
           === (max ○ scanr _⊗_ ø) x  
                   by cong max (scanr-pf x) )
  where _⊕_ : Val -> List⁺ Val -> List⁺ Val 
        a ⊕ y = ø ::⁺ map⁺ (_+_ a) y

        _⊗_ : Val -> Val -> Val
        a ⊗ b = ø ↑ (a + b)

where max-sum-prefix consists of two fold fusion:


max-sum-prefix : (x : List Val) -> max (map⁺ sum (inits x)) ≡ foldr _⊗_ ø 
max-sum-prefix x =
 chain> max (map⁺ sum (inits x))
        === max (foldr _⊕_ [ ø ]⁺ x)
               by cong max (foldr-fusion (map⁺ sum) lemma1 x)
        === foldr _⊗_ ø x
               by foldr-fusion max lemma2 x
 where lemma1 : (a : Val) -> (xs : List⁺ (List Val)) -> 
                          map⁺ sum (ini a xs) ≡ (a ⊕ (map⁺ sum xs))
       lemma1 a xs = ?
       lemma2 : (a : Val) -> (x : List⁺ Val) ->
                               max (a ⊕ x) ≡ (a ⊗ max x)
       lemma2 a x = ?

The two lemmas are given in the files attached below. Having the derivation, mss is given by:


mss : List Val -> Val
mss x = depfst (mss-der x)

it is an executable program that is proved to be correct.

The complete Agda program is split into five files:

  • MSS.agda: the main program importing all the sub-modules.
  • ListProperties.agda: some properties I need about lists, such as fold fusion, concat ○ map (map f) = map f ○ concat, etc. Later in the development I realised that I should switch to non-empty lists, so not all of the properties here are used.
  • List+.agda: non-empty lists and some of its properties.
  • Derivation.agda: the main derivation of MSS. The derivation is parameterised over any numerical data and operators + and such that + is associative, and a + (b ↑ c) = (a ↑ b) + (a ↑ c). The reason of this parameterisation, however, was that I did not know how to prove the properties above, until Josh showed me the proof.
  • IntRNG.agda: proofs from Josh that Data.Int actually satisfy the properties above. (Not quite complete yet.)

The Pruning Theorem: Thinning Based on a Loose Notion of Monotonicity

The reason I studied the thinning theorem again is because I need a slightly generalised variation. The following seems to be what I need. The general idea and the term “pruning” emerged from discussion with Sharon Curtis. The term “lax preorder” is invented by myself. I am not good at naming, and welcome suggestions for better names.

The notation below are mostly taken from the book Algebra of Programming. Not many people, even among functional programmers, are familiar with these notations involving relational intersection, division, etc. One starts to appreciate their benefits once he/she gets used to using their calculation rules. Most of the time when I was doing the proof, I was merely manipulating the symbols. I could not have managed the complexity if I had to fall back to the semantics and think about what they “mean” all the time.

A relation Q :: PA ← A between a set of A and an element is called a lax preorder if it is

  1. reflexive in the sense that ∋ ⊆ Q, and
  2. transitive in the sense that (Q/∋) . Q ⊆ Q.

A relation S :: A ← FA is monotonic on lax preorder Q if S . FQ˘ ⊆ Q˘. Λ(S . F∈).

Given a lax preorder, we define:

prune Q = ∈\∈ ∩ Q/∋

The definition induces the universal property:

X ⊆ prune Q . ΛS   ≡    ∈ . X ⊆ S   ⋀   X . S˘⊆ Q

Any preorder R induces a lax preorder ∋ . R. If a relation S is monotonic on , it is monotonic on lax preorder ∋ . R. Furthermore, prune (∋ . R) = thin R. Therefore, pruning is a generalisation of thinning. We need the notion of lax preorders because, for some problems, the generating relation S is monotonic on a lax preorder, but not a preorder.

Theorem: if S is monotonic on lax preorder Q, we have:

fold (prune Q . Λ(S . F∈)) ⊆ prune Q . Λ(fold S)

Proof. Since Λ(fold S) = fold (Λ(S . F∈)), by fold fusion, the theorem holds if

prune Q . Λ(S . F∈) . F(prune Q) ⊆ prune Q . Λ(S . F∈)

By the universal property of prune, the above is equivalent to:

∈ . prune Q . Λ(S . F∈) . F(prune Q) ⊆ S . F∈   ⋀
prune Q . Λ(S . F∈) . F(prune Q) . (S . F∈)˘ ⊆ Q

The first inclusion is proved by:

     ∈ . prune Q . Λ(S . F∈) . F(prune Q)
⊆     { since prune Q ⊆ ∈\∈ }
     ∈ . ∈\∈ . Λ(S . F∈) . F(thin Q)
⊆     { division }
     ∈ . Λ(S . F∈) . F(thin Q)
=     { power transpose }
     S . F∈ . F(thin Q)
⊆     { since prune Q ⊆ ∈\∈ }
     S . F∈ . F(∈\∈)
⊆     { division }
     S . F∈

And the second by:

     prune Q . Λ(S . F∈) . F(prune Q) . (S . F∈)˘
⊆     { since prune Q ⊆ Q/∋, converse }
     prune Q . Λ(S . F∈) . F(Q/∋) . F∋ . S˘
⊆     { division }
     prune Q . Λ(S . F∈) . FQ . S˘
⊆     { monotonicity: FQ . S˘⊆ Λ(S . F∈)˘. Q }
     prune Q . Λ(S . F∈) . Λ(S . F∈)˘. Q
⊆     { since Λ(S . F∈)˘is a function, that is, f . f˘⊆ id }
     prune Q . Q
⊆     { since thin Q ⊆ Q/∋, division }
     Q/∋ . Q
⊆     { since Q transitive }
     Q

Endproof.

The proof above uses transitivity of Q but not reflectivity. I need reflectivity to construct base cases, for example, to come up wit this specialised Pruning Theorem for lists:

foldr (prune Q . Λ(S . (id × ∈))) {e} ⊆ prune Q . Λ(foldr S e)

if S . (id × Q˘) ⊆ Q˘. Λ(S . (id × ∈)).

Proving the Thinning Theorem by Fold Fusion

Algebra of Programming records two proofs of the greedy and the thinning theorems slightly shorter than proofs using fold fusion. Of course, one can still use fold fusion. In fact, proving them by fold fusion are exercises in Chapter 8 (PDF) of Algebraic and Coalgebraic Methods in the Mathematics of Program Construction, of which I myself is listed among the authors.

A while ago when I needed to consider some variations of the thinning theorem I tried to do the proof again. And, horrifyingly, I could not do it anymore! Have my skills become rusty due to lack of practice in the past few years? In panic, I spent an entire afternoon fighting with it, until I realised that it was just a typical copying error from the very beginning: when I copied a property from the book I put in an extra Λ. Then I trapped myself in the maze of expanding ΛR into ∈\R ∩ (R\∈) and using modular law and ….

Having fixed the error, I get my trivial and easy proof back again. Anyway, I am going to record it below, in case I run into the same panic again.

Given a preorder Q, the relation thin Q is defined by:

thin Q = ∈\∈ ∩ (∋ . Q)/∋

The definition induces the universal property:

X ⊆ thin Q . ΛS   ≡    ∈ . X ⊆ S   ⋀   X . S˘ ⊆ ∋.Q

And here are some basic properties we will make use of later:

(R . S)˘ = S˘ . R˘,    (R\S)˘ = S˘/R˘      (converse)
∈ . ΛS = S       (power transpose)
ΛR . R˘ ⊆ ∋
R . R\S ⊆ S,       R/S . S ⊆ R       (division)

The Thinning Theorem

The thinning theorem says :
Theorem: if S is monotonic on preorder Q, that is, S . FQ˘⊆ Q˘. S, we have:

fold (thin Q . Λ(S . F∈)) ⊆ thin Q . Λ(fold S)

Proof. By fold fusion, the theorem holds if

thin Q . Λ(S . F∈) . F(thin Q) ⊆ thin Q . Λ(S . F∈)

By the universal property of thin, the above inclusion is equivalent to

∈ . thin Q . Λ(S . F∈) . F(thin Q) ⊆ S . F∈  ⋀
thin Q . Λ(S . F∈) . F(thin Q) . (S . F∈)˘ ⊆ ∋.Q

The first inclusion is proved by:

    ∈ . thin Q . Λ(S . F∈) . F(thin Q)
{ since thin Q ⊆ ∈\∈ }
    ∈ . ∈\∈ . Λ(S . F∈) . F(thin Q)
{ division }
    ∈ . Λ(S . F∈) . F(thin Q)
= { power transpose }
    S . F∈ . F(thin Q)
{ since thin Q ⊆ ∈\∈ }
    S . F∈ . F(∈\∈)
{ division }
    S . F∈

And the second by:

    thin Q . Λ(S . F∈) . F(thin Q) . (S . F∈)˘
{ since thin Q ⊆ (∋ . Q)/∋, converse }
    thin Q . Λ(S . F∈) . F((∋ . Q)/∋) . F∋ . S˘
{ functor, division }
    thin Q . Λ(S . F∈) . F(∋ . Q) . S˘
{ monotonicity: FQ . S˘ ⊆ S˘. Q }
    thin Q . Λ(S . F∈) . F∋ . S˘. Q
{ since ΛR . R ⊆ ∋ }
    thin Q . ∋ . Q
{ since thin Q ⊆ (∋.Q)/∋, division }
    ∋ . Q . Q
{ since Q transitive }
    ∋ . Q

Endproof.

By the way, the variation of thinning theorem I need is “fold (thin Q . Λ(S . F∈)) ⊆ thin Q . Λ(fold S) if S . F(Q˘. ∈) ⊆ Q˘. S . F ∈ “, whose proof is, luckily, trivial once you write down the original proof.