# The Windowing Technique for Longest Segment Problems

In the previous post we reviewed Hans Zantema’s algorithm for solving longest segment problems with suffix and overlap-closed predicates. For predicates that are not overlap-closed, Zantema derived a so-called “windowing” technique, which will be the topic of this post.

A brief review: the longest segment problem takes the form:

max# ∘ p ◁ ∘ segs

where segs :: [a] → [[a]], defined by segs = concat ∘ map inits ∘ tails returns all consecutive segments of the input list; p ◁ is an abbreviation for filter p, and max# :: [[a]] → [a] returns the longest list from the input list of lists. In words, the task is to compute the longest consecutive segment of the input that satisfies predicate p.

A predicate p is suffix-closed if p (xs ⧺ ys) ⇒ p ys. For suffix-closed p, Zantema proposed a technique that, from a high-level point of view, looks just like the right solution to such problems. We scan through the input list using a foldr from the right to the left, during which we try to maintain the longest segment satisfying p so far. Also, we keep a prefix of the list that is as long as the currently longest segment, which we call the window.

If, when we move one element to the right, the window (now one element longer than the currently longest segment) happens to satisfy p, it becomes the new optimal solution. Otherwise we drop the right-most element of the window so that it slides leftwards, retaining the length. Notice that it implies that we’d better represent the window using a queue, so we can efficiently add elements from the left and drop from the right.

Derivation of the algorithm is a typical case of tupling.

### Tupling

Given a function h, we attempt to compute it efficiently by turning it into a foldr. It would be possible if the value of the inductive case h (x : xs) were determined solely by x and h xs, that is:

h (x : xs) = f x (h xs)

for some f. With some investigation, however, it would turn out that h (x : xs) also depends on some g:

h (x : xs) = f x (g (x : xs)) (h xs)

Therefore, we instead try to construct their split ⟨ h , g ⟩ as a fold, where the split is defined by:

⟨ h , g ⟩ xs = (h xs, g xs)

and h = fst . ⟨ h , g ⟩.

If ⟨ h , g ⟩ is indeed a fold, it should scan through the list and construct a pair of a h-value and a g-value. To make it feasible, it is then hoped that g (x : xs) can be determined by g xs and h xs. Otherwise, we may have to repeat the process again, making the fold return a triple.

### Segment/Prefix Decomposition

Let us look into the longest segment problem. For suffix-closed p it is reasonable to assume that p [] is true — otherwise p would be false everywhere. Therefore, for the base case we have max# ∘ p ◁ ∘ segs ▪ [] = []. We denote function application by to avoid too many parentheses.

Now the inductive case. It is not hard to derive an alternative definition of segs:

segs [] = [[]]
segs (x : xs) = inits (x:xs) ⧺ segs xs

therefore, we derive:

max# ∘ p ◁ ∘ segs ▪ (x : xs)
= max# ∘ p ◁ ▪ (inits (x : xs) ⧺ segs xs)
= (max# ∘ p ◁ ∘ inits ▪ (x : xs)) ↑#
(max# ∘ p ◁ ∘ segs ▪ xs)

where xs ↑# ys returns the longer one between xs and ys.

It suggests that we maintain, by a foldr, a pair containing the longest segment and the longest prefix satisfying p (that is, max# ∘ p ◁ ∘ inits). It is then hoped that max# ∘ p ◁ ∘ inits ▪ (x : xs) can be computed using max# ∘ p ◁ ∘ inits ▪ xs. And luckily, it is indeed the case, implied by the following proposition proved in an earlier post:

Proposition 1: If p is suffix-closed, we have:

p ◁ ∘ inits ▪ (x : xs) = finits (max# ∘ p ◁ ∘ inits ▪ xs)

where finits ys = p ◁ ∘ inits ▪ (x : ys).

Proposition 1 says that the list (or set) of all the prefixes of x : xs that satisfies p can be computed by the longest prefix of xs (call it ys) satisfying p, provided that p is suffix-closed. A naive way to do so is simply by computing all the prefixes of x : ys and do the filtering again, as is done in finits.

This was the route taken in the previous post. It would turn out, however, to come up with an efficient implementation of f we need some more properties from p, such as that it is also overlap-closed.

### The “Window”

Proposition 1 can be strengthened: to compute all the prefixes of x : xs that satisfies p using finits we do not strictly have to start with ys. Any prefix of xs longer than ys will do.

Proposition 2: If p is suffix-closed, we have:

p ◁ ∘ inits ▪ (x : xs) = finits (take i xs)

where finits ys = p ◁ ∘ inits ▪ (x : ys), and i ≥ length ∘ max# ∘ p ◁ ∘ inits ▪ xs.

In particular, we may choose i to be the length of the longest segment:

Lemma 1:

length ∘ max# ∘ p ◁ ∘ segs ▪ xs ≥
length ∘ max# ∘ p ◁ ∘ inits ▪ xs

Appealing to intuition, Lemma 1 is true because segs xs is a superset of inits xs.

Remark: Zantema proved Proposition 1 by contradiction. The purpose of an earlier post was to give a constructive proof of Proposition 1, which was considerably harder than I expected. I’d be interested to see a constructive proof of Proposition 2.

Now we resume the reasoning:

max# ∘ p ◁ ∘ segs ▪ (x : xs)
= max# ∘ p ◁ ▪ (inits (x : xs) ⧺ segs xs)
= (max# ∘ p ◁ ∘ inits ▪ (x : xs)) ↑#
(max# ∘ p ◁ ∘ segs ▪ xs)
=   { Proposition 2 and Lemma 1 }
let s = max# ∘ p ◁ ∘ segs ▪ xs
in (max# ∘ finits ▪ (x : take (length s) xs)) ↑# s

Define window xs = take (length ∘ max# ∘ p ◁ ∘ segs ▪ xs) xs, the reasoning above suggest that we may try the following tupling:

max# ∘ p ◁ ∘ segs
= fst ∘ ⟨ max# ∘ p ◁ ∘ segs , window ⟩

### Maintaining the Longest Segment and the Window

The task now is to express ⟨ max# ∘ p ◁ ∘ segs , window ⟩ as a foldr. We can do so only if both max# ∘ p ◁ ∘ segs ▪ (x : xs) and window (x : xs) can be determined by max# ∘ p ◁ ∘ segs ▪ xs and window xs. Let us see whether it is the case.

#### Maintaining the Longest Segment

Regarding max# ∘ p ◁ ∘ segs ▪ (x : xs), we have derived:

max# ∘ p ◁ ∘ segs ▪ (x : xs)
=   { as shown above, let s = max# ∘ p ◁ ∘ segs ▪ xs }
(max# ∘ p ◁ ∘ inits ▪ (x : take (length s) xs)) ↑# s

Let s = max# ∘ p ◁ ∘ segs ▪ xs. We consider two cases:

1. Case p (x : take (length s) xs). We reason:
(max# ∘ p ◁ ∘ inits ▪ (x : take (length s) xs)) ↑# s
=   { see below }
(x : take (length s) xs) ↑# s
=   { since the LHS is one element longer than the RHS }
x : take (length s) xs
=   { definition of window }
x : window xs

The first step is correct because, for all zs, p zs implies that max# ∘ p ◁ ∘ inits ▪ zs = zs.

2. Case ¬ p (x : take (length s) xs). In this case (max# ∘ p ◁ ∘ inits ▪ (x : take (length s) xs)) ↑# s must be s, since ¬ p zs implies that length∘ max# ∘ p ◁ ∘ inits ▪ zs < length zs.

#### Maintaining the Window

Now consider the window. Also, we do a case analysis:

1. Case p (x : take (length s) xs). We reason:
window (x : xs)
= take (length ∘ max# ∘ p ◁ ∘ segs ▪ (x : xs)) (x : xs)
=   { by the reasoning above }
take (length (x : take (length s) xs)) (x : xs)
=   { take and length }
x : take (length (take (length s)) xs) xs
=   { take and length }
x : take (length s) xs
= x : window xs

2. Case ¬ p (x : take (length s) xs). We reason:
window (x : xs)
= take (length ∘ max# ∘ p ◁ ∘ segs ▪ (x : xs)) (x : xs)
=   { by the reasoning above }
take (length s) (x : xs)
=   { take and length }
x : take (length (s-1)) xs
= x: init (window xs)

#### The Algorithm

In summary, the reasoning above shows that

⟨ max# ∘ p ◁ ∘ segs , window ⟩ = foldr step ([], [])

where step is given by

step x (s, w) | p (x : w) = (x : w, x : w)
| otherwise = (s, x : init w)

As is typical of many program derivations, after much work we deliver an algorithm that appears to be rather simple. The key invariants that made the algorithm correct, such as that s is the optimal segment and that w is as long as s, are all left implicit. It would be hard to prove the correctness of the algorithm without these clues.

We are not quite done yet. The window w had better be implemented using a queue, so that init w can be performed efficiently. The algorithm then runs in time linear to the length of the input list, provided that p (x : w) can be performed in constant time -- which is usually not the case for interesting predicates. We may then again tuple the fold with some information that helps to compute p efficiently. But I shall stop here.

# Longest Segment Satisfying Suffix and Overlap-Closed Predicates

I spent most of the week preparing for the lecture on Monday, in which we plan to talk about segment problems. One of the things we would like to do is to translate the derivations in Hans Zantema’s Longest Segment Problems to Bird-Meertens style. Here is a summary of the part I managed to do this week.

Zantema’s paper considered problems of this form:

max# ∘ p ◁ ∘ segs

where segs :: [a] → [[a]], defined by segs = concat ∘ map inits ∘ tails returns all consecutive segments of the input list; p ◁ is a shorter notation for filter p, and max# :: [[a]] → [a] returns the longest list from the input list of lists. In words, the task is to compute the longest consecutive segment of the input that satisfies predicate p.

Of course, we have to assume certain properties from the predicate p. A predicate p is:

• suffix-closed, if p (xs ⧺ ys) ⇒ p ys;
• overlap-closed, if p (xs ⧺ ys) ∧ p (ys ⧺ zs) ∧ ys ≠ [] ⇒ p~(xs ⧺ ys ⧺ zs).

For example, ascending is suffix and overlapping-closed, while p xs = (all (0 ≤) xs) ∧ (sum xs ≤ C) for some constant C is suffix-closed but not overlap-closed. Note that for suffix-closed p, it is reasonable to assume that p [] is true, otherwise p would be false everywhere. It also saves us some trouble being sure that max# is always applied to a non-empty set.

I denote function application by an infix operator that binds looser than function composition but tighter than other binary operators. Therefore f ∘ g ∘ h ▪ x means f (g (h x)).

### Prefix/Suffix Decomposition

Let us begin with the usual prefix/suffix decomposition:

max# ∘ p ◁ ∘ segs
= max# ∘ p ◁ ∘ concat ∘ map inits ∘ tails
= max# ∘ concat ∘ map (p ◁) ∘ map inits ∘ tails
= max# ∘ map (max# ∘ p ◁ ∘ inits) ∘ tails

Like what we do with the classical maximum segment sum, if we can somehow turn max# ∘ p ◁ ∘ inits into a fold, we can then implement map (foldr ...) ∘ tails by a scanr. Let us denote max# ∘ p ◁ ∘ inits by mpi.

If you believe in structural recursion, you may attempt to fuse map# ∘ p ◁ into inits by fold-fusion. Unfortunately, it does not work this way! In the fold-fusion theorem:

h ∘ foldr f e = foldr g (h e)   ⇐   h (f x y) = g x (h y)

notice that x and y are universally quantified, which is too strong for this case. Many properties we have, to be shown later, do need information from the context — e.g. some properties are true only if y is the result of inits.

### Trimming Unneeded Prefixes

One of the crucial properties we need is the following:

Proposition 1: If p is suffix-closed, we have:

p ◁ ∘ inits ▪ (x : xs) =
p ◁ ∘ inits ▪ (x : max# ∘ p ◁ ∘ inits ▪ xs)

For some intuition, let x = 1 and xs = [2,3,4]. The left-hand side first compute all prefixes of xs:

[] [2] [2,3] [2,3,4]

before filtering them. Let us assume that only [] and [2,3] pass the check p. We then pick the longest one, [2,3], cons it with 1, and compute all its prefixes:

[] [1] [1,2] [1,2,3]

before filtering them with p again.

The right-hand side, on the other hand, performs filtering on all prefixes of [1,2,3,4]. However, the proposition says that it is the same as the left-hand side — filtering on the prefixes of [1,2,3] only. We lose nothing if we drop [1,2,3,4]. Indeed, since p is suffix-closed, if p [1,2,3,4] were true, p [2,3,4] would have been true on the right-hand side.

Proof of Proposition 1 was the topic of a previous post. The proposition is useful for us because:

mpi (x : xs)
= max# ∘ p ◁ ∘ inits ▪ (x : xs)
=    { Proposition 1 }
max# ∘ p ◁ ∘ inits ▪ (x : max# ∘ p ◁ ∘ inits ▪ xs)
= mpi (x : mpi xs)

Therefore mpi is a fold!

mpi = foldr (λx ys → mpi (x : ys)) []

### Refining the Step Function

We still have to refine the step function λx ys → mpi (x : ys) to something more efficient. Luckily, for overlap-closed p, mpi (x : ys) is either [], [x], or x : ys — if ys is the result of mpi.

Proposition 2: If p is overlap-closed, mpi (x : mpi xs) = x ⊙ xs, where is defined by:

x ⊙ ys | p (x : xs) = x : ys
| p [x] = [x]
| otherwise = []

To see why Proposition 2 is true, consider mpi (x : mpi xs).

• If mpi (x : mpi xs) = [], we are done.
• Otherwise it must be x : zs for some zs ∈ inits (mpi xs). And we have p~(x : zs) because it is a result of mpi. Again consider two cases:
• If zs = [], both sides reduce to [x], otherwise…
• Let us not forget that p (mpi xs) must be true. Also, since zs ∈ inits (mpi xs), we have mpi xs = zs ⧺ ws for some ws. Together that implies p (x : zs ⧺ ws) = p (x : mpi xs) must be true.

Notice that the reasoning above (from Zantema) is a proof-by-contradiction. I do not yet know how hard it is to build a constructive proof.

With Proposition 1 and 2 we have turned mpi to a fold. That leads to the derivation:

max# ∘ p ◁ ∘ segs
=   { derivation above }
max# ∘ map (max# ∘ p ◁ ∘ inits) ∘ tails
= max# ∘ map (foldr (⊙) []) ∘ tails
= max# ∘ scanr (⊙) []

with the definition of given above. It turns out to be a rather simple algorithm: we scan through the list, and in each step we choose among three outcomes: [], [x], and x : ys. Like the maximum segment sum problem, it is a simple algorithm whose correctness is that that easy to justify.

The algorithm would be linear-time if can be computed in constant-time. With the presence of p in , however, it is unlikely the case.

### Efficient Testing

So let us compute, during the fold, something that allows p to be determined efficiently. Assume that there exists some φ :: [A] → B that is a fold (φ = foldr ⊗ ι for some and ι), such that p (x : xs) = p xs ∧ f x (φ xs) for some f. Some example choices of φ and f:

• p = ascending. We may pick:

φ xs = if null xs then Nothing else Just (head xs)
f x Nothing = true
f x (Just y) = x ≤ y
• p xs = all elements in xs equal modolu 3. We may pick:
φ xs = if null xs then Nothing else Just (head xs `mod` 3)
f x Nothing = true
f x (Just y) = x `mod`3 == y

Let us tuple mpi with φ, and turn them into one fold. Let ⟨ f , g ⟩ x = (f x, g x), we derive:

max# ∘ p ◁ ∘ inits
=   { f = fst ∘  ⟨ f , g ⟩, see below}
fst ∘ ⟨ max# ∘ p ◁ ∘ inits , φ ⟩
= fst ∘ foldr step ([], ι)

where step is given by

step x (xs, b)
| f x b = (x : xs , x ⊗ b)
| f x ι = ([x], x ⊗ ι)
| otherwise = ([], ι)

Notice that the property f = fst ∘ ⟨ f , g ⟩ is true when the domain of f is in the domain of g, in particular, when they are both total, which again shows why we prefer to work in a semantics with total functions only.

Let us restart the main derivation again, this time use the tupling:

max# ∘ p ◁ ∘ segs
= max# ∘ map (max# ∘ p ◁ ∘ inits) ∘ tails
= max# ∘ map (fst ∘ ⟨ max# ∘ p ◁ ∘ inits , φ ⟩) ∘ tails
=   { since map# ∘ map fst = fst ∘ map#', see below}
fst ∘ max#' ∘ map ⟨ max# ∘ p ◁ ∘ inits , φ ⟩ ∘ tails
=   { derivation above }
fst ∘ max#' ∘ map (foldr step ([], ι) ∘ tails
= fst ∘ max#' ∘ scanr step ([], ι)

where map#' compares the lengths of the first components. This is a linear-time algorithm.

### Next… Windowing?

What if p is not overlap-closed? Zantema used a technique called windowing, which I will defer to next time…

# On a Basic Property for the Longest Prefix Problem

In the Software Construction course next week we will, inevitably, talk about maximum segment sum. A natural next step is to continue with the theme of segment problems, which doesn’t feel complete without mentioning Hans Zantema’s Longest Segment Problems.

The paper deals with problem of this form:

ls = max# ∘ p ◁ ∘ segs

That is, computing the longest consecutive segment of the input list that satisfies predicate p. When writing on paper I found it much easier denoting filter p by the Bird-Meertens style p ◁ and I will use the latter for this post too. The function segs :: [a] → [[a]], defined by segs = concat ∘ map inits ∘ tails returns all consecutive segments of the input list, and max# :: [[a]] → [a] returns the longest list from the input list of lists. To avoid long nesting parenthesis, I denote function application by an infix operator that binds looser than function composition . Therefore f ∘ g ∘ h ▪ x means f (g (h x)).

Standard transformation turns the specification to the form

ls = max# ∘ map (max# ∘ p ◁ ∘ inits) ∘ tails

Therefore we may solve ls if we manage to solve its sub-problem on prefixes:

lp = max# ∘ p ◁ ∘ inits

that is, computing the longest prefix of the input list satisfying predicate p. One of the key propositions in the paper says:

Proposition 1: If p is suffix-closed (that is, p (x ⧺ y) ⇒ p y), we have:

p ◁ ∘ inits ▪ (a : x) =
p ◁ ∘ inits ▪ (a : max# ∘ p ◁ ∘ inits ▪ x)

It is useful because, by composing max# on both sides we get

lp (a : x) = max# ∘ p ◁ ∘ inits ▪ (a : lp x)

that is, lp can be computed by a foldr.

Of course, we are not quite done yet. We then have to somehow simplify p ◁ ∘ inits ▪ (a : lp x) to something more efficient. Before we move on, however, proving Proposition 1 turns out to be an interesting challenge in itself.

### Intuition

What does Proposition 1 actually say? Let x = [1,2,3] and a = 0. On the left-hand side, we are performing p ◁ on

[] [0] [0,1] [0,1,2] [0,1,2,3]

The right hand side says that we may first filter the tails of [1,2,3]:

[] [1] [1,2] [1,2,3]

Assuming that only [] and [1,2] get chosen. We may then keep the longest prefix [1,2] only, generate all its prefixes (which would be [] [1] [1,2]), and filter the latter again. In other words, we lost no information dropping [1,2,3] if it fails predicate p, since by suffix-closure, p ([0] ⧺ [1,2,3]) ⇒ p [1,2,3]. If [1,2,3] doesn’t pass p, p [0,1,2,3] cannot be true either.

Zantema has a nice and brief proof of Proposition 1 by contradiction. However, the theme of this course has mainly focused on proof by induction and, to keep the possibility of someday encoding our derivations in tools like Coq or Agda, we would like to have a constructive proof.

So, is it possible to prove Proposition 1 in a constructive manner?

### The Proof

I managed to come up with a proof. I’d be happy to know if there is a better way, however.

For brevity, I denote if p then x else y by p → x; y. Also, define

a ⊕p x = p a → a : x ; x

Therefore p ◁ is defined by

p ◁ = foldr ⊕p []

Here comes the the main proof:

Proposition 1

p ◁ ∘ inits ▪ (a : x) = p ◁ ∘ inits ▪ (a : max# ∘ p ◁ ∘ inits ▪ x)

if p is suffix-closed.
Proof.

=      { definition of inits }
p ◁ ([] : map (a :) ∘ inits ∘ max# ∘ p ◁ ∘ inits ▪ x)
=      { definition of p ◁ }
[] ⊕p (p ◁ ∘ map (a :) ∘ inits ∘ max# ∘ p ◁ ∘ inits ▪ x)
=      { Lemma 1 }
[] ⊕p (p ◁ ∘ map (a :) ∘ p ◁ ∘ inits ∘ max# ∘ p ◁ ∘ inits ▪ x)
=      { Lemma 2 }
[] ⊕p (p ◁ ∘ map (a :) ∘ p ◁ ∘ inits ▪ x)
=      { Lemma 1 }
[] ⊕p (p ◁ ∘ map (a :) ∘ inits ▪ x)
=      { definition of p ◁ }
p ◁ ([] : map (a :) ∘ inits ▪ x)
=      { definition of inits }
p ◁ ∘ inits ▪ (a : x)

The main proof refers to two “decomposition” lemmas, both are of the form f ∘ g = f ∘ g ∘ f:

• Lemma 1: p ◁ ∘ map (a:) = p ◁ ∘ map (a:) ∘ p ◁ if p suffix-closed.
• Lemma 2: p ◁ ∘ inits ∘ max# ∘ p ◁ ∘ inits = p ◁ ∘ inits for all predicate p.

Both are proved by structural induction. For Lemma 1 we need the conditional distribution rule:

f (p →  x; y) = (p →  f x; f y)

If we are working in CPO we need the side condition that f is strict, which is true for the cases below anyway:
Lemma 1

p ◁ ∘ map (a:) =  p ◁ ∘ map (a:) ∘ p ◁

if p is suffix-closed.
Proof. Structural induction on the input.
Case []: trivial.
Case (x : xs):

p ◁ ∘ map (a:) ∘ p ◁ ▪ (x : xs)
=   { definition of p ◁ }
p ◁ ∘ map (a:) ▪ (p x →  x : p ◁ xs ; p ◁ xs)
=   { map distributes into conditionals }
p ◁ ▪ (p x → (a : x) : map (a :) ∘ p ◁ ▪ xs ; map (a :) ∘ p ◁ ▪ xs)
=   { p ◁ distributes into conditionals }
p x → p ◁ ((a : x) : map (a :) ∘ p ◁ ▪ xs) ;
p ◁ ∘ map (a :) .p ◁ ▪ xs
=   { definition of p ◁ }
p x → (p (a : x) → (a : x) : p ◁ ∘ map (a :) ∘ p ◁ ▪ xs) ;
p ◁ ∘ map (a :) ∘ p ◁ ▪ xs) ;
p ◁ ∘ map (a :) ∘ p ◁ ▪ xs
=   { induction }
p x → (p (a : x) → (a : x) : p ◁ ∘ map (a :) ▪ xs) ;
p ◁ ∘ map (a :) ▪ xs) ;
p ◁ ∘ map (a :) ▪ xs
=   { since p (a : x) ⇒ p x by suffix closure }
p (a : x) → (a : x) : p ◁ ∘ map (a :) ▪ xs) ;
p ◁ ∘ map (a :) ▪ xs
=   { definition of p ◁ }
p ◁ ((a : x) : map (a :) xs)
=   { definition of map }
p ◁ ∘ map (a :) ▪ (x : xs)

For Lemma 2, it is important that p is universally quantified. We need the following map-filter exchange rule:

p ◁ ∘ map (a :) =  map (a :) ∘ (p ∘ (a:)) ◁

The proof goes:
Lemma 2 For all predicate p we have

p ◁ ∘ inits ∘ max# ∘ p ◁ ∘ inits = p ◁ ∘ inits

Proof. Structural induction on the input.
Case []: trivial.
Case (a : x):

p ◁ ∘ inits ∘ max# ∘ p ◁ ∘ inits ▪ (a : x)
= p ◁ ∘ inits ∘ max# ∘ p ◁ ▪ ([] : map (a :) (inits x))

Consider two cases:
1. Case p [] ∧ null (p ◁ ∘ map (a :) ∘ inits ▪ x):
If ¬ p [], both sides are undefined. Otherwise:

...
= p ◁ ∘ inits ∘ max# ▪ []
= []
= p ◁ ▪ ([] : p ◁ ∘ map (a : ) ∘ inits ▪ x)
= p ◁ ∘ inits ▪ (a : x)

2. Case ¬ (null (p ◁ ∘ map (a :) ∘ inits ▪ x)):

...
= p ◁ ∘ inits ∘ max# ∘ p ◁ ∘ map (a :) ∘ inits ▪ x
=   { map-filter exchange }
p ◁ ∘ inits ∘ max# ∘ map (a :) ∘ (p ∘ (a:)) ◁ ∘ inits ▪ x
=   { since  max# ∘ map (a :) =  (a :) ∘ max# }
p ◁ ∘ inits ∘ (a :) ∘ max# ∘ (p ∘ (a :)) ◁ ∘ inits ▪ x
=   { definition of inits }
p ◁ ([] : map (a :) ∘ inits ∘  max# ∘ (p ∘ (a :)) ◁ ∘ inits ▪ x)
=   { definition of p ◁ }
p ⊕p (p ◁ ∘ map (a :) ∘ inits ∘  max# ∘ (p ∘ (a :)) ◁ ∘ inits ▪ x)
=   { map-filter exchange }
p ⊕p (map (a :) ∘ (p ∘ (a :)) ◁ ∘ inits ∘  max# ∘ (p ∘ (a :)) ◁ ∘ inits ▪ x)
=   { induction }
p ⊕p (map (a :) ∘ (p ∘ (a :)) ◁ ∘ inits ▪ x)
=   { map-filter exchange }
p ⊕p (p ◁ ∘ map (a :) ∘ inits ▪ x)
=   { definition of p ◁ }
p ◁ ( [] :  map (a :) ∘ inits ▪ x)
=   { definition of inits }
p ◁ ∘ inits ▪ (a : x)

# No Inverses for Injective but Non-Surjective Functions?

“I cannot prove that if a function is injective, it has an inverse,” Hideki Hashimoto posed this question to me a while ago. It turned out that this was not the property he really wanted, but it got me into thinking: is it possible at all?

### Preliminaries

Let us start with some basic definitions. A relation from A to B is denoted by the wavy arrow:

_↝_ : Set → Set → Set1
A ↝ B = A → B → Set

Given a relation R : A ↝ B we can always take its converse R ˘ : B ↝ A, and a function can be lifted to a relation by fun:

_˘ : ∀ {A B} → (A ↝ B) → (B ↝ A)
(R ˘) b a = R a b

fun : ∀ {A B} → (A → B) → (A ↝ B)
fun f a b = f a ≡ b

A relation R : A ↝ B is simple if it does not map one input to multiple outputs. It is entire if everything in A is mapped to something in B — a more familiar word may be “total”.

simple : ∀ {A B} → (A ↝ B) → Set
simple R = ∀ {a b₁ b₂} →  R a b₁ → R a b₂ → b₁ ≡ b₂

entire : ∀ {A B} → (A ↝ B) → Set
entire R = ∀ a → ∃ (λ b → R a b)

A function is a relation that is simple and entire. Indeed, one can show that fun f is simple and entire for every f. Injectivity and surjectivity are similar notions defined for converse of R:

injective : ∀ {A B} → (A ↝ B) → Set
injective R = ∀ {a₁ a₂ b} → R a₁ b → R a₂ b → a₁ ≡ a₂

surjective : ∀ {A B} → (A ↝ B) → Set
surjective R = ∀ b → ∃ (λ a → R a b)

The (constructive variant of the) axiom of choice states that an entire relation A ↝ B can be refined to a function A → B:

ac : ∀ {A B} → (R : A ↝ B) →
(∀ a → ∃ (λ b → R a b)) → ∃ {A → B} (λ f → ∀ a → R a (f a))
ac R R-entire = ((λ a → proj₁ (R-entire a)) , λ a → proj₂ (R-entire a))

See the axiom of choice homepage, or the Stanford Encyclopedia of Philosophy for more information on this axiom.

### Inverting Injective and Surjective Functions

Now, let us restate Hashimoto san’s challenge:

Let fun f : A → B be injective. Prove that f has a left inverse. That is, some f⁻¹ such that f⁻¹ (f a) = a forall a.

It turned out that he forgot a condition: f is also surjective. If f is also (provably) surjective, one can pick some g ⊆ f˘ using the axiom of choice (since f is surjective if and only if is total) and further prove that g ∘ f = id using injectivity:

inv-sur : ∀ {A B} → (f : A → B) →
injective (fun f) → surjective (fun f) →
∃ {B → A} (λ f⁻¹ → (∀ a → f⁻¹ (f a) ≡ a))
inv-sur f f-inj f-sur with ac ((fun f) ˘) f-sur
... | (g , fgb≡b) =  (g , λ a → f-inj {g (f a)} {a} {f a} (fgb≡b (f a)) refl)

Like the proof of the constructive axiom of choice, the proof above does not really do much. The proof of surjectivity of f has already provided, for every b : B, an a : A such that f a ≡ b. So we simply let f⁻¹ return that a.

Can we lift the restriction that f must be surjective? That is, can this be proved?

inv : ∀ {A B} → (f : A → B) → injective (fun f) →
∃ {B → A} (λ f⁻¹ → (∀ a → f⁻¹ (f a) ≡ a))

To make the scenario clear: we have a (total) function f : A → B that is injective but not necessarily surjective. The set B could be “larger” than A in the sense that there could be some elements b : B for which no f a equals b — that is, B may not be “fully covered.” Can we construct f⁻¹ : B → A such that f⁻¹ (f a) ≡ a for all a : A?

At the first glance it did not look like something terribly difficult to do. Given b, if it equals some f a, let f⁻¹ simply return that a. Otherwise f⁻¹ could just map b to any element in A, since this b is not used in any invocation of f⁻¹ (f a) anyway. It should be possible as long as A is not empty, right?

I tried to construct this f⁻¹ but could not help noticing something funny going on. It turns out that had this function existed, we could, again, prove the law of excluded middle. That is, for any predicate P : B → Set and any b : B, there would be a decision procedure telling us whether P b is true or not.

Provided that we assume proof irrelevance, that is.

### Excluded Middle, Again

Here we go. Let B be some type having decidable equality. That is, there exists some eqB:

eqB : (b₁ b₂ : B) → (b₁ ≡ b₂) ⊎ (¬ (b₁ ≡ b₂))

where _⊎_ is disjoint sum.

Now take some predicate P : B → Set. Let A be defined by:

A : (B → Set) → Set
A P = Σ B (λ b → P b)

That is, A P is the subset of B for which P holds. Each element of A P is a pair (b, Pb) where Pb is a proof of P b.

Finally, take

f : ∀ {P} → A P → B
f = proj₁

Thus f (b, Pb) = b.

The function f is injective if we assume proof irrelevance. That is, if f (b, Pb) = b and f (b', Pb') = b, we must have b = b' and (due to proof irrelevance) Pb = Pb', and therefore (b, Pb) = (b', Pb'). Indeed, if we postulate proof irrelevance:

postulate irr : (P : B → Set) → ∀ {b} → (p₁ : P b) → (p₂ : P b) → p₁ ≡ p₂

We can construct a proof that f is injective:

f-inj : ∀ {P} → injective (fun (f {P}))
f-inj {P} {(.b , Pb₁)} {(.b , Pb₂)} {b} refl refl = cong (λ p → (b , p)) (irr P Pb₁ Pb₂)

Assume that we have proved inv. We can now apply inv and obtain some f⁻¹, the left inverse of f.

However, with this particular choice of A, f, and f⁻¹, we can construct a deciding procedure for P. That is, for any P and b, we can determine P b holds or not:

em : {P : B → Set} → ∀ b → P b ⊎ ¬ (P b)

This is how em works. Given some b, let’s apply f⁻¹ to b. The result is a pair (b', Pb'). Let’s compare b and b' using eqB:

em {P} b with inv f (f-inj {P})
...      | (f⁻¹ , f⁻¹fa≡a) with inspect (f⁻¹ b)
...                        | (b' , Pb') with-≡ _          with eqB b b'

If b ≡ b', Pb' is a proof of P b' and also a proof of P b. Let us just return it (after some type casting):

em {P} b | (f⁻¹ , f⁻¹fa≡a) | (b' , Pb') with-≡ _          | inj₁ b≡b'  =
inj₁ (subst P (sym b≡b') Pb')

Consider the case that b does not equal b'. We want to show that P b is not true. That is, a proof of P b leads to contradiction. Assume we have a proof Pb of P b. Since f (b , P b) ≡ b, we have f⁻¹ b ≡ (b , Pb):

em {P} b | (f⁻¹ , f⁻¹fa≡a) | (b' , Pb') with-≡ b'Pb'≡f⁻¹b | inj₂ ¬b≡b' =
inj₂ (λ Pb →
let f⁻¹b≡bPb : f⁻¹ b ≡ (b , Pb)
f⁻¹b≡bPb = f⁻¹fa≡a (b , Pb)

The assumption says that f⁻¹ b = (b' , Pb'). By transitivity we have (b , Pb) ≡ (b' , Pb').

bPb≡b'Pb' : (b , Pb) ≡ (b' , Pb')
bPb≡b'Pb' = sym (trans b'Pb'≡f⁻¹b f⁻¹b≡bPb)

But if we take the first component of both sides, we get b ≡ b'. That contradicts our assumption that b does not equal b':

b≡b' : b ≡ b'
b≡b' = cong proj₁ bPb≡b'Pb'
in ¬b≡b' b≡b')

which completes the proof.

In retrospect, f⁻¹ cannot exist because for it to work, it has to magically know whether b is in the range of f, that is, whether P b is true or not.

### Nakano’s Challenge

When I talked about this to Keisuke Nakano he posed me another related challenge. Set-theoretically, we understand that if there exists an injective function f : A → B and another injective function g : B → A, the sets A and B are of the same cardinality and there ought to be a bijection A → B. Can we construct this bijection? That is, can we prove this theorem?

nakano : {A B : Set} →
(f : A → B) → injective (fun f) →
(g : B → A) → injective (fun g) →
∃ {A → B} (λ h → injective (fun h) × surjective (fun h))

I do not yet have a solution. Any one wanna give it a try?

# Proof Irrelevance, Extensional Equality, and the Excluded Middle

It was perhaps in our first lesson in constructive logic when we learnt about the absence of the law of excluded middle, which in a constructive interpretation would imply a decision procedure for every proposition. Therefore I was puzzled by the fact, stated in a number of places including the Stanford Encyclopedia of Philosophy (SEP), that axiom of choice, proof irrelevance, and extensional equality (definitions to be given later) together entail the law of excluded middle. Since a constructive variant of axiom of choice is provable in intuitionistic logic, and both proof irrelevance and extensional equality are properties we would like to have in a type system, it is worrying that they lead to implausible consequences. Thus I was curious to find out what happened.

The observational type theory promises us the best of everything — extensional equality without losing canonicity (please do implement it in Agda soon!), and it does rely on proof irrelevance. There is even an Agda embedding (or, with the information given it is not hard to reconstruct one), so I conducted some experiments in the embedding. For this post, however, it is sufficient to do everything in plain Agda and postulate the missing properties.

### Decidable Equality for All Types?

The construction in SEP makes use of functions on propositions, which is not available in the Agda embedding of observational type theory. Instead I experimented with another construction from Benjamin Werner‘s On the Strength of Proof-Irrelevant Type Theories: with (a particular interpretations of) axiom of choice and proof irrelevance, one can construct a function that, given a and b of any type A, decides whether a equals b.

This could also lead to horrifying consequences — we could even compare whether two infinite structure, or two functions are equal, in a finite amount of time.

#### Axiom of Choice

The axiom of choice, as described on the very informative homepage for the axiom maintained by Eric Schechter, is considered by some the “last great controversy of mathematics.” The axiom comes in many forms, and one of the simplest could be:

Given a collection of non-empty sets, we can choose a member from each set in that collection.

A set of B is usually represented by a characteristic function B → Set. Let the collection of non-empty sets of B‘s be indexed by A, the collection can be modelled by a function mapping indexes in A to sets of Bs, that is, a relation A → B → Set. The action of “choosing” is modelled by the existence of a function returning the choice, that is, a function taking an index in A and returning a element in B that is chosen. One possible formulation of the axiom of choice would thus be:

ac : {A B : Set} → (R : A → B → Set) →
(∀ x → ∃ (λ y → R x y)) →
(∃ {A → B} (λ f → ∀ x → R x (f x)))

In words: given a collection of sets represented by A → B → Set, if R x is non-empty for every x : A, there exists a function f : A → B, such that f x is in R x for every x : A. Another way to see it is that the axiom simply says we can refine a relation R : A → B → Set to a function A → B provided that the former is total.

It is surprising to some that in constructive logic, the axiom of choice is in fact provable:

ac R g = ((λ x → proj₁ (g x)) , λ x → proj₂ (g x))

The technical reason is that a proof of ∀ x → ∃ (λ y → R x y) contains a witness, which we can just pick as the result of the choice function. I will leave the mathematical and philosophical explanation and implications to better sources, such as the axiom of choice homepage, and the two pages in SEP on the axiom and the axiom in type theory.

#### Proof Irrelevance

We define a small language of propositions about equalities: † A is the language of propositions whose atoms are equalities between elements of type A (_≐_), and between Boolean values (_≘_):

data † (A : Set) : Set where
TT  : † A
FF  : † A
_≐_ : A → A → † A
_≘_ : Bool → Bool → † A
_∧_ : † A → † A → † A
_⇒_ : † A → † A → † A
_∨_ : † A → † A → † A
￢ : ∀ {A} → † A → † A
￢ P = P ⇒ FF

The semantics is more or less what one would expect: TT is the unit type, FF the empty type, conjunction encoded by products, and implication by functions. In particular, disjunction is encoded by the sum type:

⌈_⌉ : ∀ {A} → † A → Set
⌈ TT ⌉ = ⊤
⌈ FF ⌉ = ⊥
⌈ a ≐ b ⌉ = a ≡ b
⌈ a ≘ b ⌉ = a ≡ b
⌈ P ∧ Q ⌉ = ⌈ P ⌉ × ⌈ Q ⌉
⌈ P ⇒ Q ⌉ = ⌈ P ⌉ → ⌈ Q ⌉
⌈ P ∨ Q ⌉ = ⌈ P ⌉ ⊎ ⌈ Q ⌉

We shall define the if-and-only-if relation _⇔_ between terms of † A

_⇔_ : ∀ {A} → † A → † A → Set
P ⇔ Q = ⌈ (P ⇒ Q) ∧ (Q ⇒ P) ⌉

The current example, however, could have worked too if we simply take _⇔_ to be _≡_. Its role will be more significant for the next example.

Proof irrelevance asserts that we do not distinguish between proofs of the same proposition. Let p and q be proofs of P and Q respectively. If P and Q turn out to be equivalent propositions, then p and q must be equal:

postulate irr : ∀ {A} (P Q : † A) → (p : ⌈ P ⌉)(q : ⌈ Q ⌉) → P ⇔ Q → p ≅ q

where _≅_ stands for heterogenous equality, needed here because p and q appear to have different types. Note that in Agda, if we assume uniqueness of identity proof (i.e. refl being the only proof of a ≅ b), the axiom irr holds for TT, FF, _≐_, _≘_, and _∧_, and would be true for _⇒_ if we had extensional equality for functions, but not for disjunction _∨_.

#### Decidable Equality

For a b : A, let oneof a b be the type of things that are either a or b, paired with a proof of equality:

oneof : {A : Set} → (a b : A) → Set
oneof {A} a b = Σ A (λ x → ⌈ (x ≐ a) ∨ (x ≐ b) ⌉)

The relation Ψ a b : oneof a b → Bool → Set relates a z : oneof a b value and a Boolean e if e tells us which value z actually is:

Ψ : {A : Set} → (a b : A) → oneof a b → Bool → Set
Ψ {A} a b z e = ⌈ ((e ≘ false) ∧ (proj₁ z ≐ a)) ∨
((e ≘ true)  ∧ (proj₁ z ≐ b)) ⌉

Certainly Ψ is total: for any z : oneof a b, either true or false satisfies Ψ z. We can prove the totality:

Ψ-total : {A : Set} → (a b : A) →
(z : oneof a b) → ∃ (λ e → Ψ a b z e)
Ψ-total a b (x , inj₁ x≡a) = (false , inj₁ (refl , x≡a))
Ψ-total a b (x , inj₂ x≡b) = (true  , inj₂ (refl , x≡b))

and therefore extract, using the axiom of choice, a function that actually computes the Boolean:

Ψ-fun : {A : Set} → (a b : A) →
∃ {oneof a b → Bool}
(λ f → (z : oneof a b) → Ψ a b z (f z))
Ψ-fun a b = ac (Ψ a b) (Ψ-total a b)

Now we show how to construct a decision procedure for a ≐ b. Given a and b, we inject them to oneof a b and call the result a' and b' respectively:

eq_dec : {A : Set} → (a b : A) → ⌈ (a ≐ b) ∨ (￢ (a ≐ b)) ⌉
eq_dec {A} a b = body
where

a' : oneof a b
a' = (a , inj₁ refl)

b' : oneof a b
b' = (b , inj₂ refl)

In the function body, we extract a choice function f (and also get a proof that f satisfies the relation Ψ) and apply it to both a' and b'. The results could come in four combinations. If f a' is true, whatever f b' is, from the proof of Ψ a b a' (f a') we have a proof of a ≡ b. If f a' and f b' are both false, we also get a proof of b ≡ a:

body : (a ≡ b) ⊎ (¬ (a ≡ b))
body with Ψ-fun a b
... | (f , f⊆Ψ) with f⊆Ψ a' | f⊆Ψ b'
...  | inj₂ (fa'≡tt , a≡b) | _ = inj₁ a≡b
...  | inj₁ (fa'≡ff , a≡a) | inj₁ (fb'≡ff , b≡a) = inj₁ (sym b≡a)
...  | inj₁ (fa'≡ff , a≡a) | inj₂ (fb'≡tt , b≡b) = inj₂ (cont fa'≡ff fb'≡tt)

For the case f a' is false but f b' is true, we call cont to create a contradiction if a ≡ b.

How do we get the contradition? If we can somehow conclude a' ≡ b', by congruence we have f a' ≡ f b'. However, that gives us false ≡ f a' ≡ f b' ≡ true, which contradicts the assumption that false and true are two distinct elements in Bool:

where cont : f a' ≡ false → f b' ≡ true → a ≢ b
cont fa'≡ff fb'≡tt a≡b =
let ....
fa'≡fb' : f a' ≡ f b'
fa'≡fb' = cong f a'≡b'
in let open EqR (PropEq.setoid Bool)
in ff≢tt (begin
false ≈⟨ sym fa'≡ff  ⟩
f a'  ≈⟨ fa'≡fb' ⟩
f b'  ≈⟨ fb'≡tt ⟩
true ∎)

So the task narrows down to showing that a' ≡ b' given a ≡ b, which is where proof irrelevance comes in. Recall that a' is a paired with a proof P : (a ≐ a) ∨ (a ≐ b), and b' is b paired with a proof Q : (b ≐ a) ∨ (b ≐ b). Now that a ≡ b, the two propositions are the same, which by proof irrelevance means that their proofs must be equal too. Here is the actual proof in Agda:

P≡Q : ((a ≐ a) ∨ (a ≐ b)) ≡ ((b ≐ a) ∨ (b ≐ b))
P≡Q = let open EqR (PropEq.setoid († A))
in begin
(a ≐ a) ∨ (a ≐ b)
≈⟨ cong (λ x → (a ≐ a) ∨ (a ≐ x)) (sym a≡b) ⟩
(a ≐ a) ∨ (a ≐ a)
≈⟨ cong (λ x → (x ≐ a) ∨ (x ≐ x)) a≡b ⟩
(b ≐ a) ∨ (b ≐ b)
∎

a'≡b' : a' ≡ b'
a'≡b' = cong-× a≡b
(irr ((a ≐ a) ∨ (a ≐ b)) ((b ≐ a) ∨ (b ≐ b))
(proj₂ a') (proj₂ b') P≡Q)

So, what happened here? When I posed the question to Conor McBride, his first question was “Which variant of the axiom of choice did you use?” (“The one that has a trivial proof,” I answered). The second question was “How did you encode disjunction?” Indeed, disjunction is where we “smuggled” something that should not be there. The two proofs P : (a ≐ a) ∨ (a ≐ b) and Q : (b ≐ a) ∨ (b ≐ b) can be casted to (a ≐ a) ∨ (a ≐ a) when a ≡ b. In Agda, we should have two proofs for (a ≐ a) ∨ (a ≐ a). But proof irrelevance deploys some magic to unify P and Q to one proof only when a ≡ b, which reveals some information we can exploit.

Will we see the same phenomena in observational equality? “That’s why we don’t have disjunction in the propositions of observational type theory!” Conor replied. Or, when we do need disjunction in propositions, it must be done in a safe manner. Perhaps Conor or Thorsten Altenkirch knows how to do that?

### Excluded Middle

The construction in SEP deriving the law of excluded middle works in a similar way but in a different level — we will be reasoning about propositions and predicates. We need one more ingredient: extensional equality. Equality of propositions, as mentioned in the previous section, are based on the if-and-only-if relation _⇔_. Equality of functions, on the other hand, is defined pointwise — two functions are equal if they map (extensionally) equal values to (extensionally) equal results. In particular, equality of predicates (functions that return propositions) is given by:

postulate ext : {A B : Set} → (f g : A → † B) → (∀ a → f a ⇔ g a) → f ≡ g

Let X be the proposition for whose validity we are constructing a decision procedure. We define two predicate constants U X and V X:

U : ∀ {A} → † A → Bool → † A
U X b = X ∨ (b ≘ false)

V : ∀ {A} → † A → Bool → † A
V X b = X ∨ (b ≘ true)

And a type for predicates (on Booleans) that are known to be either U X or V X:

UorV : ∀ A → † A → Set
UorV A X = Σ (Bool → † A) (λ P → ⌈ (P ≐ U X) ∨ (P ≐ V X) ⌉)

Given a predicate P : UorV A X that is either U X or V X, the relation Φ relates P and Boolean b if P (precisely, proj₁ P) is true at b:

Φ :  ∀ {A X} → (P : UorV A X) → Bool → Set
Φ P b = ⌈ proj₁ P b ⌉

Again Φ can be shown to be total, and we may extract a choice function which, given a proposition P : UorV A X, returns a Boolean for which P is true.

Φ-total : ∀ {A X} → (P : UorV A X) → ∃ (λ b → Φ P b)
Φ-fun : ∀ {A X} → ∃ {UorV A X → Bool} (λ f → ∀ P → Φ P (f P))

Now, like in the previous example, we inject U X and V X to UorV A X:

U' : ∀ {A} X → UorV A X
U' X = (U X , inj₁ refl)

V' : ∀ {A} X → UorV A X
V' X = (V X , inj₂ refl)

To determine the validity of X, we apply f to U' X and V' X:

ex_mid : ∀ {A} → (X : † A) → ⌈ X ∨ (￢ X) ⌉
ex_mid {A} X with Φ-fun {X = X}
... | (f , f⊆Φ) with f⊆Φ (U' X) | f⊆Φ (V' X)
...   | inj₁ pX      | _ =  inj₁ pX
...   | _            | inj₁ pX = inj₁ pX
...   | inj₂ fU'≡ff  | inj₂ fV'≡tt = inj₂ negX

If either yields inj₁, we have a proof of X. For the last case, on the other hand, we have that f (U' X) is false and f (V' X) is true, and we try to create a contradiction given a proof of X. Like in the previous example, we do so by claiming U' X ≡ V' X, therefore f (U' X) ≡ f (V' X), and therefore false ≡ true:

where negX : ⌈ (￢ X) ⌉
negX pX = let ...

in ff≢tt (begin
false     ≈⟨ sym fU'≡ff   ⟩
f (U' X)  ≈⟨ fU'≡fV' ⟩
f (V' X)  ≈⟨ fV'≡tt ⟩
true ∎)

But this time we must first show that U X ≡ V X. Expending the definitions, we have U X b = X ∨ (b ≘ false) and V X b = X ∨ (b ≘ true), and recall that we have a proof of X. We may prove a lemma that:

lemma : ∀ {A}{P Q R : † A} → ⌈ P ⌉ → (P ∨ Q) ⇔ (P ∨ R)

from which we may conclude U X b ⇔ V X b and, by extensional equality of functions, U X ≡ V X:

negX pX = let U≡V : U X ≡ V X
U≡V = let UXb⇔VXb : ∀ b → U X b ⇔ V X b
UXb⇔VXb b = lemma {_}{X}{b ≘ false}{b ≘ true} pX
in ext (U X) (V X) UXb⇔VXb

The rest is the same as in the previous example.

# “General Recursion using Coinductive Monad” Got Right

In the previous post I talked about encoding general recursion in Agda using a coinductive monad. One oddity was that Agda does not automatically unfold coinductive definitions, and a solution was to manually unfold them when we need. Therefore, instead of proving, for example, for coinductively defined f:

n<k⇒fkn↓=k : forall k n -> n<′′ k -> f k n ↓= k

we prove a pair of (possibly mutually recursive) properties:

n<k⇒⟨fkn⟩↓=k : forall k n -> n<′′ k -> unfold (f k n) ↓= k
n<k⇒fkn↓=k : forall k n -> n <′′ k -> f k n ↓= k

where unfold (a user-defined function) performs the unfolding, and the second property follows from the first by ≡-subst and the fact that unfold (f k n) ≡ f k n.

While the use of unfold appears to be a direct work around, one may argue that this is merely treating the symptom. To resolve this and some other oddities, Anton Setzer argued in his letter “Getting codata right” (hence the title of this post) to the Agda mailing list that we should go back to a category theoretical view of codata, and Dan Doel soon successfully experimented the ideas. The following, however, are based on my current understanding of their work, which may not be matured yet.

### What codata Really Is

Inductive datatypes are defined by describing how a value is constructed. The following inductive definition:

data List (a : Set) : Set where
[] : List a
_∷_ : a -> List a -> List a

states that [] is a list and, given an element x and a list xs, one can construct a list x ∷ xs.

Coinductive datatypes, on the other hand, are specified by how a covalue can be observed. The following definition of stream:

codata Stream (a : Set) : Set where
[] : Stream a
_∷_ : a -> Stream a -> Stream a

while looking almost identical to List apart from the use of keyword codata, is in fact a definition very different from that of List in nature. Setzer suggests seeing it as a short-hand for the following collection of definitions (in Setzer’s proposed notation but my naming scheme):

mutual

coalg Stream (a : Set) : Set where
_* : Stream a -> StObserve a

data StObserve (a : Set) : Set where
empty : StObserve a
nonempty : a -> Stream a -> StObserve a

Notice that Stream a appears in the domain, rather than the range of _* because we are defining a coalgebra and _* is a deconstructor. My interpretation is that _* triggers an obserivation. Given a stream xs, an observation xs * yields two possible results. Either it is empty, or nonempty x xs', in the latter case we get its head x and tail xs'. The stream “constructors”, on the other hand, can be defined by:

[] : {a : Set} -> Stream a
[] * = empty

_∷_ : {a : Set} -> a -> Stream a -> Stream a
(x ∷ xs) * = nonempty x xs

which states that [] is the stream which, when observed, yields empty and x ∷ xs is the stream whose observation is nonempty x xs.

A coinductive definition:

f x ~ e

is a shorthand for

(f x)* = e *

That is, f x defines an coinductive object whose observation shall be the same as that of e. Setzer proposes explicit notation (e.g. ~ (x ∷ xs)) for pattern matching covalue. Back to current Agda, we may seen pattern matching covalue as implicitly applying _* and pattern-match the result. For example, the Agda program:

f : {a : Set} -> Stream a -> ...
f [] = ...
f (x ∷ xs) = ...

can be seen as syntax sugar for:

f : {a : Set} -> Stream a -> ...
f ys with ys *
... | empty = ...
... | nonempty x xs = ...

### Covalue-Indexed Types

What about (co)datatypes that are indexed by covalues? Take, for example, the codatatype Comp a for possibly non-terminating computation, define in my previous post:

codata Comp (a : Set) : Set where
return : a -> Comp a
_⟩⟩=_ : Comp a -> (a -> Comp a) -> Comp a

and the datatype _↓=_ claiming that certain computation terminates and yields a value:

data _↓=_ {a : Set} : Comp a -> a -> Set where
↓=-return : forall {x} -> (return x) ↓= x
↓=-bind : forall {m f x y} ->
m ↓= x -> (f x) ↓= y -> (m ⟩⟩= f) ↓= y

What is the definition actually trying to say? Given m : Comp a and x, m ↓= x is a type. In the case that m is return x, there is an immediate proof, denoted by ↓=-return, of (return x) ↓= x (that is, return x terminates with value x). For the case m ⟩⟩= f, we can construct its termination proof, denoted by constructor ↓=-bind, from termination proofs of m and f x. Setzer suggests the following definition, which corresponds more literally to the verbal description above:

mutual
_↓=_ : {a : Set} -> Comp a -> a -> Set
return x ↓= y = ↓=-Return x y
(m ⟩⟩= f) ↓= x = ↓=-Bind m f x

data ↓=-Return {a : Set} : a -> a -> Set where
↓=-return : forall {x} -> ↓=-Return x x

data ↓=-Bind {a : Set} : Comp a -> (a -> Comp a) -> a -> Set where
↓=-bind : forall {m f x y} ->
m ↓= x -> (f x) ↓= y -> ↓=-Bind m f y

Now ↓=-return and ↓=-bind are the only constructors of their own types, and _↓=_ maps Comp a typed covalues to their termination proofs according to their observations. Notice that ↓=-Return is exactly the built-in identity type _≡_: return x terminates with y if and only if x equals y.

For more examples, compare the valueless termination type _↓ in the previous post:

data _↓ {a : Set} : Comp a -> Set where
↓-return : forall {x} -> (return x) ↓
↓-bind : forall {m f} ->
m ↓ -> (forall x -> m ↓= x -> f x ↓) -> (m ⟩⟩= f) ↓

and its “correct” representation:

mutual

data _↓ : {a : Set} -> Comp a -> Set
return x ↓ = ↓-Return x
(m ⟩⟩= f) ↓ = ↓-Bind m f

data ↓-Return {a : Set} : a -> Set where
↓-return : forall {x} -> ↓-Return x

data ↓-Bind {a : Set} : Comp a -> (a -> Comp a) -> Set where
↓-bind : forall {m f} ->
m ↓ -> (forall x -> m ↓= x -> f x ↓) -> ↓-Bind m f

Here is a non-recursive example. The datatype Invoked-By characterising sub-computation relation is written:

data Invoked-By {a : Set} : Comp a -> Comp a -> Set where
invokes-prev : forall {m f} -> Invoked-By m (m ⟩⟩= f)
invokes-fun : forall {m f x} -> m ↓= x -> Invoked-By (f x) (m ⟩⟩= f)

Like above, we define a function for dispatching Comp a covalues according to their observation:

data Invoked-By-Bind {a : Set} : Comp a -> Comp a -> (a -> Comp a) -> Set where
invokes-prev : forall {m f} -> Invoked-By-Bind m m f
invokes-fun : forall {m f x} -> m ↓= x -> Invoked-By-Bind (f x) m f

invoked-by : {a : Set} -> Comp a -> Comp a -> Set
invoked-by m (return x) = ⊥
invoked-by m (n ⟩⟩= f) = Invoked-By-Bind m n f

Indeed, return x consists of no sub-computations, therefore invoked-by m (return x) is an empty relation. On the other hand, there are two possibilities invoked-by m (n ⟩⟩= f), represented by the two cases of Invoked-By.

### Termination Proof

The lengthy discussion above is of more than pedantic interest. While the mutually exclusive indexed type definitions may look rather confusing to some, the effort pays off when we prove properties about them. Redoing the exercises in the previous post using the new definitions, the first good news is that the definitions of Safe, ↓-safe, eval-safe, and eval stay almost the same. Second, we no longer have to split the termination proofs into pairs.

Take for example the McCarthian Maximum function:

f : ℕ -> ℕ -> Comp ℕ
f k n with k ≤′′? n
... | inj₁ k≤n ~ return n
... | inj₂ n<k ~ f k (suc n) ⟩⟩= \x -> f k x

Its termination proof is based on two lemmas, together they state that f k n terminates with the maximum of k and n:

k≤n⇒fkn↓=n : forall k n -> k ≤′′ n -> f k n ↓= n
k≤n⇒fkn↓=n k n k≤n with k ≤′′? n
... | inj₁ _ = ↓=-return
... | inj₂ n<k = ⊥-elim (¬[m<n∧n≤m] (n<k , k≤n))

n<k⇒fkn↓=k : forall k n -> n <′′ k -> f k n ↓= k
n<k⇒fkn↓=k k n n<k with k ≤′′? n
n<k⇒fkn↓=k k n n<k | inj₁ k≤n = ⊥-elim (¬[m<n∧n≤m] (n<k , k≤n))
n<k⇒fkn↓=k ._ n _ | inj₂ ≤′′-refl =
↓=-bind {_}{f (suc n) (suc n)}{\x -> f (suc n) x}{suc n}
(k≤n⇒fkn↓=n (suc n) (suc n) ≤′′-refl)
(k≤n⇒fkn↓=n (suc n) (suc n) ≤′′-refl)
n<k⇒fkn↓=k ._ n ≤′′-refl | inj₂ (≤′′-step 2+n≤n) =
⊥-elim (¬1+n≤n 2+n≤n)
n<k⇒fkn↓=k k n (≤′′-step 2+n≤k) | inj₂ (≤′′-step _) =
↓=-bind {_}{f k (suc n)}{\x -> f k x}{k}
(n<k⇒fkn↓=k k (suc n) 2+n≤k)
(k≤n⇒fkn↓=n k k ≤′′-refl)

### Divergence Proof

For some reason (lack of space, perhaps?), Megacz did not talk about divergence in his PLPV paper. For completeness, let us give it a try.

A return command always terminates. On the other hand, m ⟩⟩= f diverges if either m does, or m ↓= x and f x diverges. This is expressed as:

mutual
_↑ : {a : Set} -> Comp a -> Set
return x ↑ = ⊥
(m ⟩⟩= f) ↑ = ↑-Bind m f

codata ↑-Bind {a : Set} : Comp a -> (a -> Comp a) -> Set where
↑-prev : forall {m f} -> m ↑ -> ↑-Bind m f
↑-fun : forall {m f} ->
(forall x -> m ↓= x -> f x ↑) -> ↑-Bind m f

For an example, the following is a proof that length diverges when applied to an infinite stream:

length : {a : Set} -> Stream a -> Comp ℕ
length [] ~ return zero
length (x ∷ xs) ~ length xs ⟩⟩= \n -> return (suc n)

ones : Stream ℕ
ones ~ 1 ∷ ones

length-ones↑ : length ones ↑
length-ones↑ ~ ↑-prev {_}{length ones} length-ones↑

The implicit argument length ones must be given. Otherwise length-ones↑ fails to typecheck when the file is loaded, although it does typecheck when length-ones↑ was stepwisely constructed using the “Give” command. I do not know whether it is a bug in Agda.

# General Recursion using Coindutive Monad

Agda, like many dependently typed functional languages, recognises a variation of structural recursion. We need some tricks to encode general recursion. With well-founded recursion, you add an extra argument to the general-recursive function you would like to define. The extra argument is a proof that one input of the function and the successive value passed to the recursive call are related by a well-founded ordering. Therefore, the recursion cannot go on forever. The function passes the termination check because it is structural recursive on the proof term.

I like well-founded recursion because it relates well with how, in my understanding, termination proofs are done in relational program derivation. For real programming, however, the extra argument clutters the code. For more complex cases, such as the McCarthian maximum function where the termination proof depends on the result of the function, the function has to be altered in ways making it rather incomprehensible. It would be nice to if we could write the program and prove its termination elsewhere.

Adam Megacz published an interesting paper: A Coinductive Monad for Prop-Bounded Recursion, in which he summarised a number of approaches to perform general recursion in a dependently typed language before presenting his own, an extension to Venanzio Capretta‘s General Recursion via Coinductive Types. As a practice, I tried to port his code to Agda.

### A Monad for Non-Terminating Computation

The codatatype Comp, representing a possibly non-terminating computation, is a monad-like structure:

codata Comp (a : Set) : Set where
return : a -> Comp a
_⟩⟩=_ : Comp a -> (a -> Comp a) -> Comp a

While the monadic bind operator has type M a -> (a -> M b) -> M b, the _⟩⟩=_ operator for Comp has a more specialised type. This suffices when we use it only at the point of recursive calls. I suspect that we may need to generalise the type when we construct mutually recursive functions.

General-recursive programs are written in monadic style. For example, the function div, for integral division, can be written as:

div : ℕ -> ℕ -> Comp ℕ
div zero n ~ return zero
div (suc m) zero ~ return (suc m)
div (suc m) (suc n) ~
div (suc m ∸ suc n) (suc n) ⟩⟩= \h ->
return (suc h)

An example harder to tackle would be McCarthy’s 91 function. To be consistent with my earlier blog entry (well, the real reason is that I am lazy. And Megacz has proved the termination of the 91 function anyway), consider the following simplification:

f : ℕ -> ℕ -> Comp ℕ
f k n with k ≤′′? n
... | inj₁ k≤n ~ return n
... | inj₂ n<k ~ f k (suc n) ⟩⟩= \x -> f k x

The function, seen in Bengt Nordström‘s Terminating General Recursion, returns the maximum of k and n, and will be referred to as the McCarthian Maximum function.

### Terminating Computation

In the definitions of div and f, we have merely built some (potentially infinite) codata objects representing the computations. Now we will build an interpreter that would actually carry out a computation, provided that it is terminating. The idea here is similar to what we did in well-founded recursion: construct a proof that the computation terminates, and the interpreter would be structurally recursive on the proof term.

The datatype _↓ characterises terminating computations:

data _↓ {a : Set} : Comp a -> Set where
↓-return : forall {x} -> (return x) ↓
↓-bind : forall {m f} ->
m ↓ -> (forall x -> m ↓= x -> f x ↓) -> (m ⟩⟩= f) ↓

Certainly, return x for all x always terminates. For m ⟩⟩= f to terminate, both m and f, given the input from m, must terminate. It would be too strong to demand that f x terminates for all x. We only need f x to terminate for those x that could actually be returned by m. For that we define another datatype _↓=_ such that m ↓= x states that a computation m terminates and yields a value x:

data _↓=_ {a : Set} : Comp a -> a -> Set where
↓=-return : forall {x} -> (return x) ↓= x
↓=-bind : forall {m f x y} ->
m ↓= x -> (f x) ↓= y -> (m ⟩⟩= f) ↓= y

I also needed the following lemmas. The lemma ↓=-unique states that if a computation yields only one value, if it terminates at all:

↓=-unique : {a : Set} {m : Comp a} {x y : a} ->
m ↓= x -> m ↓= y -> x ≡ y
↓=-unique ↓=-return ↓=-return = ≡-refl
↓=-unique {x = x} (↓=-bind m↓=x fx↓=y) (↓=-bind m↓=x' fx'↓=y')
with ↓=-unique m↓=x m↓=x'
... | ≡-refl with ↓=-unique fx↓=y fx'↓=y'
... | ≡-refl = ≡-refl

The next lemma says that a computation that yields a value terminates:

↓=⇒↓ : {a : Set} {m : Comp a} {x : a} -> m ↓= x -> m ↓
↓=⇒↓ ↓=-return = ↓-return
↓=⇒↓ (↓=-bind {m}{f} m↓=x fx↓=y) =
↓-bind (↓=⇒↓ m↓=x)
(\z m↓=z -> ≡-subst (\w -> f w ↓) (↓=-unique m↓=x m↓=z) (↓=⇒↓ fx↓=y))

### Safe Computation

Now we construct a function eval evaluating terminating computations. Recall, however, that an important thing about well-founded recursion is that the datatype Acc consists of only one constructor. Thus, like the newtype construct in Haskell, the constructor is superfluous and can be optimised away by the compiler. For eval, we would like the proof of termination to be a one-constructor type too. We define the following datatype Safe. A computation is safe, if all its sub-computations are:

data Safe {a : Set} : Comp a -> Set where
safe-intro : forall m ->
(forall m' -> InvokedBy m' m -> Safe m') -> Safe m

Given computations m and m', the relation InvokedBy m' m holds if m' is a sub-computation of m. The relation InvokedBy can be defind by:

data InvokedBy {a : Set} : Comp a -> Comp a -> Set where
invokes-prev : forall {m f} -> InvokedBy m (m ⟩⟩= f)
invokes-fun : forall {m f x} ->
m ↓= x -> InvokedBy (f x) (m ⟩⟩= f)

Notice that Safe is exactly a specialised version of Acc. Safe m states that m is accessible under relation InvokedBy. That is, there is no infinite chain of computations starting from m.

Programmers prove termination of programs by constructing terms of type _↓. The following lemma states that terminating computations are safe:

↓-safe : {a : Set} -> (m : Comp a) -> m ↓ -> Safe m
↓-safe (return x) ↓-return = safe-intro _ fn
where fn : (m' : Comp _) -> InvokedBy m' (return x) -> Safe m'
fn _ ()
↓-safe (m ⟩⟩= f) (↓-bind m↓ m↓=x⇒fx↓) = safe-intro _ fn
where fn : (m' : Comp _) -> InvokedBy m' (m ⟩⟩= f) -> Safe m'
fn .m invokes-prev = ↓-safe m m↓
fn ._ (invokes-fun m↓=x) = ↓-safe _ (m↓=x⇒fx↓ _ m↓=x)

We can then use the lemma to convert m ↓ to Safe m, which will be used by eval-safe:

eval-safe : {a : Set} -> (m : Comp a) -> Safe m -> ∃ (\x -> m ↓= x)
eval-safe (return x) (safe-intro ._ safe) = (x , ↓=-return)
eval-safe (m ⟩⟩= f) (safe-intro ._ safe)
with eval-safe m (safe m invokes-prev)
... | (x , m↓=x) with eval-safe (f x) (safe (f x) (invokes-fun m↓=x))
...   | (y , fx↓y) = (y , ↓=-bind m↓=x fx↓y)

It is structurally recursive because safe is a sub-component of safe-intro ._ safe. Our eval function simply calls eval-safe:

eval : {a : Set} (m : Comp a) -> m ↓ -> a
eval m m↓ with eval-safe m (↓-safe m m↓)
... | (x , m↓=x) = x

### Example: Div

Recall the function div:

div : ℕ -> ℕ -> Comp ℕ
div zero n ~ return zero
div (suc m) zero ~ return (suc m)
div (suc m) (suc n) ~
div (suc m ∸ suc n) (suc n) ⟩⟩= \h ->
return (suc h)

To prove its termination we use strong induction: if div k n terminates for all k < m, we have div m n terminates as well. The strong induction principle is given by:

strong-induction : (P : ℕ -> Set) ->
(forall n -> (forall m -> m <′ n -> P m) -> P n) ->
(forall n -> P n)
strong-induction P Pind n = Pind n (ind n)
where ind : forall n -> forall m -> m <′ n -> P m
ind zero _ ()
ind (suc n) ._ ≤′-refl = strong-induction P Pind n
ind (suc n) m (≤′-step m<n) = ind n m m<n

Notice how it resembles the accessibility proof ℕ<-wf. Indeed, accessibility is strong induction. We are doing the same proof, but organised differently.

There is a little extra complication, however. In Agda, co-recursively defined functions do not automatically unfold. We have to unfold them manually, a trick mentioned byUlf Norell and Dan Doel on the Agda mailing list. The following function unfold, through pattern matching, expands the definition of a coinductively defined function:

unfold : {a : Set} -> Comp a -> Comp a
unfold (return x) = return x
unfold (m ⟩⟩= f) = m ⟩⟩= f

To prove a property about m, we could prove the same property for unfold m, and then use the fact that unfold m ≡ m, proved below:

≡-unfold : {a : Set} {m : Comp a} -> unfold m ≡ m
≡-unfold {_}{return x} = ≡-refl
≡-unfold {_}{m ⟩⟩= f} = ≡-refl

Here is the proof that div m n terminates for all m, using strong induction:

div↓ : forall m n -> div m n ↓
div↓ m n = strong-induction (\m -> div m n ↓)
(\m f -> ≡-subst _↓ ≡-unfold (ind m f)) m
where
ind : forall m -> (forall k -> k <′ m -> div k n ↓) ->
unfold (div m n) ↓
ind zero f = ↓-return
ind (suc m) f with n
... | zero = ↓-return
... | (suc n') = ↓-bind (f (suc m ∸ suc n') (sm-sn<sm m n'))
(\h _ -> ↓-return)

where sm-sn<sm : forall m n -> (m ∸ n) <′ suc m.

### Example: McCarthian Maximum

Now, let us consider the McCarthian Maximum:

f : ℕ -> ℕ -> Comp ℕ
f k n with k ≤′′? n
... | inj₁ k≤n ~ return n
... | inj₂ n<k ~ f k (suc n) ⟩⟩= \x -> f k x

Two peculiarities made this function special. Firstly, for n < k, the value of n gets closer to k at each recursive call. The function terminates immediately for n ≥ k. Therefore, to reason about its termination, we need a downward induction, where n = k is the base case and termination for n = i is inducted from n = i+1, i+2, .. , k. To make this induction easier, I created another datatype expressing "less than or equal to":

data _≤′′_ : ℕ -> ℕ -> Set where
≤′′-refl : forall {n} -> n ≤′′ n
≤′′-step : forall {m n} -> suc m ≤′′ n -> m ≤′′ n

_<′′_ : ℕ -> ℕ -> Set
m <′′ n = suc m ≤′′ n

Secondly, to reason about the termination of f we need information about the value it returns. In the branch for inj₂ n<k, termination of the first recursive call follows from downward induction: suc n is closer to k than n. On the other hand, we know that the second call terminates only after we notice that f returns the maximum of its arguments. Therefore, f k (suc n) yields k, and f k k terminates.

I proved the termination of f in two parts, respectively for the case when k ≤ n, and n < k:

k≤n⇒fkn↓=n : forall k n -> k ≤′′ n -> f k n ↓= n
n<k⇒fkn↓=k : forall k n -> n <′′ k -> f k n ↓= k

It then follows immediately that:

f↓ : forall k n -> f k n ↓
f↓ k n with k ≤′′? n
... | inj₁ k≤n = ↓=⇒↓ (k≤n⇒fkn↓=n k n k≤n)
... | inj₂ n<k = ↓=⇒↓ (n<k⇒fkn↓=k k n n<k)

Both k≤n⇒fkn↓=n and n<k⇒fkn↓=k, on the other hand, were proved in two stages. I needed to prove an unfolded version of the lemma, which may or may not be mutually recursive on the lemma to be proved. Take n<k⇒fkn↓=k for example:

mutual
n<k⇒⟨fkn⟩↓=k : forall k n -> n<′′ k -> unfold (f k n) ↓= k
n<k⇒⟨fkn⟩↓=k k n n<k = ...{- tedious proof, I don't think you want to read it -}

n<k⇒fkn↓=k : forall k n -> n <′′ k -> f k n ↓= k
n<k⇒fkn↓=k k n n<k =
≡-subst (\m -> m ↓= k) (≡-unfold {_}{f k n}) (n<k⇒⟨fkn⟩↓=k k n n

As mentioned above, this two-stage pattern was discussed on the Agda mailing list. I do not know whether this will be a recommended programming pattern. Some follow up discussions seemed to show that this is not always necessary. I have got to read more.

# Typed λ Calculus Interpreter in Agda

Update (Oct. 5, 2010): updated the code to use the current version of Standard Library (with universe polymorphism); index the environment by typing context.

It has been a typical exercise for beginners to write an evaluator for some variation of λ calculus. In a recent meeting I talked to my assistants about using dependent type to guarantee that, in an evaluator for λ calculus using de Bruin indices, that variable look-up always succeeds. The basic idea is to represent the environment using the Vec datatype, and label each λ terms with the number of enclosing λ’s, which must match the length of the environment.

I thought it could also be an easy exercsie for them to write up such an evaluator. Yet again, things turned out to be just a little bit more harder than I expected, thus I am recording the code here for reference. I do not remember where I have seen similar code before, but I believe the following is more or less standard.

### Evaluating Simply-Typed λ Calculus

Take simply-typed λ calculus with natural numbers as the only base type. It is common in Haskell to use such a tagged value type:

data Val = Num Int | Fun (Val -> Val)

However, Val is not a legistimate type in Agda, since recursion occurs in negative position in the case of Fun. Indeed, Val -> Val is a “larger” type than Val, and we would need a more complicated semantics to accommodate the former into the latter.

Oleg Kiselyov suggested defunctionalising Val. For example, represent the value as either

data Val = Num Int | Closure [Val] Term

or

data Val n = Num n Int | Closure [Val m] (Val (n+1))

I may try these alternatives later. At the time when I wrote the code, the right thing to do seemed to be switching to a tagless representation for values and integrate type checking into the term type.

The type Term Γ a represents a term that, in the typing context Γ, evaluates to a value having type a. Since we are about to use the de Bruin notation, the typing context is simply a vector of types:

Cxt : ℕ → Set1
Cxt n = Vec Set n

Note that it is a vector storing types (in Set1) rather than values (in Set). Before Agda introduced universe polymorphism, one would have to use Vec₁, a Set1 variation of Vec.

The type Term Γ a can then be defined by:

data Term {n : ℕ} (Γ : Cxt n) : Set → Set1 where
lit : ℕ → Term Γ ℕ
var : (x : Fin n) → Term Γ (lookup x Γ)
_·_ : ∀ {a b} → Term Γ (a → b) → Term Γ a → Term Γ b
add : Term Γ ℕ → Term Γ ℕ → Term Γ ℕ
ƛ : ∀ {a b} → Term (a ∷ Γ) b → Term Γ (a → b)

The type Fin n denotes natural numbers between 0 and n-1. Notice that in a (de Bruin) variable var x, the value of x is bounded by n, the length of the typing context. The lookup function is thus guaranteed to success. The typing context is extended with one more element within each ƛ. The type Vec, its look-up function, and the type Fin for bounded natural numbers, are all defined in the standard library:

data Vec {a} (A : Set a) : ℕ → Set a where
[] : Vec A zero
_∷_ : ∀ {n} (x : A) (xs : Vec A n) → Vec A (suc n)

data Fin : ℕ → Set where
zero : {n : ℕ} → Fin (suc n)
suc : {n : ℕ} (i : Fin n) → Fin (suc n)

lookup : ∀ {a n} {A : Set a} → Fin n → Vec A n → A
lookup zero (x ∷ xs) = x
lookup (suc i) (x ∷ xs) = lookup i xs

Now the evaluator. We need an environment allowing us to store heterogeneous values. One possibility is to index the environment by typing contexts: given a typing context Γ of length n, Env Γ is essentially a vector of n values whose types match those in Γ:

data Env : {n : ℕ} → Cxt n → Set1 where
[] : Env []
_∷_ : ∀ {a n} {Γ : Cxt n} → a → Env Γ → Env (a ∷ Γ)

Agda allows us to reuse the constructors [] and _∷_. For an example, 0 ∷ 1 ∷ (λ v → suc v) ∷ [] may have type Env (ℕ ∷ ℕ ∷ (ℕ → ℕ) ∷ []).

We also need a function that looks up an environment:

envLookup : ∀ {n}{Γ : Cxt n} → (i : Fin n) → Env Γ → lookup i Γ
envLookup zero (v ∷ ρ) = v
envLookup (suc n) (v ∷ ρ) = envLookup n ρ

Notice the resulting type lookup i Γ — the type of the ith element in an environment is the ith type in the typing context.

With the setting up, the evaluator looks rather standard:

eval : ∀ {a n} {Γ : Cxt n} → (ρ : Env Γ) → Term Γ a → a
eval ρ (lit m) = m
eval ρ (var x) = envLookup x ρ
eval ρ (e₁ · e₂) = (eval ρ e₁) (eval ρ e₂)
eval ρ (add e₁ e₂) = (eval ρ e₁) + (eval ρ e₂)
eval ρ (ƛ e) = λ v → eval (v ∷ ρ) e

### An Environment as a Vector of Type-Value Pairs

In my earlier attempt I tried to reuse the Vec type for the environment as well. After all, an environment can be seen as a list of (type, value) pairs:

Env : ℕ → Set1
Env n = Vec (Σ Set (λ a → a)) n

where Σ is a pair where the type of its second component may depend on the first. The two projection functions of a pair are respectively proj₁ and proj₂. In Σ Set (λ a → a), the type of the second component is exactly the first, that is, we may have environments like ρ = (ℕ , 0) ∷ (ℕ , 1) ∷ (ℕ → ℕ , (λ v → suc v)) ∷ []. To extract the ith value, we simply apply proj₂ (lookup i ρ), which by the definition of Σ has type proj₁ (lookup i ρ). Applying map proj₁ to ρ gives us the “typing context” Env (ℕ ∷ ℕ ∷ (ℕ → ℕ) ∷ []). The eval function now has type:

eval : ∀ {a n} → (ρ : Env n) → Term (map proj₁ ρ) a → a

In dependently typed programming, however, it is not always easy to reuse general purpose datatypes. In this case, the price for reusing Vec and lookup is an additional proof obligation in the var x case of eval:

eval : ∀ {a n} → (ρ : Env n) → Term (map proj₁ ρ) a → a
eval ρ (lit m) = m
eval ρ (var x) rewrite lookup-proj x ρ = proj₂ (lookup x ρ)
eval ρ (e₁ · e₂) = eval ρ e₁ (eval ρ e₂)
eval ρ (add e₁ e₂) = (eval ρ e₁) + (eval ρ e₂)
eval ρ (ƛ e) = λ v → eval ((_ , v) ∷ ρ) e

The reason is that the right-hand side, proj₂ (lookup x ρ), has type proj₁ (lookup x ρ), while the type checker, having just matched the var constructor, expects something having type lookup x (map proj₁ ρ). We thus have to convince Agda that they are actually the same type:

lookup-proj : ∀ {n} (x : Fin n) (ρ : Env n) →
lookup x (map proj₁ ρ) ≡ proj₁ (lookup x ρ)
lookup-proj zero (_ ∷ _) = refl
lookup-proj (suc i) (_ ∷ xs) = lookup-proj i xs

### Higher-Order Abstract Syntax

My original purpose was to demonstrate the use of vectors to guarantee static properties. If we decide not to use de Bruin notation and switch to higher-order abstract syntax instead, we could have a much simpler evaluator:

data Term : Set → Set1 where
lit : ℕ → Term ℕ
var : ∀ {a} → a → Term a
_·_ : ∀ {a b} → Term (a → b) → Term a → Term b
add : Term ℕ → Term ℕ → Term ℕ
ƛ : ∀ {a b} → (a → Term b) → Term (a → b)

eval : ∀ {a} → Term a → a
eval (lit n) = n
eval (var v) = v
eval (e₁ · e₂) = eval e₁ (eval e₂)
eval (add e₁ e₂) = eval e₁ + eval e₂
eval (ƛ e) = λ x → eval (e x)

It is a bit odd to me that var and lit are actually the same thing. Nothing forbids you from writing λ(\x -> lit x) or λ(\x -> var 3).

Oleg Kiselyov has written a lot on typed tagless interpreters. Besides the initial representation above, there is a dual final representation. The purpose is to experiment with staged interpreters. Their paper Finally Tagless, Partial Evaluated: Tagless Staged Interpreters for Simpler Typed Languages is a good read which I would recommend to people.

### Code

• Eval.agda: an early version, using Vec for the environment.
• Eval2.agda: using higher-order abstract syntax.
• Eval3.agda: the code presented in the main text of this post.

# Tail-Recursive, Linear-Time Fibonacci

For two consecutive years I have been responsible for teaching the program derivation course in the FLOLAC summer school. It was an exam question last year to derive a linear-time solution computing the Fibonacci function, whose definition we all know very well:

$fib 0 = 0$
fib 1 = 1
fib (n+2) = fib (n+1) + fib n

I would like the students to see the derivation as an instance of tupling: to generalise the targeted function by having it return more results, hoping that the generalisation can actually be performed faster. For this problem, one may consider the generalisation:

$fib2 n = \left(fib n, fib \left(n+1\right)\right)$

from which we may easily construct a recursive definition for fib2 by simply unfolding and folding the definitions. Once we finish, take fib = fst . fib2.

### Tupling and Fold Fusion

As a practice, however, I instructed the students to see this as an instance of fold fusion with the fold on natural numbers:

$foldN f e 0 = e$
foldN f e (n+1) = f (foldN f e n)

The derivation goes like

=    { since id = foldN (1+) 0 }
fib2 . foldN (1+) 0
=    { fold fusion, see below }
foldN step (0,1)

To derive step through the fusion condition, we reason:

=    { def. of fib2 }
(fib (n+1), fib (n+2))
=    { def. of fib }
(fib (n+1), fib (n+1) + fib n)
=    { let step (x,y) = (y,x+y) }
step (fib n, fib (n+1))
=    { def. of fib2 }
step (fib2 n)

Therefore, we have fib2 = foldN step (0,1) where step (x,y) = (y,x+y).

Oleg Kiselyov also used a similar definition of fib as a fold (on a list of length n) as an example in his library of fold transformers.

### Accumulating Parameters

I was perhaps so happy with seeing fib as a fold-fusion that I forgot one may want a tail recursive definition. This is one of the exercises of the functional programming course (taught by Kung Chen) this year, where the students may just write down the (obvious) tail recursive, linear-time function computing Fibonacci. The hint was to “introduce two accumulating parameters.” Can I derive this definition?

When I taught the students about accumulating parameters, I also told them to “look for some generalisation by adding more parameters.” The canonical example is perhaps generalising quadratic time reverse to fastrev xs ys = reverse xs ++ ys. For fib, it appeared reasonable to generalise the base cases. I was later told by Oleg that this function is called gib:

$gib 0 x y = x$
gib 1 x y = y
gib (n+2) x y = gib (n+1) x y + gib n x y

From this definition, however, a linear-time implementation of gib does not follow immediately. We will need this lemma:

$gib n \left(fib k\right) \left(fib \left(k+1\right)\right) = fib \left(n+k\right)$(1)

which can be proved by induction on n. For an intuitive understanding, it says that gib allows us to compute Fibonacci not only from the base cases 0 and 1, but also from the middle: if we take fib k and fib (k+1) as base cases, we get fib (n+k).

The property (1) came from some slides Kung Chen showed me, originated from John Hughes. John Hughes actually suggested to start with (1) as the specification. Either way, I wonder how one could come up with (1) in the first place. Given the definition of gib, (1) can be proved by induction. The rest of the derivation is not too hard. This is the derived program. Details omitted.

$gib 0 x y = x$
gib (n+1) x y = gib n y (x+y)

### Tail-Recursive Fibonacci

Meanwhile, Josh Ko gave me this generalisation:

$ffib n x y = x × fib n + y × fib \left(n+1\right)$

The motivation came from, perhaps, the observation that every fib n can always be expressed as an linear combination of some smaller fib i and fib j:

$fib 2 = fib 1 + fib 0$
fib 3 = fib 2 + fib 1 = 2 × fib 1 + fib 0
fib 4 = fib 3 + fib 2 = 2 × fib 2 + fib 1 = 3 × fib1 + 2 × fib 0
:

Given the definition, it is immediate that ffib 0 x y = y and:

$ffib \left(n+1\right) x y$
= x × fib (n+1) + y × fib (n+2)
= x × fib (n+1) + y × fib (n+1) + y × fib n
= (x+y) × fib (n+1) + y × fib n
= ffib n y (x+y)

Therefore we have derived:

$ffib 0 x y = y$
ffib (n+1) x y = ffib n y (x+y)

which is more or less what one would expect. It differs from the derived linear-time implementation of gib only on the base case.

Finally, since gib and ffib define similar functions, ffib satisfies a property similar to (1):

$ffib n \left(fib k\right) \left(fib \left(k+1\right)\right) = fib \left(n+k+1\right)$(2)

The proof proceeds by induction on n. The base case is trivial. For the inductive case:

= fib k × fib (n+1) + fib (k+1) ×fib (n+2)
= fib k × fib (n+1) + fib (k+1) × fib (n+1) + fib (k+1) × fib n
= (fib k + fib (k+1)) × fib (n+1) + fib (k+1) × fib n
= ffib n (fib (k+1)) (fib (k+2))
=   { induction }
fib (n+k+2)

thus completes the proof.

# A Simple Exercise using the Modular Law

Josh needs a property for which there is an obvious pointwise proof. He wonders whether it is possible to prove it in point-free style. I haven’t practised point-free proofs for a while, so I’d give it a try.

The actual property he needs is that FT . C ⊆ C . FT . C where C is a coreflexive, given T ⊆ ⦇S⦈ and F⦇S⦈ . C ⊆ C . F⦇S⦈ . C. It is sufficient to prove this generalised property:

Intuitively speaking, if C covers the range of R and S ⊆ R, then C covers the range of S too.

Well, the proof goes:

= { since C . C = C }
C . C . S
{ since X ⊇ X ∩ Y }
C . ((C . S) ∩ R)
{ modular law X ∩ (Y.Z) ⊆ Y . ((Y˘.X) ∩ Z), and C˘ = C }
S ∩ (C . R)
{ given: R ⊆ C . R }
S ∩ R
= { given: S ⊆ R }
S

It is hard to explain the intuition behind the use of modular law. If we take C as a predicate, the property can be written pointwisely as:

where a and b are both universally quantified. We may simplify b S a ⇒ C b ∧ b S a to b S a ⇒ C b, but that makes it more difficult to see the connection. Essentially, I was mimicking the following obvious proof:

{ strengthening }
C b ∧ b S a ∧ b R a
{ since b R a ⇒ C b ∧ b R a }
b S a ∧ b R a
{ since b S a ⇒ b R a }
b S a

The strengthening step corresponds to the use of X ⊇ X ∩ Y, while the next two steps respectively correspond to the last two steps of the point-free proof. The modular law, implicit in the pointwise proof, was used to group C and R to the right places.