# A “Side-Swapping” Lemma Regarding Minimum, Using Enriched Indirect Equality

Yu-Han Lyu and I were studying some paper from the algorithm community, and we noticed a peculiar kind of argument. For a much simplified version, let X and D be two relations of type A → B, denoting two alternative approaches to non-deterministically compute possible solution candidates to a problem. Also let be a transitive relation on B, and its converse. The relation min ≤ : {B} → B, given a set, returns one of its elements that is no larger (under ) than any elements in the set, if such a minimum exists.
We would like find solution as small as possible under .

When arguing for the correctness of its algorithm, the paper we are studying claims that the method X is no worse than D in the following sense: if every solution returned by D is no better than some solution returned by X, which we translate to:

D ⊆ ≥ . X

then the best (smallest) solution by X must be no worse than (one of the) best solutions returned by D:

min ≤ . ΛX ⊆ ≤ . min ≤ . ΛD

where Λ converts a relation A → B to a function A → {B} by collecting its results to a set. Note that, awkwardly, X and D are swapped to different sides of relational inclusion.

“What? How could this be true?” was my first reaction. I bombarded Yu-Han with lots of emails, making sure that we didn’t misinterpret the paper. An informal way to see it is that since every result of D is outperformed by something returned by X, collectively, the best result among the latter must is “lower-bounded” by the optimal result of D. But this sounds unconvincing to me. Something is missing.

### Totality and Well-Boundedness

It turns out that the reasoning can be correct, but we need some more constraints on D and . Firstly, D must yield some result whenever X does. Otherwise it could be that D ⊆ ≥ . X is true but ΛD returns an empty set, while ΛX still returns something. This is bad because X is no more a safe alternative of D — it could sometimes do too much. One way to prevent it from happening so is to demand that ΛD = dom ∈ . ΛD, where is the membership relation, and dom ∈, the domain of , consists only of non-empty sets. It will be proved later that this is equivalent to demanding that D be total.

Secondly, we need to be sure that every non-empty set has a minimum, or min ≤ always yields something for non-empty sets. Therefore min ≤ . ΛD would not fall back to the empty relation. Formally, it can be expressed as dom ∈ = dom (min ≤). Bird and de Moor called this property well-boundedness of .

Recall that min ≤ = ∈ ∩ ≤/∋. The part guarantees that min ≤ returns something that is in the given set, while ≤/∋ guarantees that the returned value is a lower-bound of the given set. Since ΛD (as well as ΛX) is a function, we also have min ≤ . ΛD = D ∩ ≤/D°, following from the laws of division.

Later we will prove an auxiliary lemma stating that if is well-bounded, we have:

≤/∋ . dom ∈   ⊆   ≤ . min ≤ . dom ∈

The right-hand side, given a non-empty list, takes its minimum and returns something possibly smaller. The left-hand side merely returns some lower-bound of the given set. It sounds weaker because it does not demand that the set has a minimum. Nevertheless, the inclusion holds if is well-bounded.

An algebraic proof of the auxiliary lemma was given by Akimasa Morihata. The proof, to be discussed later, is quite interesting to me because it makes an unusual use of indirect equality. With the lemma, proof of the main result becomes rather routine:

min ≤ . ΛX   ⊆   ≤ . min ≤ . ΛD
≣   { since ΛD = dom ∈ . ΛD }
min ≤ . ΛX   ⊆   ≤ . min ≤ . dom ∈ . ΛD
⇐  { ≤/∋ . dom ∈ ⊆ ≤ . min ≤ . dom ∈, see below }
min ≤ . ΛX   ⊆   ≤/∋ . dom ∈ . ΛD
≣   { since ΛD = dom ∈ .  ΛD }
min ≤ . ΛX   ⊆   ≤/∋ . ΛD
≣   { since ΛD is a function, R/S . f = R/(f° . S) }
min ≤ . ΛX   ⊆   ≤/D°
≣   { Galois connection }
min ≤ . ΛX . D°   ⊆   ≤
⇐   { min ≤ . ΛX ⊆ ≤/X° }
≤/X°. D°   ⊆   ≤
⇐   { since  D ⊆ ≥ . X }
≤/X°. X° . ≤   ⊆   ≤
⇐   { division }
≤ . ≤   ⊆   ≤
≣   { ≤ transitive }
true

### Proof Using Enriched Indirect Equality

Now we have got to prove that ≤/∋ . dom ∈ ⊆ ≤ . min ≤ . dom ∈ provided that is well-bounded. To prove this lemma I had to resort to first-order logic. I passed the problem to Akimasa Morihata and he quickly came up with a proof. We start with some preparation:

≤/∋ . dom ∈ ⊆ ≤ . min ≤ . dom ∈
⇐   { since min ≤ ⊆ ∈ }
≤/(min ≤)° . dom ∈ ⊆ ≤ . min ≤ . dom ∈

And then we use proof by indirect (in)equality. The proof, however, is unusual in two ways. Firstly, we need the enriched indirect equality proposed by Dijkstra in
EWD 1315: Indirect equality enriched (and a proof by Netty). Typically, proof by indirect equality exploits the property:

x = y    ≡   (∀u. u ⊆ x ≡ u ⊆ y)

and also:

x ⊆ y   ≡   (∀u. u ⊆ x ⇒ u ⊆ y)

When we know that both x and y satisfy some predicate P, enriched indirect equality allows us to prove x = y (or x ⊆ y) by proving a weaker premise:

x = y   ≡   (∀u. P u ⇒ u ⊆ x ≡ u ⊆ y)

Note that both ≤/(min ≤)° . dom ∈ and ≤ . min ≤ . dom ∈ satisfy X = X . dom ∈. Later we will try to prove:

X ⊆ ≤/(min ≤)° . dom ∈    ⇒    X ⊆ ≤ . min ≤ . dom ∈

for X such that X = X . dom ∈.

The second unusual aspect is that rather than starting from one of X ⊆ ≤/(min ≤)° . dom ∈ or X ⊆ ≤ . min ≤ . dom ∈ and ending at another, Morihata’s proof took the goal as a whole and used rules like (P ⇒ Q) ⇒ (P ⇒ P ∧ Q). The proof goes:

(X ⊆ ≤/(min ≤)° . dom ∈  ⇒  X ⊆ ≤ . min ≤ . dom ∈)
⇐   { dom ∈  ⊆ id }
(X  ⊆ ≤/(min ≤)°  ⇒  X ⊆ ≤ . min ≤ . dom ∈)
≣    { Galois connection }
(X . (min ≤)° ⊆ ≤  ⇒  X ⊆ ≤ . min ≤ . dom ∈)
⇐   { (P ⇒ Q) ⇒ (P ⇒ P ∧ Q) }
(X . (min ≤)° ⊆ ≤  ⇒  X ⊆ X . (min ≤)° . min ≤ . dom ∈)
⇐   { R ∩ S ⊆ R  }
(X . (min ≤)° ⊆ ≤  ⇒  X ⊆ X . (((min ≤)° . min ≤) ∩ id) . dom ∈)
≣   { dom R = (R° . R) ∩ id }
(X . (min ≤)° ⊆ ≤  ⇒  X ⊆ X . dom (min ≤) . dom ∈)
≣   { ≤ well-bounded: dom ∈ = dom (min ≤) }
(X . (min ≤)° ⊆ ≤  ⇒  X ⊆ X . dom ∈ . dom ∈)
≣   { dom ∈ . dom ∈ = dom ∈ }
(X . (min ≤)° ⊆ ≤  ⇒  X ⊆ X . dom ∈)
≣   { X = X . dom ∈ }
(X . (min ≤)° ⊆ ≤  ⇒  true)
≣ true

### Auxiliary Proofs

Finally, this is a proof that the constraint ΛD = dom ∈ . ΛD is equivalent to D being total, that is id ⊆ D° . D. Recall that dom ∈ = ((∋ . ∈) ∩ id). We simplify dom ∈ . ΛD a bit:

dom ∈ . ΛD
= ((∋ . ∈) ∩ id) . ΛD
=   { ΛD a function }
(∋ . ∈ . ΛD) ∩ ΛD
=   { ∈ . ΛD = D }
(∋ . D) ∩ ΛD

We reason:

dom ∈ . ΛD = ΛD
≡   { R ∩ S = S iff S ⊆ R }
ΛD ⊆ ∋ . D
≡   { ΛD function, shunting }
id ⊆ (ΛD)° . ∋ . D
≡ id ⊆ D° . D

which is the definition of totality.

# The Windowing Technique for Longest Segment Problems

In the previous post we reviewed Hans Zantema’s algorithm for solving longest segment problems with suffix and overlap-closed predicates. For predicates that are not overlap-closed, Zantema derived a so-called “windowing” technique, which will be the topic of this post.

A brief review: the longest segment problem takes the form:

max# ∘ p ◁ ∘ segs

where segs :: [a] → [[a]], defined by segs = concat ∘ map inits ∘ tails returns all consecutive segments of the input list; p ◁ is an abbreviation for filter p, and max# :: [[a]] → [a] returns the longest list from the input list of lists. In words, the task is to compute the longest consecutive segment of the input that satisfies predicate p.

A predicate p is suffix-closed if p (xs ⧺ ys) ⇒ p ys. For suffix-closed p, Zantema proposed a technique that, from a high-level point of view, looks just like the right solution to such problems. We scan through the input list using a foldr from the right to the left, during which we try to maintain the longest segment satisfying p so far. Also, we keep a prefix of the list that is as long as the currently longest segment, which we call the window.

If, when we move one element to the right, the window (now one element longer than the currently longest segment) happens to satisfy p, it becomes the new optimal solution. Otherwise we drop the right-most element of the window so that it slides leftwards, retaining the length. Notice that it implies that we’d better represent the window using a queue, so we can efficiently add elements from the left and drop from the right.

Derivation of the algorithm is a typical case of tupling.

### Tupling

Given a function h, we attempt to compute it efficiently by turning it into a foldr. It would be possible if the value of the inductive case h (x : xs) were determined solely by x and h xs, that is:

h (x : xs) = f x (h xs)

for some f. With some investigation, however, it would turn out that h (x : xs) also depends on some g:

h (x : xs) = f x (g (x : xs)) (h xs)

Therefore, we instead try to construct their split ⟨ h , g ⟩ as a fold, where the split is defined by:

⟨ h , g ⟩ xs = (h xs, g xs)

and h = fst . ⟨ h , g ⟩.

If ⟨ h , g ⟩ is indeed a fold, it should scan through the list and construct a pair of a h-value and a g-value. To make it feasible, it is then hoped that g (x : xs) can be determined by g xs and h xs. Otherwise, we may have to repeat the process again, making the fold return a triple.

### Segment/Prefix Decomposition

Let us look into the longest segment problem. For suffix-closed p it is reasonable to assume that p [] is true — otherwise p would be false everywhere. Therefore, for the base case we have max# ∘ p ◁ ∘ segs ▪ [] = []. We denote function application by to avoid too many parentheses.

Now the inductive case. It is not hard to derive an alternative definition of segs:

segs [] = [[]]
segs (x : xs) = inits (x:xs) ⧺ segs xs

therefore, we derive:

max# ∘ p ◁ ∘ segs ▪ (x : xs)
= max# ∘ p ◁ ▪ (inits (x : xs) ⧺ segs xs)
= (max# ∘ p ◁ ∘ inits ▪ (x : xs)) ↑#
(max# ∘ p ◁ ∘ segs ▪ xs)

where xs ↑# ys returns the longer one between xs and ys.

It suggests that we maintain, by a foldr, a pair containing the longest segment and the longest prefix satisfying p (that is, max# ∘ p ◁ ∘ inits). It is then hoped that max# ∘ p ◁ ∘ inits ▪ (x : xs) can be computed using max# ∘ p ◁ ∘ inits ▪ xs. And luckily, it is indeed the case, implied by the following proposition proved in an earlier post:

Proposition 1: If p is suffix-closed, we have:

p ◁ ∘ inits ▪ (x : xs) = finits (max# ∘ p ◁ ∘ inits ▪ xs)

where finits ys = p ◁ ∘ inits ▪ (x : ys).

Proposition 1 says that the list (or set) of all the prefixes of x : xs that satisfies p can be computed by the longest prefix of xs (call it ys) satisfying p, provided that p is suffix-closed. A naive way to do so is simply by computing all the prefixes of x : ys and do the filtering again, as is done in finits.

This was the route taken in the previous post. It would turn out, however, to come up with an efficient implementation of f we need some more properties from p, such as that it is also overlap-closed.

### The “Window”

Proposition 1 can be strengthened: to compute all the prefixes of x : xs that satisfies p using finits we do not strictly have to start with ys. Any prefix of xs longer than ys will do.

Proposition 2: If p is suffix-closed, we have:

p ◁ ∘ inits ▪ (x : xs) = finits (take i xs)

where finits ys = p ◁ ∘ inits ▪ (x : ys), and i ≥ length ∘ max# ∘ p ◁ ∘ inits ▪ xs.

In particular, we may choose i to be the length of the longest segment:

Lemma 1:

length ∘ max# ∘ p ◁ ∘ segs ▪ xs ≥
length ∘ max# ∘ p ◁ ∘ inits ▪ xs

Appealing to intuition, Lemma 1 is true because segs xs is a superset of inits xs.

Remark: Zantema proved Proposition 1 by contradiction. The purpose of an earlier post was to give a constructive proof of Proposition 1, which was considerably harder than I expected. I’d be interested to see a constructive proof of Proposition 2.

Now we resume the reasoning:

max# ∘ p ◁ ∘ segs ▪ (x : xs)
= max# ∘ p ◁ ▪ (inits (x : xs) ⧺ segs xs)
= (max# ∘ p ◁ ∘ inits ▪ (x : xs)) ↑#
(max# ∘ p ◁ ∘ segs ▪ xs)
=   { Proposition 2 and Lemma 1 }
let s = max# ∘ p ◁ ∘ segs ▪ xs
in (max# ∘ finits ▪ (x : take (length s) xs)) ↑# s

Define window xs = take (length ∘ max# ∘ p ◁ ∘ segs ▪ xs) xs, the reasoning above suggest that we may try the following tupling:

max# ∘ p ◁ ∘ segs
= fst ∘ ⟨ max# ∘ p ◁ ∘ segs , window ⟩

### Maintaining the Longest Segment and the Window

The task now is to express ⟨ max# ∘ p ◁ ∘ segs , window ⟩ as a foldr. We can do so only if both max# ∘ p ◁ ∘ segs ▪ (x : xs) and window (x : xs) can be determined by max# ∘ p ◁ ∘ segs ▪ xs and window xs. Let us see whether it is the case.

#### Maintaining the Longest Segment

Regarding max# ∘ p ◁ ∘ segs ▪ (x : xs), we have derived:

max# ∘ p ◁ ∘ segs ▪ (x : xs)
=   { as shown above, let s = max# ∘ p ◁ ∘ segs ▪ xs }
(max# ∘ p ◁ ∘ inits ▪ (x : take (length s) xs)) ↑# s

Let s = max# ∘ p ◁ ∘ segs ▪ xs. We consider two cases:

1. Case p (x : take (length s) xs). We reason:
(max# ∘ p ◁ ∘ inits ▪ (x : take (length s) xs)) ↑# s
=   { see below }
(x : take (length s) xs) ↑# s
=   { since the LHS is one element longer than the RHS }
x : take (length s) xs
=   { definition of window }
x : window xs

The first step is correct because, for all zs, p zs implies that max# ∘ p ◁ ∘ inits ▪ zs = zs.

2. Case ¬ p (x : take (length s) xs). In this case (max# ∘ p ◁ ∘ inits ▪ (x : take (length s) xs)) ↑# s must be s, since ¬ p zs implies that length∘ max# ∘ p ◁ ∘ inits ▪ zs < length zs.

#### Maintaining the Window

Now consider the window. Also, we do a case analysis:

1. Case p (x : take (length s) xs). We reason:
window (x : xs)
= take (length ∘ max# ∘ p ◁ ∘ segs ▪ (x : xs)) (x : xs)
=   { by the reasoning above }
take (length (x : take (length s) xs)) (x : xs)
=   { take and length }
x : take (length (take (length s)) xs) xs
=   { take and length }
x : take (length s) xs
= x : window xs

2. Case ¬ p (x : take (length s) xs). We reason:
window (x : xs)
= take (length ∘ max# ∘ p ◁ ∘ segs ▪ (x : xs)) (x : xs)
=   { by the reasoning above }
take (length s) (x : xs)
=   { take and length }
x : take (length (s-1)) xs
= x: init (window xs)

#### The Algorithm

In summary, the reasoning above shows that

⟨ max# ∘ p ◁ ∘ segs , window ⟩ = foldr step ([], [])

where step is given by

step x (s, w) | p (x : w) = (x : w, x : w)
| otherwise = (s, x : init w)

As is typical of many program derivations, after much work we deliver an algorithm that appears to be rather simple. The key invariants that made the algorithm correct, such as that s is the optimal segment and that w is as long as s, are all left implicit. It would be hard to prove the correctness of the algorithm without these clues.

We are not quite done yet. The window w had better be implemented using a queue, so that init w can be performed efficiently. The algorithm then runs in time linear to the length of the input list, provided that p (x : w) can be performed in constant time -- which is usually not the case for interesting predicates. We may then again tuple the fold with some information that helps to compute p efficiently. But I shall stop here.

# Longest Segment Satisfying Suffix and Overlap-Closed Predicates

I spent most of the week preparing for the lecture on Monday, in which we plan to talk about segment problems. One of the things we would like to do is to translate the derivations in Hans Zantema’s Longest Segment Problems to Bird-Meertens style. Here is a summary of the part I managed to do this week.

Zantema’s paper considered problems of this form:

max# ∘ p ◁ ∘ segs

where segs :: [a] → [[a]], defined by segs = concat ∘ map inits ∘ tails returns all consecutive segments of the input list; p ◁ is a shorter notation for filter p, and max# :: [[a]] → [a] returns the longest list from the input list of lists. In words, the task is to compute the longest consecutive segment of the input that satisfies predicate p.

Of course, we have to assume certain properties from the predicate p. A predicate p is:

• suffix-closed, if p (xs ⧺ ys) ⇒ p ys;
• overlap-closed, if p (xs ⧺ ys) ∧ p (ys ⧺ zs) ∧ ys ≠ [] ⇒ p~(xs ⧺ ys ⧺ zs).

For example, ascending is suffix and overlapping-closed, while p xs = (all (0 ≤) xs) ∧ (sum xs ≤ C) for some constant C is suffix-closed but not overlap-closed. Note that for suffix-closed p, it is reasonable to assume that p [] is true, otherwise p would be false everywhere. It also saves us some trouble being sure that max# is always applied to a non-empty set.

I denote function application by an infix operator that binds looser than function composition but tighter than other binary operators. Therefore f ∘ g ∘ h ▪ x means f (g (h x)).

### Prefix/Suffix Decomposition

Let us begin with the usual prefix/suffix decomposition:

max# ∘ p ◁ ∘ segs
= max# ∘ p ◁ ∘ concat ∘ map inits ∘ tails
= max# ∘ concat ∘ map (p ◁) ∘ map inits ∘ tails
= max# ∘ map (max# ∘ p ◁ ∘ inits) ∘ tails

Like what we do with the classical maximum segment sum, if we can somehow turn max# ∘ p ◁ ∘ inits into a fold, we can then implement map (foldr ...) ∘ tails by a scanr. Let us denote max# ∘ p ◁ ∘ inits by mpi.

If you believe in structural recursion, you may attempt to fuse map# ∘ p ◁ into inits by fold-fusion. Unfortunately, it does not work this way! In the fold-fusion theorem:

h ∘ foldr f e = foldr g (h e)   ⇐   h (f x y) = g x (h y)

notice that x and y are universally quantified, which is too strong for this case. Many properties we have, to be shown later, do need information from the context — e.g. some properties are true only if y is the result of inits.

### Trimming Unneeded Prefixes

One of the crucial properties we need is the following:

Proposition 1: If p is suffix-closed, we have:

p ◁ ∘ inits ▪ (x : xs) =
p ◁ ∘ inits ▪ (x : max# ∘ p ◁ ∘ inits ▪ xs)

For some intuition, let x = 1 and xs = [2,3,4]. The left-hand side first compute all prefixes of xs:

[] [2] [2,3] [2,3,4]

before filtering them. Let us assume that only [] and [2,3] pass the check p. We then pick the longest one, [2,3], cons it with 1, and compute all its prefixes:

[] [1] [1,2] [1,2,3]

before filtering them with p again.

The right-hand side, on the other hand, performs filtering on all prefixes of [1,2,3,4]. However, the proposition says that it is the same as the left-hand side — filtering on the prefixes of [1,2,3] only. We lose nothing if we drop [1,2,3,4]. Indeed, since p is suffix-closed, if p [1,2,3,4] were true, p [2,3,4] would have been true on the right-hand side.

Proof of Proposition 1 was the topic of a previous post. The proposition is useful for us because:

mpi (x : xs)
= max# ∘ p ◁ ∘ inits ▪ (x : xs)
=    { Proposition 1 }
max# ∘ p ◁ ∘ inits ▪ (x : max# ∘ p ◁ ∘ inits ▪ xs)
= mpi (x : mpi xs)

Therefore mpi is a fold!

mpi = foldr (λx ys → mpi (x : ys)) []

### Refining the Step Function

We still have to refine the step function λx ys → mpi (x : ys) to something more efficient. Luckily, for overlap-closed p, mpi (x : ys) is either [], [x], or x : ys — if ys is the result of mpi.

Proposition 2: If p is overlap-closed, mpi (x : mpi xs) = x ⊙ xs, where is defined by:

x ⊙ ys | p (x : xs) = x : ys
| p [x] = [x]
| otherwise = []

To see why Proposition 2 is true, consider mpi (x : mpi xs).

• If mpi (x : mpi xs) = [], we are done.
• Otherwise it must be x : zs for some zs ∈ inits (mpi xs). And we have p~(x : zs) because it is a result of mpi. Again consider two cases:
• If zs = [], both sides reduce to [x], otherwise…
• Let us not forget that p (mpi xs) must be true. Also, since zs ∈ inits (mpi xs), we have mpi xs = zs ⧺ ws for some ws. Together that implies p (x : zs ⧺ ws) = p (x : mpi xs) must be true.

Notice that the reasoning above (from Zantema) is a proof-by-contradiction. I do not yet know how hard it is to build a constructive proof.

With Proposition 1 and 2 we have turned mpi to a fold. That leads to the derivation:

max# ∘ p ◁ ∘ segs
=   { derivation above }
max# ∘ map (max# ∘ p ◁ ∘ inits) ∘ tails
= max# ∘ map (foldr (⊙) []) ∘ tails
= max# ∘ scanr (⊙) []

with the definition of given above. It turns out to be a rather simple algorithm: we scan through the list, and in each step we choose among three outcomes: [], [x], and x : ys. Like the maximum segment sum problem, it is a simple algorithm whose correctness is that that easy to justify.

The algorithm would be linear-time if can be computed in constant-time. With the presence of p in , however, it is unlikely the case.

### Efficient Testing

So let us compute, during the fold, something that allows p to be determined efficiently. Assume that there exists some φ :: [A] → B that is a fold (φ = foldr ⊗ ι for some and ι), such that p (x : xs) = p xs ∧ f x (φ xs) for some f. Some example choices of φ and f:

• p = ascending. We may pick:

φ xs = if null xs then Nothing else Just (head xs)
f x Nothing = true
f x (Just y) = x ≤ y
• p xs = all elements in xs equal modolu 3. We may pick:
φ xs = if null xs then Nothing else Just (head xs `mod` 3)
f x Nothing = true
f x (Just y) = x `mod`3 == y

Let us tuple mpi with φ, and turn them into one fold. Let ⟨ f , g ⟩ x = (f x, g x), we derive:

max# ∘ p ◁ ∘ inits
=   { f = fst ∘  ⟨ f , g ⟩, see below}
fst ∘ ⟨ max# ∘ p ◁ ∘ inits , φ ⟩
= fst ∘ foldr step ([], ι)

where step is given by

step x (xs, b)
| f x b = (x : xs , x ⊗ b)
| f x ι = ([x], x ⊗ ι)
| otherwise = ([], ι)

Notice that the property f = fst ∘ ⟨ f , g ⟩ is true when the domain of f is in the domain of g, in particular, when they are both total, which again shows why we prefer to work in a semantics with total functions only.

Let us restart the main derivation again, this time use the tupling:

max# ∘ p ◁ ∘ segs
= max# ∘ map (max# ∘ p ◁ ∘ inits) ∘ tails
= max# ∘ map (fst ∘ ⟨ max# ∘ p ◁ ∘ inits , φ ⟩) ∘ tails
=   { since map# ∘ map fst = fst ∘ map#', see below}
fst ∘ max#' ∘ map ⟨ max# ∘ p ◁ ∘ inits , φ ⟩ ∘ tails
=   { derivation above }
fst ∘ max#' ∘ map (foldr step ([], ι) ∘ tails
= fst ∘ max#' ∘ scanr step ([], ι)

where map#' compares the lengths of the first components. This is a linear-time algorithm.

### Next… Windowing?

What if p is not overlap-closed? Zantema used a technique called windowing, which I will defer to next time…

# On a Basic Property for the Longest Prefix Problem

In the Software Construction course next week we will, inevitably, talk about maximum segment sum. A natural next step is to continue with the theme of segment problems, which doesn’t feel complete without mentioning Hans Zantema’s Longest Segment Problems.

The paper deals with problem of this form:

ls = max# ∘ p ◁ ∘ segs

That is, computing the longest consecutive segment of the input list that satisfies predicate p. When writing on paper I found it much easier denoting filter p by the Bird-Meertens style p ◁ and I will use the latter for this post too. The function segs :: [a] → [[a]], defined by segs = concat ∘ map inits ∘ tails returns all consecutive segments of the input list, and max# :: [[a]] → [a] returns the longest list from the input list of lists. To avoid long nesting parenthesis, I denote function application by an infix operator that binds looser than function composition . Therefore f ∘ g ∘ h ▪ x means f (g (h x)).

Standard transformation turns the specification to the form

ls = max# ∘ map (max# ∘ p ◁ ∘ inits) ∘ tails

Therefore we may solve ls if we manage to solve its sub-problem on prefixes:

lp = max# ∘ p ◁ ∘ inits

that is, computing the longest prefix of the input list satisfying predicate p. One of the key propositions in the paper says:

Proposition 1: If p is suffix-closed (that is, p (x ⧺ y) ⇒ p y), we have:

p ◁ ∘ inits ▪ (a : x) =
p ◁ ∘ inits ▪ (a : max# ∘ p ◁ ∘ inits ▪ x)

It is useful because, by composing max# on both sides we get

lp (a : x) = max# ∘ p ◁ ∘ inits ▪ (a : lp x)

that is, lp can be computed by a foldr.

Of course, we are not quite done yet. We then have to somehow simplify p ◁ ∘ inits ▪ (a : lp x) to something more efficient. Before we move on, however, proving Proposition 1 turns out to be an interesting challenge in itself.

### Intuition

What does Proposition 1 actually say? Let x = [1,2,3] and a = 0. On the left-hand side, we are performing p ◁ on

[] [0] [0,1] [0,1,2] [0,1,2,3]

The right hand side says that we may first filter the tails of [1,2,3]:

[] [1] [1,2] [1,2,3]

Assuming that only [] and [1,2] get chosen. We may then keep the longest prefix [1,2] only, generate all its prefixes (which would be [] [1] [1,2]), and filter the latter again. In other words, we lost no information dropping [1,2,3] if it fails predicate p, since by suffix-closure, p ([0] ⧺ [1,2,3]) ⇒ p [1,2,3]. If [1,2,3] doesn’t pass p, p [0,1,2,3] cannot be true either.

Zantema has a nice and brief proof of Proposition 1 by contradiction. However, the theme of this course has mainly focused on proof by induction and, to keep the possibility of someday encoding our derivations in tools like Coq or Agda, we would like to have a constructive proof.

So, is it possible to prove Proposition 1 in a constructive manner?

### The Proof

I managed to come up with a proof. I’d be happy to know if there is a better way, however.

For brevity, I denote if p then x else y by p → x; y. Also, define

a ⊕p x = p a → a : x ; x

Therefore p ◁ is defined by

p ◁ = foldr ⊕p []

Here comes the the main proof:

Proposition 1

p ◁ ∘ inits ▪ (a : x) = p ◁ ∘ inits ▪ (a : max# ∘ p ◁ ∘ inits ▪ x)

if p is suffix-closed.
Proof.

=      { definition of inits }
p ◁ ([] : map (a :) ∘ inits ∘ max# ∘ p ◁ ∘ inits ▪ x)
=      { definition of p ◁ }
[] ⊕p (p ◁ ∘ map (a :) ∘ inits ∘ max# ∘ p ◁ ∘ inits ▪ x)
=      { Lemma 1 }
[] ⊕p (p ◁ ∘ map (a :) ∘ p ◁ ∘ inits ∘ max# ∘ p ◁ ∘ inits ▪ x)
=      { Lemma 2 }
[] ⊕p (p ◁ ∘ map (a :) ∘ p ◁ ∘ inits ▪ x)
=      { Lemma 1 }
[] ⊕p (p ◁ ∘ map (a :) ∘ inits ▪ x)
=      { definition of p ◁ }
p ◁ ([] : map (a :) ∘ inits ▪ x)
=      { definition of inits }
p ◁ ∘ inits ▪ (a : x)

The main proof refers to two “decomposition” lemmas, both are of the form f ∘ g = f ∘ g ∘ f:

• Lemma 1: p ◁ ∘ map (a:) = p ◁ ∘ map (a:) ∘ p ◁ if p suffix-closed.
• Lemma 2: p ◁ ∘ inits ∘ max# ∘ p ◁ ∘ inits = p ◁ ∘ inits for all predicate p.

Both are proved by structural induction. For Lemma 1 we need the conditional distribution rule:

f (p →  x; y) = (p →  f x; f y)

If we are working in CPO we need the side condition that f is strict, which is true for the cases below anyway:
Lemma 1

p ◁ ∘ map (a:) =  p ◁ ∘ map (a:) ∘ p ◁

if p is suffix-closed.
Proof. Structural induction on the input.
Case []: trivial.
Case (x : xs):

p ◁ ∘ map (a:) ∘ p ◁ ▪ (x : xs)
=   { definition of p ◁ }
p ◁ ∘ map (a:) ▪ (p x →  x : p ◁ xs ; p ◁ xs)
=   { map distributes into conditionals }
p ◁ ▪ (p x → (a : x) : map (a :) ∘ p ◁ ▪ xs ; map (a :) ∘ p ◁ ▪ xs)
=   { p ◁ distributes into conditionals }
p x → p ◁ ((a : x) : map (a :) ∘ p ◁ ▪ xs) ;
p ◁ ∘ map (a :) .p ◁ ▪ xs
=   { definition of p ◁ }
p x → (p (a : x) → (a : x) : p ◁ ∘ map (a :) ∘ p ◁ ▪ xs) ;
p ◁ ∘ map (a :) ∘ p ◁ ▪ xs) ;
p ◁ ∘ map (a :) ∘ p ◁ ▪ xs
=   { induction }
p x → (p (a : x) → (a : x) : p ◁ ∘ map (a :) ▪ xs) ;
p ◁ ∘ map (a :) ▪ xs) ;
p ◁ ∘ map (a :) ▪ xs
=   { since p (a : x) ⇒ p x by suffix closure }
p (a : x) → (a : x) : p ◁ ∘ map (a :) ▪ xs) ;
p ◁ ∘ map (a :) ▪ xs
=   { definition of p ◁ }
p ◁ ((a : x) : map (a :) xs)
=   { definition of map }
p ◁ ∘ map (a :) ▪ (x : xs)

For Lemma 2, it is important that p is universally quantified. We need the following map-filter exchange rule:

p ◁ ∘ map (a :) =  map (a :) ∘ (p ∘ (a:)) ◁

The proof goes:
Lemma 2 For all predicate p we have

p ◁ ∘ inits ∘ max# ∘ p ◁ ∘ inits = p ◁ ∘ inits

Proof. Structural induction on the input.
Case []: trivial.
Case (a : x):

p ◁ ∘ inits ∘ max# ∘ p ◁ ∘ inits ▪ (a : x)
= p ◁ ∘ inits ∘ max# ∘ p ◁ ▪ ([] : map (a :) (inits x))

Consider two cases:
1. Case p [] ∧ null (p ◁ ∘ map (a :) ∘ inits ▪ x):
If ¬ p [], both sides are undefined. Otherwise:

...
= p ◁ ∘ inits ∘ max# ▪ []
= []
= p ◁ ▪ ([] : p ◁ ∘ map (a : ) ∘ inits ▪ x)
= p ◁ ∘ inits ▪ (a : x)

2. Case ¬ (null (p ◁ ∘ map (a :) ∘ inits ▪ x)):

...
= p ◁ ∘ inits ∘ max# ∘ p ◁ ∘ map (a :) ∘ inits ▪ x
=   { map-filter exchange }
p ◁ ∘ inits ∘ max# ∘ map (a :) ∘ (p ∘ (a:)) ◁ ∘ inits ▪ x
=   { since  max# ∘ map (a :) =  (a :) ∘ max# }
p ◁ ∘ inits ∘ (a :) ∘ max# ∘ (p ∘ (a :)) ◁ ∘ inits ▪ x
=   { definition of inits }
p ◁ ([] : map (a :) ∘ inits ∘  max# ∘ (p ∘ (a :)) ◁ ∘ inits ▪ x)
=   { definition of p ◁ }
p ⊕p (p ◁ ∘ map (a :) ∘ inits ∘  max# ∘ (p ∘ (a :)) ◁ ∘ inits ▪ x)
=   { map-filter exchange }
p ⊕p (map (a :) ∘ (p ∘ (a :)) ◁ ∘ inits ∘  max# ∘ (p ∘ (a :)) ◁ ∘ inits ▪ x)
=   { induction }
p ⊕p (map (a :) ∘ (p ∘ (a :)) ◁ ∘ inits ▪ x)
=   { map-filter exchange }
p ⊕p (p ◁ ∘ map (a :) ∘ inits ▪ x)
=   { definition of p ◁ }
p ◁ ( [] :  map (a :) ∘ inits ▪ x)
=   { definition of inits }
p ◁ ∘ inits ▪ (a : x)

# Algebra of programming in Agda: dependent types for relational program derivation

S-C. Mu, H-S. Ko, and P. Jansson. In Journal of Functional Programming, Vol. 19(5), pp. 545-579. Sep. 2009 [PDF]

Relational program derivation is the technique of stepwise refining a relational specification to a program by algebraic rules. The program thus obtained is correct by construction. Meanwhile, dependent type theory is rich enough to express various correctness properties to be verified by the type checker.

We have developed a library, AoPA, to encode relational derivations in the dependently typed programming language Agda. A program is coupled with an algebraic derivation whose correctness is guaranteed by the type system.

Two non-trivial examples are presented: an optimisation problem, and a derivation of quicksort where well-founded recursion is used to model terminating hylomorphisms in a language with inductive types.

This article extends the paper we published in Mathematics of Program Construction 2008. Code accompanying the paper has been developed into an Agda library AoPA.

# AoPA — Algebra of Programming in Agda

2015/03/29: AoPA is on GitHub now, updated to work with Agda 2.4.2.2 and Standard Library 0.9.

An Agda library accompanying the paper Algebra of Programming in Agda: Dependent Types for Relational Program Derivation, developed in co-operation with Hsiang-Shang Ko and Patrik Jansson.

Dependent type theory is rich enough to express that a program satisfies an input/output relational specification, but it could be hard to construct the proof term. On the other hand, squiggolists know very well how to show that one relation is included in another by algebraic reasoning. The AoPA library allows one to encode Algebra of Programming style program derivation, both functional and relational, in Agda.

### Example

The following is a derivation of insertion sort in progress:

isort-der : ∃ (\f → ordered? ○ permute ⊒ fun f )
isort-der = (_ , (
⊒-begin
ordered? ○ permute
⊒⟨ (\vs -> ·-monotonic ordered? (permute-is-fold vs)) ⟩
ordered? ○ foldR combine nil
⊒⟨ foldR-fusion ordered? ins-step ins-base ⟩
foldR (fun (uncurry insert)) nil
⊒⟨ { foldR-to-foldr insert []}0
{ fun (foldr insert [])
⊒∎ }1))

isort : [ Val ] -> [ Val ]
isort = proj₁ isort-der

The type of isort-der is a proposition that there exists a function f that is contained in ordered ? ◦ permute , a relation mapping a list to one of its ordered permutations. The proof proceeds by derivation from the speciﬁcation towards the algorithm. The ﬁrst step exploits monotonicity of and that permute can be expressed as a fold. The second step makes use of relational fold fusion. The shaded areas denote interaction points — fragments of (proof ) code to be completed. The programmer can query Agda for the expected type and the context of the shaded expression. When the proof is completed, an algorithm isort is obtained by extracting the witness of the proposition. It is an executable program that is backed by the type system to meet the speciﬁcation.

The complete program is in the Example directory of the code.

### The Code

The code consists of the following files and folders:

• AlgebraicReasoning: a number of modules supporting algebraic reasoning. At present we implement our own because the PreorderReasoning module in earlier versions of the Standard Library was not expressive enough for our need. We may adapt to the new Standard Library later.
• Data: defining relational fold, unfold, hylomorphism (using well-founded recursion), the greedy theorem, and the converse-of-a-function theorem, etc, for list and binary tree.
• Examples: currently we have prepared four examples: a functional derivation of the maximum segment sum problem, a relational derivation of insertion sort and quicksort (following the paper Functional Algorithm Design by Richard Bird), and solving an optimisation problem using the greedy theorem.
• Relations: modules defining various properties of relations.
• Sets: a simple encoding of sets, upon with Relations are built.

# Maximum segment sum is back: deriving algorithms for two segment problems with bounded lengths

S-C. Mu. In Partial Evaluation and Program Manipulation (PEPM ’08), pp 31-39. January 2008. (20/74) [PDF] [GZipped Postscript]

It may be surprising that variations of the maximum segment sum (MSS) problem, a textbook example for the squiggolists, are still active topics for algorithm designers. In this paper we examine the new developments from the view of relational program calculation. It turns out that, while the classical MSS problem is solved by the Greedy Theorem, by applying the Thinning Theorem, we get a linear-time algorithm for MSS with upper bound on length.

To derive a linear-time algorithm for the maximum segment density problem, on the other hand, we purpose a variation of thinning based on an extended notion of monotonicity. The concepts of left-negative and right-screw segments emerge from the search for monotonicity conditions. The efficiency of the resulting algorithms crucially relies on exploiting properties of the set of partial solutions and design efficient data structures for them.

# Maximum Segment Sum, Agda Approved

To practise using the Logic.ChainReasoning module in Agda, Josh proved the foldr-fusion theorem, which he learnt in the program derivation lecture in FLOLAC where we used the maximum segment sum (MSS) as the main example. Seeing his proof, I was curious to know how much program derivation I can do in Agda and tried coding the entire derivation of MSS. I thought it would be a small exercise I could do over the weekend, but ended up spending the entire week.

As a reminder, given a list of (possibly negative) numbers, the MSS is about computing the maximum sum among all its consecutive segments. Typically, the specification is:

mss = max ○ map⁺ sum ○ segs

where segs = concat⁺ ○ map⁺ inits ○ tails computes all segments of a list.

A dependent pair is defined by:

data _∣_ (A : Set) (P : A -> Set) : Set where
sub : (x : A) -> P x -> A ∣ P

such that sub x p is a pair where the type of the second component p depends on the value of the first component x. The idea is to use a dependent pair to represent a derivation:

mss-der : (x : List Val) -> (Val ∣ \m -> (max ○ map⁺ sum ○ segs) x ≡ m)
mss-der x =
sub ?
(chain> (max ○ map⁺ sum ○ segs) x
=== ?)

It says that mss-der is a function taking a list x and returns a value of type Val, with the constraint that the value returned must be equal to (max ○ map⁺ sum ○ segs) x.

My wish was to use the interactive mechanism of the Agda Emacs mode to “derive” the parts marked by ?, until we come to an implementation:

mss-der : (x : List Val) -> (Val ∣ \m -> (max ○ map⁺ sum ○ segs) x ≡ m)
mss-der x =
sub RESULT
(chain> (max ○ map⁺ sum ○ segs) x
=== ...
=== ...
=== RESULT)

If it works well, we can probably use Agda as a tool for program derivation!

Currently, however, I find it harder to use than expected, perhaps due to my being unfamiliar with the way Agda reports type errors. Nevertheless, Agda does force me to make every details right. For example, the usual definition of max I would use in a paper would be:

max = foldr _↑_ -∞

But then I would have to define numbers with lower bound -∞. A sloppy alternative definition:

max = foldr _↑_ 0

led me to prove a base case 0 ↑ max x ≡ max x, which is not true. That the definition does work in practice relies on the fact that segs always returns empty list as one of the possible segment. Alternatively, I could define max on non-empty lists only:

max : List⁺ Val -> Val
max = foldr⁺ _↑_ id

where List⁺ A is defined by:

data List⁺ (A : Set) : Set where
[_]⁺ : A -> List⁺ A
_::⁺_ : A -> List⁺ A -> List⁺ A

and refine the types of inits, tails, etc, to return non-empty lists. This is the approach I took.

Eventually, I was able to give a derivation of mss this way:

mss-der : (x : List Val) -> (Val ∣ \m -> (max ○ map⁺ sum ○ segs) x ≡ m)
mss-der x =
sub ((max ○ scanr _⊗_ ø) x)
(chain> (max ○ map⁺ sum ○ segs) x
=== (max ○ map⁺ sum ○ concat⁺ ○ map⁺ inits ○ tails) x
by refl
=== (max ○ concat⁺ ○ map⁺ (map⁺ sum) ○ map⁺ inits ○ tails) x
by cong max (sym (concat⁺-map⁺ ((map⁺ inits ○ tails) x)))
=== (max ○ map⁺ max ○ map⁺ (map⁺ sum) ○ map⁺ inits ○ tails) x
by max-concat⁺ ((map⁺ (map⁺ sum) ○ map⁺ inits ○ tails) x)
=== (max ○ map⁺ max ○ map⁺ (map⁺ sum ○ inits) ○ tails) x
by cong (\xs -> max (map⁺ max xs)) (sym (map⁺-compose (tails x)))
=== (max ○ map⁺ (max ○ map⁺ sum ○ inits) ○ tails) x
by cong max (sym (map⁺-compose (tails x)))
=== (max ○ map⁺ (foldr _⊗_ ø) ○ tails) x
by cong max (map⁺-eq max-sum-prefix (tails x))
=== (max ○ scanr _⊗_ ø) x
by cong max (scanr-pf x) )
where _⊕_ : Val -> List⁺ Val -> List⁺ Val
a ⊕ y = ø ::⁺ map⁺ (_+_ a) y

_⊗_ : Val -> Val -> Val
a ⊗ b = ø ↑ (a + b)

where max-sum-prefix consists of two fold fusion:

max-sum-prefix : (x : List Val) -> max (map⁺ sum (inits x)) ≡ foldr _⊗_ ø
max-sum-prefix x =
chain> max (map⁺ sum (inits x))
=== max (foldr _⊕_ [ ø ]⁺ x)
by cong max (foldr-fusion (map⁺ sum) lemma1 x)
=== foldr _⊗_ ø x
by foldr-fusion max lemma2 x
where lemma1 : (a : Val) -> (xs : List⁺ (List Val)) ->
map⁺ sum (ini a xs) ≡ (a ⊕ (map⁺ sum xs))
lemma1 a xs = ?
lemma2 : (a : Val) -> (x : List⁺ Val) ->
max (a ⊕ x) ≡ (a ⊗ max x)
lemma2 a x = ?

The two lemmas are given in the files attached below. Having the derivation, mss is given by:

mss : List Val -> Val
mss x = depfst (mss-der x)

it is an executable program that is proved to be correct.

The complete Agda program is split into five files:

• MSS.agda: the main program importing all the sub-modules.
• ListProperties.agda: some properties I need about lists, such as fold fusion, concat ○ map (map f) = map f ○ concat, etc. Later in the development I realised that I should switch to non-empty lists, so not all of the properties here are used.
• List+.agda: non-empty lists and some of its properties.
• Derivation.agda: the main derivation of MSS. The derivation is parameterised over any numerical data and operators + and such that + is associative, and a + (b ↑ c) = (a ↑ b) + (a ↑ c). The reason of this parameterisation, however, was that I did not know how to prove the properties above, until Josh showed me the proof.
• IntRNG.agda: proofs from Josh that Data.Int actually satisfy the properties above. (Not quite complete yet.)

# The Pruning Theorem: Thinning Based on a Loose Notion of Monotonicity

The reason I studied the thinning theorem again is because I need a slightly generalised variation. The following seems to be what I need. The general idea and the term “pruning” emerged from discussion with Sharon Curtis. The term “lax preorder” is invented by myself. I am not good at naming, and welcome suggestions for better names.

The notation below are mostly taken from the book Algebra of Programming. Not many people, even among functional programmers, are familiar with these notations involving relational intersection, division, etc. One starts to appreciate their benefits once he/she gets used to using their calculation rules. Most of the time when I was doing the proof, I was merely manipulating the symbols. I could not have managed the complexity if I had to fall back to the semantics and think about what they “mean” all the time.

A relation Q :: PA ← A between a set of A and an element is called a lax preorder if it is

1. reflexive in the sense that ∋ ⊆ Q, and
2. transitive in the sense that (Q/∋) . Q ⊆ Q.

A relation S :: A ← FA is monotonic on lax preorder Q if S . FQ˘ ⊆ Q˘. Λ(S . F∈).

Given a lax preorder, we define:

$prune Q = \in \\in \cap Q/\ni$

The definition induces the universal property:

Any preorder R induces a lax preorder ∋ . R. If a relation S is monotonic on , it is monotonic on lax preorder ∋ . R. Furthermore, prune (∋ . R) = thin R. Therefore, pruning is a generalisation of thinning. We need the notion of lax preorders because, for some problems, the generating relation S is monotonic on a lax preorder, but not a preorder.

Theorem: if S is monotonic on lax preorder Q, we have:

$fold \left(prune Q . \Lambda \left(S . F\in \right)\right) \subseteq prune Q . \Lambda \left(fold S\right)$

Proof. Since Λ(fold S) = fold (Λ(S . F∈)), by fold fusion, the theorem holds if

$prune Q . \Lambda \left(S . F\in \right) . F\left(prune Q\right) \subseteq prune Q . \Lambda \left(S . F\in \right)$

By the universal property of prune, the above is equivalent to:

prune Q . Λ(S . F∈) . F(prune Q) . (S . F∈)˘ ⊆ Q

The first inclusion is proved by:

⊆     { since prune Q ⊆ ∈\∈ }
∈ . ∈\∈ . Λ(S . F∈) . F(thin Q)
⊆     { division }
∈ . Λ(S . F∈) . F(thin Q)
=     { power transpose }
S . F∈ . F(thin Q)
⊆     { since prune Q ⊆ ∈\∈ }
S . F∈ . F(∈\∈)
⊆     { division }
S . F∈

And the second by:

⊆     { since prune Q ⊆ Q/∋, converse }
prune Q . Λ(S . F∈) . F(Q/∋) . F∋ . S˘
⊆     { division }
prune Q . Λ(S . F∈) . FQ . S˘
⊆     { monotonicity: FQ . S˘⊆ Λ(S . F∈)˘. Q }
prune Q . Λ(S . F∈) . Λ(S . F∈)˘. Q
⊆     { since Λ(S . F∈)˘is a function, that is, f . f˘⊆ id }
prune Q . Q
⊆     { since thin Q ⊆ Q/∋, division }
Q/∋ . Q
⊆     { since Q transitive }
Q

Endproof.

The proof above uses transitivity of Q but not reflectivity. I need reflectivity to construct base cases, for example, to come up wit this specialised Pruning Theorem for lists:

$foldr \left(prune Q . \Lambda \left(S . \left(id × \in \right)\right)\right) \left\{e\right\} \subseteq prune Q . \Lambda \left(foldr S e\right)$

if S . (id × Q˘) ⊆ Q˘. Λ(S . (id × ∈)).

# Proving the Thinning Theorem by Fold Fusion

Algebra of Programming records two proofs of the greedy and the thinning theorems slightly shorter than proofs using fold fusion. Of course, one can still use fold fusion. In fact, proving them by fold fusion are exercises in Chapter 8 (PDF) of Algebraic and Coalgebraic Methods in the Mathematics of Program Construction, of which I myself is listed among the authors.

A while ago when I needed to consider some variations of the thinning theorem I tried to do the proof again. And, horrifyingly, I could not do it anymore! Have my skills become rusty due to lack of practice in the past few years? In panic, I spent an entire afternoon fighting with it, until I realised that it was just a typical copying error from the very beginning: when I copied a property from the book I put in an extra Λ. Then I trapped myself in the maze of expanding ΛR into ∈\R ∩ (R\∈) and using modular law and ….

Having fixed the error, I get my trivial and easy proof back again. Anyway, I am going to record it below, in case I run into the same panic again.

Given a preorder Q, the relation thin Q is defined by:

$thin Q = \in \\in \cap \left(\ni . Q\right)/\ni$

The definition induces the universal property:

And here are some basic properties we will make use of later:

∈ . ΛS = S       (power transpose)
ΛR . R˘ ⊆ ∋
R . R\S ⊆ S,       R/S . S ⊆ R       (division)

### The Thinning Theorem

The thinning theorem says :
Theorem: if S is monotonic on preorder Q, that is, S . FQ˘⊆ Q˘. S, we have:

$fold \left(thin Q . \Lambda \left(S . F\in \right)\right) \subseteq thin Q . \Lambda \left(fold S\right)$

Proof. By fold fusion, the theorem holds if

$thin Q . \Lambda \left(S . F\in \right) . F\left(thin Q\right) \subseteq thin Q . \Lambda \left(S . F\in \right)$

By the universal property of thin, the above inclusion is equivalent to

thin Q . Λ(S . F∈) . F(thin Q) . (S . F∈)˘ ⊆ ∋.Q

The first inclusion is proved by:

{ since thin Q ⊆ ∈\∈ }
∈ . ∈\∈ . Λ(S . F∈) . F(thin Q)
{ division }
∈ . Λ(S . F∈) . F(thin Q)
= { power transpose }
S . F∈ . F(thin Q)
{ since thin Q ⊆ ∈\∈ }
S . F∈ . F(∈\∈)
{ division }
S . F∈

And the second by:

{ since thin Q ⊆ (∋ . Q)/∋, converse }
thin Q . Λ(S . F∈) . F((∋ . Q)/∋) . F∋ . S˘
{ functor, division }
thin Q . Λ(S . F∈) . F(∋ . Q) . S˘
{ monotonicity: FQ . S˘ ⊆ S˘. Q }
thin Q . Λ(S . F∈) . F∋ . S˘. Q
{ since ΛR . R ⊆ ∋ }
thin Q . ∋ . Q
{ since thin Q ⊆ (∋.Q)/∋, division }
∋ . Q . Q
{ since Q transitive }
∋ . Q

Endproof.

By the way, the variation of thinning theorem I need is “fold (thin Q . Λ(S . F∈)) ⊆ thin Q . Λ(fold S) if S . F(Q˘. ∈) ⊆ Q˘. S . F ∈ “, whose proof is, luckily, trivial once you write down the original proof.