# Author Archives: Shin

As of Feb 2006 I have started working as an assistant research fellow in Academia Sinica, Taiwan.

# Evaluating Simple Polynomials

In the end of FLOLAC ’10 I had a chance to show the students, in 25 minutes, what functional program calculation is about. The student have just been exposed to functional programming a week ago in a three-hour course, after which they have written some simple programs handling concrete data but may have problem grasping those more abstract concepts like folds. I have talked to them about maximum segment sum way too many times (in the context of imperative program derivation, though), and it is perhaps too complex to cover in 25 minutes. The steep list problem, on the other hand, can be dealt with in 5 minutes. Thus I need another example.

This is what I eventually came up with: given a list of numbers `a₀, a₁, a₂ ... an` and a constant `X`, compute `a₀ + a₁X, + a₂X² + ... + anXn`. In Haskell it can be specified as a one-liner:
``` ```

``  poly as = sum (zipWith (×) as (iterate (×X) 1))``
``` ```

One problem of this example is that the specification is already good enough: it is a nice linear time algorithm. To save some multiplications, perhaps, we may try to further simplify it.

It is immediate that `poly [] = 0`. For the non-empty case, we reason:
``` ```

``````   poly (a : as)
=   { definition of poly }
sum (zipWith (×) (a:as) (iterate (×X) 1))
=   { definition of iterate }
sum (zipWith (×) (a:as) (1 : iterate (×X) X))
=   { definition of zipWith }
sum (a : zipWith (×) as (iterate (×X) X))
=   { definition of sum }
a + sum (zipWith (×) as (iterate (×X) X))
``````
``` ```

The expression to the right of `a +` is unfortunately not `poly as` — the last argument to `iterate` is `X` rather than `1`. One possibility is to generalise `poly` to take another argument. For this problem, however, we can do slightly better:
``` ```

``````   a + sum (zipWith (×) as (iterate (×X) X))
=   { since iterate f (f b) = map f (iterate f b) }
a + sum (zipWith (×) as (map (×X) (iterate (×X) 1)))
=   { zipWith (⊗) as . map (⊗X) = map (⊗X) . zipWith (⊗) as
if ⊗ associative }
a + sum (map (×X) (zipWith (×) as (iterate (×X) 1)))
=   { sum . map (×X) = (×X) . sum }
a + (sum (zipWith (×) as (iterate (×X) 1))) × X
=   { definition of poly }
a + (poly as) × X
``````
``` ```

We have thus come up with the program
``` ```

``````  poly [] = 0
poly (a : as) = a + (poly as) × X
``````
``` ```

Besides the definitions of `sum`, `zipWith`, `iterate`, etc, the rules used include:

1. `map f (iterate f x) = iterate f (f x)`
2. `zipWith (⊗) as . map (⊗X) = map (⊗X) . zipWith (⊗) as` if `⊗` associative
3. `sum . map (×X) = (×X) . sum`, a special case of `foldr ⊕ e . map (⊗X) = (⊗X) . foldr ⊕ e` if `(a ⊕ b) ⊗ X = (a ⊗ X) ⊕ (b ⊗ X)` and `e ⊗ X = e`.

Well, this is not a very convincing example. Ideally I’d like to have a derivation, like the steep list, where we gain some improvement in complexity by calculation.

What is your favourite example for functional program calculation?

# Sum of Squares of Differences

In the final exam of the Program Construction course in FLOLAC ’10, I gave the students this problem (from Kaldewaij’s book):
``` ```

``````|[ con N {N ≥ 2}; a : array [0..N) of int;
var r : int;
S
{ r = (Σ i,j : 0 ≤ i < j < N : (a.i - a.j)²) }
]|``````
``` ```

In words, given an array of integers having at least two elements, compute the sum of squares of the difference between all pairs of elements. (Following the convention of the guarded command language, function application is written `f.x`, and an array is seen as a function from indices to values.)

It is not hard to quickly write up a `O(N²)` program using nested loops, which, I have to confess, is what I would do before reading Kaldewaij’s book and realised that it is possible to do the task in linear time using one loop. Unfortunately, not many students managed to come up with this solution, therefore I think it is worth some discussion.

### Quantifiers

Before we solve the problem, let us review the “Dutch style” quantifier syntax and rules. Given a commutative, associative binary operator `⊕` with unit element `e`, if we informally denote the (integral) values in the interval `[A .. B)` by `i₀, i₁, i₂ ... in`, the quantified expression:
``` ```

``   (⊕ i : A ≤ i < B : F.i)``
``` ```

informally denotes `F.i₀ ⊕ F.i₁ ⊕ F.i₂ ⊕ ... ⊕ F.in`. More generally, if all values satisfying predicate `R` can be enlisted `i₀, i₁, i₂ ... in`, the expression
``` ```

``   (⊕ i : R.i : F.i)``
``` ```

denotes `F.i₀ ⊕ F.i₁ ⊕ F.i₂ ⊕ ... ⊕ F.in`. We omit the `i` in `R.i` and `F.i` when there can be no confusion.

A more formal characterisation of the quantified expression is given by the following rules:

1. `(⊕ i : false : F.i) = e`
2. `(⊕ i : i = x : F.i) = F.x`
3. `(⊕ i : R : F) ⊕ (⊕ i : S : F) = (⊕ i : R ∨ S : F) ⊕ (⊕ i : R ∧ S : F)`
4. `(⊕ i : R : F) ⊕ (⊕ i : R : G) = (⊕ : R : F ⊕ G)`
5. `(⊕ i : R.i : (⊕ j : S.j : F.i.j)) = (⊕ j : S.j : (⊕ i : R.i : F.i.j))`

Rules 1 and 3 give rise to a useful rule "split off `n`": consider `i` such that `0 ≤ i < n + 1`. If `n > 0`, the set of possible values of `i` can be split into two subsets: `0 ≤ i < n` and `i = n`. By rule 3 (reversed) and 1 we get:
``` ```

``  (⊕ i : 0 ≤ i < n + 1 : F.i) = (⊕ i : 0 ≤ i < n : F.i) ⊕ F.n``
``` ```

Expressions quantifying more than one variables can be expressed in terms of quantifiers over single variables:
``` ```

``   (⊕ i,j : R.i ∧ S.i,j : F.i.j) = (⊕ i : R.i : (⊕ j : S.i.j : F.i.j))``
``` ```

If `⊗` distributes into `⊕`, we have an additional property:
``` ```

``   x ⊗ (⊕ i : R : F) = (⊗ i : R : x ⊗ F)``
``` ```

As a convention, `(+ i : R : F)` is often written `(Σ i : R : F)`.

### Computing the Sum of Squares of Differences

The first step is to turn the constant `N` to a variable `n`. The main worker of the program is going to be a loop, in whose invariant we try to maintain:
``` ```

``   P  ≣  r = (Σ i,j : 0 ≤ i < j < n : (a.i - a.j)²)``
``` ```

In the end of the loop we increment `n`, and the loop terminates when `n` coincides with `N`:
``` ```

``````   { Inv: P ∧ 2 ≤ n ≤ N , Bound: N - n}
do n ≠ N → ... ; n := n + 1 od``````
``` ```

We shall then find out how to update `r` before `n := n + 1` in a way that preserves `P`.

Assume that `P` and `2 ≤ n ≤ N` holds. To find out how to update `s`, we substitute `n` for `n + 1` in the desired value of `r`:
``` ```

``````   (Σ i,j : 0 ≤ i < j < n : (a.i - a.j)²)[n+1 / n]
= (Σ i,j : 0 ≤ i < j < n + 1 : (a.i - a.j)²)
=   { split off j = n }
(Σ i,j : 0 ≤ i < j < n : (a.i - a.j)²) +
(Σ i : 0 ≤ i < n : (a.i - a.n)²)
=   { P }
r + (Σ i : 0 ≤ i < n : (a.i - a.n)²)
``````
``` ```

This is where most people stop the calculation and start constructing a loop computing `(Σ i : 0 ≤ i < n : (a.i - a.n)²)`. One might later realise, however, that most computations are repeated. Indeed, the expression above can be expanded further:
``` ```

``````   r + (Σ i : 0 ≤ i < n : (a.i - a.n)²)
=   { (x - y)² = x² - 2xy + y² }
r + (Σ i : 0 ≤ i < n : a.i² - 2 × a.i × a.n + a.n²)
=   { Rule 4 }
r + (Σ i : 0 ≤ i < n : a.i²)
- (Σ i : 0 ≤ i < n : 2 × a.i × a.n)
+ (Σ i : 0 ≤ i < n : a.n²)
=   { a.n is a constant, multiplication distributes into addition }
r + (Σ i : 0 ≤ i < n : a.i²)
- 2 × (Σ i : 0 ≤ i < n : a.i) × a.n
+ (Σ i : 0 ≤ i < n : a.n²)
=   { simplifying the last term }
r + (Σ i : 0 ≤ i < n : a.i²)
- 2 × (Σ i : 0 ≤ i < n : a.i) × a.n + n × a.n²
``````
``` ```

which hints at that we can store the values of `(Σ i : 0 ≤ i < n : a.i²)` and `(Σ i : 0 ≤ i < n : a.i)` in two additional variables:
``` ```

``````  Q₀  ≣  s = (Σ i : 0 ≤ i < n : a.i²)
Q₁  ≣  t = (Σ i : 0 ≤ i < n : a.i)
``````
``` ```

It merely takes some routine calculation to find out how to update `s` and `t`. The resulting code is:
``` ```

``````|[ con N {N ≥ 2}; a : array [0..N) of int;
var r, s, t, n : int;

r, s, t, n := (a.0 - a.1)², a.0² + a.1², a.0 + a.1, 2
{ Inv: P ∧ Q₀ ∧ Q₁ ∧ 2 ≤ n ≤ N , Bound: N - n }
; do n ≠ N →
r := r + s - 2 × t × a.n + n × a.n²;
s := s + a.n²;
t := t + a.n;
n := n + 1
od
{ r = (Σ i,j : 0 ≤ i < j < N : (a.i - a.j)²) }
]|``````
``` ```

### Another “One Loop” Solution

Among those students who did come up with a program, most of them resorted to a typical two-loop, `O(N²)` solution. Given that this 9-hour course is, for almost all of them, their first exposure to program derivation, I shall perhaps be happy enough that around 3 to 4 out of 38 students came up with something like the program above.

One student, however, delivered a program I did not expect to see:
``` ```

``````|[ con N {N ≥ 2}; a : array [0..N) of int;
var r, i, j : int;

r, i, j := 0, 0, 0
{ Inv: ... ∧ 0 ≤ i ≤ j ∧ 0 ≤ j ≤ N, Bound: ? }
; do j ≠ N →
if i < j → r := r + (a.i - a.j)²;  i := i + 1
| i = j → i, j := 0, j + 1
fi
od
]|``````
``` ```

The program uses only one loop, but is still `O(N²)` — on a closer inspection one realises that it is actually simulating the inner loop manually. Still, I’d be happy if the student could show me a correctness proof, with a correct loop invariant and a bound, since both of them are more complex than what I expected them to learn. Unfortunately, in the answer handed in, the program, the invariant, and the bound all contain some bugs. Anyone wants to give it a try?

# Fully Polynomial-Time Approximation by Thinning

Last update: August 26, 2013

Code and supplementary proofs accompanying the paper: Constructing Datatype-Generic Fully Polynomial-Time Approximation Schemes Using Generalised Thinning, by Yu-Han Lyu, Akimasa Morihata, and me.

The supplementary proofs mainly consist of proofs regarding the individual problems in the paper.

The file fptas.zip consists of the following Haskell modules:

• `KnapsackSpec`: specification of the 0-1 knapsack problem.
• `Knapsack`: a thinning algorithm solving knapsack (`knapsack`), and an approximation algorithm (`knapsack_apx`).
• `KnapsackTest`: some QuickCheck properties to test the code.
• `PartTreesSpec`: specification of the maximal tree partition problem.
• `PartTrees`: a thinning algorithm solving the tree partition problem (`mtp`), and an approximation algorithm (`mtp_apx`).
• `PartTreesTest`: some QuickCheck properties to test the code.
• `Utilities`: some utilities used by `PartTreesSpec`.
• `Merging`: generalised merging and bumping functions for both programs.

The problem instances generated by QuickCheck very rapidly get too large in size. The function `smallCheck` defined in both `*Test` modules restricts the sizes of instances generated.

# An Exercise Utilising Galois Connections

Given two partial orders `(A, ⊑)`, `(B, ≼)`, two functions `f : A → B`, `g : B → A` form a Galois connection between them if for all `a : A`, `b : B` we have

``  f a ≼ b ≣ a ⊑ g b``

We will refer to this defining property as “GC” later. The function `f` is called the lower adjoint and `g` the upper adjoint of the Galois connection. Galois connections are interesting because once two functions are identified as such, they immediately satisfy a rich collection of useful properties:

• letting `a := g b` in GC, we get `f (g b) ≼ b`;
• letting `b := f a`, we get `a ⊑ g (f a)`;
• `f` is monotonic, since:
`````` f a₁ ≼ f a₂
≣   { GC }
a₁ ⊑ g (f a₂)
⇐  {  since a ⊑ g (f a) }
a₁ ⊑ a₂``````
• similarly, `g` is monotonic: `b₁ ≼ b₂ ⇒ f b₁ ⊑ f b₂`,

and many more.

In the recent work of Sharon and me on maximally dense segments we needed quite a number of functions to be monotonic, idempotent, etc. It only occurred to me after submitting the paper: could they be defined as Galois connections? The number of properties we needed in the paper is huge and it would be nice to establish them on fewer basic properties. And it looks prettier.

### Longest Prefix Up to a Certain Sum

One such function is `trim` in the paper, but it is sufficient to consider a simplification: let `sam : [Int] → [Int]` (for “sum atmost”) return the longest prefix of the input list whose sum is no larger than a constant `U`. Denote “`x` is a prefix of `y`” by `x ⊑ y`. We want to show that `sam` satisfies

• monotonicity: `x ⊑ y ⇒ sam x ⊑ sam y`, and
• idempotence: `sam (sam x) = sam x`.

Can they be derived by defining `sam` as a Galois connection?

I learned from José N. Oliveira‘s talk A Look at Program “G”alculation in IFIP WG 2.1 #65 Meeting how (the uncurried version of) `take` can be defined as a Galois connection. It turns out that `sam` is just the same. We consider a slight generalisation `sam' : (Int, [Int]) → [Int]` that takes an upper bound as a parameter. It can be characterised by:

``sum y ≤ b  ∧  y ⊑ x   ≣   y ⊑ sam' (b, x)``

There is in fact a Galois connection hidden already! To see that, define `⟨f, g⟩ a = (f a, g a)` (in the Haskell Hierarchy Library it is defined in Control.Arrow as `&&&`), and denote the product of binary relations by `×`, that is, if `a ≤ b` and `x ⊑ y` then `(a,x)` is related to `(b,y)` by `≤×⊑`. We write a composed relation as an infix operator by surrounding it in square brackets `(a,x) [≤×⊑] (b,y)`.

Using these notations, the defining equation of `sam'` can be rewritten as:

``⟨sum, id⟩ y [≤×⊑] (b,x)   ≣   y ⊑ sam' (b,x)``

Thus `sam'` is the upper adjoint in a Galois connection between `((Int, [Int]), ≤×⊑)` and `([Int], ⊑)`!

Now that `⟨sum, id⟩` and `sam'` form a Galois connection, we have:

• `f (g b) ≼ b` instantiates to `⟨sum, id⟩ (sam' (b,x)) [≤×⊑] (b,x)`, that is, `sum (sam' (b,x)) ≤ b` and `sam' (b,x) ⊑ x`;
• `a ⊑ g (f a)` instantiates to `x ⊑ sam' (sum x, x)`. Together with the previous property we have `x = sam' (sum x, x)`;
• monotonicity of the lower adjoint instantiates to `y₁ ⊑ y₂ ⇒ sum y₁ ≤ sum y₂ ∧ y₁ ⊑ y₂`;
• monotonicity of the upper adjoint instantiates to
``(b₁,x₁) [≤×⊑] (b₂,x₂)   ⇒   sam' (b₁,x₁) ⊑ sam' (b₂,x₂) ``

that is

``b₁ ≤ b₂   ∧  x₁ ⊑ x₂   ⇒   sam' (b₁,x₁) ⊑ sam' (b₂,x₂) ``

a generalisation of the monotonicity we want.

Finally, to show idempotence, we reason

``````   sam' (b₁, x) ⊑ sam' (b₁, sam' (b₂, x))
≣   { GC }
⟨sum, id⟩ (sam' (b₁, x)) [≤×⊑]  (b₁, sam' (b₂, x))
≣   { definitions }
sum (sam' (b₁, x)) ≤ b₁   ∧   sam' (b₁, x) ⊑ sam' (b₂, x)
⇐  { properties above }
b₁ ≤ b₂``````

These are all nice and pretty. There is another function, however, that is much harder to deal with, which I will write about next time.

# Finding Maximally Dense Segments

Sharon and I have finally concluded, for now, our work on the maximally dense segment problem (draft, with an errata already!), on which we have been working on and off for the past two years. Considering the algorithm itself and its derivation/proofs, I am quite happy with what we have achieved. The algorithm is rather complex, however, and it is a challenge presenting it in an accessible way. Sharon has done a great job polishing the paper, and I do hope more people would be interested in reading it and it would, eventually, inspire more work on interesting program derivations.

The basic form of the problem looks like a natural variation of the classical maximum segment sum problem: given a list of numbers, find a consecutive segment whose average, that is, sum divided by length, is maximum. The problem would be trivial without more constraints, since one could simply return the largest element, thus we usually impose a lower bound `L` on the length of feasible segments.

It was noticed by Huang [3], that a segment having maximum average need not be longer than `2L - 1`: given a segment of `2L` elements or more, we cut it in the middle. If the two halves have different averages, we keep the larger one. Otherwise the two halves have the same average. Either way, we get a shorter, feasible segment whose average is not lower. The fact hints at a trivial `O(nL)` algorithm: for each suffix of the list, find its best prefix upto `2L - 1` elements long.

A difficult challenge, however, is to come up with an algorithm that is `O(n)`, independently of `L`. The problem can be generalised to the case where the elements do not have length 1, but each has a width, and the goal is to maximise the density — sum of the elements divided by sum of their width. It makes the problem sightly more complicated, but does not change its nature. If we go on to impose an upper bound `U` on the length as well, however, the problem becomes much more difficult. There was an published algorithm that claimed to be linear only to be found not so. We discovered that two later algorithms, which appeared to have concluded the problem, also fail for a boundary case. The bug is easy to fix for one of the algorithm, but might not be so for the other.

Our algorithm closely relates to that of Chung and Lu [1] and that of Goldwasser et al [2]. The algorithm is perhaps too complex to present in detail in a blog post (that’s why we need a paper!), but I will try to give an outline using pictures from the paper, my slides and poster.

One of the ways to visualise the problem is to see each element as a block, the number being the area of the block, and the density would be its height. The input is a list of (area, width) pairs, and the goal is to find a consecutive segment maximising the height. Shown below is the input list `[(9,6),(6,2),(14,7),(20,4),(-10,5),(20,8),(-2,2),(27,6)]`, and the dashed line is their average height:

Notice that an area can be negative. In the paper, since the alphabet `w` is used for “window” (to be explained below), we instead refer to the width as “breadth”.

### Prefixes of Suffixes, and the Window

Many optimal segment problems (finding some optimal segment of a given list) are solved by finding, for each suffix, its optimal prefix, as shown below. Each bar is a suffix of the input, and the blue part is its optimal prefix:

It is preferable that an optimal prefix of `a : x` can be computed from the optimal prefix of `x`, that is, the function computing the optimal prefix is a `foldr`. If it is true, the algorithm may keep a pair of (optimal segment, optimal prefix). Each time a new element is read, it computes the new optimal prefix using the previous optimal prefix, and update the optimal segment if the new prefix is better. If you like structured recursion (or the so-called “origami programming”), this form of computation is an instance of a zygomorphism.

For each optimal prefix to be computable from the previous optimal prefix, it may not extend further than the latter. We do not want the following to happen:

However, it appears to be possible for the maximally dense prefix! Imagining adding a very small, or even negative area. We might get a denser prefix by extending further to the right since the denominator is larger.

The first theorem we had to prove aimed to show that it does not matter — if a maximally dense prefix extends further than the previous one, it is going to be suboptimal anyway. Thus it is safe if we always start from the right end of the previous prefix. That is, we do not compute the maximally dense prefix of the entire input, but merely the maximally dense prefix of the previous prefix.

This is an instance of the sliding window scheme proposed by Zantema [4]. The blue part is like a “window” of the list, containing enough information to guarantee the correctness of the algorithm. As the algorithm progresses, the two ends of the window keeps sliding to the left, hence the name.

To formally show that the window contains enough information to compute the maximally dense segment, we have to clearly state what window is, and what invariant it satisfies. It turned out to be quite tricky to formally state the intuition that “the window does not always give you the optimal prefix, but it does when it matters,” and was the first challenge we met.

Since we aim at computing a segment at least `L` units in breadth, it might be handy to split the window into a “compulsory part” (the shortest prefix that is at least `L` units wide) and the rest, the “optional part”. The algorithm thus looks like this:

where the yellow bars are the compulsory parts and blue bars the optional parts. Each time we read an element into the compulsory part, zero or more elements (since the elements have non-uniform breadths) may be shifted from the compulsory part to the optional part. Then we compute a maximally dense prefix (the yellow and the blue parts together) that does not extend further than the previous one. The best among all these prefixes is the maximally dense segment.

We want a linear time algorithm, which means that all the computation from a pair of yellow-blue bars to the next pair has to be done in (amortised) constant time — how is that possible at all? To do so we will need to exploit some structure in the optional part, based on properties of density and segments.

### Right-Skew Segments, and the DRSP

A non-empty list of elements `x` is called right-skew if, for every non-empty `x₁` and `x₂` such that `x₁ ⧺ x₂ = x`, we have `density x₁ ≤ density x₂`. Informally, a right-skew list is drawn as the blue wavy block below:

The rising wavy slope informally hints that the right half has a higher density than the left half wherever you make it cut. Howver, the drawing is at risk from the misunderstanding that a right-skew segment is a list of elements with ascending areas or densities. Note that neither the areas nor the densities of individual elements have to be ascending. For example, the list `[(9,6),(6,2),(14,7)]`, with densities `[1.5, 3, 2]`, is right-skew.

Right-skew lists are useful because of the following property. Imagining placing a list `z` next to `x`, as depicted above. To find a maximally dense prefix of `z ⧺ x` starting with `z`, it is sufficient to consider only `z` and `z ⧺ x` — nothing in the middle, such as `z ⧺ x₁`, can be denser than the two ends!

Given a window with compulsory part `c` and optional part `x`, if we can partition `x` into `x₁ ⧺ x₂ ⧺ ... ⧺ xn`, such that `x₁`, `x₂`, … `xn` are all right-skew, then to compute the maximally dense prefix of `c ⧺ x`, we only need to consider `c`, `c ⧺ x₁`, `c ⧺ x₁ ⧺ x₂`,… and `c ⧺ x₁ ⧺ x₂ ⧺ ... ⧺ xn`.

Such a partition is always possible for any list `x` — after all, each element itself constitute a singleton right-skew list. However, there is one unique right-skew partition such that the densities of `x₁`, `x₂`, … `xn` are strictly decreasing. This is called the decreasing right-skew partition (DRSP) of `x`. We will partition the optional part of the window into its DRSP. A window now looks like the picture below:

Sharon summarised many nice properties of DRSP in the paper, for which we unfortunately do not have space here. We will only look at some properties that matters for this blog post. Firstly, consider the diagram below:

In the bottom row, the leftmost block is the density of `c`, and the second block is the density of `c ⧺ x₁`, etc. If segments `x₁`, `x₂`, … `xn` have decreasing densities, the densities of `c`, `c ⧺ x₁`, `c ⧺ x₁ ⧺ x₂`,… and `c ⧺ x₁ ⧺ x₂ ⧺ ... ⧺ xn` must be bitonic — first ascending, then descending. It helps to efficiently locate the maximally dense prefix.

Secondly, the DRSP can be built and maintained in a `foldr`. The following diagram depicts how the DRSP for the list of areas `[1,4,2,5,3]` (all with breadth `1`) can be built by adding elements from the left one by one (which eventually results in one big partition):

The rule is that blocks newly added from the left keeps merging with blocks to its right until it encounters a block shorter than itself. The top-left of the diagram indicates that the DRSP of `(3` is itself. Since `5 > 3`, adding `1` results in a partition containing two segments. When `2` is added, it is merged with `5` to form a new segment with density `3.5`. No merging is triggered with the addition of `4` since `4 > 3.5` and thus `[4,3.5,3]` form a decreasing sequence. Newly added `1` first merges `4`, forming a block having density `2.5`. Since `2.5 < 3.5`, it again merges with the block `[2,5]`. Eventually all elements are grouped into one segment with density `3`. One important thing here is that adding a new element only involves merging some initial parts of the DRSP.

### Algorithm Overview

Recall that our algorithm computes, for each suffix, a prefix (a window) that is possibly optimal and contains enough information to compute all future optimal solutions. Since a feasible prefix is wider than `L`, we split it into a (yellow) compulsory part and a (blue) optional part. To obtain a linear time algorithm, we have to compute one row from the previous row in amortised constant time (the corresponding diagram is duplicated here):

The diagram below depicts how to go from one row to the next. The blue part is partitioned into DRSP. Each time an element is added to the yellow part, some elements may be shifted to the blue part, and that may trigger some right-skew segments in the blue part to be merged (second row). Then we look for a maximally dense prefix by going from right to left, chopping away segments, until we find the peak (third row):

Note that the operation shown on the third row (chopping to find the maximum) always chop away a right-skew segment in its entirety. It is important that the merging happens at the left end of the optional part, while the chopping happens at the right end. By using a tree-like data structure, each merging can be a `O(1)` operation. With the data structure, we may argue that, since each element can be merged at most once, throughout the algorithm only `O(n)` merging could happen. Similarly, each element can be chopped away at most once, so the chopping could happen at most `O(n)` time as well. Therefore the operations in the second and third rows above are both amortised `O(1)`.

### Problem with Having an Upper Bound

The discussion so far already allows us to develop an algorithm for the maximally dense segment problem without an upper bound on the breadth of feasible segments. Having the upper bound makes the problem much harder because, different from the chopping depicted above, an upper bound may cut through a right-skew segment in the middle:

And a right-skew segment, with some elements removed, might not be right-skew anymore!

Our solution is to develop another data structure that allows efficient removal from the right end of a DRSP, while maintaining the DRSP structure. The final configuration of a window looks like the diagram below, where the new data structure is represented by the green blocks:

Unfortunately, it is inefficient to add new elements from the left into the green blocks. Therefore we have to maintain the window in a way similar to how a queue is implemented using two lists. New elements are added from the left into the blue blocks; when we need to remove element from the right of a block, it is converted to a green block in large chunks.

For more details, see the paper!

### References

1. Chung, Kai-Min and Lu, Hsueh-I. An Optimal Algorithm for the Maximum-Density Segment Problem. SIAM Journal on Computing 34(2):373-387, 2004.
2. Goldwasser, Michael H. and Kao, Ming-Yang and Lu, Hsueh-I. Linear-Time Algorithms for Computing Maximum-Density Sequence Segments with Bioinformatics Applications. Journal of Computer and System Sciences, 70(2):128-144, 2005.
3. Huang, Xiaoqui. An algorithm for identifying regions of a {DNA} sequence that satisfy a content requirement. Computer Applications in the Biosciences 3(10): 219-225, 1994.
4. Zantema, Hans. Longest segment problems. Science of Computer Programming, 18(1):39-66, 1992.

# Functional pearl: maximally dense segments

The following is superseded by our later work Functional pearl: finding a densest segment.

Sharon Curtis and Shin-Cheng Mu. Submitted.
[PDF]

errata:
• Page 3: “This input sequence does not have a solution…” what we meant was “This input does not have a prefix that is within bounds.” We used another example where the input does not have a feasible segment at all before changing to example, but I forgot to change the text accordingly.
• Page 4, Proof of Theorem 3.2: the first `mdsM x ⇑d win (a:x)` should be `mdsM x ⇑d wp (trim (a:x))`; `a : x <b L` and `a : x ≥b L` should respectively be `trim (a : x) <b L` and `trim (a : x) ≥b L`.
• Thanks to Josh Ko for pointing out both errors.

The problem of finding a maximally dense segment (MDS) of a list is a generalisation of the well-known maximum segment sum (MSS) problem, but its solution is more challenging. We extend and illuminate some recent work on this problem with a formal development of a linear-time online algorithm, in the form of a sliding window zygomorphism. The development highlights some elegant properties of densities, involving partitions which are decreasing and all right-skew.

Code and supplementary proofs are available online.

keywords: program derivation, segment problem, maximum density, sliding window, zygomorphism, right-skew.

# The Maximum Segment Sum Problem: Its Origin, and a Derivation

In a previous paper of mine, regrettably, I wrongly attributed the origin of the maximum segment sum problem to Dijkstra and Feijen’s Een methode van programmeren. In fact, the story behind the problem was told very well in Jon Bentley’s Programming Pearls.

### The Problem, and the Linear-Time Algorithm

Given a list of numbers, the task is to compute the largest possible sum of a consecutive segment. In a functional language the problem can be specified by:

`````` mss = max . map sum . segments
``````

where `segments = concat . map inits . tails` enlists all segments of the input list, `map sum` computes the sum of each of the segments, before `max :: Ord a ⇒ [a] → a` picks the maximum. The specification, if executed, is a cubic time algorithm. Yet there is a linear time algorithm scanning through the list only once:

``````mss = snd . foldr step (0,0)
where step x (p,s) = (0 ↑ (x+p), (0 ↑ (x+p)) ↑ s)
``````

where `a ↑ b` yields the maximum of `a` and `b`.

Both the specification and the linear time program are short. The program is merely a `foldr` that can be implemented as a simple for-loop in an imperative language. Without some reasoning, however, it is not that trivial to see why the program is correct (hint: the `foldr` computes a pair of numbers, the first one being the maximum sum of all prefixes of the given list, while the second is the maximum sum of all segments). Derivation of the program (given below) is mostly mechanical, once you learn the basic principles of program calculation. Thus the problem has become a popular choice as the first non-trivial example of program derivation.

### Origin

Jon Bentley recorded in Programming Pearls that the problem was proposed by Ulf Grenander of Brown University. In a pattern-matching procedure he designed, a subarray having maximum sum is the most likely to yield a certain pattern in a digitised image. The two dimensional problem took too much time to solve, so he simplified to one dimension in order to to understand its structure.

In 1977 [Grenander] described the problem to Michael Shamos of UNILOGIC, Ltd. (then of Carnegie-Mellon University) who overnight designed Algorithm 3. When Shamos showed me the problem shortly thereafter, we thought that it was probably the best possible; … A few days later Shamos described the problem and its history at a Carnegie-Mellon seminar attended by statistician Jay Kadane, who designed Algorithm 4 within a minute.

Jon Bentley, Programming Pearls (1st edition), page 76.

Jay Kadane’s Algorithm 4 is the now well-known linear time algorithm, the imperative version of the functional program above:

``````maxpre, maxseg = 0, 0
for i in range (0, N):
maxpre = 0 ↑ (maxpre + a[i])
maxseg = maxpre ↑ maxseg
``````

Algorithm 3, on the other hand, is a divide and conquer algorithm. An array `a` is split into two halves `a₁ ⧺ a₂`, and the algorithm recursively computes the maximum segment sums of `a₁` and `a₂`. However, there could be some segment across `a₁` and `a₂` that yields a good sum, therefore the algorithm performs two additional loops respectively computing the maximum suffix sum of `a₁` and the maximum prefix sum of `a₂`, whose sum is the maximum sum of segment crossing the edge. The algorithm runs in `O(N log N)` time. (My pseudo Python translation of the algorithm is given below.)

In retrospect, Shamos did not have to compute the maximum prefix and suffix sums in two loops each time. The recursive function could have computed a triple quadruple of (maximum prefix sum, maximum segment sum, maximum suffix sum, and sum of the whole array) for each array. The prefix and suffix sums could thus be computed bottom-up. I believe that would result in a `O(N)` algorithm. This linear time complexity might suggest that the “divide” is superficial — we do not have to divide the array in the middle. It is actually easier to divide the array into a head and a tail — which was perhaps how Kadane quickly came up with Algorithm 4!

### A Functional Derivation

I learned the function derivation of the maximum segment sum problem from one of Jeremy’s papers [3] and was very amazed. It was perhaps one of the early incident that inspired my interest in program calculation. The derivation does not appear to be very well known outside the program derivation circle — not even for functional programmers, so I would like to redo it here.

The first few steps of the derivation goes:

``````   max . map sum . segs
=    { definition of segs }
max . map sum . concat . map inits . tails
=    { since map f . concat = concat . map (map f) }
max . concat . map (map sum) . map inits . tails
=    { since max . concat = max . map max }
max . map max .  map (map sum) . map inits . tails
=    { since map f . map g = map (f.g) }
max . map (max . map sum . inits) . tails
``````

The purpose of the book-keeping transformation above is to push `max . map sum` closer to `inits`. The fragment `max . map sum . inits` is a function which, given a list of numbers, computes the maximum sum among all its prefixes. We denote it by `mps`, for maximum prefix sum. The specification has been transformed to:

``   mss = max . map mps . tails ``

This is a common strategy for segment problems: to solve a problem looking for an optimal segment, proceed by looking for an optimal prefix of each suffix. (Symmetrically we could process the list the other way round, look for an optimal suffix for each prefix.)

We wish that `mps` for each of the suffixes can be efficiently computed in an incremental manner. For example, to compute `mps [-1,3,3,-4]`, rather than actually enumerating all suffixes, we wish that it can be computed from `-1` and `mps [3,3,-4] = 6`, which can in turn be computed from `3` and `mps [3,-4] = 3`, all in constant time. In other words, we wish that `mps` is a `foldr` using a constant time step function. If this is true, one can imagine that we could efficiently implement `map mps . tails` in linear time. Indeed, `scanr f e = map (foldr f e) . tails`!

The aim now is to turn `mps = max . map sum . inits` into a `foldr`. Luckily, `inits` is actually a `foldr`. In the following we will perform `foldr`-fusion twice, respectively fusing `map sum` and `max` into `inits`, thus turning the entire expression into a `foldr`.

The first fusion goes:

``````   max . map sum .inits
=    { definition of inits }
max . map sum . foldr (\x xss -> [] : map (x:) xss) [[]]
=    { fold fusion, see below }
max . foldr zplus [0]
``````

The fusion condition can be established below, through which we also construct the definition of `zplus`:

``````   map sum ([] : map (x:) xss)
=  0 : map (sum . (x:)) xss
=    { by definition, sum (x : xs) = x + sum xs }
0 : map (x+) (map sum xss)
=    { define zplus x xss = 0 : map (x+) xss }
zplus x (map sum xss)
``````

We continue with the derivation and perform another fusion:

``````   max . foldr zplus [0]
=    { fold fusion, let zmax x y = 0 ↑ (x+y) }
foldr zmax 0 {-"."-}
``````

For the second fold fusion to work, we have to prove the following fusion condition:

``````   max (0 : map (x+) xs)
=  0 ↑ max (map (x+) xs)
=    { since  max (map (x +) xs) = x + max xs }
0 ↑ (x + max xs) {-"."-}
``````

The property `max (map (x +) xs) = x + max xs` in the last step follows from that `(↑)` distributes into `(+)`, that is, `(x + y) ↑ (x + z) = x + (y ↑ z)`. This is the key property that allows the whole derivation to work.

By performing `foldr`-fusion twice we have established that

``mps = foldr zmax 0``

In words, `mps (x : xs)`, the best prefix sum of `x : xs`, can be computed by `zmax x (mps xs)`. The definition of `zmax` says that if `x + mps xs` is positive, it is the maximum prefix sum; otherwise we return `0`, sum of the empty prefix.
Therefore, `mss` can be computed by a `scanr`:

``````   mss
=    { reasoning so far }
max . map (foldr zmax 0) . tails
=    { introducing scanr }
max . scanr zmax 0 {-"."-}
``````

We have derived `mss = max . scanr zmax 0`, where `zmax x y = 0 ↑ (x+y)`.

Many functional derivations usually stop here. This gives us an algorithm that runs in linear time, but takes linear space. A tupling transformation eliminates the need for linear space:

``  mss = snd . (head &&& max) . scanr zmax 0 ``

where `(f &&& g) a = (f a, g a)`. The part `(head &&& max) . scanr zmax 0` returns a pair, the first component being the result of `mps`, the second `mss`. By some mechanical simplification we get the final algorithm:

``````mss = snd . foldr step (0,0)
where step x (p,s) = (0 ↑ (x+p), (0 ↑ (x+p)) ↑ s)
``````

### A Relational Derivation?

The maximum segment sum problem later turned out to be a example of Richard and Oege’s Greedy Theorem [2]. It is an exercise in the Algebra of Programming book, but I have not seen the solution given anywhere. For completeness, I recorded a relational derivation in a paper of mine about some other variations of the maximum segment sum problem[4].

### References

1. Bentley, Jon. Programming Pearls. Addison-Wesley, Inc, 1987.
2. Bird, Richard and de Moor, Oege. Algebra of Programming. Prentice-Hall, 1997
3. Gibbons, Jeremy. Calculating Functional Programs. Proceedings of ISRG/SERG Research Colloquium, Oxford Brookes University, November 1997.
4. Mu, Shin-Cheng. Maximum segment sum is back: deriving algorithms for two segment problems with bounded lengths. Partial Evaluation and Program Manipulation (PEPM ’08), pp 31-39. January 2008.

### Appendix: Algorithm 3

``````
def mss(l,u):
if l > u:
return 0          # empty array
else if l == u:
return (0 ↑ a[l])  # singleton array
else:
m = (l + u) / 2

# compute maximum suffix sum of a[0..m]
sum, maxToLeft = 0, 0
for i in range (m, l-1, -1):
sum = sum + a[i]
maxToLeft = maxToLeft ↑ sum
# compute maximum prefix sum of a[m+1..u]
sum, maxToRight = 0, 0
for i in range (m+1, u+1):
sum = sum + a[i]
maxToLeft = maxToRight ↑ sum
maxCrossing = maxToLeft + maxToRight

# recursively compute mss of a[0..m] and a[m+1..u]
maxInL = mss(l,m)
maxInR = mss(m+1,u)
return (maxInL ↑ maxCrossing ↑ maxInR)
``````

# Maximally Dense Segments — Code and Proof

The following is superseded by our later work Functional pearl: finding a densest segment.

Code and proof accompanying a forthcoming paper of Sharon Curtis and me: Functional Pearl: Maximally Dense Segments.

Quick links: [expository program | linear time program | proofs (late update: 2010.04.17)].

errata:
• Page 3: “This input sequence does not have a solution…” what we meant was “This input does not have a prefix that is within bounds.” We used another example where the input does not have a feasible segment at all before changing to example, but I forgot to change the text accordingly.
• Page 4, Proof of Theorem 3.2: the first `mdsM x ⇑d win (a:x)` should be `mdsM x ⇑d wp (trim (a:x))`; `a : x <b L` and `a : x ≥b L` should respectively be `trim (a : x) <b L` and `trim (a : x) ≥b L`.
• Thanks to Josh Ko for pointing out both errors.

### Expository Code

The expository program [download here] intends to be an executable implementation of the code in the paper. For clarity we use Haskell lists for all sequences, and do not implement optimisations such as storing the areas and breadths of segments, thus the program is not linear time yet. A linear time implementation will follow soon.

The code is mostly consistent with the paper, with some minor differences: many functions take an additional argument of type `BreadthSpec = (Breadth, Maybe Breadth)`. The first component is the lower bound, while the second component is an optional upperbound. The main function is:

``mds :: BreadthSpec ->  [Elem] -> Maybe [Elem]``

which takes a `BreadthSpec` and switches between the modes with or without an upper bound.

To try the code, you may either load the module `Main` into `ghci` and invoke the function `mds`:

``mds (lb, Just ub) [x1, x2, x3, ... ]``

or load the module `Test` and run some QuickCheck properties:

``Test.QuickCheck.quickCheck (prop_mds_correct bb (lb, Just ub))``

where `lb` and `ub` are the lower and upper bounds, and `bb` is the bound on breadths of generated test elements. The property `prop_mds_correct` asserts that `mds (lb,ub) x =d mds_spec (lb,ub) x` for all `x`.

The gzipped file consists of the following Haskell modules:

• `Main`: containing the main program `mds`, our variant of zygomorphism `zh`, `wp2`, `smsp2`, `maxChop`, `trim`, etc.
• `Spec`: containing a specification of the maximally dense segment problem:
``````mds_spec :: BreadthSpec -> [Elem] -> Maybe [Elem]
mds_spec bs = maxlistM d . filter (bounds bs) . nonEmptySegs``````

Many types having `area`, `breadth`, and `density` defined are collected into a type class `Block`, and functions like `maxChop` are define on the type class.

• `DRSP`: specification of right-skew segments and DRSP, with functions including `rightSkew`, `sdars`, `lrsp`, and `drsp`.
• `DPTrees`: defining `DTree`s and `PTrees`, and functions like `addD` and `prependD` allowing one to construct DRSPs in a fold.
• `Utilities`: some utility functions processing lists.
• `Test`: a module defining some QuickCheck properties to test the code.

### Linear Time Implementation

A linear time implementation can be downloaded here. The program uses Data.Sequence to represent the compulsory part and the first level of the `DForest` and the `PForest` of the window, as well as annotating them with areas and breadths. The subtrees of a `DTree` and a `PTree`, however, can be represented simply by snoc-lists and cons-lists respectively.

Organisation of the code is the same as the first program.

### Proofs

Proofs accompanying the paper [PDF]. Theorems and lemmas are labelled with both their own numbers as well as the numbers in the paper, if any. For example, Lemma A.1 (3.1) is Lemma 3.1 in the paper.

# A Survey of Binary Search

If you think you know everything you need to know about binary search, but have not read Netty van Gasteren and Wim Feijen’s note The Binary Search Revisited, you should.

In the FLOLAC summer school this year we plan to teach the students some basics of Hoare logic and Dijkstra’s weakest precondition calculus — a topic surprisingly little known among Taiwanese students. Binary search seems to be a topic I should cover. A precise specification will be given later, but basically the requirement is like this: given a sorted array of `N` numbers and a value to search for, either locate the position where the value resides in the array, or report that the value does not present in the array, in `O(log N)` time.

Given that everybody should have learned about binary search in their algorithm class, you would not expect it to be a hard programming task. Yet in his popular book Programming Pearls, Jon Bentley noted that surprisingly few professions programmers managed to implement the algorithm without bugs at their first attempt. I tried it myself and, being very careful (and having learned something about program construction already), I produced a program which I believe to be correct, but rather inelegant. I tried it on my students and they did produce code with some typical bugs. So it seems to be a good example for a class about correct program construction.

### The Van Gasteren-Feijen Approach

The surprising fact that van Gasteren and Feijen pointed out was that binary search does not apply only to sorted lists! In fact, the usual practice comparing binary search to searching for a word in a dictionary is, according to van Gasteren and Feijen, a major educational blunder.

Van Gasteren and Feijen considered solving a more general problem: let `M` and `N` be integral numbers with `M < N`, and let `Φ` be a relation such that `M Φ N` (where `Φ` is written infix), with some additional constraints to be given later. The task is to find `l` such that

``M ≤ l < N   ∧   l Φ (l+1)``

This is the program:

``````  { M < N ∧ M Φ N }
l, r := M, N
{ Inv: M ≤ l < r ≤ N  ∧  l Φ r,   Bound: r - l }
; do l+1 ≠ r →
{ l + 2 ≤ r }
m := (l + r) / 2
; if m Φ r → l := m
[] l Φ m → r := m
fi
od
{ M ≤ l < N  ∧  l Φ (l+1) }
``````

Notice first that the loop guard `l+1 ≠ r`, if satisfied, guarantees that `l` and `r` are not adjacent numbers, therefore assigning `m := (l + r) / 2` establishes `l < m < r`, and thus the bound `r - l` is guaranteed to decrease. The `if` statement clearly maintains the invariant, if at least one of the guards are always satisfied:

`` l Φ r  ∧  l < m < r   ⇒   l Φ m  ∨  m Φ r(*)``

For `Φ` satisfying the condition above, at the end of the loop we will find some `l` such that `l Φ (l+1)`.

What relations satisfy `(*)`? Examples given by van Gasteren and Feijen include:

• `i Φ j = a[i] ≠ a[j]` for some array `a`. Van Gasteren and Feijen suggested using this as the example when introducing binary search.
• `i Φ j = a[i] < a[j]`,
• `i Φ j = a[i] × a[j] ≤ 0`,
• `i Φ j = a[i] ∨ a[j]`, etc.

### Searching for a Key in a Sorted Array

To search for a key `K` in an ascending-sorted array `a`, it seems that we could just pick this `Φ`:

``i Φ j = a[i] ≤ K < a[j]``

and check whether `a[i] = K` after the loop. There is only one problem, however -- we are not sure we can establish the precondition `a[l] ≤ K < a[r]`!

Van Gasteren and Feijen's solution is to add to imaginary elements to the two ends of the array. That is, for a possibly empty array `a[0..N)`, we imagine two elements `a[-1]` such that `a[-1] ≤ x` and `a[N]` such that `x < a[N]` for all `x`. I believe this is equivalent to using this `Φ`:

``i Φ j  =  (i = -1  ∨  a[i] ≤ K)  ∧  (K < a[j]  ∨  j = N)``

which still satisfies `(*)` if `a` is sorted. And here is the program

``````  { 0 ≤ N ∧ -1 Φ N }
l, r := -1, N
{ Inv: -1 ≤ l < r ≤ N  ∧  l Φ r,   Bound: r - l }
; do l+1 ≠ r →
{ l + 2 ≤ r }
m := (l + r) / 2
; if a[m] ≤ K → l := m
[] K < a[m] → r := m
fi
od
{ -1 ≤ l < N  ∧  l Φ (l+1) }
; if l > -1 → found := a[l] = K
[] l = -1 → found := false
fi
``````

Do not worry about the idea "adding" elements to `a`. The invariant implies that `-1 < m < N`, thus `a[-1]` and `a[N]` are never accessed, and the array `a` needs not be actually altered. They are just there to justify the correctness of the program. It also enables us to handle possibly empty arrays, while the loop body seems to be designed for the case when the range `[l..r)` is non-empty.

### Bentley's Program

Bentley's program for binary search in Programming Pearls can be rephrased as below:
``` ```

``````  l, r := 0, N-1
; do l ≤ r →
m := (l + r) / 2
; if a[m] < K → l := m + 1
[] a[m] = K → found := true; break
[] K < a[m] → r := m - 1
fi
od
; found := false
``````

I would like to be able to derive this program in class, since this appears to be the more popular version. Apart from the presence of `break`, which I do not yet know of a easy variation of Hoare logic that helps to derive it, to relate the test `a[m] < K` to `l := m + 1` I will have to bring in the fact that `a` is sorted in an earlier stage of the development. Thus it is harder to put it in a more general picture.

For several reasons I used to believe that Bentley's program could be preferred, for example, it seems to shrink the range more effectively, assigning `l` and `r` to `m + 1` and `m - 1`, rather than `m`. On a second thought I realised that it might not be true. Variable `l` can be assigned `m + 1` because the possibility of `a[m] = K` is covered in another case with an early exit, and `r` is assigned `m - 1` because this algorithm represents an array segment with an inclusive right bound, as opposed to the previous algorithm.

The two algorithms do not solve exactly the same problem. With multiple occurrences of `K` in the array, Bentley's algorithm is non-deterministic about which index it returns, while the van Gasteren-Feijen algorithm, enforced by the specification, always returns the largest index. When `K` does not appear in the array, van Gasteren and Feijen's program could be more efficient because it needs only one comparison in the loop, rather than two as in Bentley's case (I am assuming that the last comparison is a catch-all case and need not be implemented). What if `K` does present in the array? An analysis by Timothy J. Rolfe concluded that a single-comparison approach is still preferable in average -- benefit of the early exit does not outweigh the cost of the extra comparison in the loop.

### On Computing the Middle Index

There are some other interesting stories regarding the assignment `m := (l + r) / 2`. Joshua Bloch from Google noted that for large arrays, adding `l` and `r` may cause an overflow, and Bloch was not picking on Bentley -- the bug was reported by Sun. Bloch suggests one of the following:

``````int m = l + ((r - l) / 2);                          /* for Java/C/C++ */
int m = (l + r) >>> 1;                              /* for Java */
m = ((unsigned int)l + (unsigned int)r)) >> 1;      /* for C/C++ */
``````

Since the publication of the blog post, there have been numerous discussions on whether it should be considered a bug in the binary search algorithm or the integer datatype, and some more machine dependent issues like whether one may have an array so large that cannot be indexed by an `int`, etc.

To be more language-independent, on the other hand, Roland Backhouse in his book Program Construction: Calculating Implementations from Specifications suggested using `(l + r - 1) / 2`, such that the value of `m` will be `⌊(l + r)/2⌋`, regardless of whether the integral division operator actually rounds the number up or down.

### Exercise?

Among the exercises suggested by van Gasteren and Feijen, this one caught my eye: let array `a[0,N)`, with `0 < N`, be the concatenation of a strictly increasing and a strictly decreasing array. Use binary search to find the maximum element. (For this problem I think it is reasonable to assume that the two sub-arrays could be empty, while `a` is non-empty.) This happens to a sub-routine I needed for an `O(N log N)` algorithm for the maximum segment density problem (there are linear-time algorithms for this problem, though), and I do remember I started off treating it as an unimportant sub-routine but had a hard time getting it right. I am glad that now I know more about it.

# A “Side-Swapping” Lemma Regarding Minimum, Using Enriched Indirect Equality

Yu-Han Lyu and I were studying some paper from the algorithm community, and we noticed a peculiar kind of argument. For a much simplified version, let `X` and `D` be two relations of type `A → B`, denoting two alternative approaches to non-deterministically compute possible solution candidates to a problem. Also let `≤` be a transitive relation on `B`, and `≥` its converse. The relation `min ≤ : {B} → B`, given a set, returns one of its elements that is no larger (under `≤`) than any elements in the set, if such a minimum exists.
We would like find solution as small as possible under `≤`.

When arguing for the correctness of its algorithm, the paper we are studying claims that the method `X` is no worse than `D` in the following sense: if every solution returned by `D` is no better than some solution returned by `X`, which we translate to:

``D ⊆ ≥ . X``

then the best (smallest) solution by `X` must be no worse than (one of the) best solutions returned by `D`:

``min ≤ . ΛX ⊆ ≤ . min ≤ . ΛD``

where `Λ` converts a relation `A → B` to a function `A → {B}` by collecting its results to a set. Note that, awkwardly, `X` and `D` are swapped to different sides of relational inclusion.

“What? How could this be true?” was my first reaction. I bombarded Yu-Han with lots of emails, making sure that we didn’t misinterpret the paper. An informal way to see it is that since every result of `D` is outperformed by something returned by `X`, collectively, the best result among the latter must is “lower-bounded” by the optimal result of `D`. But this sounds unconvincing to me. Something is missing.

### Totality and Well-Boundedness

It turns out that the reasoning can be correct, but we need some more constraints on `D` and `≤`. Firstly, `D` must yield some result whenever `X` does. Otherwise it could be that `D ⊆ ≥ . X` is true but `ΛD` returns an empty set, while `ΛX` still returns something. This is bad because `X` is no more a safe alternative of `D` — it could sometimes do too much. One way to prevent it from happening so is to demand that `ΛD = dom ∈ . ΛD`, where `∈` is the membership relation, and `dom ∈`, the domain of `∈`, consists only of non-empty sets. It will be proved later that this is equivalent to demanding that `D` be total.

Secondly, we need to be sure that every non-empty set has a minimum, or `min ≤` always yields something for non-empty sets. Therefore `min ≤ . ΛD` would not fall back to the empty relation. Formally, it can be expressed as `dom ∈ = dom (min ≤)`. Bird and de Moor called this property well-boundedness of `≤`.

Recall that `min ≤ = ∈ ∩ ≤/∋`. The part `∈` guarantees that `min ≤` returns something that is in the given set, while `≤/∋` guarantees that the returned value is a lower-bound of the given set. Since `ΛD` (as well as `ΛX`) is a function, we also have `min ≤ . ΛD = D ∩ ≤/D°`, following from the laws of division.

Later we will prove an auxiliary lemma stating that if `≤` is well-bounded, we have:

``≤/∋ . dom ∈   ⊆   ≤ . min ≤ . dom ∈``

The right-hand side, given a non-empty list, takes its minimum and returns something possibly smaller. The left-hand side merely returns some lower-bound of the given set. It sounds weaker because it does not demand that the set has a minimum. Nevertheless, the inclusion holds if `≤` is well-bounded.

An algebraic proof of the auxiliary lemma was given by Akimasa Morihata. The proof, to be discussed later, is quite interesting to me because it makes an unusual use of indirect equality. With the lemma, proof of the main result becomes rather routine:

``````  min ≤ . ΛX   ⊆   ≤ . min ≤ . ΛD
≣   { since ΛD = dom ∈ . ΛD }
min ≤ . ΛX   ⊆   ≤ . min ≤ . dom ∈ . ΛD
⇐  { ≤/∋ . dom ∈ ⊆ ≤ . min ≤ . dom ∈, see below }
min ≤ . ΛX   ⊆   ≤/∋ . dom ∈ . ΛD
≣   { since ΛD = dom ∈ .  ΛD }
min ≤ . ΛX   ⊆   ≤/∋ . ΛD
≣   { since ΛD is a function, R/S . f = R/(f° . S) }
min ≤ . ΛX   ⊆   ≤/D°
≣   { Galois connection }
min ≤ . ΛX . D°   ⊆   ≤
⇐   { min ≤ . ΛX ⊆ ≤/X° }
≤/X°. D°   ⊆   ≤
⇐   { since  D ⊆ ≥ . X }
≤/X°. X° . ≤   ⊆   ≤
⇐   { division }
≤ . ≤   ⊆   ≤
≣   { ≤ transitive }
true
``````

### Proof Using Enriched Indirect Equality

Now we have got to prove that `≤/∋ . dom ∈ ⊆ ≤ . min ≤ . dom ∈` provided that `≤` is well-bounded. To prove this lemma I had to resort to first-order logic. I passed the problem to Akimasa Morihata and he quickly came up with a proof. We start with some preparation:

``````  ≤/∋ . dom ∈ ⊆ ≤ . min ≤ . dom ∈
⇐   { since min ≤ ⊆ ∈ }
≤/(min ≤)° . dom ∈ ⊆ ≤ . min ≤ . dom ∈
``````

And then we use proof by indirect (in)equality. The proof, however, is unusual in two ways. Firstly, we need the enriched indirect equality proposed by Dijkstra in
EWD 1315: Indirect equality enriched (and a proof by Netty). Typically, proof by indirect equality exploits the property:

``x = y    ≡   (∀u. u ⊆ x ≡ u ⊆ y)``

and also:

``x ⊆ y   ≡   (∀u. u ⊆ x ⇒ u ⊆ y)``

When we know that both `x` and `y` satisfy some predicate `P`, enriched indirect equality allows us to prove `x = y` (or `x ⊆ y`) by proving a weaker premise:

``x = y   ≡   (∀u. P u ⇒ u ⊆ x ≡ u ⊆ y)``

Note that both `≤/(min ≤)° . dom ∈` and `≤ . min ≤ . dom ∈` satisfy `X = X . dom ∈`. Later we will try to prove:

``````X ⊆ ≤/(min ≤)° . dom ∈    ⇒    X ⊆ ≤ . min ≤ . dom ∈
``````

for `X` such that `X = X . dom ∈`.

The second unusual aspect is that rather than starting from one of `X ⊆ ≤/(min ≤)° . dom ∈ ` or `X ⊆ ≤ . min ≤ . dom ∈` and ending at another, Morihata’s proof took the goal as a whole and used rules like `(P ⇒ Q) ⇒ (P ⇒ P ∧ Q)`. The proof goes:

``````  (X ⊆ ≤/(min ≤)° . dom ∈  ⇒  X ⊆ ≤ . min ≤ . dom ∈)
⇐   { dom ∈  ⊆ id }
(X  ⊆ ≤/(min ≤)°  ⇒  X ⊆ ≤ . min ≤ . dom ∈)
≣    { Galois connection }
(X . (min ≤)° ⊆ ≤  ⇒  X ⊆ ≤ . min ≤ . dom ∈)
⇐   { (P ⇒ Q) ⇒ (P ⇒ P ∧ Q) }
(X . (min ≤)° ⊆ ≤  ⇒  X ⊆ X . (min ≤)° . min ≤ . dom ∈)
⇐   { R ∩ S ⊆ R  }
(X . (min ≤)° ⊆ ≤  ⇒  X ⊆ X . (((min ≤)° . min ≤) ∩ id) . dom ∈)
≣   { dom R = (R° . R) ∩ id }
(X . (min ≤)° ⊆ ≤  ⇒  X ⊆ X . dom (min ≤) . dom ∈)
≣   { ≤ well-bounded: dom ∈ = dom (min ≤) }
(X . (min ≤)° ⊆ ≤  ⇒  X ⊆ X . dom ∈ . dom ∈)
≣   { dom ∈ . dom ∈ = dom ∈ }
(X . (min ≤)° ⊆ ≤  ⇒  X ⊆ X . dom ∈)
≣   { X = X . dom ∈ }
(X . (min ≤)° ⊆ ≤  ⇒  true)
≣ true
``````

### Auxiliary Proofs

Finally, this is a proof that the constraint `ΛD = dom ∈ . ΛD` is equivalent to `D` being total, that is `id ⊆ D° . D`. Recall that `dom ∈ = ((∋ . ∈) ∩ id)`. We simplify `dom ∈ . ΛD` a bit:

``````  dom ∈ . ΛD
= ((∋ . ∈) ∩ id) . ΛD
=   { ΛD a function }
(∋ . ∈ . ΛD) ∩ ΛD
=   { ∈ . ΛD = D }
(∋ . D) ∩ ΛD
``````

We reason:

``````  dom ∈ . ΛD = ΛD
≡   { R ∩ S = S iff S ⊆ R }
ΛD ⊆ ∋ . D
≡   { ΛD function, shunting }
id ⊆ (ΛD)° . ∋ . D
≡ id ⊆ D° . D
``````

which is the definition of totality.