Maximum segment sum is back: deriving algorithms for two segment problems with bounded lengths

S-C. Mu. In Partial Evaluation and Program Manipulation (PEPM ’08), pp 31-39. January 2008. (20/74) [PDF] [GZipped Postscript]

It may be surprising that variations of the maximum segment sum (MSS) problem, a textbook example for the squiggolists, are still active topics for algorithm designers. In this paper we examine the new developments from the view of relational program calculation. It turns out that, while the classical MSS problem is solved by the Greedy Theorem, by applying the Thinning Theorem, we get a linear-time algorithm for MSS with upper bound on length.

To derive a linear-time algorithm for the maximum segment density problem, on the other hand, we purpose a variation of thinning based on an extended notion of monotonicity. The concepts of left-negative and right-screw segments emerge from the search for monotonicity conditions. The efficiency of the resulting algorithms crucially relies on exploiting properties of the set of partial solutions and design efficient data structures for them.

Maximum Segment Sum, Agda Approved

To practise using the Logic.ChainReasoning module in Agda, Josh proved the foldr-fusion theorem, which he learnt in the program derivation lecture in FLOLAC where we used the maximum segment sum (MSS) as the main example. Seeing his proof, I was curious to know how much program derivation I can do in Agda and tried coding the entire derivation of MSS. I thought it would be a small exercise I could do over the weekend, but ended up spending the entire week.

As a reminder, given a list of (possibly negative) numbers, the MSS is about computing the maximum sum among all its consecutive segments. Typically, the specification is:

``````
mss = max ○ map⁺ sum ○ segs
``````

where `segs = concat⁺ ○ map⁺ inits ○ tails` computes all segments of a list.

A dependent pair is defined by:

``````
data _∣_ (A : Set) (P : A -> Set) : Set where
sub : (x : A) -> P x -> A ∣ P
``````

such that `sub x p` is a pair where the type of the second component `p` depends on the value of the first component `x`. The idea is to use a dependent pair to represent a derivation:

``````
mss-der : (x : List Val) -> (Val ∣ \m -> (max ○ map⁺ sum ○ segs) x ≡ m)
mss-der x =
sub ?
(chain> (max ○ map⁺ sum ○ segs) x
=== ?)
``````

It says that `mss-der` is a function taking a list `x` and returns a value of type `Val`, with the constraint that the value returned must be equal to `(max ○ map⁺ sum ○ segs) x`.

My wish was to use the interactive mechanism of the Agda Emacs mode to “derive” the parts marked by `?`, until we come to an implementation:

``````
mss-der : (x : List Val) -> (Val ∣ \m -> (max ○ map⁺ sum ○ segs) x ≡ m)
mss-der x =
sub RESULT
(chain> (max ○ map⁺ sum ○ segs) x
=== ...
=== ...
=== RESULT)
``````

If it works well, we can probably use Agda as a tool for program derivation!

Currently, however, I find it harder to use than expected, perhaps due to my being unfamiliar with the way Agda reports type errors. Nevertheless, Agda does force me to make every details right. For example, the usual definition of `max` I would use in a paper would be:

``````
max = foldr _↑_ -∞
``````

But then I would have to define numbers with lower bound -∞. A sloppy alternative definition:

``````
max = foldr _↑_ 0
``````

led me to prove a base case `0 ↑ max x ≡ max x`, which is not true. That the definition does work in practice relies on the fact that `segs` always returns empty list as one of the possible segment. Alternatively, I could define `max` on non-empty lists only:

``````
max : List⁺ Val -> Val
max = foldr⁺ _↑_ id
``````

where `List⁺ A` is defined by:

``````
data List⁺ (A : Set) : Set where
[_]⁺ : A -> List⁺ A
_::⁺_ : A -> List⁺ A -> List⁺ A
``````

and refine the types of `inits`, `tails`, etc, to return non-empty lists. This is the approach I took.

Eventually, I was able to give a derivation of `mss` this way:

``````
mss-der : (x : List Val) -> (Val ∣ \m -> (max ○ map⁺ sum ○ segs) x ≡ m)
mss-der x =
sub ((max ○ scanr _⊗_ ø) x)
(chain> (max ○ map⁺ sum ○ segs) x
=== (max ○ map⁺ sum ○ concat⁺ ○ map⁺ inits ○ tails) x
by refl
=== (max ○ concat⁺ ○ map⁺ (map⁺ sum) ○ map⁺ inits ○ tails) x
by cong max (sym (concat⁺-map⁺ ((map⁺ inits ○ tails) x)))
=== (max ○ map⁺ max ○ map⁺ (map⁺ sum) ○ map⁺ inits ○ tails) x
by max-concat⁺ ((map⁺ (map⁺ sum) ○ map⁺ inits ○ tails) x)
=== (max ○ map⁺ max ○ map⁺ (map⁺ sum ○ inits) ○ tails) x
by cong (\xs -> max (map⁺ max xs)) (sym (map⁺-compose (tails x)))
=== (max ○ map⁺ (max ○ map⁺ sum ○ inits) ○ tails) x
by cong max (sym (map⁺-compose (tails x)))
=== (max ○ map⁺ (foldr _⊗_ ø) ○ tails) x
by cong max (map⁺-eq max-sum-prefix (tails x))
=== (max ○ scanr _⊗_ ø) x
by cong max (scanr-pf x) )
where _⊕_ : Val -> List⁺ Val -> List⁺ Val
a ⊕ y = ø ::⁺ map⁺ (_+_ a) y

_⊗_ : Val -> Val -> Val
a ⊗ b = ø ↑ (a + b)
``````

where `max-sum-prefix` consists of two fold fusion:

``````
max-sum-prefix : (x : List Val) -> max (map⁺ sum (inits x)) ≡ foldr _⊗_ ø
max-sum-prefix x =
chain> max (map⁺ sum (inits x))
=== max (foldr _⊕_ [ ø ]⁺ x)
by cong max (foldr-fusion (map⁺ sum) lemma1 x)
=== foldr _⊗_ ø x
by foldr-fusion max lemma2 x
where lemma1 : (a : Val) -> (xs : List⁺ (List Val)) ->
map⁺ sum (ini a xs) ≡ (a ⊕ (map⁺ sum xs))
lemma1 a xs = ?
lemma2 : (a : Val) -> (x : List⁺ Val) ->
max (a ⊕ x) ≡ (a ⊗ max x)
lemma2 a x = ?
``````

The two lemmas are given in the files attached below. Having the derivation, `mss` is given by:

``````
mss : List Val -> Val
mss x = depfst (mss-der x)
``````

it is an executable program that is proved to be correct.

The complete Agda program is split into five files:

• MSS.agda: the main program importing all the sub-modules.
• ListProperties.agda: some properties I need about lists, such as fold fusion, `concat ○ map (map f) = map f ○ concat`, etc. Later in the development I realised that I should switch to non-empty lists, so not all of the properties here are used.
• List+.agda: non-empty lists and some of its properties.
• Derivation.agda: the main derivation of MSS. The derivation is parameterised over any numerical data and operators `+` and `↑` such that `+` is associative, and `a + (b ↑ c) = (a ↑ b) + (a ↑ c)`. The reason of this parameterisation, however, was that I did not know how to prove the properties above, until Josh showed me the proof.
• IntRNG.agda: proofs from Josh that Data.Int actually satisfy the properties above. (Not quite complete yet.)

Maximum Segment Sum and Density with Bounded Lengths

It may be surprising that variations of the maximum segment sum (MSS) problem, a textbook example for the squiggolists, are still active topics for algorithm designers. These literate Haskell scripts presents a program solving two recently studied variations:

1. mssu.lhs: an amortised linear-time algorithm computing the maximum sum of segments not longer than an upper-bound;
2. msdlb.lhs: an O(n log L) algorithm computing the maximum density (average) of segments not shorter than a lower-bound;
3. msdll.lhs: computing the maximum density (average) of segments not shorter than a lower-bound. With the discovery of Glodwasser et al. we are able to refine the algorithm to amortised linear time again.

2007/06/26 Update: fixed binary search.
2007/11/04 Update: linear time algorithm for MSDL.