Tag Archives: Segment Problems

Maximum Segment Sum, Agda Approved

To practise using the Logic.ChainReasoning module in Agda, Josh proved the foldr-fusion theorem, which he learnt in the program derivation lecture in FLOLAC where we used the maximum segment sum (MSS) as the main example. Seeing his proof, I was curious to know how much program derivation I can do in Agda and tried coding the entire derivation of MSS. I thought it would be a small exercise I could do over the weekend, but ended up spending the entire week.

As a reminder, given a list of (possibly negative) numbers, the MSS is about computing the maximum sum among all its consecutive segments. Typically, the specification is:

  mss = max ○ map⁺ sum ○ segs

where segs = concat⁺ ○ map⁺ inits ○ tails computes all segments of a list.

A dependent pair is defined by:

  data _∣_ (A : Set) (P : A -> Set) : Set where
    sub : (x : A) -> P x -> A ∣ P

such that sub x p is a pair where the type of the second component p depends on the value of the first component x. The idea is to use a dependent pair to represent a derivation:

mss-der : (x : List Val) -> (Val ∣ \m -> (max ○ map⁺ sum ○ segs) x ≡ m)
mss-der x = 
   sub ?
       (chain> (max ○ map⁺ sum ○ segs) x 
           === ?)

It says that mss-der is a function taking a list x and returns a value of type Val, with the constraint that the value returned must be equal to (max ○ map⁺ sum ○ segs) x.

My wish was to use the interactive mechanism of the Agda Emacs mode to “derive” the parts marked by ?, until we come to an implementation:

mss-der : (x : List Val) -> (Val ∣ \m -> (max ○ map⁺ sum ○ segs) x ≡ m)
mss-der x = 
   sub RESULT
       (chain> (max ○ map⁺ sum ○ segs) x 
           === ...
           === ...
           === RESULT)

If it works well, we can probably use Agda as a tool for program derivation!

Currently, however, I find it harder to use than expected, perhaps due to my being unfamiliar with the way Agda reports type errors. Nevertheless, Agda does force me to make every details right. For example, the usual definition of max I would use in a paper would be:

   max = foldr _↑_ -∞

But then I would have to define numbers with lower bound -∞. A sloppy alternative definition:

   max = foldr _↑_ 0

led me to prove a base case 0 ↑ max x ≡ max x, which is not true. That the definition does work in practice relies on the fact that segs always returns empty list as one of the possible segment. Alternatively, I could define max on non-empty lists only:

  max : List⁺ Val -> Val
  max = foldr⁺ _↑_ id

where List⁺ A is defined by:

  data List⁺ (A : Set) : Set where
    [_]⁺ : A -> List⁺ A
    _::⁺_ : A -> List⁺ A -> List⁺ A

and refine the types of inits, tails, etc, to return non-empty lists. This is the approach I took.

Eventually, I was able to give a derivation of mss this way:

mss-der : (x : List Val) -> (Val ∣ \m -> (max ○ map⁺ sum ○ segs) x ≡ m)
mss-der x = 
   sub ((max ○ scanr _⊗_ ø) x)
       (chain> (max ○ map⁺ sum ○ segs) x 
           === (max ○ map⁺ sum ○ concat⁺ ○ map⁺ inits ○ tails) x 
                   by refl
           === (max ○ concat⁺ ○ map⁺ (map⁺ sum) ○ map⁺ inits ○ tails) x 
                   by cong max (sym (concat⁺-map⁺ ((map⁺ inits ○ tails) x)))
           === (max ○ map⁺ max ○ map⁺ (map⁺ sum) ○ map⁺ inits ○ tails) x 
                   by max-concat⁺ ((map⁺ (map⁺ sum) ○ map⁺ inits ○ tails) x)
           === (max ○ map⁺ max ○ map⁺ (map⁺ sum ○ inits) ○ tails) x 
                   by cong (\xs -> max (map⁺ max xs)) (sym (map⁺-compose (tails x)))
           === (max ○ map⁺ (max ○ map⁺ sum ○ inits) ○ tails) x
                   by cong max (sym (map⁺-compose (tails x)))
           === (max ○ map⁺ (foldr _⊗_ ø) ○ tails) x
                   by cong max (map⁺-eq max-sum-prefix (tails x)) 
           === (max ○ scanr _⊗_ ø) x  
                   by cong max (scanr-pf x) )
  where _⊕_ : Val -> List⁺ Val -> List⁺ Val 
        a ⊕ y = ø ::⁺ map⁺ (_+_ a) y

        _⊗_ : Val -> Val -> Val
        a ⊗ b = ø ↑ (a + b)

where max-sum-prefix consists of two fold fusion:

max-sum-prefix : (x : List Val) -> max (map⁺ sum (inits x)) ≡ foldr _⊗_ ø 
max-sum-prefix x =
 chain> max (map⁺ sum (inits x))
        === max (foldr _⊕_ [ ø ]⁺ x)
               by cong max (foldr-fusion (map⁺ sum) lemma1 x)
        === foldr _⊗_ ø x
               by foldr-fusion max lemma2 x
 where lemma1 : (a : Val) -> (xs : List⁺ (List Val)) -> 
                          map⁺ sum (ini a xs) ≡ (a ⊕ (map⁺ sum xs))
       lemma1 a xs = ?
       lemma2 : (a : Val) -> (x : List⁺ Val) ->
                               max (a ⊕ x) ≡ (a ⊗ max x)
       lemma2 a x = ?

The two lemmas are given in the files attached below. Having the derivation, mss is given by:

mss : List Val -> Val
mss x = depfst (mss-der x)

it is an executable program that is proved to be correct.

The complete Agda program is split into five files:

  • MSS.agda: the main program importing all the sub-modules.
  • ListProperties.agda: some properties I need about lists, such as fold fusion, concat ○ map (map f) = map f ○ concat, etc. Later in the development I realised that I should switch to non-empty lists, so not all of the properties here are used.
  • List+.agda: non-empty lists and some of its properties.
  • Derivation.agda: the main derivation of MSS. The derivation is parameterised over any numerical data and operators + and such that + is associative, and a + (b ↑ c) = (a ↑ b) + (a ↑ c). The reason of this parameterisation, however, was that I did not know how to prove the properties above, until Josh showed me the proof.
  • IntRNG.agda: proofs from Josh that Data.Int actually satisfy the properties above. (Not quite complete yet.)

Maximum Segment Sum and Density with Bounded Lengths

It may be surprising that variations of the maximum segment sum (MSS) problem, a textbook example for the squiggolists, are still active topics for algorithm designers. These literate Haskell scripts presents a program solving two recently studied variations:

  1. mssu.lhs: an amortised linear-time algorithm computing the maximum sum of segments not longer than an upper-bound;
  2. msdlb.lhs: an O(n log L) algorithm computing the maximum density (average) of segments not shorter than a lower-bound;
  3. msdll.lhs: computing the maximum density (average) of segments not shorter than a lower-bound. With the discovery of Glodwasser et al. we are able to refine the algorithm to amortised linear time again.

2007/06/26 Update: fixed binary search.
2007/11/04 Update: linear time algorithm for MSDL.