Thursday
13th August
2020

Quantitative Typing with Non-idempotent Intersection Types

This post contains a development of an Agda proof that, for the Call-by-Name (CBN) λ-calculus, the number of steps needed to reduce a term to weak head normal form (WHNF) in the Krivine Abstract Machine (KAM) is equal to the size of the term's typing derivation in a non-idempotent intersection type system.

The inverse also holds: if a term reduces to WHNF in the KAM, then it has a typing derivation, the size of which is equal to the number of steps it takes to reduce the term. Therefore, typability characterises termination, quantitatively.

The main result manifests as a “quantitative” subject reduction lemma: not only is typing preserved by each reduction step, but the size of the typing derivation goes down by exactly one in each step.

Non-idempotent intersection types can also be seen as a syntactic presentation of a denotational semantics of untyped λ-calculus in the category Rel of sets and relations for a particular reflexive object. The fact that subject reduction (and expansion) have a quantitative aspectyields a proof of adequacy for this semantics that doesn't rely on logical relations (I've haven't proved this in this file though).

The quantitative nature of non-idempotent intersection types was originally observed for head normalisation by Daniel de Carvalho in Execution Time of lambda-Terms via Denotational Semantics and Intersection Types.

module non-idempotent-intersection-types where

Imports

First we need some things from the Agda standard library:

open import Data.Nat using (; zero; suc; _+_)
open import Data.Nat.Properties using (+-assoc; +-comm; +-identityʳ; suc-injective)
open import Data.Fin using (Fin; zero; suc; _≟_)
open import Data.Product using (Σ-syntax; _×_; _,_; proj₁; proj₂)
open import Data.List using (List; []; _∷_; _++_; [_])
import Relation.Binary.PropositionalEquality as Eq
open Eq using (_≡_; refl; trans; cong; subst; sym; _≢_; cong₂)
open Eq.≡-Reasoning using (begin_; _≡⟨⟩_; step-≡; _∎)

Untyped λ-calculus

The syntax of the programming language we will look at is the same as the usual untyped λ-calculus. I've opted for an intrinsically well-scoped de Bruijn representation:

data term :   Set where
  `_  :  {n}  Fin n  term n
  ƛ   :  {n}  term (suc n)  term n
  _·_ :  {n}  term n  term n  term n

infixl 20 _·_
infix 40 `_

Krivine Abstract Machine

The standard way to give an operational semantics to the untyped λ-calculus is to give a small step reduction relation where the main rule is β-reduction:

  (ƛ t) · t' ---> t [ t' ]

However, our goal here is to measure the complexity of programs via typing and counting the number of β reductions is not a very convincing way of doing this. Substitution does a non-constant amount of work, depending on how big t is and how often the 0th variable is used in it.

Instead, we want to use a semantics that performs a constant amount of work at each step. One way to do this is to use an abstract machine to interpret programs. Abstract machines break the complex process of β-reduction down into smaller steps, so they are more convincing ways to measure complexity.

The abstract machine we shall use here is the Krivine Abstract Machine (KAM), which implements reduction of λ terms to weak head normal form.

Rather than using substitution, the KAM uses closures: pairs of a term and an environment that assigns a closure to each variable in the term. As this description suggests, closures and environments are mutually defined:

mutual
  data env :   Set where
    nil  : env zero
    _,-_ :  {n}  env n  clo  env (suc n)

  clo = Σ[ n   ] env n × term n

Note that environments are built by extending to the right.

The second ingredient for the KAM are stacks of closures. Stacks are used to remember the context within a program we are currently executing. Compare to the congruence rule used in the definition of CBN small step operational semantics:

       M  -->  M'
 ---------------------
  M · N  -->  M' · N

In the rule, the context is the N term, and in general we might have to go under several Ns to get to the part we are currently executing. The KAM uses a stack of closures to represent this:

data stk : Set where
  emp  : stk
  _-,_ : clo  stk  stk

infixr 10 _-,_

Stacks are built by extending to the left.

A configuration of the KAM is a pair of a closure, the current focus of the computation, and a stack representing the evaluation context. We use the special notation ⟨_⋆_⟩ for configurations.

record configuration : Set where
  constructor ⟨_⋆_⟩
  field
    closure : clo
    stack   : stk

infix  20 ⟨_⋆_⟩

The reduction semantics of the KAM is given by the following four rules, which are driven by the shape of the term in the first component of the configuration:

data _⇒_ : configuration  configuration  Set where
  grab :  {n η c π}     suc n , η ,- c , ` zero     π         c                       π 
  skip :  {n η c i π}   suc n , η ,- c , ` (suc i)  π         n ,     η        , ` i  π 
  push :  {n η t s π}       n , η      , t · s      π         n ,     η        , t    (n , η , s) -, π 
  pop  :  {n η t c π}       n , η      , ƛ t        c -, π    suc n , (η ,- c) , t    π 

infixr 10 _⇒_

The rules grab and skip handle the execution of variables by incrementally looking them up in the current environment. One could also define a KAM that has a single rule for variable lookup (looking arbitrarily deep into the environment in a single step), but we chose to break it down into individual steps here.

The rule push executes an application t · s by pushing the argument s on to the stack (accompanied by its environment). Execution will then carry on in the function position (compare to the congruence rule given above for a small step operational semantics).

Finally, the rule pop executes a λ abstraction ƛ t by popping the current argument from the stack and adding it to the current environment. Combined with the first two rules, this performs a delayed β-reduction.

To start the KAM, we need to take programs that we want to execute and turn them into configurations:

init : term 0  configuration
init t =  0 , nil , t  emp 

Due to the fact that we have defined our KAM using intrinsically scoped syntax, it is not possible for the KAM to get stuck by not having a variable in scope. The only way the machine can get stuck is by ending up with a λ-abstraction paired with an empty stack. In this case we can say that the original term has been reduced to its weak head normal form (WHNF). It also possible that a term never reduces to a final configuration at all -- not all terms finish reducing under the Call-by-Name semantics. To identify which terms reduce, we make an inductive definition that also records how many steps it takes for a term to reduce:

data reduces :   configuration  Set where
  stop :  {n} η t  reduces 0  n , η , ƛ t  emp 
  step :  {n p q}  p  q  reduces n q  reduces (suc n) p

So if we have a value of type reduces n p, then we know that a configuration p reduces to a WHNF in n steps.

Since the selection of rules the KAM is driven by the shape of the term being executed, execution in the KAM is deterministic:

deterministic :  {p q q'}  p  q  p  q'  q  q'
deterministic grab grab = refl
deterministic skip skip = refl
deterministic push push = refl
deterministic pop  pop  = refl

Consequently, any configuration that reduces does so in a unique number of steps:

reduces-det :  {p m n}  reduces m p  reduces n p  m  n
reduces-det (stop η t)   (stop .η .t) = refl
reduces-det (step s1 r1) (step s2 r2) with deterministic s1 s2
... | refl = cong suc (reduces-det r1 r2)

The Non-Idempotent Type Assignment System

Non-idempotent intersection types assign types to terms according to their precise runtime behaviour. Usually type systems assign types that are over approximations of the possible behaviours of a program. Intersection types (whether idempotent or not) assign types that track precisely the behaviour a program will have (in a particular context). Indeed, as we will see below, the system we present here precisely tracks the termination behaviour of programs. This precision does make typechecking undecidable.

Types, Observations, Behaviours

The types of our system are defined as:

data type : Set where
      : type
  _↦_ : List type  type  type

infixr 30 _↦_

We can interpret the types of the system as kinds of “observations” or “behaviour descriptions” that we can make of a program. Per the definition of type, we have two kinds of observation in our system:

  1. The observation means that we can observe that the term will reduce to a value (a λ abstraction), but gives us no further information. To observe that a complete program reduces to WHNF, we will seek typings that type the whole program with the observation .

  2. The observation σ ↦ τ pairs a list of observations σ with a single observation τ. Making this observation means that the term will reduce to a function that when applied to an argument that exhibits all the behaviours in σ, exhibits the observation τ.

For example, the observation [⋆] ↦ ⋆ is one that we can assign to the identity function: we can observe that the output is a value, as long as it can make the observation that the input is a value.

The lists of types σ are the intersection types of our system. They indicate that a particular term must have all of the behaviours listed, so it must be in the intersection of all the possible behaviours: σ = τ₁ ∧ τ₂ ∧ ... ∧ τₙ. Idempotent intersection type systems disregard multiple occurrences of the same behaviour (i.e. τ ∧ τ = τ) in an intersection: they only track whether a behaviour is present or not. Non-idempotent intersection type systems, in contrast, track the number of occurrences of each behaviour. In turn, this tracks the number of times a function uses its arguments. It is this quantitative aspect that will allow us to use non-idempotent intersection types to measure the reduction complexity of programs.

To see how the rejection of idempotency allows tracking of number of uses, consider the following two terms (written using explicit names):

  λf. λx. f x

and

  λf. λx. f (f x)

When applied to the identity function λx .x and any value λz.t, these terms both reduce to a value. We can see this by the types that the system we define below will assign to them. The first one gets the following type:

  ⊢ λf. λx. f x ⦂ [[⋆] ↦ ⋆] ↦ [⋆] ↦ ⋆

In words, this says "if the first argument yields values when given values, and the second argument is a value, then the result is a value". The second term is assigned a similar type, but here we can see the quantitative nature of the system coming in to play:

  ⊢ λf. λx. f (f x) ⦂ [[⋆] ↦ ⋆, [⋆] ↦ ⋆] ↦ [⋆] ↦ ⋆

Notice that the type now states that the first argument will be used twice, because two behaviours appear in the first intersection. This precisely tracks the structure of the term, which does indeed use f twice. If we assumed idempotency, then the two types would be equivalent, and we would be tracking the qualitative behaviour, but not the quantitative behaviour.

Typing contexts

To assign types to open terms, we need typing contexts. Typing contexts assign to each free variable position a list of possible behaviours. Typing contexts grow to the right.

data ctx :   Set where
  nil : ctx zero
  _,-_ :  {n}  ctx n  List type  ctx (suc n)

We need the following two combinators for typing contexts. The first, empty, is the typing context where every variable is assumed to have no behaviours. This is used to type terms that do not use any of the variables in scope.

empty : ∀{n}  ctx n
empty {zero}  = nil
empty {suc n} = empty {n} ,- []

The _+++_ combinator “zips” two typing contexts to combine their possible behaviours. We will use this to create contexts that combine the observations on contexts required by two the two subterms in an application term.

_+++_ : ∀{n}  ctx n  ctx n  ctx n
nil        +++ nil       = nil
(Γ₁ ,- σ₁) +++ (Γ₂ ,- σ₂) = (Γ₁ +++ Γ₂) ,- (σ₁ ++ σ₂)

Non-idempotent intersection type systems are sensitive to the number of occurrences of each behaviour in an intersection, but not to their order. We will model this by requiring equivalence only up to permutation at various places. We define a permutation between two lists via sequences of swappings:

module permutations {X : Set} where
  data _⋈_ : List X -> List X -> Set where
    nil     : []  []
    skip    :  {x l₁ l₂}  -> l₁  l₂ -> (x  l₁)  (x  l₂)
    swap    :  {x y l}    ->           (x  y  l)  (y  x  l)
    ⋈-trans :  {l₁ l₂ l₃} -> l₁  l₂ -> l₂  l₃ -> l₁  l₃

  ⋈-refl :  {l} -> l  l
  ⋈-refl {[]}    = nil
  ⋈-refl {x  l} = skip ⋈-refl
open permutations

We use permutations to define when two typing contexts are pointwise permutations of each other. I.e., when two contexts have the same number of variables, but the observations at each variable are permuted.

data _⋈ctx_ :  {n}  ctx n  ctx n  Set where
  nil : nil ⋈ctx nil
  _,-_ :  {n}{Γ₁ Γ₂ : ctx n}{σ₁ σ₂}  Γ₁ ⋈ctx Γ₂  σ₁  σ₂  (Γ₁ ,- σ₁) ⋈ctx (Γ₂ ,- σ₂)

⋈ctx-refl :  {n}{Γ : ctx n}  Γ ⋈ctx Γ
⋈ctx-refl {Γ = nil}    = nil
⋈ctx-refl {Γ = Γ ,- σ} = ⋈ctx-refl ,- ⋈-refl

Type Assignment

We can now define the rules of our system. First, we define what it means for a variable (represented as an Agda value of type Fin n) to be well typed in a context:

data _⊢v_⦂_ :  {n}  ctx n  Fin n  type  Set where
  zero :  {n τ}  (empty {n} ,- [ τ ]) ⊢v zero  τ
  suc  :  {n τ i}{Γ : ctx n}  Γ ⊢v i  τ  (Γ ,- []) ⊢v (suc i)  τ

The key point here is that the context for a variable must require no behaviours for any variable that is not the one being selected, and must require exactly one behaviour for the one being selected. Thus one use a variable corresponds to one observation.

The typing rules for terms are mutually defined with a rule for typing a single term against an intersection type. I will explain the rules after the definition.

mutual
  data _⊢_⦂_ :  {n}  ctx n  term n  type  Set where
    var :  {n τ i}{Γ : ctx n} 
          Γ ⊢v i  τ 
          Γ  ` i  τ
    lam :  {n σ τ t}{Γ : ctx n} 
          (Γ ,- σ)  t  τ 
          Γ  ƛ t  (σ  τ)
    app :  {n σ τ s t}{Γ Γ₁ Γ₂ : ctx n} 
          Γ₁  s  (σ  τ) 
          Γ₂  t ⦂' σ 
          (Γ₁ +++ Γ₂) ⋈ctx Γ 
          Γ  (s · t)  τ
    lam⋆ :  {n t} 
          empty {n}  ƛ t  

  data _⊢_⦂'_ :  {n}  ctx n  term n  List type  Set where
    nil :  {n}{t : term n} 
          empty  t ⦂' []
    _,~_∷_ :  {n t σ τ}{Γ₁ Γ₂ Γ : ctx n} 
          Γ₁  t  τ 
          (Γ₁ +++ Γ₂) ⋈ctx Γ 
          Γ₂  t ⦂' σ 
          Γ  t ⦂' (τ  σ)

The rule var uses the variable typing judgement to assign types to terms that are variables. The rule lam states that a λ abstraction can be assigned the type σ ↦ τ if the body can be assigned the type τ under the assumption that the 0th variable has the intersection type σ. The app rule states that if the term s can be observed to act like a function mapping intersections of behaviours σ to behaviours τ, and the term t exhibits all the behaviours in σ (using the auxiliary judgement _⊢_⦂'_), then the whole application term exhibits the behaviour τ. Note that the two terms are typed in separate contexts that are combined -- so the uses of variables in s and t are tracked separately. The rule lam⋆ is used to assign the observation that λ abstractions are values ( means that the term is a value) to any λ abstraction.

The auxiliary judgement Γ ⊢ t ⦂' σ is used to ensure that a single term t can exhibit all the possible behaviours in σ. Note that each individual behaviour in σ must be observable under a separate typing context. All the typing contexts are summed up (up to permutation), to give the complete requirements on the context.

This completes the definition of the type assignment system. We demonstrate it using the two terms from above. These are defined as follows:

--- λf. λx. f x
example₁ : term 0
example₁ = ƛ (ƛ (` suc zero · ` zero))

-- λf. λx. f (f x)
example₂ : term 0
example₂ = ƛ (ƛ (` suc zero · (` suc zero · ` zero)))

To give typing derivations for these, we will save a bit of typing by using the following derived rule for application when we apply a function that will only use its argument once:

app₁ :  {n Γ₁ Γ₂ τ₁ τ₂}{s t : term n} 
      Γ₁  s  [ τ₁ ]  τ₂ 
      Γ₂  t  τ₁ 
      (Γ₁ +++ (Γ₂ +++ empty))  s · t  τ₂
app₁ s-ok t-ok = app s-ok (t-ok ,~ ⋈ctx-refl  nil) ⋈ctx-refl

With this derived rule to help us, the first example can be assigned the type we gave above like so:

typing₁ : nil  example₁  [ [  ]   ]  [  ]  
typing₁ = lam (lam (app₁ (var (suc zero)) (var zero)))

And the second one like so:

typing₂ : nil  example₂  ([  ]    [  ]    [])  [  ]  
typing₂ = lam (lam (app₁ (var (suc zero)) (app₁ (var (suc zero)) (var zero))))

In the second one we see how the type reveals that the term uses its first argument twice.

Measuring the sizes of derivations

Our main result below is that typing is preserved by execution quantitatively. Therefore, we need to be able to compute the sizes of typing derivations. This is done by the following three functions:

v-size :  {n}{Γ : ctx n}{t τ}  Γ ⊢v t  τ  
v-size zero    = 1
v-size (suc d) = 1 + v-size d

mutual
  size :  {n}{Γ : ctx n}{t τ}  Γ  t  τ  
  size (var x)      = v-size x
  size (lam d)      = 1 + size d
  size (app d₁ d₂ _) = 1 + size d₁ + size' d₂
  size lam⋆         = 0

  size' :  {n}{Γ : ctx n}{t σ}  Γ  t ⦂' σ  
  size' nil           = 0
  size' (d ,~ _  ds) = size d + size' ds

We have been a little selective in what we have counted. We have added 1 for every construct that can cause the KAM to make a step: whether a variable lookup (zero or suc), an application, or a λ abstraction that we expect to have some interesting behaviour. In particular a use of lam⋆ is counted as 0 because it represents values that will not be inspected any further, and so will not cause the KAM to do any work.

The size' function adds up all the derivations used in checking an intersection type. This means that we accurately account for all the possible ways that a single term will be used.

Typing KAM configurations.

We now extend the non-idempotent intersection typing discipline from terms to configurations of the KAM. This is required so that we can state our main lemmas.

Closures are typed by ensuring that the term contained in them is well typed for some context, and the associated environment is contains closures well typed according to that context. Because contexts contain intersection types, we need an auxiliary judgement that checks that a closure is well typed for all types in an intersection. These three judgements (closures, environments, closures with intersection types) need to be mutually defined:

mutual
  data ⊢c_⦂_ : (c : clo) (τ : type)  Set where
    t-clo :  {n t τ}{η : env n}{Γ : ctx n} 
       Γ  t  τ 
       ⊢e η  Γ 
       ⊢c (n , η , t)  τ

  data ⊢e_⦂_ :  {n}  env n  ctx n  Set where
    nil : ⊢e nil  nil
    _,-_ : ∀{n}{η : env n}{Γ : ctx n}{c}{σ} 
           ⊢e η  Γ 
           ⊢c c ⦂' σ 
           ⊢e (η ,- c)  (Γ ,- σ)

  data ⊢c_⦂'_ : (c : clo) (σ : List type)  Set where
    nil :  {c} 
           ⊢c c ⦂' []
    _∷_ :  {c τ σ} 
           ⊢c c  τ 
           ⊢c c ⦂' σ 
           ⊢c c ⦂' (τ  σ)

We also extend the measurement of sizes of derivations to well typed closures and environments. Notice that these derivations do not contribute themselves to the size: we only sum up the sizes of the contained term typing derivations.

mutual
  c-size :  {c τ}  ⊢c c  τ  
  c-size (t-clo t e) = size t + e-size e

  e-size :  {n}{η : env n}{Γ}  ⊢e η  Γ  
  e-size nil = 0
  e-size (d ,- c) = e-size d + c'-size c

  c'-size :  {c σ}  ⊢c c ⦂' σ  
  c'-size nil = 0
  c'-size (c  c') = c-size c + c'-size c'

Stacks are typed with two types τ₁ ⊸ τ₂, meaning that if the observation τ₁ can be made of the computation happening under the stack, then the observation τ₂ can be observed at the "top level":

data ⊢s_⦂_⊸_ : (π : stk) (τ₁ τ₂ : type)  Set where
  emp  :  {τ} 
         ⊢s emp  τ  τ
  _-,_ :  {c π σ τ₁ τ₂} 
         ⊢c c ⦂' σ 
         ⊢s π  τ₁  τ₂ 
         ⊢s (c -, π)  (σ  τ₁)  τ₂

Again, we measure the size of a stack typing derivation by adding up the sizes of the typing derivations contained within:

s-size :  {π τ₁ τ₂}  ⊢s π  τ₁  τ₂  
s-size emp = 0
s-size (c -, π) = c'-size c + s-size π

Well typed KAM configurations are constructed by pairing closures that can generate a behaviour τ₁ with stacks that can turn that behaviour into a behaviour τ at the top level. The size of a configuration is measured by summing up the sizes of the typing derivations included in the closure and the stack.

data ⊢p_⦂_ : configuration  type  Set where
  cfg :  {c π τ₁ τ} 
         ⊢c c  τ₁ 
         ⊢s π  τ₁  τ 
         ⊢p  c  π   τ

cfg-size :  {p τ}  ⊢p p  τ  
cfg-size (cfg c π) = c-size c + s-size π

We can now prove that the initial configuration of the KAM for a given program is well typed if the program is:

init-typed : ∀{t τ}  nil  t  τ  ⊢p init t  τ
init-typed d = cfg (t-clo d nil) emp

The size of this initial typing derivation is also equal to the size of the original typing derivation:

init-typed-size : ∀{t τ}  (d : nil  t  τ)  cfg-size (init-typed d)  size d
init-typed-size{τ = τ} d =
  begin
     size d + e-size nil + s-size{τ₁ = τ} emp
   ≡⟨⟩
     size d + 0 + 0
   ≡⟨ +-identityʳ (size d + 0) 
     size d + 0
   ≡⟨ +-identityʳ (size d) 
     size d
   

Quantitative Subject Reduction

We now connect the KAM semantics to the non-idempotent intersection type system. This will take the form of three key lemmas:

  1. Subject Reduction (a.k.a preservation), which says that if a configuration of the KAM is well typed, and steps to another configuration, then that configuration is also well-typed. The twist here is that this lemma is also quantitative: the size of the typing derivation after one step is exactly one smaller than it was before. This will allow us to prove that all typable terms reduce.

  2. Subject Expansion, the converse of subject reduction. If a configuration is well typed and was the result of a step of the KAM, then the predecessor was also well typed. Subject expansion does not usually hold for type systems, but is characteristic of intersection type systems due to the way they describe what a program must do, instead of what it may do. Subject expansion allows us to prove a completeness result: if a program reduces to WHNF, then we can step it backwards to recover a typing derivation for the initial state.

  3. Progress: if a configuration is well typed, and its derivation has non-zero size, then the KAM can take a step. Thus we can interpret the size of a derivation as the amount of computation that a program has left to perform.

Some lemmas

Before we can prove subject reduction, we will need some auxiliary lemmas about the type system. These mostly say that typing is preserved under permutation, splitting, or joining of intersection types. We also prove that these preservations are size preserving, anticipating the quantitative subject reduction lemma in the next section.

First, we need that if a closure c can be well typed for (i.e., exhibits the behaviour of) all the types in σ₁ then it is well typed for any permutation of σ₁. Also, this does not affect the size of the typing derivation. We prove this by induction on the permutation applied:

clo-permute :  {c σ₁ σ₂}  ⊢c c ⦂' σ₁  σ₂  σ₁  ⊢c c ⦂' σ₂
clo-permute d            nil            = d
clo-permute (x  d)      (skip p)       = x  clo-permute d p
clo-permute (x  x₁  d) swap           = x₁  x  d
clo-permute d            (⋈-trans p p₁) = clo-permute (clo-permute d p₁) p

clo-permute-size :  {c σ₁ σ₂}  (d : ⊢c c ⦂' σ₁)  (p : σ₂  σ₁)  c'-size d  c'-size (clo-permute d p)
clo-permute-size d              nil            = refl
clo-permute-size (c  d)        (skip p)       = cong    c-size c + ) (clo-permute-size d p)
clo-permute-size (x  (x₁  d)) swap           = trans (sym (+-assoc (c-size x) (c-size x₁) (c'-size d))) (trans (cong     + c'-size d) (+-comm (c-size x) (c-size x₁))) (+-assoc (c-size x₁) (c-size x) (c'-size d)))
clo-permute-size d              (⋈-trans p p₁) = trans (clo-permute-size d p₁) (clo-permute-size (clo-permute d p₁) p)

Preservation of closure typing under permutation extends to preservation of environment typing under pointwise permutation of typing contexts, and this is also size preserving:

env-permute :  {n}{η : env n}{Γ₁ Γ₂ : ctx n} 
             ⊢e η  Γ₁ 
             Γ₂ ⋈ctx Γ₁ 
             ⊢e η  Γ₂
env-permute nil      nil       = nil
env-permute (d ,- x) (p ,- x₁) = env-permute d p ,- clo-permute x x₁

env-permute-size :  {n}{η : env n}{Γ₁ Γ₂ : ctx n} 
             (d : ⊢e η  Γ₁)  (p : Γ₂ ⋈ctx Γ₁) 
             e-size d  e-size (env-permute d p)
env-permute-size nil      nil       = refl
env-permute-size (d ,- x) (p ,- x₁) = cong₂ _+_ (env-permute-size d p) (clo-permute-size x x₁)

If a closure c exhibits all the behaviours in the big intersection σ₁ ++ σ₂, then it exhibits them separately. Moreover, the sizes of the two separate typing derivations sums to the original:

clo-split :  {c σ₁ σ₂}  ⊢c c ⦂' (σ₁ ++ σ₂)  ⊢c c ⦂' σ₁ × ⊢c c ⦂' σ₂
clo-split {σ₁ = []}     d        = nil , d
clo-split {σ₁ = x  σ₁} (x₁  d) = let d₁ , d₂ = clo-split d in x₁  d₁ , d₂

clo-split-size :  {c σ₁ σ₂}  (d : ⊢c c ⦂' (σ₁ ++ σ₂))  c'-size d  c'-size (clo-split {σ₁ = σ₁} d .proj₁) + c'-size (clo-split {σ₁ = σ₁} d .proj₂)
clo-split-size {σ₁ = []}     d        = refl
clo-split-size {σ₁ = x  σ₁} (x₁  d) = trans (cong    c-size x₁ + ) (clo-split-size {σ₁ = σ₁} d)) (sym (+-assoc (c-size x₁) _ _))

As a consequence of clo-split, we can prove that if an environment exhibits behaviours in two contexts combined, then it exhibits them separately. We will use this to distribute welltypedness of environments between the two parts of an application term in the push rule of the KAM.

env-split :  {n} {η : env n} {Γ₁ Γ₂}  ⊢e η  (Γ₁ +++ Γ₂)  ⊢e η  Γ₁ × ⊢e η  Γ₂
env-split {n = zero}  {nil}    {nil}      {nil}     nil       = nil , nil
env-split {n = suc n} {η ,- x} {Γ₁ ,- x₁} {Γ₂ ,- x₂} (d ,- x₃) = (env-split d .proj₁ ,- clo-split x₃ .proj₁) , (env-split d .proj₂ ,- clo-split x₃ .proj₂)

The splitting of typing derivations by env-split preserves sizes, but to prove this we first need a lemma about addition:

interchange :  a b c d  (a + b) + (c + d)  (a + c) + (b + d)
interchange a b c d =
  begin
    (a + b) + (c + d)
  ≡⟨ +-assoc a b (c + d) 
    a + (b + (c + d))
  ≡⟨ cong    a + ) (sym (+-assoc b c d)) 
    a + ((b + c) + d)
  ≡⟨ cong    a + ( + d)) (+-comm b c) 
    a + ((c + b) + d)
  ≡⟨ cong    a + ) (+-assoc c b d) 
    a + (c + (b + d))
  ≡⟨ sym (+-assoc a c (b + d)) 
    (a + c) + (b + d)
  

With this lemma, we can prove that splitting an environment typing derivation is size preserving:

env-split-size :  {n} {η : env n} {Γ₁ Γ₂}  (d : ⊢e η  (Γ₁ +++ Γ₂))  e-size d  e-size (env-split d .proj₁) + e-size (env-split d .proj₂)
env-split-size {n = zero}  {nil}    {nil}      {nil}     nil      = refl
env-split-size {n = suc n} {η ,- x} {Γ₁ ,- σ₁} {Γ₂ ,- σ₂} (d ,- c) =
  begin
     e-size d + c'-size c
   ≡⟨ cong₂ _+_ (env-split-size d) (clo-split-size {x}{σ₁} c) 
     (e-size (env-split d .proj₁) + e-size (env-split d .proj₂)) + (c'-size (clo-split {x}{σ₁} c .proj₁) + c'-size (clo-split {x}{σ₁} c .proj₂))
   ≡⟨ interchange (e-size (env-split d .proj₁)) (e-size (env-split d .proj₂)) (c'-size (clo-split {x}{σ₁} c .proj₁)) (c'-size (clo-split {x}{σ₁} c .proj₂)) 
     e-size (env-split d .proj₁) + c'-size (clo-split {x}{σ₁} c .proj₁) + (e-size (env-split d .proj₂) + c'-size (clo-split {x}{σ₁} c .proj₂))
   

We also have nullary version of clo-split and env-split: whenever a closure has is typed to have no behaviours, or an environment consists of closures that have no required behaviours, then the sizes of their derivations is zero:

c'-empty :  {c}  (d : ⊢c c ⦂' [])  c'-size d  0
c'-empty nil = refl

e-empty :  {n}{η : env n}  (d : ⊢e η  empty)  e-size d  0
e-empty nil = refl
e-empty (d ,- c) = begin
                      e-size d + c'-size c
                   ≡⟨ cong₂ _+_ (e-empty d) (c'-empty c) 
                      0
                   

The final piece we need before proving subject reduction is the following derived typing rule for constructing typings of closures against intersections from typings of terms against intersections, assuming an appropriately typed environment. This uses the env-split and env-permute lemmas.

mk-clo :  {n}{η : env n}{Γ : ctx n}{s : term n}{σ} 
         ⊢e η  Γ 
         Γ  s ⦂' σ 
         ⊢c n , η , s ⦂' σ
mk-clo {σ = []} η-ok nil  = nil
mk-clo {σ = τ  σ} η-ok (x ,~ p  s-ok) =
  t-clo x (env-split (env-permute η-ok p) .proj₁)  mk-clo (env-split (env-permute η-ok p) .proj₂) s-ok

This derived rule is again size preserving, using the fact that env-split and env-permute are size preserving:

mk-clo-size :  {n}{η : env n}{Γ : ctx n}{s : term n}{σ} 
              (η-ok : ⊢e η  Γ) 
              (s-ok : Γ  s ⦂' σ) 
              size' s-ok + e-size η-ok  c'-size (mk-clo η-ok s-ok)
mk-clo-size {σ = []} η-ok nil = e-empty η-ok
mk-clo-size {σ = τ  σ} η-ok (x ,~ p  s-ok) =
  begin
     size x + size' s-ok + e-size η-ok
   ≡⟨ cong    size x + size' s-ok + ) (env-permute-size η-ok p) 
     size x + size' s-ok + e-size (env-permute η-ok p)
   ≡⟨ cong    size x + size' s-ok + ) (env-split-size (env-permute η-ok p)) 
     size x + size' s-ok + (e-size (env-split (env-permute η-ok p) .proj₁) + e-size (env-split (env-permute η-ok p) .proj₂))
   ≡⟨ interchange (size x) (size' s-ok) (e-size (env-split (env-permute η-ok p) .proj₁)) (e-size (env-split (env-permute η-ok p) .proj₂)) 
     size x + e-size (env-split (env-permute η-ok p) .proj₁) + (size' s-ok + e-size (env-split (env-permute η-ok p) .proj₂))
   ≡⟨ cong    size x + e-size (env-split (env-permute η-ok p) .proj₁) + ) (mk-clo-size (env-split (env-permute η-ok p) .proj₂) s-ok) 
     size x + e-size (env-split (env-permute η-ok p) .proj₁) + c'-size (mk-clo (env-split (env-permute η-ok p) .proj₂) s-ok)
   

Subject Reduction

Using the lemmas in the previous section, we can now prove the subject reduction property for the non-idempotent intersection type system. The proof consists of taking each possible KAM step in turn, pattern matching on the possible starting configurations' typing derivations, and constructing the corresponding typing derivation for the ending configuration:

subject-reduction :  {p q τ}  p  q  ⊢p p  τ  ⊢p q  τ
subject-reduction grab (cfg (t-clo (var zero)    (_    ,- (c  nil))) π-ok)        = cfg c π-ok
subject-reduction skip (cfg (t-clo (var (suc x)) (η-ok ,- _))         π-ok)        = cfg (t-clo (var x) η-ok) π-ok
subject-reduction push (cfg (t-clo (app t s p)   η-ok)                π-ok)        =
  cfg (t-clo t (env-split (env-permute η-ok p) .proj₁)) (mk-clo (env-split (env-permute η-ok p) .proj₂) s -, π-ok)
subject-reduction pop  (cfg (t-clo (lam t)       η-ok)                (c -, π-ok)) = cfg (t-clo t (η-ok ,- c)) π-ok

In most of the cases, the final typing derivation is constructed from parts of the initial one. In the application case, we need to use env-split to distribute the typing derivation of the environment (η-ok) between the closures for the function and its argument.

We now prove that subject reduction is quantitative: the typing derivation generated by subject reduction is always of size one smaller than the initial one. This is a consequence of the fact that most of the operations we perform on typing derivations in he subject reduction proof are size preserving, except for the fact that we remove the top level term former, which accounts for the decrementing by one.

sr-size :  {p q τ} 
          (step : p  q) 
          (d : ⊢p p  τ) 
          cfg-size d  suc (cfg-size (subject-reduction step d))
sr-size grab (cfg (t-clo (var zero)    (η-emp ,- (c  nil))) π-ok)        =
  begin
     suc (e-size η-emp + (c-size c + 0) + s-size π-ok)
   ≡⟨ cong    suc ( + (c-size c + 0) + s-size π-ok)) (e-empty η-emp) 
     suc (0 + (c-size c + 0) + s-size π-ok)
   ≡⟨⟩
     suc ((c-size c + 0) + s-size π-ok)
   ≡⟨ cong    suc ( + s-size π-ok)) (+-identityʳ (c-size c)) 
     suc (c-size c + s-size π-ok)
   
sr-size skip (cfg (t-clo (var (suc x)) (η-ok  ,- c'))        π-ok)        =
  begin
     suc (v-size x + (e-size η-ok + c'-size c') + s-size π-ok)
   ≡⟨ cong    suc (v-size x + (e-size η-ok + ) + s-size π-ok)) (c'-empty c') 
     suc (v-size x + (e-size η-ok + 0) + s-size π-ok)
   ≡⟨ cong    suc (v-size x +  + s-size π-ok)) (+-identityʳ (e-size η-ok)) 
     suc (v-size x + e-size η-ok + s-size π-ok)
   
sr-size push (cfg (t-clo (app t s p)   η-ok)                 π-ok)        =
  begin
     suc (size t + size' s + e-size η-ok + s-size π-ok)
  ≡⟨ cong    suc (size t + size' s +  + s-size π-ok)) (env-permute-size η-ok p) 
     suc (size t + size' s + e-size (env-permute η-ok p) + s-size π-ok)
  ≡⟨ cong    suc (size t + size' s +  + s-size π-ok)) (env-split-size (env-permute η-ok p)) 
     suc (size t + size' s + (e-size (env-split (env-permute η-ok p) .proj₁) + e-size (env-split (env-permute η-ok p) .proj₂)) + s-size π-ok)
  ≡⟨ cong    suc ( + s-size π-ok)) (interchange (size t) (size' s) (e-size (env-split (env-permute η-ok p) .proj₁)) (e-size (env-split (env-permute η-ok p) .proj₂))) 
     suc (size t + e-size (env-split (env-permute η-ok p) .proj₁) + (size' s + e-size (env-split (env-permute η-ok p) .proj₂)) + s-size π-ok)
  ≡⟨ cong    suc (size t + e-size (env-split (env-permute η-ok p) .proj₁) +  + s-size π-ok)) (mk-clo-size (env-split (env-permute η-ok p) .proj₂) s) 
     suc (size t + e-size (env-split (env-permute η-ok p) .proj₁) + c'-size (mk-clo (env-split (env-permute η-ok p) .proj₂) s) + s-size π-ok)
  ≡⟨ cong    suc ) (+-assoc (size t + e-size (env-split (env-permute η-ok p) .proj₁)) (c'-size (mk-clo (env-split (env-permute η-ok p) .proj₂) s)) (s-size π-ok)) 
     suc (size t + e-size (env-split (env-permute η-ok p) .proj₁) + (c'-size (mk-clo (env-split (env-permute η-ok p) .proj₂) s) + s-size π-ok))
  
sr-size pop  (cfg (t-clo (lam t)       η-ok)                 (c -, π-ok)) =
  begin
     suc (size t + e-size η-ok + (c'-size c + s-size π-ok))
  ≡⟨ cong    suc ) (+-assoc (size t) (e-size η-ok) (c'-size c + s-size π-ok)) 
     suc (size t + (e-size η-ok + (c'-size c + s-size π-ok)))
  ≡⟨ cong    suc (size t + )) (sym (+-assoc (e-size η-ok) (c'-size c) (s-size π-ok))) 
     suc (size t + ((e-size η-ok + c'-size c) + s-size π-ok))
  ≡⟨ cong suc (sym (+-assoc (size t) (e-size η-ok + c'-size c) (s-size π-ok))) 
     suc (size t + (e-size η-ok + c'-size c) + s-size π-ok)
  

When we combine these two lemmas with the progress lemma we prove below, we will be able to prove by induction on the size of the typing derivation that the execution of the KAM terminates. This is the content of the soundness lemma below.

Subject expansion

A feature of intersection type systems is that they often enjoy subject expansion as well as subject reduction. The type information is a precise enough description of a term's behaviour that we are always able to reconstruct typing derivations backwards through execution.

To prove subject expansion we will need the inverse to the closure and environment splitting lemmas we proved for subject reduction:

clo-join :  {c σ₁ σ₂}  ⊢c c ⦂' σ₁  ⊢c c ⦂' σ₂  ⊢c c ⦂' (σ₁ ++ σ₂)
clo-join {σ₁ = []}     nil      c₂ = c₂
clo-join {σ₁ = τ  σ₁} (x  c₁) c₂ = x  clo-join c₁ c₂

env-join :  {n} {η : env n} {Γ₁ Γ₂}  ⊢e η  Γ₁  ⊢e η  Γ₂  ⊢e η  (Γ₁ +++ Γ₂)
env-join {n = zero}  {nil}    {nil}      {nil}     nil       nil        = nil
env-join {n = suc n} {η ,- c} {Γ₁ ,- σ₁} {Γ₂ ,- σ₂} (e₁ ,- c₁) (e₂ ,- c₂) = env-join e₁ e₂ ,- clo-join c₁ c₂

We also need the fact that every environment is well typed if the types we use make no demands:

env-empty :  {n}{η : env n}  ⊢e η  empty
env-empty {η = nil}    = nil
env-empty {η = η ,- x} = env-empty ,- nil

Together, these allow us to prove that the derived rule for building closures typed with intersection types (mkclo) is invertible:

unmk-clo :  {n}{η : env n}{s : term n}{σ} 
         ⊢c n , η , s ⦂' σ 
         Σ[ Γ  ctx n ] (⊢e η  Γ × Γ  s ⦂' σ)
unmk-clo {σ = []} nil = empty , env-empty , nil
unmk-clo {σ = τ  σ} (t-clo {Γ = Γ} s-τ η-ok  d) with unmk-clo d
... | Γ' , η-ok' , s-σ = (Γ +++ Γ') , env-join η-ok η-ok' , (s-τ ,~ ⋈ctx-refl  s-σ)

The proof of subject expansion is now has a similar structure to the proof of subject reduction:

subject-expansion :  {p q τ}  p  q  ⊢p q  τ  ⊢p p  τ
subject-expansion grab (cfg c-ok π-ok) = cfg (t-clo (var zero) (env-empty ,- (c-ok  nil))) π-ok
subject-expansion skip (cfg (t-clo (var x) x₂) x₁) = cfg (t-clo (var (suc x)) (x₂ ,- nil)) x₁
subject-expansion push (cfg (t-clo {Γ = Γ} t-ok η-ok) (c-ok -, π-ok)) with unmk-clo c-ok
... | Γ' , η-ok' , s-ok = cfg (t-clo (app t-ok s-ok ⋈ctx-refl) (env-join η-ok η-ok')) π-ok
subject-expansion pop  (cfg (t-clo t (x₂ ,- x₃)) x₁) = cfg (t-clo (lam t) x₂) (x₃ -, x₁)

Progress

As is usual progress only holds for complete programs. In our case, that means configurations that are typed with the observation , meaning that we expect the configuration to terminate with a value. Our formulation of progress is quantitative, so rather than examining the configuration p to see whether it is a value or not, we rely on measuring the amount of “potential” left in the configuration by measuring its size. If this is non-zero, then the KAM will make progress.

progress :  {p}  (d : ⊢p p  )  cfg-size d  0  Σ[ q  configuration ] p  q
progress (cfg (t-clo (var zero)    (η-emp ,- (c  _))) π-ok)        _ = _ , grab
progress (cfg (t-clo (var (suc x)) (η-ok ,- nil))      π-ok)        _ = _ , skip
progress (cfg (t-clo (lam t)       η-ok)               (c -, π-ok)) _ = _ , pop
progress (cfg (t-clo (app s t p)   η-ok)               π-ok)        _ = _ , push
progress (cfg (t-clo lam⋆          η-ok)               emp)         r rewrite e-empty η-ok with r refl
... | ()

Quantitative Soundness and Completeness

Putting all our results together, we can now prove that non-idempotent intersection types are a sound and complete characterisation of termination in the CBN λ-calculus, and moreover the size of the typing derivation is exactly the number of steps taken to terminate.

We first do soundness: if a closed term t has a typing derivation d that shows that is a possible observation of t, then t will reduce in the KAM to WHNF in size d steps:

soundness :  {t}  (d : nil  t  )  reduces (size d)  0 , nil , t  emp 
soundness {t} d = subst    reduces   0 , nil , t  emp )
                        (init-typed-size d)
                        (run (init-typed d) (cfg-size (init-typed d)) refl)
  where
    helper :  {k n}  n  suc k  n  0
    helper refl ()

    final :  {p}  (d : ⊢p p  )  cfg-size d  0  reduces 0 p
    final (cfg (t-clo (var zero) _)    π-ok) ()
    final (cfg (t-clo (var (suc _)) _) π-ok) ()
    final { n , η , ƛ t  emp } (cfg (t-clo lam⋆ x₁) emp) e = stop η t

    run :  {p}  (d : ⊢p p  )   k  cfg-size d  k  reduces k p
    run d zero    e = final d e
    run d (suc k) e with progress d (helper e)
    ... | q , s = step s (run (subject-reduction s d) k (suc-injective (trans (sym (sr-size s d)) e)))

The bulk of the work in this lemma is performed by the run function which uses induction on the size of the typing derivation to repeatedly call progress to get the next step, subject-reduction to preserve typability, and sr-size to show that the size of the derivation goes down.

The converse of soundness is completeness: every term that reduces to WHNF has a typing derivation. This is proved by induction on the reduction sequence: in the base case all WHNFs are typable by the lam⋆ rule, and in the step case by using subject expansion. We also know by soundness and the determinism of length of reduction sequences that the computed typing derivation has the same size as the length of the reduction.

completeness :  {t n}  reduces n  0 , nil , t  emp   Σ[ d  nil  t   ] (size d  n)
completeness {t} r = d , reduces-det (soundness d) r
  where
   config-typed :  {p k}  reduces k p  ⊢p p  
   config-typed (stop η t) = cfg (t-clo lam⋆ env-empty) emp
   config-typed (step s r) = subject-expansion s (config-typed r)

   extract :  {t}  ⊢p  0 , nil , t  emp     nil  t  
   extract (cfg (t-clo x nil) emp) = x

   d : nil  t  
   d = extract (config-typed r)

Inspecting the proof of subject expansion, we can also see that the computed typing derivation never uses a non-identity permutation, indicating that we might be able to use a variant of non-idempotent intersection types to predict the ordering of usage, as well as quantity of usage.

A possible application of this type system would be to proving that optimisations actually optimise. If we have an optimisation that replaces a term M with a term N, then if for all typings Γ ⊢ M ⦂ τ we can get a typing Γ ⊢ N ⦂ τ that is smaller, then any context which uses M can replace it with N, preserving meaning, and take fewer steps to terminate.