Quantitative Typing with Non-idempotent Intersection Types
This post contains a development of an Agda proof that, for the Call-by-Name (CBN) λ-calculus, the number of steps needed to reduce a term to weak head normal form (WHNF) in the Krivine Abstract Machine (KAM) is equal to the size of the term's typing derivation in a non-idempotent intersection type system.
The inverse also holds: if a term reduces to WHNF in the KAM, then it has a typing derivation, the size of which is equal to the number of steps it takes to reduce the term. Therefore, typability characterises termination, quantitatively.
The main result manifests as a “quantitative” subject reduction lemma: not only is typing preserved by each reduction step, but the size of the typing derivation goes down by exactly one in each step.
Non-idempotent intersection types can also be seen as a syntactic presentation of a denotational semantics of untyped λ-calculus in the category Rel of sets and relations for a particular reflexive object. The fact that subject reduction (and expansion) have a quantitative aspectyields a proof of adequacy for this semantics that doesn't rely on logical relations (I've haven't proved this in this file though).
The quantitative nature of non-idempotent intersection types was originally observed for head normalisation by Daniel de Carvalho in Execution Time of lambda-Terms via Denotational Semantics and Intersection Types.
module non-idempotent-intersection-types where
Imports
First we need some things from the Agda standard library:
open import Data.Nat using (ℕ; zero; suc; _+_) open import Data.Nat.Properties using (+-assoc; +-comm; +-identityʳ; suc-injective) open import Data.Fin using (Fin; zero; suc; _≟_) open import Data.Product using (Σ-syntax; _×_; _,_; proj₁; proj₂) open import Data.List using (List; []; _∷_; _++_; [_]) import Relation.Binary.PropositionalEquality as Eq open Eq using (_≡_; refl; trans; cong; subst; sym; _≢_; cong₂) open Eq.≡-Reasoning using (begin_; _≡⟨⟩_; step-≡; _∎)
Untyped λ-calculus
The syntax of the programming language we will look at is the same as the usual untyped λ-calculus. I've opted for an intrinsically well-scoped de Bruijn representation:
data term : ℕ → Set where `_ : ∀ {n} → Fin n → term n ƛ : ∀ {n} → term (suc n) → term n _·_ : ∀ {n} → term n → term n → term n infixl 20 _·_ infix 40 `_
Krivine Abstract Machine
The standard way to give an operational semantics to the untyped λ-calculus is to give a small step reduction relation where the main rule is β-reduction:
(ƛ t) · t' ---> t [ t' ]
However, our goal here is to measure the complexity of programs via typing and counting the number of β reductions is not a very convincing way of doing this. Substitution does a non-constant amount of work, depending on how big t
is and how often the 0
th variable is used in it.
Instead, we want to use a semantics that performs a constant amount of work at each step. One way to do this is to use an abstract machine to interpret programs. Abstract machines break the complex process of β-reduction down into smaller steps, so they are more convincing ways to measure complexity.
The abstract machine we shall use here is the Krivine Abstract Machine (KAM), which implements reduction of λ terms to weak head normal form.
Rather than using substitution, the KAM uses closures: pairs of a term and an environment that assigns a closure to each variable in the term. As this description suggests, closures and environments are mutually defined:
mutual data env : ℕ → Set where nil : env zero _,-_ : ∀ {n} → env n → clo → env (suc n) clo = Σ[ n ∈ ℕ ] env n × term n
Note that environments are built by extending to the right.
The second ingredient for the KAM are stacks of closures. Stacks are used to remember the context within a program we are currently executing. Compare to the congruence rule used in the definition of CBN small step operational semantics:
M --> M'
---------------------
M · N --> M' · N
In the rule, the context is the N
term, and in general we might have to go under several N
s to get to the part we are currently executing. The KAM uses a stack of closures to represent this:
data stk : Set where emp : stk _-,_ : clo → stk → stk infixr 10 _-,_
Stacks are built by extending to the left.
A configuration of the KAM is a pair of a closure, the current focus of the computation, and a stack representing the evaluation context. We use the special notation ⟨_⋆_⟩
for configurations.
record configuration : Set where constructor ⟨_⋆_⟩ field closure : clo stack : stk infix 20 ⟨_⋆_⟩
The reduction semantics of the KAM is given by the following four rules, which are driven by the shape of the term in the first component of the configuration:
data _⇒_ : configuration → configuration → Set where grab : ∀ {n η c π} → ⟨ suc n , η ,- c , ` zero ⋆ π ⟩ ⇒ ⟨ c ⋆ π ⟩ skip : ∀ {n η c i π} → ⟨ suc n , η ,- c , ` (suc i) ⋆ π ⟩ ⇒ ⟨ n , η , ` i ⋆ π ⟩ push : ∀ {n η t s π} → ⟨ n , η , t · s ⋆ π ⟩ ⇒ ⟨ n , η , t ⋆ (n , η , s) -, π ⟩ pop : ∀ {n η t c π} → ⟨ n , η , ƛ t ⋆ c -, π ⟩ ⇒ ⟨ suc n , (η ,- c) , t ⋆ π ⟩ infixr 10 _⇒_
The rules grab
and skip
handle the execution of variables by incrementally looking them up in the current environment. One could also define a KAM that has a single rule for variable lookup (looking arbitrarily deep into the environment in a single step), but we chose to break it down into individual steps here.
The rule push
executes an application t · s
by pushing the argument s
on to the stack (accompanied by its environment). Execution will then carry on in the function position (compare to the congruence rule given above for a small step operational semantics).
Finally, the rule pop
executes a λ abstraction ƛ t
by popping the current argument from the stack and adding it to the current environment. Combined with the first two rules, this performs a delayed β-reduction.
To start the KAM, we need to take programs that we want to execute and turn them into configurations:
init : term 0 → configuration init t = ⟨ 0 , nil , t ⋆ emp ⟩
Due to the fact that we have defined our KAM using intrinsically scoped syntax, it is not possible for the KAM to get stuck by not having a variable in scope. The only way the machine can get stuck is by ending up with a λ-abstraction paired with an empty stack. In this case we can say that the original term has been reduced to its weak head normal form (WHNF). It also possible that a term never reduces to a final configuration at all -- not all terms finish reducing under the Call-by-Name semantics. To identify which terms reduce, we make an inductive definition that also records how many steps it takes for a term to reduce:
data reduces : ℕ → configuration → Set where stop : ∀ {n} η t → reduces 0 ⟨ n , η , ƛ t ⋆ emp ⟩ step : ∀ {n p q} → p ⇒ q → reduces n q → reduces (suc n) p
So if we have a value of type reduces n p
, then we know that a configuration p
reduces to a WHNF in n
steps.
Since the selection of rules the KAM is driven by the shape of the term being executed, execution in the KAM is deterministic:
deterministic : ∀ {p q q'} → p ⇒ q → p ⇒ q' → q ≡ q' deterministic grab grab = refl deterministic skip skip = refl deterministic push push = refl deterministic pop pop = refl
Consequently, any configuration that reduces does so in a unique number of steps:
reduces-det : ∀ {p m n} → reduces m p → reduces n p → m ≡ n reduces-det (stop η t) (stop .η .t) = refl reduces-det (step s1 r1) (step s2 r2) with deterministic s1 s2 ... | refl = cong suc (reduces-det r1 r2)
The Non-Idempotent Type Assignment System
Non-idempotent intersection types assign types to terms according to their precise runtime behaviour. Usually type systems assign types that are over approximations of the possible behaviours of a program. Intersection types (whether idempotent or not) assign types that track precisely the behaviour a program will have (in a particular context). Indeed, as we will see below, the system we present here precisely tracks the termination behaviour of programs. This precision does make typechecking undecidable.
Types, Observations, Behaviours
The types of our system are defined as:
data type : Set where ⋆ : type _↦_ : List type → type → type infixr 30 _↦_
We can interpret the types of the system as kinds of “observations” or “behaviour descriptions” that we can make of a program. Per the definition of type
, we have two kinds of observation in our system:
The observation
⋆
means that we can observe that the term will reduce to a value (a λ abstraction), but gives us no further information. To observe that a complete program reduces to WHNF, we will seek typings that type the whole program with the observation⋆
.The observation
σ ↦ τ
pairs a list of observationsσ
with a single observationτ
. Making this observation means that the term will reduce to a function that when applied to an argument that exhibits all the behaviours inσ
, exhibits the observationτ
.
For example, the observation [⋆] ↦ ⋆
is one that we can assign to the identity function: we can observe that the output is a value, as long as it can make the observation that the input is a value.
The lists of types σ
are the intersection types of our system. They indicate that a particular term must have all of the behaviours listed, so it must be in the intersection of all the possible behaviours: σ = τ₁ ∧ τ₂ ∧ ... ∧ τₙ
. Idempotent intersection type systems disregard multiple occurrences of the same behaviour (i.e. τ ∧ τ = τ
) in an intersection: they only track whether a behaviour is present or not. Non-idempotent intersection type systems, in contrast, track the number of occurrences of each behaviour. In turn, this tracks the number of times a function uses its arguments. It is this quantitative aspect that will allow us to use non-idempotent intersection types to measure the reduction complexity of programs.
To see how the rejection of idempotency allows tracking of number of uses, consider the following two terms (written using explicit names):
λf. λx. f x
and
λf. λx. f (f x)
When applied to the identity function λx .x
and any value λz.t
, these terms both reduce to a value. We can see this by the types that the system we define below will assign to them. The first one gets the following type:
⊢ λf. λx. f x ⦂ [[⋆] ↦ ⋆] ↦ [⋆] ↦ ⋆
In words, this says "if the first argument yields values when given values, and the second argument is a value, then the result is a value". The second term is assigned a similar type, but here we can see the quantitative nature of the system coming in to play:
⊢ λf. λx. f (f x) ⦂ [[⋆] ↦ ⋆, [⋆] ↦ ⋆] ↦ [⋆] ↦ ⋆
Notice that the type now states that the first argument will be used twice, because two behaviours appear in the first intersection. This precisely tracks the structure of the term, which does indeed use f
twice. If we assumed idempotency, then the two types would be equivalent, and we would be tracking the qualitative behaviour, but not the quantitative behaviour.
Typing contexts
To assign types to open terms, we need typing contexts. Typing contexts assign to each free variable position a list of possible behaviours. Typing contexts grow to the right.
data ctx : ℕ → Set where nil : ctx zero _,-_ : ∀ {n} → ctx n → List type → ctx (suc n)
We need the following two combinators for typing contexts. The first, empty
, is the typing context where every variable is assumed to have no behaviours. This is used to type terms that do not use any of the variables in scope.
empty : ∀{n} → ctx n empty {zero} = nil empty {suc n} = empty {n} ,- []
The _+++_
combinator “zips” two typing contexts to combine their possible behaviours. We will use this to create contexts that combine the observations on contexts required by two the two subterms in an application term.
_+++_ : ∀{n} → ctx n → ctx n → ctx n nil +++ nil = nil (Γ₁ ,- σ₁) +++ (Γ₂ ,- σ₂) = (Γ₁ +++ Γ₂) ,- (σ₁ ++ σ₂)
Non-idempotent intersection type systems are sensitive to the number of occurrences of each behaviour in an intersection, but not to their order. We will model this by requiring equivalence only up to permutation at various places. We define a permutation between two lists via sequences of swappings:
module permutations {X : Set} where data _⋈_ : List X -> List X -> Set where nil : [] ⋈ [] skip : ∀ {x l₁ l₂} -> l₁ ⋈ l₂ -> (x ∷ l₁) ⋈ (x ∷ l₂) swap : ∀ {x y l} -> (x ∷ y ∷ l) ⋈ (y ∷ x ∷ l) ⋈-trans : ∀ {l₁ l₂ l₃} -> l₁ ⋈ l₂ -> l₂ ⋈ l₃ -> l₁ ⋈ l₃ ⋈-refl : ∀ {l} -> l ⋈ l ⋈-refl {[]} = nil ⋈-refl {x ∷ l} = skip ⋈-refl open permutations
We use permutations to define when two typing contexts are pointwise permutations of each other. I.e., when two contexts have the same number of variables, but the observations at each variable are permuted.
data _⋈ctx_ : ∀ {n} → ctx n → ctx n → Set where nil : nil ⋈ctx nil _,-_ : ∀ {n}{Γ₁ Γ₂ : ctx n}{σ₁ σ₂} → Γ₁ ⋈ctx Γ₂ → σ₁ ⋈ σ₂ → (Γ₁ ,- σ₁) ⋈ctx (Γ₂ ,- σ₂) ⋈ctx-refl : ∀ {n}{Γ : ctx n} → Γ ⋈ctx Γ ⋈ctx-refl {Γ = nil} = nil ⋈ctx-refl {Γ = Γ ,- σ} = ⋈ctx-refl ,- ⋈-refl
Type Assignment
We can now define the rules of our system. First, we define what it means for a variable (represented as an Agda value of type Fin n
) to be well typed in a context:
data _⊢v_⦂_ : ∀ {n} → ctx n → Fin n → type → Set where zero : ∀ {n τ} → (empty {n} ,- [ τ ]) ⊢v zero ⦂ τ suc : ∀ {n τ i}{Γ : ctx n} → Γ ⊢v i ⦂ τ → (Γ ,- []) ⊢v (suc i) ⦂ τ
The key point here is that the context for a variable must require no behaviours for any variable that is not the one being selected, and must require exactly one behaviour for the one being selected. Thus one use a variable corresponds to one observation.
The typing rules for terms are mutually defined with a rule for typing a single term against an intersection type. I will explain the rules after the definition.
mutual data _⊢_⦂_ : ∀ {n} → ctx n → term n → type → Set where var : ∀ {n τ i}{Γ : ctx n} → Γ ⊢v i ⦂ τ → Γ ⊢ ` i ⦂ τ lam : ∀ {n σ τ t}{Γ : ctx n} → (Γ ,- σ) ⊢ t ⦂ τ → Γ ⊢ ƛ t ⦂ (σ ↦ τ) app : ∀ {n σ τ s t}{Γ Γ₁ Γ₂ : ctx n} → Γ₁ ⊢ s ⦂ (σ ↦ τ) → Γ₂ ⊢ t ⦂' σ → (Γ₁ +++ Γ₂) ⋈ctx Γ → Γ ⊢ (s · t) ⦂ τ lam⋆ : ∀ {n t} → empty {n} ⊢ ƛ t ⦂ ⋆ data _⊢_⦂'_ : ∀ {n} → ctx n → term n → List type → Set where nil : ∀ {n}{t : term n} → empty ⊢ t ⦂' [] _,~_∷_ : ∀ {n t σ τ}{Γ₁ Γ₂ Γ : ctx n} → Γ₁ ⊢ t ⦂ τ → (Γ₁ +++ Γ₂) ⋈ctx Γ → Γ₂ ⊢ t ⦂' σ → Γ ⊢ t ⦂' (τ ∷ σ)
The rule var
uses the variable typing judgement to assign types to terms that are variables. The rule lam
states that a λ abstraction can be assigned the type σ ↦ τ
if the body can be assigned the type τ
under the assumption that the 0
th variable has the intersection type σ
. The app
rule states that if the term s
can be observed to act like a function mapping intersections of behaviours σ
to behaviours τ
, and the term t
exhibits all the behaviours in σ
(using the auxiliary judgement _⊢_⦂'_
), then the whole application term exhibits the behaviour τ
. Note that the two terms are typed in separate contexts that are combined -- so the uses of variables in s
and t
are tracked separately. The rule lam⋆
is used to assign the observation that λ abstractions are values (⋆
means that the term is a value) to any λ abstraction.
The auxiliary judgement Γ ⊢ t ⦂' σ
is used to ensure that a single term t
can exhibit all the possible behaviours in σ
. Note that each individual behaviour in σ
must be observable under a separate typing context. All the typing contexts are summed up (up to permutation), to give the complete requirements on the context.
This completes the definition of the type assignment system. We demonstrate it using the two terms from above. These are defined as follows:
--- λf. λx. f x example₁ : term 0 example₁ = ƛ (ƛ (` suc zero · ` zero)) -- λf. λx. f (f x) example₂ : term 0 example₂ = ƛ (ƛ (` suc zero · (` suc zero · ` zero)))
To give typing derivations for these, we will save a bit of typing by using the following derived rule for application when we apply a function that will only use its argument once:
app₁ : ∀ {n Γ₁ Γ₂ τ₁ τ₂}{s t : term n} → Γ₁ ⊢ s ⦂ [ τ₁ ] ↦ τ₂ → Γ₂ ⊢ t ⦂ τ₁ → (Γ₁ +++ (Γ₂ +++ empty)) ⊢ s · t ⦂ τ₂ app₁ s-ok t-ok = app s-ok (t-ok ,~ ⋈ctx-refl ∷ nil) ⋈ctx-refl
With this derived rule to help us, the first example can be assigned the type we gave above like so:
typing₁ : nil ⊢ example₁ ⦂ [ [ ⋆ ] ↦ ⋆ ] ↦ [ ⋆ ] ↦ ⋆ typing₁ = lam (lam (app₁ (var (suc zero)) (var zero)))
And the second one like so:
typing₂ : nil ⊢ example₂ ⦂ ([ ⋆ ] ↦ ⋆ ∷ [ ⋆ ] ↦ ⋆ ∷ []) ↦ [ ⋆ ] ↦ ⋆ typing₂ = lam (lam (app₁ (var (suc zero)) (app₁ (var (suc zero)) (var zero))))
In the second one we see how the type reveals that the term uses its first argument twice.
Measuring the sizes of derivations
Our main result below is that typing is preserved by execution quantitatively. Therefore, we need to be able to compute the sizes of typing derivations. This is done by the following three functions:
v-size : ∀ {n}{Γ : ctx n}{t τ} → Γ ⊢v t ⦂ τ → ℕ v-size zero = 1 v-size (suc d) = 1 + v-size d mutual size : ∀ {n}{Γ : ctx n}{t τ} → Γ ⊢ t ⦂ τ → ℕ size (var x) = v-size x size (lam d) = 1 + size d size (app d₁ d₂ _) = 1 + size d₁ + size' d₂ size lam⋆ = 0 size' : ∀ {n}{Γ : ctx n}{t σ} → Γ ⊢ t ⦂' σ → ℕ size' nil = 0 size' (d ,~ _ ∷ ds) = size d + size' ds
We have been a little selective in what we have counted. We have added 1
for every construct that can cause the KAM to make a step: whether a variable lookup (zero
or suc
), an application, or a λ abstraction that we expect to have some interesting behaviour. In particular a use of lam⋆
is counted as 0
because it represents values that will not be inspected any further, and so will not cause the KAM to do any work.
The size'
function adds up all the derivations used in checking an intersection type. This means that we accurately account for all the possible ways that a single term will be used.
Typing KAM configurations.
We now extend the non-idempotent intersection typing discipline from terms to configurations of the KAM. This is required so that we can state our main lemmas.
Closures are typed by ensuring that the term contained in them is well typed for some context, and the associated environment is contains closures well typed according to that context. Because contexts contain intersection types, we need an auxiliary judgement that checks that a closure is well typed for all types in an intersection. These three judgements (closures, environments, closures with intersection types) need to be mutually defined:
mutual data ⊢c_⦂_ : (c : clo) (τ : type) → Set where t-clo : ∀ {n t τ}{η : env n}{Γ : ctx n} → Γ ⊢ t ⦂ τ → ⊢e η ⦂ Γ → ⊢c (n , η , t) ⦂ τ data ⊢e_⦂_ : ∀ {n} → env n → ctx n → Set where nil : ⊢e nil ⦂ nil _,-_ : ∀{n}{η : env n}{Γ : ctx n}{c}{σ} → ⊢e η ⦂ Γ → ⊢c c ⦂' σ → ⊢e (η ,- c) ⦂ (Γ ,- σ) data ⊢c_⦂'_ : (c : clo) (σ : List type) → Set where nil : ∀ {c} → ⊢c c ⦂' [] _∷_ : ∀ {c τ σ} → ⊢c c ⦂ τ → ⊢c c ⦂' σ → ⊢c c ⦂' (τ ∷ σ)
We also extend the measurement of sizes of derivations to well typed closures and environments. Notice that these derivations do not contribute themselves to the size: we only sum up the sizes of the contained term typing derivations.
mutual c-size : ∀ {c τ} → ⊢c c ⦂ τ → ℕ c-size (t-clo t e) = size t + e-size e e-size : ∀ {n}{η : env n}{Γ} → ⊢e η ⦂ Γ → ℕ e-size nil = 0 e-size (d ,- c) = e-size d + c'-size c c'-size : ∀ {c σ} → ⊢c c ⦂' σ → ℕ c'-size nil = 0 c'-size (c ∷ c') = c-size c + c'-size c'
Stacks are typed with two types τ₁ ⊸ τ₂
, meaning that if the observation τ₁
can be made of the computation happening under the stack, then the observation τ₂
can be observed at the "top level":
data ⊢s_⦂_⊸_ : (π : stk) (τ₁ τ₂ : type) → Set where emp : ∀ {τ} → ⊢s emp ⦂ τ ⊸ τ _-,_ : ∀ {c π σ τ₁ τ₂} → ⊢c c ⦂' σ → ⊢s π ⦂ τ₁ ⊸ τ₂ → ⊢s (c -, π) ⦂ (σ ↦ τ₁) ⊸ τ₂
Again, we measure the size of a stack typing derivation by adding up the sizes of the typing derivations contained within:
s-size : ∀ {π τ₁ τ₂} → ⊢s π ⦂ τ₁ ⊸ τ₂ → ℕ s-size emp = 0 s-size (c -, π) = c'-size c + s-size π
Well typed KAM configurations are constructed by pairing closures that can generate a behaviour τ₁
with stacks that can turn that behaviour into a behaviour τ
at the top level. The size of a configuration is measured by summing up the sizes of the typing derivations included in the closure and the stack.
data ⊢p_⦂_ : configuration → type → Set where cfg : ∀ {c π τ₁ τ} → ⊢c c ⦂ τ₁ → ⊢s π ⦂ τ₁ ⊸ τ → ⊢p ⟨ c ⋆ π ⟩ ⦂ τ cfg-size : ∀ {p τ} → ⊢p p ⦂ τ → ℕ cfg-size (cfg c π) = c-size c + s-size π
We can now prove that the initial configuration of the KAM for a given program is well typed if the program is:
init-typed : ∀{t τ} → nil ⊢ t ⦂ τ → ⊢p init t ⦂ τ init-typed d = cfg (t-clo d nil) emp
The size of this initial typing derivation is also equal to the size of the original typing derivation:
init-typed-size : ∀{t τ} → (d : nil ⊢ t ⦂ τ) → cfg-size (init-typed d) ≡ size d init-typed-size{τ = τ} d = begin size d + e-size nil + s-size{τ₁ = τ} emp ≡⟨⟩ size d + 0 + 0 ≡⟨ +-identityʳ (size d + 0) ⟩ size d + 0 ≡⟨ +-identityʳ (size d) ⟩ size d ∎
Quantitative Subject Reduction
We now connect the KAM semantics to the non-idempotent intersection type system. This will take the form of three key lemmas:
Subject Reduction (a.k.a preservation), which says that if a configuration of the KAM is well typed, and steps to another configuration, then that configuration is also well-typed. The twist here is that this lemma is also quantitative: the size of the typing derivation after one step is exactly one smaller than it was before. This will allow us to prove that all typable terms reduce.
Subject Expansion, the converse of subject reduction. If a configuration is well typed and was the result of a step of the KAM, then the predecessor was also well typed. Subject expansion does not usually hold for type systems, but is characteristic of intersection type systems due to the way they describe what a program must do, instead of what it may do. Subject expansion allows us to prove a completeness result: if a program reduces to WHNF, then we can step it backwards to recover a typing derivation for the initial state.
Progress: if a configuration is well typed, and its derivation has non-zero size, then the KAM can take a step. Thus we can interpret the size of a derivation as the amount of computation that a program has left to perform.
Some lemmas
Before we can prove subject reduction, we will need some auxiliary lemmas about the type system. These mostly say that typing is preserved under permutation, splitting, or joining of intersection types. We also prove that these preservations are size preserving, anticipating the quantitative subject reduction lemma in the next section.
First, we need that if a closure c
can be well typed for (i.e., exhibits the behaviour of) all the types in σ₁
then it is well typed for any permutation of σ₁
. Also, this does not affect the size of the typing derivation. We prove this by induction on the permutation applied:
clo-permute : ∀ {c σ₁ σ₂} → ⊢c c ⦂' σ₁ → σ₂ ⋈ σ₁ → ⊢c c ⦂' σ₂ clo-permute d nil = d clo-permute (x ∷ d) (skip p) = x ∷ clo-permute d p clo-permute (x ∷ x₁ ∷ d) swap = x₁ ∷ x ∷ d clo-permute d (⋈-trans p p₁) = clo-permute (clo-permute d p₁) p clo-permute-size : ∀ {c σ₁ σ₂} → (d : ⊢c c ⦂' σ₁) → (p : σ₂ ⋈ σ₁) → c'-size d ≡ c'-size (clo-permute d p) clo-permute-size d nil = refl clo-permute-size (c ∷ d) (skip p) = cong (λ □ → c-size c + □) (clo-permute-size d p) clo-permute-size (x ∷ (x₁ ∷ d)) swap = trans (sym (+-assoc (c-size x) (c-size x₁) (c'-size d))) (trans (cong (λ □ → □ + c'-size d) (+-comm (c-size x) (c-size x₁))) (+-assoc (c-size x₁) (c-size x) (c'-size d))) clo-permute-size d (⋈-trans p p₁) = trans (clo-permute-size d p₁) (clo-permute-size (clo-permute d p₁) p)
Preservation of closure typing under permutation extends to preservation of environment typing under pointwise permutation of typing contexts, and this is also size preserving:
env-permute : ∀ {n}{η : env n}{Γ₁ Γ₂ : ctx n} → ⊢e η ⦂ Γ₁ → Γ₂ ⋈ctx Γ₁ → ⊢e η ⦂ Γ₂ env-permute nil nil = nil env-permute (d ,- x) (p ,- x₁) = env-permute d p ,- clo-permute x x₁ env-permute-size : ∀ {n}{η : env n}{Γ₁ Γ₂ : ctx n} → (d : ⊢e η ⦂ Γ₁) → (p : Γ₂ ⋈ctx Γ₁) → e-size d ≡ e-size (env-permute d p) env-permute-size nil nil = refl env-permute-size (d ,- x) (p ,- x₁) = cong₂ _+_ (env-permute-size d p) (clo-permute-size x x₁)
If a closure c
exhibits all the behaviours in the big intersection σ₁ ++ σ₂
, then it exhibits them separately. Moreover, the sizes of the two separate typing derivations sums to the original:
clo-split : ∀ {c σ₁ σ₂} → ⊢c c ⦂' (σ₁ ++ σ₂) → ⊢c c ⦂' σ₁ × ⊢c c ⦂' σ₂ clo-split {σ₁ = []} d = nil , d clo-split {σ₁ = x ∷ σ₁} (x₁ ∷ d) = let d₁ , d₂ = clo-split d in x₁ ∷ d₁ , d₂ clo-split-size : ∀ {c σ₁ σ₂} → (d : ⊢c c ⦂' (σ₁ ++ σ₂)) → c'-size d ≡ c'-size (clo-split {σ₁ = σ₁} d .proj₁) + c'-size (clo-split {σ₁ = σ₁} d .proj₂) clo-split-size {σ₁ = []} d = refl clo-split-size {σ₁ = x ∷ σ₁} (x₁ ∷ d) = trans (cong (λ □ → c-size x₁ + □) (clo-split-size {σ₁ = σ₁} d)) (sym (+-assoc (c-size x₁) _ _))
As a consequence of clo-split
, we can prove that if an environment exhibits behaviours in two contexts combined, then it exhibits them separately. We will use this to distribute welltypedness of environments between the two parts of an application term in the push
rule of the KAM.
env-split : ∀ {n} {η : env n} {Γ₁ Γ₂} → ⊢e η ⦂ (Γ₁ +++ Γ₂) → ⊢e η ⦂ Γ₁ × ⊢e η ⦂ Γ₂ env-split {n = zero} {nil} {nil} {nil} nil = nil , nil env-split {n = suc n} {η ,- x} {Γ₁ ,- x₁} {Γ₂ ,- x₂} (d ,- x₃) = (env-split d .proj₁ ,- clo-split x₃ .proj₁) , (env-split d .proj₂ ,- clo-split x₃ .proj₂)
The splitting of typing derivations by env-split
preserves sizes, but to prove this we first need a lemma about addition:
interchange : ∀ a b c d → (a + b) + (c + d) ≡ (a + c) + (b + d) interchange a b c d = begin (a + b) + (c + d) ≡⟨ +-assoc a b (c + d) ⟩ a + (b + (c + d)) ≡⟨ cong (λ □ → a + □) (sym (+-assoc b c d)) ⟩ a + ((b + c) + d) ≡⟨ cong (λ □ → a + (□ + d)) (+-comm b c) ⟩ a + ((c + b) + d) ≡⟨ cong (λ □ → a + □) (+-assoc c b d) ⟩ a + (c + (b + d)) ≡⟨ sym (+-assoc a c (b + d)) ⟩ (a + c) + (b + d) ∎
With this lemma, we can prove that splitting an environment typing derivation is size preserving:
env-split-size : ∀ {n} {η : env n} {Γ₁ Γ₂} → (d : ⊢e η ⦂ (Γ₁ +++ Γ₂)) → e-size d ≡ e-size (env-split d .proj₁) + e-size (env-split d .proj₂) env-split-size {n = zero} {nil} {nil} {nil} nil = refl env-split-size {n = suc n} {η ,- x} {Γ₁ ,- σ₁} {Γ₂ ,- σ₂} (d ,- c) = begin e-size d + c'-size c ≡⟨ cong₂ _+_ (env-split-size d) (clo-split-size {x}{σ₁} c) ⟩ (e-size (env-split d .proj₁) + e-size (env-split d .proj₂)) + (c'-size (clo-split {x}{σ₁} c .proj₁) + c'-size (clo-split {x}{σ₁} c .proj₂)) ≡⟨ interchange (e-size (env-split d .proj₁)) (e-size (env-split d .proj₂)) (c'-size (clo-split {x}{σ₁} c .proj₁)) (c'-size (clo-split {x}{σ₁} c .proj₂)) ⟩ e-size (env-split d .proj₁) + c'-size (clo-split {x}{σ₁} c .proj₁) + (e-size (env-split d .proj₂) + c'-size (clo-split {x}{σ₁} c .proj₂)) ∎
We also have nullary version of clo-split
and env-split
: whenever a closure has is typed to have no behaviours, or an environment consists of closures that have no required behaviours, then the sizes of their derivations is zero:
c'-empty : ∀ {c} → (d : ⊢c c ⦂' []) → c'-size d ≡ 0 c'-empty nil = refl e-empty : ∀ {n}{η : env n} → (d : ⊢e η ⦂ empty) → e-size d ≡ 0 e-empty nil = refl e-empty (d ,- c) = begin e-size d + c'-size c ≡⟨ cong₂ _+_ (e-empty d) (c'-empty c) ⟩ 0 ∎
The final piece we need before proving subject reduction is the following derived typing rule for constructing typings of closures against intersections from typings of terms against intersections, assuming an appropriately typed environment. This uses the env-split
and env-permute
lemmas.
mk-clo : ∀ {n}{η : env n}{Γ : ctx n}{s : term n}{σ} → ⊢e η ⦂ Γ → Γ ⊢ s ⦂' σ → ⊢c n , η , s ⦂' σ mk-clo {σ = []} η-ok nil = nil mk-clo {σ = τ ∷ σ} η-ok (x ,~ p ∷ s-ok) = t-clo x (env-split (env-permute η-ok p) .proj₁) ∷ mk-clo (env-split (env-permute η-ok p) .proj₂) s-ok
This derived rule is again size preserving, using the fact that env-split
and env-permute
are size preserving:
mk-clo-size : ∀ {n}{η : env n}{Γ : ctx n}{s : term n}{σ} → (η-ok : ⊢e η ⦂ Γ) → (s-ok : Γ ⊢ s ⦂' σ) → size' s-ok + e-size η-ok ≡ c'-size (mk-clo η-ok s-ok) mk-clo-size {σ = []} η-ok nil = e-empty η-ok mk-clo-size {σ = τ ∷ σ} η-ok (x ,~ p ∷ s-ok) = begin size x + size' s-ok + e-size η-ok ≡⟨ cong (λ □ → size x + size' s-ok + □) (env-permute-size η-ok p) ⟩ size x + size' s-ok + e-size (env-permute η-ok p) ≡⟨ cong (λ □ → size x + size' s-ok + □) (env-split-size (env-permute η-ok p)) ⟩ size x + size' s-ok + (e-size (env-split (env-permute η-ok p) .proj₁) + e-size (env-split (env-permute η-ok p) .proj₂)) ≡⟨ interchange (size x) (size' s-ok) (e-size (env-split (env-permute η-ok p) .proj₁)) (e-size (env-split (env-permute η-ok p) .proj₂)) ⟩ size x + e-size (env-split (env-permute η-ok p) .proj₁) + (size' s-ok + e-size (env-split (env-permute η-ok p) .proj₂)) ≡⟨ cong (λ □ → size x + e-size (env-split (env-permute η-ok p) .proj₁) + □) (mk-clo-size (env-split (env-permute η-ok p) .proj₂) s-ok) ⟩ size x + e-size (env-split (env-permute η-ok p) .proj₁) + c'-size (mk-clo (env-split (env-permute η-ok p) .proj₂) s-ok) ∎
Subject Reduction
Using the lemmas in the previous section, we can now prove the subject reduction property for the non-idempotent intersection type system. The proof consists of taking each possible KAM step in turn, pattern matching on the possible starting configurations' typing derivations, and constructing the corresponding typing derivation for the ending configuration:
subject-reduction : ∀ {p q τ} → p ⇒ q → ⊢p p ⦂ τ → ⊢p q ⦂ τ subject-reduction grab (cfg (t-clo (var zero) (_ ,- (c ∷ nil))) π-ok) = cfg c π-ok subject-reduction skip (cfg (t-clo (var (suc x)) (η-ok ,- _)) π-ok) = cfg (t-clo (var x) η-ok) π-ok subject-reduction push (cfg (t-clo (app t s p) η-ok) π-ok) = cfg (t-clo t (env-split (env-permute η-ok p) .proj₁)) (mk-clo (env-split (env-permute η-ok p) .proj₂) s -, π-ok) subject-reduction pop (cfg (t-clo (lam t) η-ok) (c -, π-ok)) = cfg (t-clo t (η-ok ,- c)) π-ok
In most of the cases, the final typing derivation is constructed from parts of the initial one. In the application case, we need to use env-split
to distribute the typing derivation of the environment (η-ok
) between the closures for the function and its argument.
We now prove that subject reduction is quantitative: the typing derivation generated by subject reduction is always of size one smaller than the initial one. This is a consequence of the fact that most of the operations we perform on typing derivations in he subject reduction proof are size preserving, except for the fact that we remove the top level term former, which accounts for the decrementing by one.
sr-size : ∀ {p q τ} → (step : p ⇒ q) → (d : ⊢p p ⦂ τ) → cfg-size d ≡ suc (cfg-size (subject-reduction step d)) sr-size grab (cfg (t-clo (var zero) (η-emp ,- (c ∷ nil))) π-ok) = begin suc (e-size η-emp + (c-size c + 0) + s-size π-ok) ≡⟨ cong (λ □ → suc (□ + (c-size c + 0) + s-size π-ok)) (e-empty η-emp) ⟩ suc (0 + (c-size c + 0) + s-size π-ok) ≡⟨⟩ suc ((c-size c + 0) + s-size π-ok) ≡⟨ cong (λ □ → suc (□ + s-size π-ok)) (+-identityʳ (c-size c)) ⟩ suc (c-size c + s-size π-ok) ∎ sr-size skip (cfg (t-clo (var (suc x)) (η-ok ,- c')) π-ok) = begin suc (v-size x + (e-size η-ok + c'-size c') + s-size π-ok) ≡⟨ cong (λ □ → suc (v-size x + (e-size η-ok + □) + s-size π-ok)) (c'-empty c') ⟩ suc (v-size x + (e-size η-ok + 0) + s-size π-ok) ≡⟨ cong (λ □ → suc (v-size x + □ + s-size π-ok)) (+-identityʳ (e-size η-ok)) ⟩ suc (v-size x + e-size η-ok + s-size π-ok) ∎ sr-size push (cfg (t-clo (app t s p) η-ok) π-ok) = begin suc (size t + size' s + e-size η-ok + s-size π-ok) ≡⟨ cong (λ □ → suc (size t + size' s + □ + s-size π-ok)) (env-permute-size η-ok p) ⟩ suc (size t + size' s + e-size (env-permute η-ok p) + s-size π-ok) ≡⟨ cong (λ □ → suc (size t + size' s + □ + s-size π-ok)) (env-split-size (env-permute η-ok p)) ⟩ suc (size t + size' s + (e-size (env-split (env-permute η-ok p) .proj₁) + e-size (env-split (env-permute η-ok p) .proj₂)) + s-size π-ok) ≡⟨ cong (λ □ → suc (□ + s-size π-ok)) (interchange (size t) (size' s) (e-size (env-split (env-permute η-ok p) .proj₁)) (e-size (env-split (env-permute η-ok p) .proj₂))) ⟩ suc (size t + e-size (env-split (env-permute η-ok p) .proj₁) + (size' s + e-size (env-split (env-permute η-ok p) .proj₂)) + s-size π-ok) ≡⟨ cong (λ □ → suc (size t + e-size (env-split (env-permute η-ok p) .proj₁) + □ + s-size π-ok)) (mk-clo-size (env-split (env-permute η-ok p) .proj₂) s) ⟩ suc (size t + e-size (env-split (env-permute η-ok p) .proj₁) + c'-size (mk-clo (env-split (env-permute η-ok p) .proj₂) s) + s-size π-ok) ≡⟨ cong (λ □ → suc □) (+-assoc (size t + e-size (env-split (env-permute η-ok p) .proj₁)) (c'-size (mk-clo (env-split (env-permute η-ok p) .proj₂) s)) (s-size π-ok)) ⟩ suc (size t + e-size (env-split (env-permute η-ok p) .proj₁) + (c'-size (mk-clo (env-split (env-permute η-ok p) .proj₂) s) + s-size π-ok)) ∎ sr-size pop (cfg (t-clo (lam t) η-ok) (c -, π-ok)) = begin suc (size t + e-size η-ok + (c'-size c + s-size π-ok)) ≡⟨ cong (λ □ → suc □) (+-assoc (size t) (e-size η-ok) (c'-size c + s-size π-ok)) ⟩ suc (size t + (e-size η-ok + (c'-size c + s-size π-ok))) ≡⟨ cong (λ □ → suc (size t + □)) (sym (+-assoc (e-size η-ok) (c'-size c) (s-size π-ok))) ⟩ suc (size t + ((e-size η-ok + c'-size c) + s-size π-ok)) ≡⟨ cong suc (sym (+-assoc (size t) (e-size η-ok + c'-size c) (s-size π-ok))) ⟩ suc (size t + (e-size η-ok + c'-size c) + s-size π-ok) ∎
When we combine these two lemmas with the progress lemma we prove below, we will be able to prove by induction on the size of the typing derivation that the execution of the KAM terminates. This is the content of the soundness lemma below.
Subject expansion
A feature of intersection type systems is that they often enjoy subject expansion as well as subject reduction. The type information is a precise enough description of a term's behaviour that we are always able to reconstruct typing derivations backwards through execution.
To prove subject expansion we will need the inverse to the closure and environment splitting lemmas we proved for subject reduction:
clo-join : ∀ {c σ₁ σ₂} → ⊢c c ⦂' σ₁ → ⊢c c ⦂' σ₂ → ⊢c c ⦂' (σ₁ ++ σ₂) clo-join {σ₁ = []} nil c₂ = c₂ clo-join {σ₁ = τ ∷ σ₁} (x ∷ c₁) c₂ = x ∷ clo-join c₁ c₂ env-join : ∀ {n} {η : env n} {Γ₁ Γ₂} → ⊢e η ⦂ Γ₁ → ⊢e η ⦂ Γ₂ → ⊢e η ⦂ (Γ₁ +++ Γ₂) env-join {n = zero} {nil} {nil} {nil} nil nil = nil env-join {n = suc n} {η ,- c} {Γ₁ ,- σ₁} {Γ₂ ,- σ₂} (e₁ ,- c₁) (e₂ ,- c₂) = env-join e₁ e₂ ,- clo-join c₁ c₂
We also need the fact that every environment is well typed if the types we use make no demands:
env-empty : ∀ {n}{η : env n} → ⊢e η ⦂ empty env-empty {η = nil} = nil env-empty {η = η ,- x} = env-empty ,- nil
Together, these allow us to prove that the derived rule for building closures typed with intersection types (mkclo
) is invertible:
unmk-clo : ∀ {n}{η : env n}{s : term n}{σ} → ⊢c n , η , s ⦂' σ → Σ[ Γ ∈ ctx n ] (⊢e η ⦂ Γ × Γ ⊢ s ⦂' σ) unmk-clo {σ = []} nil = empty , env-empty , nil unmk-clo {σ = τ ∷ σ} (t-clo {Γ = Γ} s-τ η-ok ∷ d) with unmk-clo d ... | Γ' , η-ok' , s-σ = (Γ +++ Γ') , env-join η-ok η-ok' , (s-τ ,~ ⋈ctx-refl ∷ s-σ)
The proof of subject expansion is now has a similar structure to the proof of subject reduction:
subject-expansion : ∀ {p q τ} → p ⇒ q → ⊢p q ⦂ τ → ⊢p p ⦂ τ subject-expansion grab (cfg c-ok π-ok) = cfg (t-clo (var zero) (env-empty ,- (c-ok ∷ nil))) π-ok subject-expansion skip (cfg (t-clo (var x) x₂) x₁) = cfg (t-clo (var (suc x)) (x₂ ,- nil)) x₁ subject-expansion push (cfg (t-clo {Γ = Γ} t-ok η-ok) (c-ok -, π-ok)) with unmk-clo c-ok ... | Γ' , η-ok' , s-ok = cfg (t-clo (app t-ok s-ok ⋈ctx-refl) (env-join η-ok η-ok')) π-ok subject-expansion pop (cfg (t-clo t (x₂ ,- x₃)) x₁) = cfg (t-clo (lam t) x₂) (x₃ -, x₁)
Progress
As is usual progress only holds for complete programs. In our case, that means configurations that are typed with the observation ⋆
, meaning that we expect the configuration to terminate with a value. Our formulation of progress is quantitative, so rather than examining the configuration p
to see whether it is a value or not, we rely on measuring the amount of “potential” left in the configuration by measuring its size. If this is non-zero, then the KAM will make progress.
progress : ∀ {p} → (d : ⊢p p ⦂ ⋆) → cfg-size d ≢ 0 → Σ[ q ∈ configuration ] p ⇒ q progress (cfg (t-clo (var zero) (η-emp ,- (c ∷ _))) π-ok) _ = _ , grab progress (cfg (t-clo (var (suc x)) (η-ok ,- nil)) π-ok) _ = _ , skip progress (cfg (t-clo (lam t) η-ok) (c -, π-ok)) _ = _ , pop progress (cfg (t-clo (app s t p) η-ok) π-ok) _ = _ , push progress (cfg (t-clo lam⋆ η-ok) emp) r rewrite e-empty η-ok with r refl ... | ()
Quantitative Soundness and Completeness
Putting all our results together, we can now prove that non-idempotent intersection types are a sound and complete characterisation of termination in the CBN λ-calculus, and moreover the size of the typing derivation is exactly the number of steps taken to terminate.
We first do soundness: if a closed term t
has a typing derivation d
that shows that ⋆
is a possible observation of t
, then t
will reduce in the KAM to WHNF in size d
steps:
soundness : ∀ {t} → (d : nil ⊢ t ⦂ ⋆) → reduces (size d) ⟨ 0 , nil , t ⋆ emp ⟩ soundness {t} d = subst (λ □ → reduces □ ⟨ 0 , nil , t ⋆ emp ⟩) (init-typed-size d) (run (init-typed d) (cfg-size (init-typed d)) refl) where helper : ∀ {k n} → n ≡ suc k → n ≢ 0 helper refl () final : ∀ {p} → (d : ⊢p p ⦂ ⋆) → cfg-size d ≡ 0 → reduces 0 p final (cfg (t-clo (var zero) _) π-ok) () final (cfg (t-clo (var (suc _)) _) π-ok) () final {⟨ n , η , ƛ t ⋆ emp ⟩} (cfg (t-clo lam⋆ x₁) emp) e = stop η t run : ∀ {p} → (d : ⊢p p ⦂ ⋆) → ∀ k → cfg-size d ≡ k → reduces k p run d zero e = final d e run d (suc k) e with progress d (helper e) ... | q , s = step s (run (subject-reduction s d) k (suc-injective (trans (sym (sr-size s d)) e)))
The bulk of the work in this lemma is performed by the run
function which uses induction on the size of the typing derivation to repeatedly call progress
to get the next step, subject-reduction
to preserve typability, and sr-size
to show that the size of the derivation goes down.
The converse of soundness is completeness: every term that reduces to WHNF has a typing derivation. This is proved by induction on the reduction sequence: in the base case all WHNFs are typable by the lam⋆
rule, and in the step case by using subject expansion. We also know by soundness and the determinism of length of reduction sequences that the computed typing derivation has the same size as the length of the reduction.
completeness : ∀ {t n} → reduces n ⟨ 0 , nil , t ⋆ emp ⟩ → Σ[ d ∈ nil ⊢ t ⦂ ⋆ ] (size d ≡ n) completeness {t} r = d , reduces-det (soundness d) r where config-typed : ∀ {p k} → reduces k p → ⊢p p ⦂ ⋆ config-typed (stop η t) = cfg (t-clo lam⋆ env-empty) emp config-typed (step s r) = subject-expansion s (config-typed r) extract : ∀ {t} → ⊢p ⟨ 0 , nil , t ⋆ emp ⟩ ⦂ ⋆ → nil ⊢ t ⦂ ⋆ extract (cfg (t-clo x nil) emp) = x d : nil ⊢ t ⦂ ⋆ d = extract (config-typed r)
Inspecting the proof of subject expansion, we can also see that the computed typing derivation never uses a non-identity permutation, indicating that we might be able to use a variant of non-idempotent intersection types to predict the ordering of usage, as well as quantity of usage.
A possible application of this type system would be to proving that optimisations actually optimise. If we have an optimisation that replaces a term M
with a term N
, then if for all typings Γ ⊢ M ⦂ τ
we can get a typing Γ ⊢ N ⦂ τ
that is smaller, then any context which uses M
can replace it with N
, preserving meaning, and take fewer steps to terminate.