Thursday
23rd April
2015

# The Incremental λ-Calculus and Relational Parametricity

Back in February, the paper A theory of changes for higher-order languages — incrementalizing λ-calculi by static differentiation by Cai, Giarusso, Rendel, and Ostermann, was posted to Lambda-the-Ultimate. The poster, gasche, made the following parenthetical comment at the end of the L-t-U post:

(The program transformation seems related to the program-level parametricity transformation. Parametricity abstract over equality justifications, differentiation on small differences.)

In this post, I’m going to substantiate gasche’s conjecture by showing how (a very slight simplification of) the theory of changes presented in the paper is formally the same thing as relational parametricity. Where Reynolds’ relational parametricity for System F is a theory of how programs act under change of data representation — i.e., change of types; Cai et al.’s theory of changes is a theory of how programs act under change of values.

## A Theory of Changes, and Change Structures

Cai et al. are interested in translating purely functional programs that map values of type `A` to values of type `B` into programs that map changes in values of type `A` to changes in values of type `B`. The intention is that the translated version of the program that maps changes to changes will more efficiently handle incremental updates to the input than doing a full recomputation.

In any attempt to define such a translation, you are going to need to define what you mean by “changes in the values of type `A`”. To this end, Cai et al. define the following notion of change structure. A change structure (for a particular type) consists of:

1. A set `A` of the possible values of this type;
2. For each value `a ∈ A`, a set `∂A(a)` of possible changes that can be applied to the value `a`.
3. An operation `⊕` that takes a value `a ∈ A` and a change ```δa ∈ ∂A(a)``` and produces a value `a ⊕ δa ∈ A` that represents the result of applying the change `δa` to the value `a`.
4. An operation `⊖` that takes an `a ∈ A` and a `b ∈ A` and produces a change `b ⊖ a ∈ ∂A(a)` that represents the change required to get from `a` to `b`.

These operations are subject to several laws that are laid out in Section 2 of the paper.

Requiring the existence of the `⊖` operation for every change structure seems to be quite strong to me. By requiring `⊖`, we are requiring that for any two values `a` and `b`, there must be at least one change that connects them, and for some reason we have chosen this change out of possibly many choices as “special”. Just by requiring the existence of a change that translates us from `a` to `b`, we are making a commitment that the “space” of `A` values is connected in some way, which may not be justified in some applications. What if there are states that we can get into that are not get-out-able? What if we only want to consider changes that are appending of new data?

As far as I can see, the `⊖` operation is only used in the paper to define the zero change `0 ∈ ∂A(a)` for every value `a ∈ A`, and then is never mentioned again. To me, the existence of a zero change for every value seems much easier to justify than a general `⊖` operation. Therefore, I propose the following mild generalisation of change structure, which only differs from the Cai et al.’s definition in the final point:

1. A set `A` of possible values;
2. For each value `a ∈ A`, a set `∂A(a)` of possible changes that can be applied to the value `a`.
3. An operation `⊕` that takes a value `a ∈ A` and a change ```δa ∈ ∂A(a)``` and produces a value `a ⊕ δa ∈ A` that represents the result of applying the change `δa` to the value `a`.
4. For every `a ∈ A`, a zero change `0 ∈ ∂A(a)`.

Change structures allow Cai et al. to define the notion of a derivative of a function f. The idea is that if `f` is the original function, then `f'` is a function that can compute changes in the output, given the original input value and the change in the input value.

Formally, given a pair of change structures, `(A, ∂A, ⊕, 0)` and ```(B, ∂B, ⊕, 0)```, and a function `f : A → B` Cai et al. define a derivative of `f` to be a function `f' : (a : A) → ∂A(a) → ∂B(f a)` such that: i) `f(a ⊕ δa) = (f a) ⊕ (f' a δa)`; and ii) ```f' a 0 = 0```. (Notes: point ii is actually a lemma in the paper, which I think is provable if you have `⊖`, but I have to state explicitly; also, the paper calls `f'` the derivative, but in general it isn’t unique for a given `f`.)

Given these definitions, we can define a category of change structures: the objects are change structures, and the morphisms are pairs `(f,f')` of functions `f` between the sets of values, and derivatives `f'`. Looking at change structures as a category, we can see that the middle four pages of Cai et al.’s paper, from Section 2.2 to the end of Section 3, essentially boils down to proving that this category is cartesian closed, and so can model the simply typed λ-calculus.

## Relational Parametricity via Reflexive Graphs

Relational parametricity is usually introduced in terms of logical relations. A logical relation is a relation between two interpretations of the same syntactic type `A` that is defined by looking at the “logical” structure of the type `A` (that is: looking at the “logical” constructors `→`, `+`, `×` and so on used to construct `A`). The general idea is that we relate two interpretations of the same type `A` to capture the idea of “change of representation” for that type.

An alternative approach to relational parametricity, that ends up being equivalent when we are just talking about types uses reflexive graphs. This approach is originally due to Robinson and Rosolini in their “Reflexive Graphs and Parametric Polymorphism” paper from 1994. The idea is to abstract logical relations away from types and relations, and to look at just the bare structure of what we are defining. A reflexive graph consists of:

1. A set `O`, which we will call abstract objects.
2. A set `R`, which we will call abstract relations or edges.
3. A function `src : R → O` (“source of an edge”) that maps each edge to its source object.
4. A function `tgt : R → O` (“target of an edge”) that maps each edge to its target object.
5. A function `refl : O → R` (“reflexive edge”) that maps each object `o` to a reflexive edge with source `o` and target `o`.

Conceptually, we can think of the objects in `O` as the actual data we compute with, and the edges in `R` as telling us about abstract relationships between pairs of data items. A reflexive graph morphism will tell us how to map data to data and relationships to relationships, in such a way that the two mappings agree.

Formally, a morphism between two reflexive graphs ```(O₁, R₁, src₁, tgt₁, refl₁)``` and `(O₂, R₂, src₂, tgt₂, refl₂)` consists of a pair of functions `f : O₁ → O₂`, mapping objects to objects, and ```r : R₁ → R₂```, mapping relations to relations, such that the three operations `src`, `tgt` and `refl` are preserved.

As with change structures, we can make reflexive graphs and reflexive graph morphisms into a category, which is cartesian closed for general reasons (specifically, the category of reflexive graphs is a presheaf category, which are always cartesian closed).

I think it is useful to consider reflexive graphs because they are a larger world where normal logical relations live, alongside many other interesting things. We can define a particular reflexive graph that captures the idea of “relation between two interpretations of a type”, but we can also go on to define other reflexive graphs to represent other things, like higher kinds, or geometric transformation groups, and, as I’ll demonstrate below, Cai et al.’s change structures.

In more detail, the reflexive graph that captures the idea of “relation between two interpretations of a type” is defined as having objects that are (small) sets, edges that are triples `(A,B,R)`, where `A` and `B` are small sets, and `R` is binary relation between `A` and `B`. We define `src(A,B,R) = A`, `tgt(A,B,R) = B`. The “reflexive edge” for each small set is defined to be the equality relation: `refl(A) = (A,A,{(a,a) | a ∈ A})`. Making the reflexive edge from every set to itself be equality captures the Identity Extension property identified by Reynolds.

For interpreting System F, and the higher kinded variant System Fω, we interpret kinds (i.e., `*`, `* → *`, `(* → *) → *` and so on) as reflexive graphs, types (possibly with free variables) as morphisms of reflexive graphs, and terms as transformations between morphisms of reflexive graphs. See my paper Relational Parametricity for Higher Kinds for more details.

When interpreting dependent types in a relationally parametric model, we get to make a simplification: we unify the interpretations of types and kinds. Now all types are interpreted as (families of) reflexive graphs, and terms are interpreted as (families of) reflexive graph morphisms. The type of small types, `U`, is interpreted using the reflexive graph `Type` I gave the definition of above. See the paper A Relationally Parametric Model of Dependent Type Theory, that I wrote with Neil and Patty, for more details.

So, the category of reflexive graphs is the natural place for interpreting relational parametricity for type systems that are more expressive than plain System F.

## Change Structures and Reflexive Graphs are equivalent

I’m now going to show that Cai et al.’s change structures (or at least my slight variant) and reflexive graphs are the same thing. More precisely, the categories involved are equivalent, but here I’ll just stick to showing how to convert a change structure to reflexive graph and back.

Given a change structure `(A, ∂A, ⊕, 0)`, we define a reflexive graph as follows:

1. The set of objects is the set `A`.
2. The set of relations is the set consisting of pairs `(a, δa)`, where `a ∈ A` and `δa ∈ ∂A(a)`.
3. The source map is defined as `src(a, δa) = a`.
4. The target map is defined as `tgt(a, δa) = a ⊕ δa`.
5. The reflexive maps is defined as `refl(a) = (a,0)`.

Conversely, given a reflexive graph `(O, R, src, tgt, refl)`, we define a change structure as follows:

1. The set of values is `O`.
2. For each value `o ∈ O`, the set of changes at `o` is ```∂O(o) = { r ∈ R | src(r) = o }```i.e., the set of relations with source `o`.
3. The `⊕` operation is defined as `o ⊕ r = tgt(r)`i.e., we take the target of a relation whose source is `o`.
4. For each `o ∈ O`, the `0` change in `∂O(o)` is `refl(o)`.

If we start with either a change structure or a reflexive graph then translating to the other and back again will always give us something that is isomorphic to what we started with. After a bit more work, we can see that this will give us an equivalence of categories.

## So what?

So we’ve got an equivalence of categories: one intended for interpreting incremental recomputation, and one intended for interpreting relational parametricity. What does this give us?

1. Since the category of reflexive graphs can interpret dependent type theory (see here), the fact that the categories of change structures and reflexive graphs are equivalent means that we automatically have a notion of static differentiation of programs in the sense of Cai et al. for dependently typed languages.

2. I think this equivalence gives us a way to think about relational parametricity. Relational parametricity (and “theorems for free”) are sometimes characterised as holding because programs “cannot inspect their type parameters”, or “don’t know what type is being used”. But the equivalence with change structures indicates to me that thinking of parametricity as a theory of how programs act under changes of their parameters. This viewpoint is explicit in Kennedy’s Relational Parametricity and Units of Measure from 1997, where he treats polymorphism over dimensions as invariance under scaling. However, the “theory of change” viewpoint does not seem to be a common way to think about parametricity or polymorphism in the literature in general.

3. I hope that a unified viewpoint on parametricity and change structures may lead to interesting advances in both. As one example: what about having additional structure on the set of changes? Could we arrange changes into a partial order to say that one change was “bigger” than another? This might give us a way of stating that an incremental recomputation gives us the least change in the output for a given change in the input. Then what does that mean for parametricity? Another example: if the derivative function maps changes in the input to changes in the output, then can we formulate what is required of a function that maps changes back? (Fun exercise: show that “well-behaved lenses” are equivalent to change structure morphisms between certain change structures, where the derivative function partially applied to a value — `f' a : ∂A(a) → ∂B(f a)` — always has an inverse.)