Back in February, the paper A theory of changes for higher-order languages — incrementalizing λ-calculi by static differentiation by Cai, Giarusso, Rendel, and Ostermann, was posted to Lambda-the-Ultimate. The poster, gasche, made the following parenthetical comment at the end of the L-t-U post:

(The program transformation seems related to the program-level parametricity transformation. Parametricity abstract over equality justifications, differentiation on small differences.)

In this post, I’m going to substantiate gasche’s conjecture by showing
how (a very slight simplification of) the theory of changes presented
in the paper is formally *the same thing as* relational
parametricity. Where Reynolds’ relational parametricity for System F
is a theory of how programs act under change of data representation —
*i.e.,* change of types; Cai *et al.*’s theory of changes is a theory
of how programs act under change of values.

Cai *et al.* are interested in translating purely functional programs
that map values of type `A`

to values of type `B`

into programs that
map changes in values of type `A`

to changes in values of type
`B`

. The intention is that the translated version of the program that
maps changes to changes will more efficiently handle incremental
updates to the input than doing a full recomputation.

In any attempt to define such a translation, you are going to need to
define what you mean by “changes in the values of type `A`

”. To this
end, Cai *et al.* define the following notion of *change structure*. A
change structure (for a particular type) consists of:

- A set
`A`

of the possible values of this type; - For each value
`a ∈ A`

, a set`∂A(a)`

of possible changes that can be applied to the value`a`

. - An operation
`⊕`

that takes a value`a ∈ A`

and a change`δa ∈ ∂A(a)`

and produces a value`a ⊕ δa ∈ A`

that represents the result of applying the change`δa`

to the value`a`

. - An operation
`⊖`

that takes an`a ∈ A`

and a`b ∈ A`

and produces a change`b ⊖ a ∈ ∂A(a)`

that represents the change required to get from`a`

to`b`

.

These operations are subject to several laws that are laid out in Section 2 of the paper.

Requiring the existence of the `⊖`

operation for every change
structure seems to be quite strong to me. By requiring `⊖`

, we are
requiring that for *any* two values `a`

and `b`

, there must be at
least one change that connects them, and for some reason we have
chosen this change out of possibly many choices as “special”. Just by
requiring the existence of a change that translates us from `a`

to
`b`

, we are making a commitment that the “space” of `A`

values is
connected in some way, which may not be justified in some
applications. What if there are states that we can get into that are
not get-out-able? What if we only want to consider changes that are
appending of new data?

As far as I can see, the `⊖`

operation is only used in the paper to
define the zero change `0 ∈ ∂A(a)`

for every value `a ∈ A`

, and then
is never mentioned again. To me, the existence of a zero change for
every value seems much easier to justify than a general `⊖`

operation. Therefore, I propose the following mild generalisation of
change structure, which only differs from the Cai *et al.*’s
definition in the final point:

- A set
`A`

of possible values; - For each value
`a ∈ A`

, a set`∂A(a)`

of possible changes that can be applied to the value`a`

. - An operation
`⊕`

that takes a value`a ∈ A`

and a change`δa ∈ ∂A(a)`

and produces a value`a ⊕ δa ∈ A`

that represents the result of applying the change`δa`

to the value`a`

. - For every
`a ∈ A`

, a*zero change*`0 ∈ ∂A(a)`

.

Change structures allow Cai *et al.* to define the notion of a
*derivative* of a function *f*. The idea is that if `f`

is the
original function, then `f'`

is a function that can compute changes in
the output, given the original input value and the change in the input
value.

Formally, given a pair of change structures, `(A, ∂A, ⊕, 0)`

and ```
(B,
∂B, ⊕, 0)
```

, and a function `f : A → B`

Cai *et al.* define a
derivative of `f`

to be a function `f' : (a : A) → ∂A(a) → ∂B(f a)`

such that: *i)* `f(a ⊕ δa) = (f a) ⊕ (f' a δa)`

; and *ii)* ```
f' a 0 =
0
```

. (Notes: point *ii* is actually a lemma in the paper, which I think
is provable if you have `⊖`

, but I have to state explicitly; also, the
paper calls `f'`

*the* derivative, but in general it isn’t unique for
a given `f`

.)

Given these definitions, we can define a category of change
structures: the objects are change structures, and the morphisms are
pairs `(f,f')`

of functions `f`

between the sets of values, and
derivatives `f'`

. Looking at change structures as a category, we can
see that the middle four pages of Cai *et al.*’s paper, from
Section 2.2 to the end of Section 3, essentially boils down to proving
that this category is cartesian closed, and so can model the simply
typed λ-calculus.

Relational parametricity is usually introduced in terms of *logical
relations*. A logical relation is a relation between two
interpretations of the same syntactic type `A`

that is defined by
looking at the “logical” structure of the type `A`

(that is: looking
at the “logical” constructors `→`

, `+`

, `×`

and so on used to
construct `A`

). The general idea is that we relate two interpretations
of the same type `A`

to capture the idea of “change of representation”
for that type.

An alternative approach to relational parametricity, that ends up
being equivalent when we are just talking about types uses *reflexive
graphs*. This approach is originally due to Robinson and Rosolini in
their
“Reflexive Graphs and Parametric Polymorphism” paper from 1994. The
idea is to abstract logical relations away from types and relations,
and to look at just the bare structure of what we are defining. A
reflexive graph consists of:

- A set
`O`

, which we will call abstract*objects*. - A set
`R`

, which we will call abstract*relations*or*edges*. - A function
`src : R → O`

(“source of an edge”) that maps each edge to its source object. - A function
`tgt : R → O`

(“target of an edge”) that maps each edge to its target object. - A function
`refl : O → R`

(“reflexive edge”) that maps each object`o`

to a reflexive edge with source`o`

and target`o`

.

Conceptually, we can think of the objects in `O`

as the actual data we
compute with, and the edges in `R`

as telling us about abstract
relationships between pairs of data items. A reflexive graph morphism
will tell us how to map data to data and relationships to
relationships, in such a way that the two mappings agree.

Formally, a *morphism* between two reflexive graphs ```
(O₁, R₁, src₁, tgt₁,
refl₁)
```

and `(O₂, R₂, src₂, tgt₂, refl₂)`

consists of a pair of
functions `f : O₁ → O₂`

, mapping objects to objects, and ```
r : R₁ →
R₂
```

, mapping relations to relations, such that the three operations
`src`

, `tgt`

and `refl`

are preserved.

As with change structures, we can make reflexive graphs and reflexive graph morphisms into a category, which is cartesian closed for general reasons (specifically, the category of reflexive graphs is a presheaf category, which are always cartesian closed).

I think it is useful to consider reflexive graphs because they are a
larger world where normal logical relations live, alongside many other
interesting things. We can define a particular reflexive graph that
captures the idea of “relation between two interpretations of a type”,
but we can also go on to define other reflexive graphs to represent
other things, like higher kinds, or
geometric transformation groups, and, as
I’ll demonstrate below, Cai *et al.*’s change structures.

In more detail, the reflexive graph that captures the idea of
“relation between two interpretations of a type” is defined as having
objects that are (small) sets, edges that are triples `(A,B,R)`

, where
`A`

and `B`

are small sets, and `R`

is binary relation between `A`

and
`B`

. We define `src(A,B,R) = A`

, `tgt(A,B,R) = B`

. The “reflexive
edge” for each small set is defined to be the equality relation:
`refl(A) = (A,A,{(a,a) | a ∈ A})`

. Making the reflexive edge from
every set to itself be equality captures the *Identity Extension*
property identified by Reynolds.

For interpreting System F, and the higher kinded variant System Fω, we
interpret *kinds* (i.e., `*`

, `* → *`

, `(* → *) → *`

and so on) as
reflexive graphs, *types* (possibly with free variables) as morphisms
of reflexive graphs, and *terms* as transformations between morphisms
of reflexive graphs. See my paper
Relational Parametricity for Higher Kinds
for more details.

When interpreting *dependent types* in a relationally parametric
model, we get to make a simplification: we unify the interpretations
of types and kinds. Now all types are interpreted as (families of)
reflexive graphs, and terms are interpreted as (families of) reflexive
graph morphisms. The type of small types, `U`

, is interpreted using
the reflexive graph `Type`

I gave the definition of above. See the
paper
A Relationally Parametric Model of Dependent Type Theory,
that I wrote with Neil and Patty, for more details.

So, the category of reflexive graphs is the natural place for interpreting relational parametricity for type systems that are more expressive than plain System F.

I’m now going to show that Cai *et al.*’s change structures (or
at least my slight variant) and reflexive graphs are the same
thing. More precisely, the categories involved are equivalent, but
here I’ll just stick to showing how to convert a change structure to
reflexive graph and back.

Given a change structure `(A, ∂A, ⊕, 0)`

, we define a reflexive graph
as follows:

- The set of objects is the set
`A`

. - The set of relations is the set consisting of pairs
`(a, δa)`

, where`a ∈ A`

and`δa ∈ ∂A(a)`

. - The source map is defined as
`src(a, δa) = a`

. - The target map is defined as
`tgt(a, δa) = a ⊕ δa`

. - The reflexive maps is defined as
`refl(a) = (a,0)`

.

Conversely, given a reflexive graph `(O, R, src, tgt, refl)`

, we
define a change structure as follows:

- The set of values is
`O`

. - For each value
`o ∈ O`

, the set of changes at`o`

is`∂O(o) = { r ∈ R | src(r) = o }`

—*i.e.*, the set of relations with source`o`

. - The
`⊕`

operation is defined as`o ⊕ r = tgt(r)`

—*i.e.*, we take the target of a relation whose source is`o`

. - For each
`o ∈ O`

, the`0`

change in`∂O(o)`

is`refl(o)`

.

If we start with either a change structure or a reflexive graph then translating to the other and back again will always give us something that is isomorphic to what we started with. After a bit more work, we can see that this will give us an equivalence of categories.

So we’ve got an equivalence of categories: one intended for interpreting incremental recomputation, and one intended for interpreting relational parametricity. What does this give us?

Since the category of reflexive graphs can interpret dependent type theory (see here), the fact that the categories of change structures and reflexive graphs are equivalent means that we automatically have a notion of static differentiation of programs in the sense of Cai

*et al.*for dependently typed languages.I think this equivalence gives us a way to

*think*about relational parametricity. Relational parametricity (and “theorems for free”) are sometimes characterised as holding because programs “cannot inspect their type parameters”, or “don’t know what type is being used”. But the equivalence with change structures indicates to me that thinking of parametricity as a theory of how programs act under changes of their parameters. This viewpoint is explicit in Kennedy’s Relational Parametricity and Units of Measure from 1997, where he treats polymorphism over dimensions as invariance under scaling. However, the “theory of change” viewpoint does not seem to be a common way to think about parametricity or polymorphism in the literature in general.I hope that a unified viewpoint on parametricity

*and*change structures may lead to interesting advances in both. As one example: what about having additional structure on the set of changes? Could we arrange changes into a partial order to say that one change was “bigger” than another? This might give us a way of stating that an incremental recomputation gives us the*least*change in the output for a given change in the input. Then what does that mean for parametricity? Another example: if the derivative function maps changes in the input to changes in the output, then can we formulate what is required of a function that maps changes back? (Fun exercise: show that “well-behaved lenses” are equivalent to change structure morphisms between certain change structures, where the derivative function partially applied to a value —`f' a : ∂A(a) → ∂B(f a)`

— always has an inverse.)