Let's assume that you're querying to some database stored in the cloud (i.e., on someone else’s computer). Being of a sceptical mind, you worry whether or not the answers you get back are from the database you expect. Or is the cloud lying to you?

Authenticated Data Structures (ADSs) are a proposed solution to this problem. When the server sends back its answers, it also sends back a “proof” that the answer came from the database it claims. You, the client, verify this proof. If the proof doesn't verify, then you’ve got evidence that the server was lying. If the proof does verify, then there is a guarantee that the server’s response is legitimate (usually up to the possibility of a hash collision).

This all seems great, but doesn’t address the question of how anyone might build an ADS, and, crucially, prove that it has the security of unforgable (up to hash collisions) verification. A brute-force way to implement an ADS is for the client to retain a copy of the database, and to check the server’s results against the copy. But then what would be the point of the server?

Merkle Trees are the
original example of an ADS. Merkle trees solve the problem of needing
a complete copy of the database by having the client store a *hash* of
the database. The server’s proof is then verified against the
hash. But what if we want a data structure that isn’t trees? Will we
have to implement our own ADS? How will we know that we have done it
correctly? Implementing a completely new ADS has three problems. We
need: to invent a way to do the authentication, a proof that
authentication has been done correctly, and a proof that we have
correctly implemented it.

Andrew Miller, Michael Hicks, Jonathan Katz, and Elaine Shi have a solution to this meta-problem in their POPL2014 paper Authenticated Data Structures, Generically, by describing a new language “Lambda-Auth” for implementing correct-by-construction ADSs.

Miller et al. describe a programming language with special constructs
for writing ADSs. Once you’ve written an implementation of your data
structure, you annotate the implementation with ‘authentication’
markers that indicate points in the structure that act as
authentication checkpoints. On the client (verifier) side, each
checkpoint becomes a hash code representing the real data, which will
be checked against the proof sent by the server. On the server
(prover) side, each checkpoint is an indicator of where to generate a
piece of proof. The key insight of Miller et al. is that the client
and server run *the same code*, just with two different
interpretations of what the authentication checkpoints mean. They are
then able to prove, *for all programs*, that the server and client
sides will always agree, and that proofs of authentication are
unforgable (up to hash collisions). The authors wrote a
PL Enthusiast blog post
with a gentle introduction, and
a second part
with more detail.

In this post, I'll show that in a language with sufficiently powerful
abstraction facilities (in this case OCaml), it is
possible to implement Miller et al.’s solution as a *library* within
the language, with no need to create a new language, or to alter an
existing language’s implementation.

Moreover, although I won’t give any details in this blog post, I claim that the correctness proof given by Miller et al. will be an instance of parametricity for higher-kinded types. I’ll give a few sketchy details at the end.

To get started, I need to isolate a description of the bits and pieces
we need to be able to write authenticated data structures, taking
Miller et al.’s paper as a guide. OCaml provides module signatures as
a handy way of collecting together requirements for the existence of
certain types and functions. I'll call the types and functions we need
for making ADSs an `AUTHENTIKIT`

: a kit for implementing authenticated
data structures:

`module type AUTHENTIKIT = sig`

The first thing is to postulate the existence of the type constructor
`auth`

that represents the type of authenticated values. This is the
OCaml rendering of the “•” type constructor used in Miller et al.’s
language for representing authenticated values. Note that I haven’t
committed to any implementation of this type: it is left completely
abstract in the `AUTHENTIKIT`

interface:

` type 'a auth`

We’ll need is a way of describing abstract authenticated computations: computations that are either generating proofs or verifying proofs. In Miller et al.’s setup, authenticated computations are built in to the language. However, OCaml comes with with its own fixed notion of computation, so we have to use a monad to layer our own notions on top:

```
type 'a authenticated_computation
val return : 'a -> 'a authenticated_computation
val (>>=) : 'a authenticated_computation ->
('a -> 'b authenticated_computation) ->
'b authenticated_computation
```

The third thing I’ll need is a way of proving that the types of data
that we want to authenticate are “authenticatable”. This will
essentially mean that they are serialisable to a string representation
in way that is suitable for proof construction and verification. The
requirement to be serialisable is hidden in Miller et al.’s formalism
because they assume that all data in the language is
serialisable. This is a somewhat fishy assumption (how do you
serialise and deserialise a function without breaking abstraction
boundaries?), so here I’m being a bit more formal, and requiring the
programmer to build evidence of serialisability using the following
combinators tucked away in a submodule of any `AUTHENTIKIT`

implementation:

```
module Authenticatable : sig
type 'a evidence
val auth : 'a auth evidence
val pair : 'a evidence -> 'b evidence -> ('a * 'b) evidence
val sum : 'a evidence -> 'b evidence ->
[`left of 'a | `right of 'b] evidence
val string : string evidence
val int : int evidence
end
```

Finally, I postulate the existence of the `auth`

and `unauth`

functions as Miller et al. do. The difference here is that I have
explicitly requested evidence that the data type involved is
authenticatable. Also, the `unauth`

function returns a computation in
our authenticated computation monad, indicating that this is where
some of the work to get an authenticated data structure to work will
happen.

```
val auth : 'a Authenticatable.evidence -> 'a -> 'a auth
val unauth : 'a Authenticatable.evidence -> 'a auth -> 'a authenticated_computation
end
```

Before I describe the prover and verifier implementations of the
`AUTHENTIKIT`

module type, I’ll give an example of an authenticated
data structure implementation. I’ll do a basic Merkle tree, with the
following interface:

```
module type MERKLE =
functor (A : AUTHENTIKIT) -> sig
open A
type path = [`L | `R] list
type tree = [`left of string | `right of tree * tree] auth
val make_leaf : string -> tree
val make_branch : tree -> tree -> tree
val retrieve : path -> tree -> string option authenticated_computation
val update : path -> string -> tree -> tree option authenticated_computation
end
```

The first thing to note about this module signature is that its
implementations are parameterised by implementations of
`AUTHENTIKIT`

. This will be characteristic of any authenticated data
structure implementation we implement in this style: we need the
freedom to instantiate the implementation with alternative
`AUTHENTIKIT`

s to get the prover and verifier sides of the system.

The second thing to note is that the data structure interface is a
very lightly annotated version of a completely boring binary tree
interface. We have types for trees and paths, and operations to
construct trees, query trees, and update trees. The only differences
from a normal interface is that the tree type is annotated with an
extra `auth`

type constructor, indicating how the tree is augmented
with authentication information, and the `authenticated_computation`

type constructors on the `retrieve`

and `update`

operations,
indicating that they are authenticated operations.

The implementation is likewise a straightforward implementation of a
binary tree in OCaml, with a couple of extra annotations for the
authentication. The `Merkle`

module signature states that
implementations are parameterised by `AUTHENTIKITs`

, so we declare the
`Merkle`

implementation like so:

```
module Merkle : MERKLE =
functor (A : AUTHENTIKIT) -> struct
open A
```

The `open A`

introduces all the members of the `AUTHENTIKIT`

into
scope, so we don't have to qualify any names. The first thing to do
is to fulfil the promise of definitions for the types `path`

and
`tree`

that we gave in the interface:

```
type path = [`L|`R] list
type tree = [`left of string | `right of tree * tree ] auth
```

Next, we construct some evidence that the body of the tree type is
authenticatable, so we will be able to use `auth`

and `unauth`

on
trees.

```
let tree : [`left of string | `right of tree * tree] Authenticatable.evidence =
Authenticatable.(sum string (pair auth auth))
```

The `make_leaf`

and `make_branch`

functions build leaves and branches
using the appropriate constructors.

```
let make_leaf s =
auth tree (`left s)
let make_branch l r =
auth tree (`right (l,r))
```

To query our authenticated data structure, we have two functions
`retrieve`

and `update`

. First, the implementation of `retrieve`

which
takes a path and a tree and returns the data identified by that path,
if it exists.

```
let rec retrieve path t =
unauth tree t >>= fun t ->
match path, t with
| [], `left s -> return (Some s)
| `L::path, `right (l,r) -> retrieve path l
| `R::path, `right (l,r) -> retrieve path r
| _, _ -> return None
```

As I mentioned above, this is really nothing more than a standard
binary tree search implementation, obfuscated by the need to unwrap
the authenticated input, and the additional `>>=`

and `return`

s needed
to track the authenticated computation. The `update`

implemention is
similar in its similarlity to a standard binary tree update function:

```
let rec update path v t =
unauth tree t >>= fun t ->
match path, t with
| [], `left _ ->
return (Some (make_leaf v))
| `L::path, `right (l,r) ->
(update path v l >>= function
| None -> return None
| Some l' -> return (Some (make_branch l' r)))
| `R::path, `right (l,r) ->
(update path v r >>= function
| None -> return None
| Some r' -> return (Some (make_branch l r')))
| _ ->
return None
```

Finally, an `end`

to complete the definition of `Merkle`

:

` end`

Just to emphasise again, this implementation is essentially the same
as a standard binary tree implementation written in OCaml. In the
code, we didn’t have to mention anything to do with authenticated data
structures, except for placing `auth`

and `unauth`

in the correct
places. And even if we get in a muddle doing that, the OCaml type
checker will helpfully inform us where we’ve made a mistake. One
wrinkle is that I've had to give up writing in “direct style” and had
to write in monadic style.

The `Merkle`

module is not yet ready for use. To use it we need an
implementation of the `AUTHENTIKIT`

interface. The different
implementations of `AUTHENTIKIT`

will correspond to the different
semantics in Miller et al.’s presentation: the prover and the
verifier.

The first implementation of `AUTHENTIKIT`

I'll give is the prover.

The prover constructs proofs, which I'll represent as lists of JSON values, using the Ezjsonm library.

`type proof = Ezjsonm.value list`

We will also need to compute cryptographic hashes of JSON values, which I'll do using the Cryptokit library. I'm using the SHA1 algorithm here, but obviously any secure hashing algorithm would work.

```
let hash_json =
let hash_algo = Cryptokit.Hash.sha1 () in
fun json_value ->
Cryptokit.hash_string hash_algo (Ezjsonm.to_string (`A json_value))
```

The module `Prover`

implements the `AUTHENTIKIT`

signature, plus the
additional knowledge that a) authenticated computations generate
proofs as well as values, and b) that we have a `get_hash`

function
that returns the hashed representation of any authenticated value. The
prover’s interface is captured by its module signature:

```
module Prover : sig
include AUTHENTIKIT with type 'a authenticated_computation = proof * 'a
val get_hash : 'a auth -> string
end = struct
```

First, we implement the prover’s view of authenticated values by
defining its implementation of the `auth`

type. The prover sees an
authenticated value as a pair of an underlying value `x`

and a hash of
`x`

’s representation:

```
type 'a auth = 'a * string
let get_hash (a,h) = h
```

It will be up to the rest of the functions in this module to maintain the invariant that the second half of the pair will always be the hash code of the serialised representation of the first half.

Authenticated computations on the prover side are represented using a specialisation of the standard Writer monad, which collects a proof (list of JSON values) as a side-effect. This implementation is quite slow, because I have used lists, but could be speeded up by using a better data structure.

```
type 'a authenticated_computation =
proof * 'a
let return a =
([], a)
let (>>=) (prf,a) f =
let (prf',b) = f a in
(prf @ prf',b)
```

The prover's view of authenticatable values are ones for which it is possible to serialise them to JSON. We represent the evidence that such a thing is possible as the existence of a function from values to JSON values.

```
module Authenticatable = struct
type 'a evidence = 'a -> Ezjsonm.value
let auth (a,h) =
`String h
let pair a_serialiser b_serialiser (a,b) =
`A [a_serialiser a; b_serialiser b]
let sum a_serialiser b_serialiser = function
| `left a -> `A [`String "left"; a_serialiser a]
| `right b -> `A [`String "right"; b_serialiser b]
let string s = `String s
let int i = `String (string_of_int i)
end
```

In the `auth`

case, we only serialise the hash code, not the
underlying value. This ensures that the prover does not end up sending
whole data structures back to the client. In the `int`

case, I'm
serialising OCaml’s `int`

values as strings, to avoid complications
arising from JSON’s use of floating point representations for numbers.

Now the `auth`

and `unauth`

functions. Creation of authenticated
values means pairing the value with its hashed serialised
representation:

```
let auth serialiser a =
(a, hash_json (serialiser a)))
```

Extracting the underlying value from an authenticated value has the “side effect” of producing a step in the proof, which is the JSON representation of the value we expect to see:

```
let unauth serialiser (a, h) =
([serialiser a], a)
```

Finally, we complete the definition of `Prover`

with an `end`

:

`end`

We get a prover-side Merkle tree implementation by instantiating the
`Merkle`

module with the `Prover`

implementation of `AUTHENTIKIT`

:

`module Merkle_Prover = Merkle (Prover)`

Let’s make a little prover-side tree, in the OCaml REPL:

```
# let tree =
Merkle_Prover.(make_branch
(make_branch (make_leaf "a") (make_leaf "b"))
(make_branch (make_leaf "c") (make_leaf "d")));;
val tree : Merkle_Prover.tree = <abstr>
```

The returned value is abstract, because it is really a pair of the
underlying data and it hash code. Using the prover’s `get_hash`

function, we can get the hash code of the tree, which is what the
verifier will use to authenticate the prover’s actions:

```
# let code = Prover.get_hash tree;;
val code : string = ".z\129w\199J\224\\\254\220\bo\246W\158\243S\029\177\190"
```

We can also ask the prover to do queries on the tree. For example, if we ask for the value at position “left, left”, it returns the result “a”, and a proof that the verifier will be able to use to check that this result actually came from the tree with the hash code above.

```
# let proof, result = Merkle_Prover.retrieve [`L;`L] tree;;
val proof : proof =
[`A [ `String "right"
; `A [ `String "?\250m&,\251\129\031\r\252QJ\001\141|d}\242\016l"
; `String "i?B\230p\158D\201\248\145\000\1400p\224\018\023\219\1935"
]
]
; `A [ `String "right"
; `A [ `String "X\140\005\028\146\1891L\224\246\224\229\201\018o\b\187\163\240\160"
; `String "\223\231\194\230\1362=\157\187\226;?\143>\127\248\014;\201\254"
]
]
; `A [`String "left"; `String "a"]
]
val result : string option = Some "a"
```

This proof looks quite long, and in this case is several times the size of the original database. So it might look like Merkle trees might not save us anything. In general, however, the proof size is logarithmic in the size of the tree and so will be much smaller than sending the entire tree.

We’ve got a prover generating query responses and proofs, but this is a bit useless if we don’t have a verifier to check them.

Our verifier is another implementation of the `AUTHENTIKIT`

signature,
with the additional information that authenticated computations are
now functions that consume proofs and return either a value (and
possibly some left-over proof) or return failure. Also, we specify
that, on the verifier side, an authenticated value is represented as
just its hash code (represented as an OCaml string).

```
module Verifier : sig
include AuthentiKit
with type 'a authenticated_computation =
proof -> [ `Ok of proof * 'a | `ProofFailure ]
and type 'a auth = string
end = struct
```

The basic idea behind the verifier is that, given a hash code and a proof, it checks the hash code against the proof, and then uses the hash code to rebuild the parts of the data structure that will be explored by the program.

We start by fulfilling our statement that the verifier’s view of authenticated values is as OCaml strings:

` type 'a auth = string`

And that the verifier’s version of authenticated computations is as a “parser” of proofs. Note how the definitions here are very similar to the definitions used for parser combinators. In some sense, verification is a process of “parsing” the proof sequence supplied by the prover.

```
type 'a authenticated_computation =
json list -> [`Ok of proof * 'a | `ProofFailure]
let return a =
fun proof -> `Ok (proof, a)
let (>>=) c f =
fun prfs ->
match c prfs with
| `ProofFailure -> `ProofFailure
| `Ok (prfs',a) -> f a prfs'
```

The verifier’s view of authenticable values is slightly more involved
than the prover’s, because it needs to be able to serialise and
deserialise values to and from JSON. There isn’t anything particularly
special going on here, but we do have to be careful to ensure that
this implementation uses the same format as the prover’s. (An obvious
improvement is to make the `Prover`

and `Verifier`

implementations
share an implementation of this sub-module.)

```
module Authenticatable = struct
type 'a evidence =
{ serialise : 'a -> Ezjsonm.value
; deserialise : Ezjsonm.value -> 'a option
}
let auth =
let serialise h = `String h
and deserialise = function
| `String s -> Some s
| _ -> None
in
{ serialise; deserialise }
let pair a_s b_s =
let serialise (a,b) =
`A [a_s.serialise a; b_s.serialise b]
and deserialise = function
| `A [x;y] ->
(match a_s.deserialise x, b_s.deserialise y with
| Some a, Some b -> Some (a,b)
| _ -> None)
| _ ->
None
in
{ serialise; deserialise }
let sum a_s b_s =
let serialise = function
| `left a -> `A [`String "left"; a_s.serialise a]
| `right b -> `A [`String "right"; b_s.serialise b]
and deserialise = function
| `A [`String "left"; x] ->
(match a_s.deserialise x with
| Some a -> Some (`left a)
| _ -> None)
| `A [`String "right"; y] ->
(match b_s.deserialise y with
| Some b -> Some (`right b)
| _ -> None)
| _ ->
None
in
{ serialise; deserialise }
let string =
let serialise s = `String s
and deserialise = function
| `String s -> Some s
| _ -> None
in
{ serialise; deserialise }
let int =
let serialise i = `String (string_of_int i)
and deserialise = function
| `String i -> (try Some (int_of_string i) with Failure _ -> None)
| _ -> None
in
{ serialise; deserialise }
end
```

Now we get to the crucial `auth`

and `unauth`

functions. Creation of
authenticated values on the verifier side is nothing more than
serialising them and computing their hash code:

```
open Authenticatable
let auth auth_evidence a =
hash_json (auth_evidence.serialise a))
```

Finally, we get to the actual proof checking. The `unauth`

function is
supplied with *a)* a (de)serialiser for the type of value to be
produced; *b)* a hash code for the expected value; and *c)* a proof
which will be used to reconstitute the actual value. If we have run
out of proof elements, then we fail immediately. Otherwise, we check
that the hash code of the first item in the proof is the same as our
required code. If so, we deserialise the proof item and return it with
the remainder of the proof. If the hash codes do not match, then the
verifier reports that proof checking has failed.

```
let unauth auth_evidence h proof = match proof with
| [] -> `ProofFailure
| p::ps when hash_json p = h ->
(match auth_evidence.deserialise p with
| None -> `ProofFailure
| Some a -> `Ok (ps, a))
| _ -> `ProofFailure
```

Finally, the `Verifier`

ends with an `end`

:

`end`

Above, we used the `Merkle_Prover`

module to generate a small Merkle
tree and run a query on it. We can now verify the prover’s execution
of `retrieve`

. Recall that we extracted a hash code representing the
tree from the prover’s representation:

`val code : string = ".z\129w\199J\224\\\254\220\bo\246W\158\243S\029\177\190"`

Let’s assume that this code was conveyed to the client somehow, and the client trusts that it is an accurate representation of the tree it wants to query.

Now the client/verifier asks the server/prover to perform a query. The server sends back the result and a proof, which we computed above:

```
val proof : proof =
[`A [ `String "right"
; `A [ `String "?\250m&,\251\129\031\r\252QJ\001\141|d}\242\016l"
; `String "i?B\230p\158D\201\248\145\000\1400p\224\018\023\219\1935"
]
]
; `A [ `String "right"
; `A [ `String "X\140\005\028\146\1891L\224\246\224\229\201\018o\b\187\163\240\160"
; `String "\223\231\194\230\1362=\157\187\226;?\143>\127\248\014;\201\254"
]
]
; `A [`String "left"; `String "a"]
]
val result : string option = Some "a"
```

We can now use `Merkle_Verifier`

to verify the prover’s proof against
`code`

:

```
# Merkle_Verifier.retrieve [`L;`L] code proof;;
- : [ `Ok of proof * string option | `ProofFailure ] = `Ok ([], Some "a")
```

Let’s try simulating an attempt by the prover to trick the verifier by running the query against a different tree. First we create a tree with the same shape, but with different values in it:

```
let other_tree =
Merkle_Prover.(make_branch
(make_branch (make_leaf "A") (make_leaf "B"))
(make_branch (make_leaf "C") (make_leaf "D")));;
```

Now we run the “left, left” query on this alternative tree, and we get back a result and a proof:

```
# let proof, result = Merkle_Prover.retrieve [`L;`L] other_tree;;
val proof : proof =
[ `A [`String "right"
; `A [`String "Q\217G\000\246\238!\248\212\127\194\184\179>\017zW\0182("
; `String "0\239\238\002b4\172\145\127\143\002@-=g\179\197\022\154|"
]
]
; `A [`String "right"
; `A [`String "\157\134\234N\234QS\136\165\196\0038\133j\018uY\133\030\005"
; `String "\2272|\223M\230\241\132\167\029-\016\141\221\005nx\239\190\184"
]
]
; `A [`String "left"; `String "A"]
]
val result : bytes option = Some "A"
```

We send the result and proof back to the sceptical client, who verifies the proof against their hash code:

```
# Merkle_Verifier.retrieve [`L;`L] code proof;;
- : [ `Ok of proof * bytes option | `ProofFailure ] = `ProofFailure
```

It fails! The cheating server has been thwarted.

Of course, the examples above don’t prove that the server/prover can never cheat. For their calculus, Miller et al. provide a proof that the prover and verifier implementations satisfy the following property, which I’ve stated informally here:

If the verifier accepts a proof (i.e., returns

`Ok`

) then either it came from the prover, or is the result of a hash collision.

If we assume that hash collisions are unlikely (Miller et al. make this assumption more formal), then we get the security property we want: up to the possibility of hash collision, we can spot cheating clouds.

I claim, but I’m not going to prove here, that this property is
provable as an instance of the
“Free Theorem”
we get from the fact that the `Merkle`

implementation is parameterised
by `AUTHENTIKIT`

implementations. The full proof involves
F-ing Modules
to translate the use of modules and functors into the higher-kinded
calculus System Fω, my
Relational Parametricity for Higher Kinds
for the higher-kinded version of free theorems, and Katsumata’s
A Semantic Formulation of TT-lifting and Logical Predicates for Computational Metalanguage
to handle the authenticated computations monads.