Let's assume that you're querying to some database stored in the cloud (i.e., on someone else’s computer). Being of a sceptical mind, you worry whether or not the answers you get back are from the database you expect. Or is the cloud lying to you?
Authenticated Data Structures (ADSs) are a proposed solution to this problem. When the server sends back its answers, it also sends back a “proof” that the answer came from the database it claims. You, the client, verify this proof. If the proof doesn't verify, then you’ve got evidence that the server was lying. If the proof does verify, then there is a guarantee that the server’s response is legitimate (usually up to the possibility of a hash collision).
This all seems great, but doesn’t address the question of how anyone might build an ADS, and, crucially, prove that it has the security of unforgable (up to hash collisions) verification. A brute-force way to implement an ADS is for the client to retain a copy of the database, and to check the server’s results against the copy. But then what would be the point of the server?
Merkle Trees are the original example of an ADS. Merkle trees solve the problem of needing a complete copy of the database by having the client store a hash of the database. The server’s proof is then verified against the hash. But what if we want a data structure that isn’t trees? Will we have to implement our own ADS? How will we know that we have done it correctly? Implementing a completely new ADS has three problems. We need: to invent a way to do the authentication, a proof that authentication has been done correctly, and a proof that we have correctly implemented it.
Andrew Miller, Michael Hicks, Jonathan Katz, and Elaine Shi have a solution to this meta-problem in their POPL2014 paper Authenticated Data Structures, Generically, by describing a new language “Lambda-Auth” for implementing correct-by-construction ADSs.
Miller et al. describe a programming language with special constructs for writing ADSs. Once you’ve written an implementation of your data structure, you annotate the implementation with ‘authentication’ markers that indicate points in the structure that act as authentication checkpoints. On the client (verifier) side, each checkpoint becomes a hash code representing the real data, which will be checked against the proof sent by the server. On the server (prover) side, each checkpoint is an indicator of where to generate a piece of proof. The key insight of Miller et al. is that the client and server run the same code, just with two different interpretations of what the authentication checkpoints mean. They are then able to prove, for all programs, that the server and client sides will always agree, and that proofs of authentication are unforgable (up to hash collisions). The authors wrote a PL Enthusiast blog post with a gentle introduction, and a second part with more detail.
In this post, I'll show that in a language with sufficiently powerful abstraction facilities (in this case OCaml), it is possible to implement Miller et al.’s solution as a library within the language, with no need to create a new language, or to alter an existing language’s implementation.
Moreover, although I won’t give any details in this blog post, I claim that the correctness proof given by Miller et al. will be an instance of parametricity for higher-kinded types. I’ll give a few sketchy details at the end.
To get started, I need to isolate a description of the bits and pieces
we need to be able to write authenticated data structures, taking
Miller et al.’s paper as a guide. OCaml provides module signatures as
a handy way of collecting together requirements for the existence of
certain types and functions. I'll call the types and functions we need
for making ADSs an AUTHENTIKIT: a kit for implementing authenticated
data structures:
module type AUTHENTIKIT = sig
The first thing is to postulate the existence of the type constructor
auth that represents the type of authenticated values. This is the
OCaml rendering of the “•” type constructor used in Miller et al.’s
language for representing authenticated values. Note that I haven’t
committed to any implementation of this type: it is left completely
abstract in the AUTHENTIKIT interface:
type 'a auth
We’ll need is a way of describing abstract authenticated computations: computations that are either generating proofs or verifying proofs. In Miller et al.’s setup, authenticated computations are built in to the language. However, OCaml comes with with its own fixed notion of computation, so we have to use a monad to layer our own notions on top:
type 'a authenticated_computation
val return : 'a -> 'a authenticated_computation
val (>>=) : 'a authenticated_computation ->
('a -> 'b authenticated_computation) ->
'b authenticated_computation
The third thing I’ll need is a way of proving that the types of data
that we want to authenticate are “authenticatable”. This will
essentially mean that they are serialisable to a string representation
in way that is suitable for proof construction and verification. The
requirement to be serialisable is hidden in Miller et al.’s formalism
because they assume that all data in the language is
serialisable. This is a somewhat fishy assumption (how do you
serialise and deserialise a function without breaking abstraction
boundaries?), so here I’m being a bit more formal, and requiring the
programmer to build evidence of serialisability using the following
combinators tucked away in a submodule of any AUTHENTIKIT
implementation:
module Authenticatable : sig
type 'a evidence
val auth : 'a auth evidence
val pair : 'a evidence -> 'b evidence -> ('a * 'b) evidence
val sum : 'a evidence -> 'b evidence ->
[`left of 'a | `right of 'b] evidence
val string : string evidence
val int : int evidence
end
Finally, I postulate the existence of the auth and unauth
functions as Miller et al. do. The difference here is that I have
explicitly requested evidence that the data type involved is
authenticatable. Also, the unauth function returns a computation in
our authenticated computation monad, indicating that this is where
some of the work to get an authenticated data structure to work will
happen.
val auth : 'a Authenticatable.evidence -> 'a -> 'a auth
val unauth : 'a Authenticatable.evidence -> 'a auth -> 'a authenticated_computation
end
Before I describe the prover and verifier implementations of the
AUTHENTIKIT module type, I’ll give an example of an authenticated
data structure implementation. I’ll do a basic Merkle tree, with the
following interface:
module type MERKLE =
functor (A : AUTHENTIKIT) -> sig
open A
type path = [`L | `R] list
type tree = [`left of string | `right of tree * tree] auth
val make_leaf : string -> tree
val make_branch : tree -> tree -> tree
val retrieve : path -> tree -> string option authenticated_computation
val update : path -> string -> tree -> tree option authenticated_computation
end
The first thing to note about this module signature is that its
implementations are parameterised by implementations of
AUTHENTIKIT. This will be characteristic of any authenticated data
structure implementation we implement in this style: we need the
freedom to instantiate the implementation with alternative
AUTHENTIKITs to get the prover and verifier sides of the system.
The second thing to note is that the data structure interface is a
very lightly annotated version of a completely boring binary tree
interface. We have types for trees and paths, and operations to
construct trees, query trees, and update trees. The only differences
from a normal interface is that the tree type is annotated with an
extra auth type constructor, indicating how the tree is augmented
with authentication information, and the authenticated_computation
type constructors on the retrieve and update operations,
indicating that they are authenticated operations.
The implementation is likewise a straightforward implementation of a
binary tree in OCaml, with a couple of extra annotations for the
authentication. The Merkle module signature states that
implementations are parameterised by AUTHENTIKITs, so we declare the
Merkle implementation like so:
module Merkle : MERKLE =
functor (A : AUTHENTIKIT) -> struct
open A
The open A introduces all the members of the AUTHENTIKIT into
scope, so we don't have to qualify any names. The first thing to do
is to fulfil the promise of definitions for the types path and
tree that we gave in the interface:
type path = [`L|`R] list
type tree = [`left of string | `right of tree * tree ] auth
Next, we construct some evidence that the body of the tree type is
authenticatable, so we will be able to use auth and unauth on
trees.
let tree : [`left of string | `right of tree * tree] Authenticatable.evidence =
Authenticatable.(sum string (pair auth auth))
The make_leaf and make_branch functions build leaves and branches
using the appropriate constructors.
let make_leaf s =
auth tree (`left s)
let make_branch l r =
auth tree (`right (l,r))
To query our authenticated data structure, we have two functions
retrieve and update. First, the implementation of retrieve which
takes a path and a tree and returns the data identified by that path,
if it exists.
let rec retrieve path t =
unauth tree t >>= fun t ->
match path, t with
| [], `left s -> return (Some s)
| `L::path, `right (l,r) -> retrieve path l
| `R::path, `right (l,r) -> retrieve path r
| _, _ -> return None
As I mentioned above, this is really nothing more than a standard
binary tree search implementation, obfuscated by the need to unwrap
the authenticated input, and the additional >>= and returns needed
to track the authenticated computation. The update implemention is
similar in its similarlity to a standard binary tree update function:
let rec update path v t =
unauth tree t >>= fun t ->
match path, t with
| [], `left _ ->
return (Some (make_leaf v))
| `L::path, `right (l,r) ->
(update path v l >>= function
| None -> return None
| Some l' -> return (Some (make_branch l' r)))
| `R::path, `right (l,r) ->
(update path v r >>= function
| None -> return None
| Some r' -> return (Some (make_branch l r')))
| _ ->
return None
Finally, an end to complete the definition of Merkle:
end
Just to emphasise again, this implementation is essentially the same
as a standard binary tree implementation written in OCaml. In the
code, we didn’t have to mention anything to do with authenticated data
structures, except for placing auth and unauth in the correct
places. And even if we get in a muddle doing that, the OCaml type
checker will helpfully inform us where we’ve made a mistake. One
wrinkle is that I've had to give up writing in “direct style” and had
to write in monadic style.
The Merkle module is not yet ready for use. To use it we need an
implementation of the AUTHENTIKIT interface. The different
implementations of AUTHENTIKIT will correspond to the different
semantics in Miller et al.’s presentation: the prover and the
verifier.
The first implementation of AUTHENTIKIT I'll give is the prover.
The prover constructs proofs, which I'll represent as lists of JSON values, using the Ezjsonm library.
type proof = Ezjsonm.value list
We will also need to compute cryptographic hashes of JSON values, which I'll do using the Cryptokit library. I'm using the SHA1 algorithm here, but obviously any secure hashing algorithm would work.
let hash_json =
let hash_algo = Cryptokit.Hash.sha1 () in
fun json_value ->
Cryptokit.hash_string hash_algo (Ezjsonm.to_string (`A json_value))
The module Prover implements the AUTHENTIKIT signature, plus the
additional knowledge that a) authenticated computations generate
proofs as well as values, and b) that we have a get_hash function
that returns the hashed representation of any authenticated value. The
prover’s interface is captured by its module signature:
module Prover : sig
include AUTHENTIKIT with type 'a authenticated_computation = proof * 'a
val get_hash : 'a auth -> string
end = struct
First, we implement the prover’s view of authenticated values by
defining its implementation of the auth type. The prover sees an
authenticated value as a pair of an underlying value x and a hash of
x’s representation:
type 'a auth = 'a * string
let get_hash (a,h) = h
It will be up to the rest of the functions in this module to maintain the invariant that the second half of the pair will always be the hash code of the serialised representation of the first half.
Authenticated computations on the prover side are represented using a specialisation of the standard Writer monad, which collects a proof (list of JSON values) as a side-effect. This implementation is quite slow, because I have used lists, but could be speeded up by using a better data structure.
type 'a authenticated_computation =
proof * 'a
let return a =
([], a)
let (>>=) (prf,a) f =
let (prf',b) = f a in
(prf @ prf',b)
The prover's view of authenticatable values are ones for which it is possible to serialise them to JSON. We represent the evidence that such a thing is possible as the existence of a function from values to JSON values.
module Authenticatable = struct
type 'a evidence = 'a -> Ezjsonm.value
let auth (a,h) =
`String h
let pair a_serialiser b_serialiser (a,b) =
`A [a_serialiser a; b_serialiser b]
let sum a_serialiser b_serialiser = function
| `left a -> `A [`String "left"; a_serialiser a]
| `right b -> `A [`String "right"; b_serialiser b]
let string s = `String s
let int i = `String (string_of_int i)
end
In the auth case, we only serialise the hash code, not the
underlying value. This ensures that the prover does not end up sending
whole data structures back to the client. In the int case, I'm
serialising OCaml’s int values as strings, to avoid complications
arising from JSON’s use of floating point representations for numbers.
Now the auth and unauth functions. Creation of authenticated
values means pairing the value with its hashed serialised
representation:
let auth serialiser a =
(a, hash_json (serialiser a)))
Extracting the underlying value from an authenticated value has the “side effect” of producing a step in the proof, which is the JSON representation of the value we expect to see:
let unauth serialiser (a, h) =
([serialiser a], a)
Finally, we complete the definition of Prover with an end:
end
We get a prover-side Merkle tree implementation by instantiating the
Merkle module with the Prover implementation of AUTHENTIKIT:
module Merkle_Prover = Merkle (Prover)
Let’s make a little prover-side tree, in the OCaml REPL:
# let tree =
Merkle_Prover.(make_branch
(make_branch (make_leaf "a") (make_leaf "b"))
(make_branch (make_leaf "c") (make_leaf "d")));;
val tree : Merkle_Prover.tree = <abstr>
The returned value is abstract, because it is really a pair of the
underlying data and it hash code. Using the prover’s get_hash
function, we can get the hash code of the tree, which is what the
verifier will use to authenticate the prover’s actions:
# let code = Prover.get_hash tree;;
val code : string = ".z\129w\199J\224\\\254\220\bo\246W\158\243S\029\177\190"
We can also ask the prover to do queries on the tree. For example, if we ask for the value at position “left, left”, it returns the result “a”, and a proof that the verifier will be able to use to check that this result actually came from the tree with the hash code above.
# let proof, result = Merkle_Prover.retrieve [`L;`L] tree;;
val proof : proof =
[`A [ `String "right"
; `A [ `String "?\250m&,\251\129\031\r\252QJ\001\141|d}\242\016l"
; `String "i?B\230p\158D\201\248\145\000\1400p\224\018\023\219\1935"
]
]
; `A [ `String "right"
; `A [ `String "X\140\005\028\146\1891L\224\246\224\229\201\018o\b\187\163\240\160"
; `String "\223\231\194\230\1362=\157\187\226;?\143>\127\248\014;\201\254"
]
]
; `A [`String "left"; `String "a"]
]
val result : string option = Some "a"
This proof looks quite long, and in this case is several times the size of the original database. So it might look like Merkle trees might not save us anything. In general, however, the proof size is logarithmic in the size of the tree and so will be much smaller than sending the entire tree.
We’ve got a prover generating query responses and proofs, but this is a bit useless if we don’t have a verifier to check them.
Our verifier is another implementation of the AUTHENTIKIT signature,
with the additional information that authenticated computations are
now functions that consume proofs and return either a value (and
possibly some left-over proof) or return failure. Also, we specify
that, on the verifier side, an authenticated value is represented as
just its hash code (represented as an OCaml string).
module Verifier : sig
include AuthentiKit
with type 'a authenticated_computation =
proof -> [ `Ok of proof * 'a | `ProofFailure ]
and type 'a auth = string
end = struct
The basic idea behind the verifier is that, given a hash code and a proof, it checks the hash code against the proof, and then uses the hash code to rebuild the parts of the data structure that will be explored by the program.
We start by fulfilling our statement that the verifier’s view of authenticated values is as OCaml strings:
type 'a auth = string
And that the verifier’s version of authenticated computations is as a “parser” of proofs. Note how the definitions here are very similar to the definitions used for parser combinators. In some sense, verification is a process of “parsing” the proof sequence supplied by the prover.
type 'a authenticated_computation =
json list -> [`Ok of proof * 'a | `ProofFailure]
let return a =
fun proof -> `Ok (proof, a)
let (>>=) c f =
fun prfs ->
match c prfs with
| `ProofFailure -> `ProofFailure
| `Ok (prfs',a) -> f a prfs'
The verifier’s view of authenticable values is slightly more involved
than the prover’s, because it needs to be able to serialise and
deserialise values to and from JSON. There isn’t anything particularly
special going on here, but we do have to be careful to ensure that
this implementation uses the same format as the prover’s. (An obvious
improvement is to make the Prover and Verifier implementations
share an implementation of this sub-module.)
module Authenticatable = struct
type 'a evidence =
{ serialise : 'a -> Ezjsonm.value
; deserialise : Ezjsonm.value -> 'a option
}
let auth =
let serialise h = `String h
and deserialise = function
| `String s -> Some s
| _ -> None
in
{ serialise; deserialise }
let pair a_s b_s =
let serialise (a,b) =
`A [a_s.serialise a; b_s.serialise b]
and deserialise = function
| `A [x;y] ->
(match a_s.deserialise x, b_s.deserialise y with
| Some a, Some b -> Some (a,b)
| _ -> None)
| _ ->
None
in
{ serialise; deserialise }
let sum a_s b_s =
let serialise = function
| `left a -> `A [`String "left"; a_s.serialise a]
| `right b -> `A [`String "right"; b_s.serialise b]
and deserialise = function
| `A [`String "left"; x] ->
(match a_s.deserialise x with
| Some a -> Some (`left a)
| _ -> None)
| `A [`String "right"; y] ->
(match b_s.deserialise y with
| Some b -> Some (`right b)
| _ -> None)
| _ ->
None
in
{ serialise; deserialise }
let string =
let serialise s = `String s
and deserialise = function
| `String s -> Some s
| _ -> None
in
{ serialise; deserialise }
let int =
let serialise i = `String (string_of_int i)
and deserialise = function
| `String i -> (try Some (int_of_string i) with Failure _ -> None)
| _ -> None
in
{ serialise; deserialise }
end
Now we get to the crucial auth and unauth functions. Creation of
authenticated values on the verifier side is nothing more than
serialising them and computing their hash code:
open Authenticatable
let auth auth_evidence a =
hash_json (auth_evidence.serialise a))
Finally, we get to the actual proof checking. The unauth function is
supplied with a) a (de)serialiser for the type of value to be
produced; b) a hash code for the expected value; and c) a proof
which will be used to reconstitute the actual value. If we have run
out of proof elements, then we fail immediately. Otherwise, we check
that the hash code of the first item in the proof is the same as our
required code. If so, we deserialise the proof item and return it with
the remainder of the proof. If the hash codes do not match, then the
verifier reports that proof checking has failed.
let unauth auth_evidence h proof = match proof with
| [] -> `ProofFailure
| p::ps when hash_json p = h ->
(match auth_evidence.deserialise p with
| None -> `ProofFailure
| Some a -> `Ok (ps, a))
| _ -> `ProofFailure
Finally, the Verifier ends with an end:
end
Above, we used the Merkle_Prover module to generate a small Merkle
tree and run a query on it. We can now verify the prover’s execution
of retrieve. Recall that we extracted a hash code representing the
tree from the prover’s representation:
val code : string = ".z\129w\199J\224\\\254\220\bo\246W\158\243S\029\177\190"
Let’s assume that this code was conveyed to the client somehow, and the client trusts that it is an accurate representation of the tree it wants to query.
Now the client/verifier asks the server/prover to perform a query. The server sends back the result and a proof, which we computed above:
val proof : proof =
[`A [ `String "right"
; `A [ `String "?\250m&,\251\129\031\r\252QJ\001\141|d}\242\016l"
; `String "i?B\230p\158D\201\248\145\000\1400p\224\018\023\219\1935"
]
]
; `A [ `String "right"
; `A [ `String "X\140\005\028\146\1891L\224\246\224\229\201\018o\b\187\163\240\160"
; `String "\223\231\194\230\1362=\157\187\226;?\143>\127\248\014;\201\254"
]
]
; `A [`String "left"; `String "a"]
]
val result : string option = Some "a"
We can now use Merkle_Verifier to verify the prover’s proof against
code:
# Merkle_Verifier.retrieve [`L;`L] code proof;;
- : [ `Ok of proof * string option | `ProofFailure ] = `Ok ([], Some "a")
Let’s try simulating an attempt by the prover to trick the verifier by running the query against a different tree. First we create a tree with the same shape, but with different values in it:
let other_tree =
Merkle_Prover.(make_branch
(make_branch (make_leaf "A") (make_leaf "B"))
(make_branch (make_leaf "C") (make_leaf "D")));;
Now we run the “left, left” query on this alternative tree, and we get back a result and a proof:
# let proof, result = Merkle_Prover.retrieve [`L;`L] other_tree;;
val proof : proof =
[ `A [`String "right"
; `A [`String "Q\217G\000\246\238!\248\212\127\194\184\179>\017zW\0182("
; `String "0\239\238\002b4\172\145\127\143\002@-=g\179\197\022\154|"
]
]
; `A [`String "right"
; `A [`String "\157\134\234N\234QS\136\165\196\0038\133j\018uY\133\030\005"
; `String "\2272|\223M\230\241\132\167\029-\016\141\221\005nx\239\190\184"
]
]
; `A [`String "left"; `String "A"]
]
val result : bytes option = Some "A"
We send the result and proof back to the sceptical client, who verifies the proof against their hash code:
# Merkle_Verifier.retrieve [`L;`L] code proof;;
- : [ `Ok of proof * bytes option | `ProofFailure ] = `ProofFailure
It fails! The cheating server has been thwarted.
Of course, the examples above don’t prove that the server/prover can never cheat. For their calculus, Miller et al. provide a proof that the prover and verifier implementations satisfy the following property, which I’ve stated informally here:
If the verifier accepts a proof (i.e., returns
Ok) then either it came from the prover, or is the result of a hash collision.
If we assume that hash collisions are unlikely (Miller et al. make this assumption more formal), then we get the security property we want: up to the possibility of hash collision, we can spot cheating clouds.
I claim, but I’m not going to prove here, that this property is
provable as an instance of the
“Free Theorem”
we get from the fact that the Merkle implementation is parameterised
by AUTHENTIKIT implementations. The full proof involves
F-ing Modules
to translate the use of modules and functors into the higher-kinded
calculus System Fω, my
Relational Parametricity for Higher Kinds
for the higher-kinded version of free theorems, and Katsumata’s
A Semantic Formulation of TT-lifting and Logical Predicates for Computational Metalanguage
to handle the authenticated computations monads.