Computing Cryptographic Hashes for Cyclic Dependencies

 

"Much more than encryption algorithms, one-way hash functions are the workhorses of modern cryptography." —Bruce Schneier

 

Cryptographic hashes (or one-way hash functions) allow us to compute a digest that uniquely identifies a resource. If we make a small change anywhere in a resource, its digest also changes—drastically, because of the Avalanche effect.

 

crypto-hashes-001.png
Figure 1. Notice the small, single letter change in the input resource in the third row and the corresponding, drastic changes to its digest. Cf. citation.

 

This characteristic makes the hashes very practical for detecting changes in applications that deal with dependency trees. If we include the cryptographic hashes of the dependencies of a resource in the resource's own cryptographic hash, we have a cheap way to check if a resource or any of its dependencies were changed. This pattern is commonly used in the software world. For example, think of git hashes, blockchains, and the nix build system.

 

Implementation is theoretically not that hard, given a good cryptographic library. However, it gets trickier when we want to allow dependency cycles. At that point, we get a bit of a chicken-and-egg problem: we need resource A’s hash to compute resource B’s hash, but that in turn depends on B’s hash again!

 

Cyclic Cloud Resources

Let's start by giving a more concrete example of how we encountered this problem. In Fugue, we are dealing with constructing complicated cloud infrastructure. Cyclic dependencies between resources inevitably pop up once in awhile.

 

For example, it’s not uncommon to have two Security Groups that refer to each other. Imagine you have a default security group, which contains the core of your application, and a monitoring security group which performs health checks. Two considerations are key:

 

  • You want to allow traffic from monitoring to default in order to perform external sanity checks;
  • You also want to allow traffic from default to monitoring in order to perform heartbeat-like checks or to have your application send detailed information.

 

Configuring these in mainstream configuration languages often leads to annoying cyclic dependency issues, which in turn require annoying and verbose workarounds. In Fugue, we use a statically typed, declarative programming language called Ludwig to configure infrastructure. This allows us to solve the problem in a neat way.

 

In this blog post, we’ll study how we can solve part of the problem—detecting changes in these possibly cyclic dependency graphs.

 

Literate Haskell

It’s useful to step back for a quick moment and make sure you’ll have easy access to the logic and code involved in our study. This blogpost is written in Literate Haskell, which is an implementation of literate programming for the Haskell programming language. The term, literate programming, was coined in the early 1980s by Dr. Donald Knuth, who cited CWEB as his favorite programming language.

 

In literate programming, we communicate an idea by presenting a human-readable text interspersed with snippets of code. This is the reverse of "normal" programming, where we intersperse our code with comment snippets.

 

It’s perfect for technical blog posts … and, of course, we really like Haskell.

 

As a result, you can compile this blog post easily with a Haskell compiler, or even better, load it into the GHCi REPL and play around with it. In order to do that, download the code to this blog post and load it in GHCi:

<code>$ ghci cyclic-hash.lhs<span class="hljs-typename">GHCi, <span class="hljs-variable">version</span> 7.10.3:</span> <span class="hljs-record">http://<span class="hljs-variable">www</span>.<span class="hljs-variable">haskell</span>.<span class="hljs-variable">org</span>/<span class="hljs-variable">ghc</span>/ :? <span class="hljs-variable">for</span> <span class="hljs-variable">help</span></span><span class="hljs-array">[<span class="hljs-number">1</span> <span class="hljs-keyword">of</span> <span class="hljs-number">1</span>]</span> Compiling Data.CyclicHash  ( cyclic-hash.lhs, interpreted )<span class="hljs-typename">Ok, <span class="hljs-variable">modules</span> <span class="hljs-variable">loaded</span>:</span> Data.CyclicHash.*Data.CyclicHash&gt;</code>

In order for the blog post to work as a standalone Haskell module, we need to do some setup first. We start with a reasonably standard Haskell module header and declare the language pragmas that we need:

<code><span class="xml"></span><span class="hljs-expression">{<span class="hljs-variable">-</span><span class="hljs-begin-block"># LANGUAGE BangPatterns </span><span class="hljs-begin-block">#-</span>}</span><span class="xml"> </span><span class="hljs-expression">{<span class="hljs-variable">-</span><span class="hljs-begin-block"># LANGUAGE FlexibleContexts </span><span class="hljs-begin-block">#-</span>}</span><span class="xml"> </span><span class="hljs-expression">{<span class="hljs-variable">-</span><span class="hljs-begin-block"># LANGUAGE GADTs </span><span class="hljs-begin-block">#-</span>}</span><span class="xml"> </span><span class="hljs-expression">{<span class="hljs-variable">-</span><span class="hljs-begin-block"># LANGUAGE MultiParamTypeClasses </span><span class="hljs-begin-block">#-</span>}</span><span class="xml"> </span><span class="hljs-expression">{<span class="hljs-variable">-</span><span class="hljs-begin-block"># LANGUAGE TypeFamilies </span><span class="hljs-begin-block">#-</span>}</span><span class="xml"></span></code>

Then, we describe the exported interface of this module. Following good technical design standards, we try to keep this to a minimum:

<code class="language-haskell"><span class="hljs-module"><span class="hljs-keyword">module</span> Data.CyclicHash <span class="hljs-container">( <span class="hljs-type">CyclicHash</span> (..)</span> , HashField <span class="hljs-container">(..)</span> , cyclicHashes ) <span class="hljs-keyword">where</span></span></code>

Lastly, we have to import a number of other modules we rely on. Most of these modules ship with the base library provided by the compiler, and the others are provided by three very commonly used libraries: containers, hashable, and unordered-containers.

<code><span class="hljs-typename">import C<span class="hljs-variable">ontrol</span>.M<span class="hljs-variable">onad</span> (<span class="hljs-variable">forM_</span>) <span class="hljs-variable">import</span> D<span class="hljs-variable">ata</span>.H<span class="hljs-variable">ashable</span> (H<span class="hljs-variable">ashable</span>, <span class="hljs-variable">hashWithSalt</span>) <span class="hljs-variable">import</span> D<span class="hljs-variable">ata</span>.L<span class="hljs-variable">ist</span> (<span class="hljs-variable">foldl</span>') <span class="hljs-variable">import</span> D<span class="hljs-variable">ata</span>.M<span class="hljs-variable">aybe</span> (<span class="hljs-variable">fromMaybe</span>, <span class="hljs-variable">maybeToList</span>) <span class="hljs-variable">import</span> T<span class="hljs-variable">ext</span>.P<span class="hljs-variable">rintf</span> (<span class="hljs-variable">printf</span>) <span class="hljs-variable">import</span> <span class="hljs-variable">qualified</span> D<span class="hljs-variable">ata</span>.G<span class="hljs-variable">raph</span> <span class="hljs-variable">as</span> G <span class="hljs-variable">import</span> <span class="hljs-variable">qualified</span> D<span class="hljs-variable">ata</span>.H<span class="hljs-variable">ashMap</span>.S<span class="hljs-variable">trict</span> <span class="hljs-variable">as</span> HMS</span></code>

 

Problem Definition

The simple example of cyclic dependencies that we’ll explore can be modeled in a trust graph. We want to compute cryptographic hashes for all the people in this trust graph. Because trust can be mutual, we quickly encounter the problem of dependency cycles here.

 

Let's look at the data type we will use to model our problem:

<code class="language-haskell"><span class="hljs-typedef"><span class="hljs-keyword">type</span> <span class="hljs-type">Username</span> = <span class="hljs-type">String</span></span><span class="hljs-typedef"><span class="hljs-keyword">data</span> <span class="hljs-type">Person</span> = <span class="hljs-type">Person</span></span>    { pUserName :: <span class="hljs-type">Username</span>    , pFullName :: <span class="hljs-type">String</span>    , pTrusts   :: [<span class="hljs-type">Username</span>]    } <span class="hljs-keyword">deriving</span> (<span class="hljs-type">Show</span>)</code>

A Person has a Username, which we will use as a key, and a full name. A Person also has a list of other people he or she trusts. The cryptographic hash of a Person consists of the username, the full name, and the hashes of the people this person trusts.

 

The CyclicHash Class

We start out with a class that captures the essence of computing hashes for cyclic data structures.

 

We need an associated type synonym, Key, which gives us a unique key for the value. We also have a corresponding key function to obtain this key.

 

Usually the hash of a data type is a hash of all the fields in that data structure. This is why we have the second important function, fields. It returns a list of all the fields we want to hash:

<code class="language-haskell"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-container">(<span class="hljs-type">Hashable</span> (<span class="hljs-type">Key</span> <span class="hljs-title">a</span>)</span>, <span class="hljs-type">Ord</span> <span class="hljs-container">(<span class="hljs-type">Key</span> <span class="hljs-title">a</span>)</span>) =&gt; <span class="hljs-type">CyclicHash</span> a <span class="hljs-keyword">where</span></span>    <span class="hljs-typedef"><span class="hljs-keyword">type</span> <span class="hljs-type">Key</span> a</span>    key    :: a -&gt; <span class="hljs-type">Key</span> a    fields :: a -&gt; [<span class="hljs-type">HashField</span> a]</code>

The fields we want to hash are represented using an intermediate data type.

 

We make a distinction between two kinds of fields:

 

  • Primitive fields (Int, String, and other things that have a reasonable Hashable instance). These just contain the value that needs to be hashed.
  • Dependency fields which refer to other values. These are the ones we need to be careful about, since they can introduce dependency cycles.

 

Note:

<code class="language-haskell"><span class="hljs-typedef"><span class="hljs-keyword">data</span> <span class="hljs-type">HashField</span> a where</span>    <span class="hljs-type">PrimitiveField</span>  :: <span class="hljs-type">Hashable</span> b =&gt; b -&gt; <span class="hljs-type">HashField</span> a    <span class="hljs-type">DependencyField</span> :: <span class="hljs-type">Key</span> a -&gt; <span class="hljs-type">HashField</span> a</code>

We can now write down the CyclicHash instance for Person. It is fairly straightforward: pUserName and pFullName contribute to the hash as primitive fields, and we create a DependencyField for every person in pTrusts:

<code class="language-haskell"><span class="hljs-class"><span class="hljs-keyword">instance</span> <span class="hljs-type">CyclicHash</span> <span class="hljs-type">Person</span> <span class="hljs-keyword">where</span></span>    <span class="hljs-typedef"><span class="hljs-keyword">type</span> <span class="hljs-type">Key</span> <span class="hljs-type">Person</span> = <span class="hljs-type">Username</span></span>    key    p = pUserName p    fields p =        [ <span class="hljs-type">PrimitiveField</span> (pUserName p)        , <span class="hljs-type">PrimitiveField</span> (pFullName p)        ] ++        map <span class="hljs-type">DependencyField</span> (pTrusts p)</code>

Lastly, let's add an auxiliary method that gives us simply the dependencies of a node. We can do this by finding all the DependencyField fields and concatenating their keys in a list:

<code class="language-haskell"><span class="hljs-title">dependencies</span> :: <span class="hljs-type">CyclicHash</span> a =&gt; a -&gt; [<span class="hljs-type">Key</span> a]<span class="hljs-title">dependencies</span> node = [k | <span class="hljs-type">DependencyField</span> k &lt;- fields node]</code>

 

Aside: The Hashable Library

In this blogpost, we use the hashable library because of its easy interface and wide adoption. However, we must keep in mind that this library does not provide us with cryptographically strong hash functions.

 

In production code, it would be better to use an algorithm like SHA-2. The idea behind the cyclic hashing stays the same though, so updating this code to use such an algorithm is not a hard task.

 

A Primitive Hash

We can start by defining a "primitive" hash function, which does not look at the dependencies of a node:

<code class="language-haskell"><span class="hljs-title">primitiveHash</span> :: <span class="hljs-type">CyclicHash</span> a =&gt; <span class="hljs-type">Int</span> -&gt; a -&gt; <span class="hljs-type">Int</span><span class="hljs-title">primitiveHash</span> salt =    foldl' hashField salt . fields  <span class="hljs-keyword">where</span>    hashField s (<span class="hljs-type">PrimitiveField</span>  x) =        s `hashWithSalt` (<span class="hljs-number">0</span> :: <span class="hljs-type">Int</span>) `hashWithSalt` x    hashField s (<span class="hljs-type">DependencyField</span> k) =        s `hashWithSalt` (<span class="hljs-number">1</span> :: <span class="hljs-type">Int</span>) `hashWithSalt` k</code>

 

A Component Hash

Now, we want some way to take the hashes of the dependencies into account. But that is impossible to do if we have a trust graph like this:

crypto-hashes-003.png
Figure 2. A simple trust graph.

 

In Haskell, this looks like:

<code class="language-haskell"><span class="hljs-title">_people</span> :: [<span class="hljs-type">Person</span>]<span class="hljs-title">_people</span> =    [ <span class="hljs-type">Person</span> <span class="hljs-string">"rob"</span>    <span class="hljs-string">"Robert Smith"</span>    [<span class="hljs-string">"mary"</span>]    , <span class="hljs-type">Person</span> <span class="hljs-string">"mary"</span>   <span class="hljs-string">"Mary Johnson"</span>   [<span class="hljs-string">"rob"</span>]    , <span class="hljs-type">Person</span> <span class="hljs-string">"susan"</span>  <span class="hljs-string">"Susan Williams"</span>  []    , <span class="hljs-type">Person</span> <span class="hljs-string">"david"</span>  <span class="hljs-string">"David Brown"</span>     [<span class="hljs-string">"mary"</span>, <span class="hljs-string">"susan"</span>]    ]</code>

Mary's hash depends on Rob's hash and vice versa! An elegant way to solve this is provided to us by computing the strongly connected components of the graph.

 

We will be able to compute the strongly connected components using the vastly underrated Data.Graph module. The output for the example graph we saw above is:

crypto-hashes-002.png
Figure 3. Output for the simple trust graph.

 

A very important property of the resulting components is that, if we look at the graph of components, there cannot be any cycles. This is easy to see; if there were a cycle between two or more components, they would belong to the same cyclic component!

 

Now, we can compute a hash per component. The simplest strongly connected component is a single vertex (represented by the G.AcyclicSCC constructor). In this case, we can rely on the primitiveHash function we saw above. Otherwise, we have a list of vertices (G.CyclicSCC). We can hash these together by using a fold.

 

Note that this hash also contains the shape of the component:

<code class="language-haskell"><span class="hljs-title">componentHash</span> :: <span class="hljs-type">CyclicHash</span> a =&gt; <span class="hljs-type">Int</span> -&gt; <span class="hljs-type">G</span>.<span class="hljs-type">SCC</span> a -&gt; <span class="hljs-type">Int</span><span class="hljs-title">componentHash</span> salt0 (<span class="hljs-type">G</span>.<span class="hljs-type">AcyclicSCC</span> x) =    salt0 `hashWithSalt` (<span class="hljs-number">0</span> :: <span class="hljs-type">Int</span>) `primitiveHash` x<span class="hljs-title">componentHash</span> salt0 (<span class="hljs-type">G</span>.<span class="hljs-type">CyclicSCC</span> xs0) =    salt0 `hashWithSalt` (<span class="hljs-number">1</span> :: <span class="hljs-type">Int</span>) `hashPrimitives` xs0  <span class="hljs-keyword">where</span>    <span class="hljs-comment">-- | Chain of primitiveHash.</span>    hashPrimitives salt = foldl' primitiveHash salt</code>

This gives us something like the following for our example; we use <+> as an informal operator to denote a combination of hashes:

<code class="language-haskell">componentHash(A) = 0 &lt;+&gt; primitiveHash(susan)componentHash(B) = 1 &lt;+&gt; primitiveHash(rob) &lt;+&gt; primitiveHash(mary)componentHash(C) = 0 &lt;+&gt; primitiveHash(david)</code>

Note that the <+> operator is not commutative, so the order in which we do the fold is important. Fortunately, the Data.Graph library takes care of that for us. The nodes returned in a component are sorted in the order we need.

 

A Component Hash Per Node

With the help of the componentHash function, we can define another function which gives us the map containing the hash for each component, by node:

<code class="language-haskell"><span class="hljs-title">componentHashes</span> :: <span class="hljs-type">CyclicHash</span> a =&gt; <span class="hljs-type">Int</span> -&gt; [<span class="hljs-type">G</span>.<span class="hljs-type">SCC</span> a] -&gt; <span class="hljs-type">HMS</span>.<span class="hljs-type">HashMap</span> (<span class="hljs-type">Key</span> a) <span class="hljs-type">Int</span><span class="hljs-title">componentHashes</span> salt sccs = <span class="hljs-type">HMS</span>.fromList $ <span class="hljs-keyword">do</span>    scc  &lt;- sccs    <span class="hljs-keyword">let</span> !hash = componentHash salt scc    node &lt;- <span class="hljs-type">G</span>.flattenSCC scc    return (key node, hash)</code>

For the example graph, this gives us the following mapping:

<code class="language-haskell">rob   =&gt; componentHash(B)david =&gt; componentHash(C)susan =&gt; componentHash(A)mary  =&gt; componentHash(B)</code>

 

The Cyclic Hash

Now, we have almost all the ingredients to compute the full cyclic hash for a value. What we want to do is easy to informally define, but a bit harder to implement.

 

Since we know there are no cyclic dependencies between the strongly connected components of the graph, we can run through them in order now.

 

1. Let's start with component A which is the simplest. Since the nodes in this component have no dependencies whatsoever, we get:

<code class="language-haskell">cyclicHash(susan) = primitiveHash(susan)</code>

2. For component B, it's a bit more complicated. Ideally, we would have something like this, which is of course impossible:

<code class="language-haskell">cyclicHash(rob) = primitiveHash(rob) &lt;+&gt; cyclicHash(mary)cyclicHash(mary) = primitiveHash(mary) &lt;+&gt; cyclicHash(rob)</code>

But we can fix this by, instead of taking the cyclic hashes of nodes in the same component, taking the component hashes, since these hashes also contain the whole component! This gives us:

<code class="language-haskell">cyclicHash(rob) = primitiveHash(rob) &lt;+&gt; componentHash(B)cyclicHash(mary) = primitiveHash(mary) &lt;+&gt; componentHash(B)</code>

3. When we arrive at component C, we have already computed the cyclic hashes of its dependencies, so we simply get:

<code class="language-haskell">cyclicHash(david) =    primitiveHash(david) &lt;+&gt;    cyclicHash(mary)      &lt;+&gt;    cyclicHash(susan)</code>

The implementation of cyclicHash is similar to primitiveHash, except we now pass in an argument dependencyHashes. The idea is that this will contain the cyclicHashes of actual dependencies and the componentHashes of cyclic dependencies (which are in the same component). Fortunately, we don't really need to check which of these cases we are in—we can just grab the hash from the map:

<code class="language-haskell"><span class="hljs-title">cyclicHash</span> :: <span class="hljs-type">CyclicHash</span> a =&gt; <span class="hljs-type">HMS</span>.<span class="hljs-type">HashMap</span> (<span class="hljs-type">Key</span> a) <span class="hljs-type">Int</span> -&gt; <span class="hljs-type">Int</span> -&gt; a -&gt; <span class="hljs-type">Int</span><span class="hljs-title">cyclicHash</span> dependencyHashes salt0 = foldl' hashField salt0 . fields  <span class="hljs-keyword">where</span>    hashField !s (<span class="hljs-type">PrimitiveField</span>  x) =        s `hashWithSalt` (<span class="hljs-number">0</span> :: <span class="hljs-type">Int</span>) `hashWithSalt` x    hashField !s (<span class="hljs-type">DependencyField</span> k) =        <span class="hljs-keyword">let</span> !x = fromMaybe <span class="hljs-number">0</span> $ <span class="hljs-type">HMS</span>.lookup k dependencyHashes <span class="hljs-keyword">in</span>        s `hashWithSalt` (<span class="hljs-number">1</span> :: <span class="hljs-type">Int</span>) `hashWithSalt` x</code>

 

Tying It All Together

Finally, it's time to put everything together in a single function that users can call. In fact, this is the only top-level function exported from this module!

 

The function takes a number of values and returns the values paired with their hashes. Of course, the interesting bit is how we construct fullHashes:

<code class="language-haskell"><span class="hljs-title">cyclicHashes</span> :: <span class="hljs-type">CyclicHash</span> a =&gt; <span class="hljs-type">Int</span> -&gt; [a] -&gt; [(a, <span class="hljs-type">Int</span>)]<span class="hljs-title">cyclicHashes</span> salt0 nodes =    [ (n, h)    | n &lt;- nodes    , h &lt;- maybeToList $ <span class="hljs-type">HMS</span>.lookup (key n) fullHashes    ]</code>

We start off by calculating the strongly connected components using the Data.Graph library. Then, we calculate a hash for every component, using the componentHashes we defined before:

<code class="language-haskell">  <span class="hljs-keyword">where</span>    comps      = <span class="hljs-type">G</span>.stronglyConnComp [(n, key n, dependencies n) | n &lt;- nodes]    compHashes = componentHashes salt0 comps</code>

It is important to note that G.stronglyConnComp returns the components in topological order. For our use case, this means that the components are in the "right" order with regards to dependencies: if y depends on x, x will always be placed before y in the list.

 

As a result, we can just fold through the comps components. The accumulator we use is the map we calculated using componentHashes. Initially, this map contains the componentHash for every node, but as we fold through the list, we update it to contain the full hash for every node. This is exactly what cyclicHash expects!

<code class="language-haskell">    fullHashes = foldl'        (acc scc -&gt;            <span class="hljs-keyword">let</span> !newHashes = <span class="hljs-type">HMS</span>.fromList                      [ (key x, cyclicHash acc salt0 x)                      | x &lt;- <span class="hljs-type">G</span>.flattenSCC scc                      ]            <span class="hljs-keyword">in</span> <span class="hljs-type">HMS</span>.union newHashes acc) compHashes comps</code>

 

Full Example Code

Let's add a quick printing function to check the results:

<code class="language-haskell"><span class="hljs-title">_printCyclicHashes</span> :: (<span class="hljs-type">CyclicHash</span> a, <span class="hljs-type">Show</span> (<span class="hljs-type">Key</span> a)) =&gt; [a] -&gt; <span class="hljs-type">IO</span> ()<span class="hljs-title">_printCyclicHashes</span> values =    forM_ (cyclicHashes <span class="hljs-number">42</span> values) $ (x, hash) -&gt;        printf <span class="hljs-string">"%s: %xn"</span> (show (key x)) hash</code>
<code class="language-haskell">*<span class="hljs-type">Data</span>.<span class="hljs-type">CyclicHash</span>&gt; _printCyclicHashes _people<span class="hljs-string">"rob"</span>: <span class="hljs-number">9923</span>cc533bb3432<span class="hljs-string">"mary"</span>: e8c6608ddd959521<span class="hljs-string">"susan"</span>: <span class="hljs-number">6</span>dd70fb3a3e127ed<span class="hljs-string">"david"</span>: <span class="hljs-number">86</span>b49a12d999cc35</code>

Then, after we edit, for example, mary's name, we get:

<code class="language-haskell">*<span class="hljs-type">Data</span>.<span class="hljs-type">CyclicHash</span>&gt; _printCyclicHashes _people<span class="hljs-string">"rob"</span>: <span class="hljs-number">201403e3</span>ca292874<span class="hljs-string">"mary"</span>: <span class="hljs-number">8370448e0</span>b8598cf<span class="hljs-string">"susan"</span>: <span class="hljs-number">6</span>dd70fb3a3e127ed<span class="hljs-string">"david"</span>: <span class="hljs-number">853624228</span>d76fe7f</code>

Seems like it is behaving as expected!

 

It goes without saying that algorithms like this need more rigorous testing, but the concept holds up to initial scrutiny.

 

Secure Your Cloud

Find security and compliance violations in your cloud infrastructure and ensure they never happen again.