<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title><![CDATA[Tim Taubert]]></title>
  <link href="https://timtaubert.de/atom.xml" rel="self"/>
  <link href="https://timtaubert.de/"/>
  <updated>2020-05-18T09:59:29+02:00</updated>
  <id>https://timtaubert.de/</id>
  <author>
    <name><![CDATA[Tim Taubert]]></name>
    
  </author>
  <generator uri="http://octopress.org/">Octopress</generator>

  
  <entry>
    <title type="html"><![CDATA[Bitslicing With Quine-McCluskey]]></title>
    <link href="https://timtaubert.de/blog/2018/08/bitslicing-with-quine-mccluskey/"/>
    <updated>2018-08-27T15:00:00+02:00</updated>
    <id>https://timtaubert.de/blog/2018/08/bitslicing-with-quine-mccluskey</id>
    <content type="html"><![CDATA[<p>Part one gave a short introduction of bitslicing as a concept, talked about
its use cases, truth tables, software multiplexers, LUTs, and manual optimization.</p>

<p>The second covered <a href="https://en.wikipedia.org/wiki/Karnaugh_map">Karnaugh mapping</a>,
a visual method to simplify Boolean algebra expressions that takes advantage of
humans’ pattern-recognition capability, but is unfortunately limited to at most
four inputs in its original variant.</p>

<p>Part three will introduce the <a href="https://en.wikipedia.org/wiki/Quine%E2%80%93McCluskey_algorithm">Quine-McCluskey algorithm</a>,
a tabulation method that, in combination with <a href="https://en.wikipedia.org/wiki/Petrick%27s_method">Petrick&rsquo;s method</a>,
can minimize circuits with an arbitrary number of input values. Both are relatively simple to implement in software.</p>

<blockquote><p><a href="https://timtaubert.de/blog/2018/08/bitslicing-an-introduction/">Part 1: Bitslicing, An Introduction</a><br/>
<a href="https://timtaubert.de/blog/2018/08/bitslicing-with-karnaugh-maps/">Part 2: Bitslicing with Karnaugh maps</a><br/>
Part 3: Bitslicing with Quine-McCluskey</p></blockquote>

<h2>The Quine-McCluskey algorithm</h2>

<p>Here is the 3-to-2-bit <a href="https://en.wikipedia.org/wiki/S-box">S-box</a> from the
previous posts again:</p>

<figure class='code'><div class="highlight"><pre><span class="k">uint8_t</span> <span class="n">SBOX</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">0</span> <span class="p">};</span>
</pre></div></figure>


<p>Without much ado, we&rsquo;ll jump right in and bitslice functions for both its
output bits in parallel. You&rsquo;ll probably recognize a few similarities to K-maps,
except that the steps are rather mechanical and don&rsquo;t require visual
pattern-recognition abilities.</p>

<h3>Step 1: Listing minterms</h3>

<p>The lookup table <code>SBOX[]</code> can be expressed as the Boolean functions
<em>f<sub>L</sub>(a,b,c) and </em>f<sub>R</sub>(a,b,c). Here are their truth tables,
with each combination of inputs assigned a symbol <em>m<sub>i</sub></em>. Rows
<em>m<sub>0</sub>-m<sub>7</sub></em> will be called <em>minterms</em>.</p>

<div class="table-wrapper minterms">
  <table>
    <caption>f<sub>L</sub>(a,b,c)</caption>
    <thead>
      <tr>
        <th></th>
        <th>a</th>
        <th>b</th>
        <th>c</th>
        <th>f<sub>L</sub></th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>m<sub>0</sub></td><td>0</td><td>0</td><td>0</td><td>0</td>
      </tr>
      <tr>
        <td>m<sub>1</sub></td><td>0</td><td>0</td><td>1</td><td>0</td>
      </tr>
      <tr>
        <td>m<sub>2</sub></td><td>0</td><td>1</td><td>0</td><td>1</td>
      </tr>
      <tr>
        <td>m<sub>3</sub></td><td>0</td><td>1</td><td>1</td><td>0</td>
      </tr>
      <tr>
        <td>m<sub>4</sub></td><td>1</td><td>0</td><td>0</td><td>1</td>
      </tr>
      <tr>
        <td>m<sub>5</sub></td><td>1</td><td>0</td><td>1</td><td>1</td>
      </tr>
      <tr>
        <td>m<sub>6</sub></td><td>1</td><td>1</td><td>0</td><td>1</td>
      </tr>
      <tr>
        <td>m<sub>7</sub></td><td>1</td><td>1</td><td>1</td><td>0</td>
      </tr>
    </tbody>
  </table>

  <table>
    <caption>f<sub>R</sub>(a,b,c)</caption>
    <thead>
      <tr>
        <th></th>
        <th>a</th>
        <th>b</th>
        <th>c</th>
        <th>f<sub>R</sub></th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>m<sub>0</sub></td><td>0</td><td>0</td><td>0</td><td>1</td>
      </tr>
      <tr>
        <td>m<sub>1</sub></td><td>0</td><td>0</td><td>1</td><td>0</td>
      </tr>
      <tr>
        <td>m<sub>2</sub></td><td>0</td><td>1</td><td>0</td><td>1</td>
      </tr>
      <tr>
        <td>m<sub>3</sub></td><td>0</td><td>1</td><td>1</td><td>1</td>
      </tr>
      <tr>
        <td>m<sub>4</sub></td><td>1</td><td>0</td><td>0</td><td>0</td>
      </tr>
      <tr>
        <td>m<sub>5</sub></td><td>1</td><td>0</td><td>1</td><td>0</td>
      </tr>
      <tr>
        <td>m<sub>6</sub></td><td>1</td><td>1</td><td>0</td><td>1</td>
      </tr>
      <tr>
        <td>m<sub>7</sub></td><td>1</td><td>1</td><td>1</td><td>0</td>
      </tr>
    </tbody>
  </table>
</div>


<p>We&rsquo;re interested only in the minterms where the function evaluates to <code>1</code> and
will ignore all others. Boolean functions can already be constructed with just
those tables. In <a href="https://en.wikipedia.org/wiki/Boolean_algebra">Boolean algebra</a>,
<em>OR</em> can be expressed as addition, <em>AND</em> as multiplication. The negation of <em>x</em>
is represented by <em><span style="text-decoration:overline">x</span></em>.</p>

<pre>
f<sub>L</sub>(a,b,c) = ∑ m(2,4,5,6)
          = m<sub>2</sub> + m<sub>4</sub> + m<sub>5</sub> + m<sub>6</sub>
          = <span style="text-decoration:overline">a</span>b<span style="text-decoration:overline">c</span> + a<span style="text-decoration:overline">b</span><span style="text-decoration:overline">c</span> + a<span style="text-decoration:overline">b</span>c + ab<span style="text-decoration:overline">c</span>

f<sub>R</sub>(a,b,c) = ∑ m(0,2,3,6)
          = m<sub>0</sub> + m<sub>2</sub> + m<sub>3</sub> + m<sub>6</sub>
          = <span style="text-decoration:overline">a</span><span style="text-decoration:overline">b</span><span style="text-decoration:overline">c</span> + <span style="text-decoration:overline">a</span>b<span style="text-decoration:overline">c</span> + a<span style="text-decoration:overline">b</span><span style="text-decoration:overline">c</span> + ab<span style="text-decoration:overline">c</span>
</pre>


<p>Well, that&rsquo;s a start. Translated into C, these functions would be constant-time
but not even close to minimal.</p>

<h3>Step 2: Bit Buckets</h3>

<p>Now that we have all these minterms, we&rsquo;ll put them in separate buckets based
on the number of <code>1</code>s in their inputs <em>a</em>, <em>b</em>, and <em>c</em>.</p>

<div class="table-wrapper buckets">
  <table>
    <caption>f<sub>L</sub>(a,b,c)</caption>
    <thead>
      <tr>
        <th># of 1s</th>
        <th>minterm</th>
        <th>binary</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>1</td><td>m<sub>2</sub></td><td >010</td>
      </tr>
      <tr>
        <td></td><td>m<sub>4</sub></td><td>100</td>
      </tr>
      <tr>
        <td>2</td><td>m<sub>5</sub></td><td>101</td>
      </tr>
      <tr>
        <td></td><td>m<sub>6</sub></td><td>110</td>
      </tr>
    </tbody>
  </table>

  <table>
    <caption>f<sub>R</sub>(a,b,c)</caption>
    <thead>
      <tr>
        <th># of 1s</th><th>minterm</th><th>binary</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>0</td><td>m<sub>0</sub></td><td>000</td>
      </tr>
      <tr>
        <td>1</td><td>m<sub>2</sub></td><td>010</td>
      </tr>
      <tr>
        <td>2</td><td>m<sub>3</sub></td><td>011</td>
      </tr>
      <tr>
        <td></td><td>m<sub>6</sub></td><td>110</td>
      </tr>
    </tbody>
  </table>
</div>


<p>The reasoning here is the same as the <a href="https://en.wikipedia.org/wiki/Gray_code">Gray code</a>
ordering for Karnaugh maps. If we start with the minterms in the first bucket <em>n</em>,
only bucket <em>n+1</em> might contain matching minterms where only a single variable
changes. They can&rsquo;t be in any of the other buckets.</p>

<h3>Step 3: Merging minterms</h3>

<p>Why would you even look for pairs of minterms with a one-variable difference?
Because they can be merged to simplify our expression. These combinations are
called <em>minterms of size 2</em>.</p>

<p>All minterms have output <code>1</code>, so if the only difference is exactly one input
variable, then the output is independent of it. For example, <code>(a &amp; ~b &amp; c) | (a &amp; b &amp; c)</code>
can be reduced to just <code>a &amp; c</code>, the expression value is independent of <em>b</em>.</p>

<div class="table-wrapper buckets size2">
  <table>
    <caption>f<sub>L</sub>(a,b,c)</caption>
    <thead>
      <tr>
        <th># of 1s</th>
        <th>minterm</th>
        <th>binary</th>
        <th colspan="2">size-2</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>1</td><td>m<sub>2</sub></td><td >010</td><td>m<sub>2,6</sub></td><td>—10</td>
      </tr>
      <tr>
        <td></td><td>m<sub>4</sub></td><td>100</td><td>m<sub>4,5</sub></td><td>10—</td>
      </tr>
      <tr>
        <td></td><td></td><td></td><td>m<sub>4,6</sub></td><td>1—0</td>
      </tr>
      <tr>
        <td>2</td><td>m<sub>5</sub></td><td>101</td><td></td><td></td>
      </tr>
      <tr>
        <td></td><td>m<sub>6</sub></td><td>110</td><td></td><td></td>
      </tr>
    </tbody>
  </table>

  <table>
    <caption>f<sub>R</sub>(a,b,c)</caption>
    <thead>
      <tr>
        <th># of 1s</th><th>minterm</th><th>binary</th><th colspan="2">size-2</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>0</td><td>m<sub>0</sub></td><td>000</td><td>m<sub>0,2</sub></td><td>0—0</td>
      </tr>
      <tr>
        <td>1</td><td>m<sub>2</sub></td><td>010</td><td>m<sub>2,3</sub></td><td>01—</td>
      </tr>
      <tr>
        <td></td><td></td><td></td><td>m<sub>2,6</sub></td><td>—10</td>
      </tr>
      <tr>
        <td>2</td><td>m<sub>3</sub></td><td>011</td><td></td><td></td>
      </tr>
      <tr>
        <td></td><td>m<sub>6</sub></td><td>110</td><td></td><td></td>
      </tr>
    </tbody>
  </table>
</div>


<p>Always start with the minterms in the very first bucket at the top of the table.
For every minterm in bucket <em>n</em>, we try to find a minterm in bucket <em>n+1</em> with a
one-bit difference in the <em>binary</em> column. Any matches will be recorded as pairs
and entered into the <em>size-2</em> column of bucket <em>n</em>.</p>

<p><em>m<sub>2</sub>=010</em> and <em>m<sub>6</sub>=110</em> for example differ in only the first
input variable, <em>a</em>. They merge into <em>m<sub>2,6</sub>=—10</em>, with a dash marking
the position of the irrelevant input bit.</p>

<p>Once all minterms were combined (as far as possible), we&rsquo;ll continue with the
next size. Minterms of size bigger than 1 have dashes for irrelevant input bits
and it&rsquo;s important to treat those as a &ldquo;third bit value&rdquo;. In other words, their
dashes must be at the same positions, otherwise they can&rsquo;t be merged.</p>

<p>There&rsquo;s nothing left to merge for <em>f<sub>L</sub>(a,b,c)</em> as all
its size-2 minterms are in the first bucket. For <em>f<sub>R</sub>(a,b,c)</em>, none
of the size-2 minterms in the first bucket match any of those in the second,
their dashes are all in different positions.</p>

<h3>Step 4: Prime Implicants</h3>

<p>All minterms from the previous step that can&rsquo;t be combined any further are
called <em>prime implicants</em>. Entering them into a table let&rsquo;s us check how well
they cover the original minterms determined by step 1.</p>

<p>If any prime implicant is the only one to cover a minterm, it&rsquo;s called an
<em>essential prime implicant</em> (marked with an asterisk). It&rsquo;s essential because
it must be included in the resulting minimal form, otherwise we&rsquo;d miss one of
the input values combinations.</p>

<div class="table-wrapper prime">
  <table>
    <caption>f<sub>L</sub>(a,b,c)</caption>
    <thead>
      <tr>
        <th></th>
        <th>m<sub>2</sub></th>
        <th>m<sub>4</sub></th>
        <th>m<sub>5</sub></th>
        <th>m<sub>6</sub></th>
        <th>abc</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>m<sub>2,6</sub>*</td><td class="essential">x</td><td></td><td></td><td>x</td><td>-10</td>
      </tr>
      <tr>
        <td>m<sub>4,5</sub>*</td><td></td><td>x</td><td class="essential">x</td><td></td><td>10-</td>
      </tr>
      <tr>
        <td>m<sub>4,6</sub>&nbsp;</td><td></td><td>x</td><td></td><td>x</td><td>1-0</td>
      </tr>
    </tbody>
  </table>

  <table>
    <caption>f<sub>R</sub>(a,b,c)</caption>
    <thead>
      <tr>
        <th></th>
        <th>m<sub>0</sub></th>
        <th>m<sub>2</sub></th>
        <th>m<sub>3</sub></th>
        <th>m<sub>6</sub></th>
        <th>abc</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>m<sub>0,2</sub>*</td><td class="essential">x</td><td>x</td><td></td><td></td><td>0-0</td>
      </tr>
      <tr>
        <td>m<sub>2,3</sub>*</td><td></td><td>x</td><td class="essential">x</td><td></td><td>01-</td>
      </tr>
      <tr>
        <td>m<sub>2,6</sub>*</td><td></td><td>x</td><td></td><td class="essential">x</td><td>-10</td>
      </tr>
    </tbody>
  </table>
</div>


<p>Prime implicant <em>m<sub>2,6</sub>*</em> on the left for example is the only one that
covers <em>m<sub>2</sub></em>. <em>m<sub>4,5</sub>*</em> is the only one that covers
<em>m<sub>5</sub></em>. Not only is <em>m<sub>4,6</sub></em> not essential, but we actually
don&rsquo;t need it at all: <em>m<sub>4</sub></em> and <em>m<sub>6</sub></em> are already covered
by the essential prime implicants. All prime implicants of f<sub>R</sub>(a,b,c)
are essential, so we need all of them.</p>

<p>When bitslicing functions with many input variables it may happen that you are
left with a number of non-essential prime implicants that can be combined in
various ways to cover the missing minterms. <a href="https://en.wikipedia.org/wiki/Petrick%27s_method">Petrick&rsquo;s method</a>
helps finding a minimum solution. It&rsquo;s tedious to do manually, but not hard to
automate.</p>

<h3>Step 5: Minimal Forms</h3>

<p>Finally, we derive minimal forms of our Boolean functions by looking at the <em>abc</em>
column of the essential prime implicants. Input variables marked with dashes
are ignored.</p>

<pre>
f<sub>L</sub>(a,b,c) = m<sub>2,6</sub> + m<sub>4,5</sub> = b<span style="text-decoration:overline">c</span> + a<span style="text-decoration:overline">b</span>
</pre>


<p>The code for <code>SBOXL()</code> with 8-bit inputs:</p>

<figure class='code'><div class="highlight"><pre><span class="k">uint8_t</span> <span class="nf">SBOXL</span><span class="p">(</span><span class="k">uint8_t</span> <span class="n">a</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">b</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">c</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">return</span> <span class="p">(</span><span class="n">b</span> <span class="o">&amp;</span> <span class="o">~</span><span class="n">c</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">a</span> <span class="o">&amp;</span> <span class="o">~</span><span class="n">b</span><span class="p">);</span>
<span class="p">}</span>
</pre></div></figure>


<p><em>f<sub>R</sub>(a,b,c)</em>, reduced to the combination of its three essential prime implicants:</p>

<pre>
f<sub>R</sub>(a,b,c) = m<sub>0,2</sub> + m<sub>2,3</sub> + m<sub>2,6</sub> = <span style="text-decoration:overline">a</span><span style="text-decoration:overline">c</span> + <span style="text-decoration:overline">a</span>b + b<span style="text-decoration:overline">c</span>
</pre>


<p>And <code>SBOXR()</code> as expected:</p>

<figure class='code'><div class="highlight"><pre><span class="k">uint8_t</span> <span class="nf">SBOXR</span><span class="p">(</span><span class="k">uint8_t</span> <span class="n">a</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">b</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">c</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">return</span> <span class="p">(</span><span class="o">~</span><span class="n">a</span> <span class="o">&amp;</span> <span class="o">~</span><span class="n">c</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="o">~</span><span class="n">a</span> <span class="o">&amp;</span> <span class="n">b</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">b</span> <span class="o">&amp;</span> <span class="o">~</span><span class="n">c</span><span class="p">);</span>
<span class="p">}</span>
</pre></div></figure>


<p>Combining <code>SBOXL()</code> and <code>SBOXR()</code> yields the familiar version of <code>SBOX()</code>, after
eliminating common subexpressions and taking out common factors.</p>

<figure class='code'><div class="highlight"><pre><span class="k">void</span> <span class="nf">SBOX</span><span class="p">(</span><span class="k">uint8_t</span> <span class="n">a</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">b</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">c</span><span class="p">,</span> <span class="k">uint8_t</span><span class="o">*</span> <span class="n">l</span><span class="p">,</span> <span class="k">uint8_t</span><span class="o">*</span> <span class="n">r</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">uint8_t</span> <span class="n">na</span> <span class="o">=</span> <span class="o">~</span><span class="n">a</span><span class="p">;</span>
  <span class="k">uint8_t</span> <span class="n">nb</span> <span class="o">=</span> <span class="o">~</span><span class="n">b</span><span class="p">;</span>
  <span class="k">uint8_t</span> <span class="n">nc</span> <span class="o">=</span> <span class="o">~</span><span class="n">c</span><span class="p">;</span>

  <span class="k">uint8_t</span> <span class="n">t0</span> <span class="o">=</span> <span class="n">b</span> <span class="o">&amp;</span> <span class="n">nc</span><span class="p">;</span>
  <span class="k">uint8_t</span> <span class="n">t1</span> <span class="o">=</span> <span class="n">b</span> <span class="o">|</span> <span class="n">nc</span><span class="p">;</span>

  <span class="o">*</span><span class="n">l</span> <span class="o">=</span> <span class="p">(</span><span class="n">a</span> <span class="o">&amp;</span> <span class="n">nb</span><span class="p">)</span> <span class="o">|</span> <span class="n">t0</span><span class="p">;</span>
  <span class="o">*</span><span class="n">r</span> <span class="o">=</span> <span class="p">(</span><span class="n">na</span> <span class="o">&amp;</span> <span class="n">t1</span><span class="p">)</span> <span class="o">|</span> <span class="n">t0</span><span class="p">;</span>
<span class="p">}</span>
</pre></div></figure>


<h2>Bitslicing a DES S-box</h2>

<p>When I started writing this blog post I thought it would be nice to ditch the
small S-box from the previous posts, and naively bitslice a &ldquo;real&rdquo; S-box, like
the ones used in <a href="https://en.wikipedia.org/wiki/Data_Encryption_Standard">DES</a>.</p>

<p>But these are 6-to-4-bit S-boxes, how much more effort can it be? As it turns out,
humans are terrible at understanding exponential growth. Here are my intermediate
results after an hour of writing, trying to bitslice just one of the four output
bits:</p>

<p><a href="https://timtaubert.de/images/des-bitslice.jpg" title="Bitslicing one output bit of a DES S-box manually" class="img"><img src="https://timtaubert.de/images/des-bitslice.jpg" title="Bitslicing one output bit of a DES S-box manually" ></a></p>

<p>I gave up when I spotted a few mistakes that would likely lead to a non-minimal
solution. Bitslicing a function with that many input variables manually is
laborious and probably not worth it, except that it definitely helped me
understand the steps of the algorithm better.</p>

<p>As mentioned in the beginning, Quine-McCluskey and Petrick&rsquo;s method can be
implemented in software rather easily, so that&rsquo;s what I did instead. I&rsquo;ll
explain how, and what to consider, in the next post.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Bitslicing With Karnaugh Maps]]></title>
    <link href="https://timtaubert.de/blog/2018/08/bitslicing-with-karnaugh-maps/"/>
    <updated>2018-08-18T15:00:00+02:00</updated>
    <id>https://timtaubert.de/blog/2018/08/bitslicing-with-karnaugh-maps</id>
    <content type="html"><![CDATA[<p><em>Bitslicing</em>, in cryptography, is the technique of converting arbitrary
functions into logic circuits, thereby enabling fast, constant-time
implementations of cryptographic algorithms immune to cache and
timing-related side channel attacks.</p>

<p>My last post <a href="https://timtaubert.de/blog/2018/08/bitslicing-an-introduction/">Bitslicing, An Introduction</a>
showed how to convert an S-box function into truth tables, then into a tree of
multiplexers, and finally how to find the lowest possible gate count through
manual optimization.</p>

<p>Today&rsquo;s post will focus on a simpler and faster method. <a href="https://en.wikipedia.org/wiki/Karnaugh_map">Karnaugh maps</a>
help simplifying Boolean algebra expressions by taking advantage of humans&#8217;
pattern-recognition capability. In short, we&rsquo;ll bitslice an S-box using K-maps.</p>

<blockquote><p><a href="https://timtaubert.de/blog/2018/08/bitslicing-an-introduction/">Part 1: Bitslicing, An Introduction</a><br/>
Part 2: Bitslicing with Karnaugh maps<br/>
<a href="https://timtaubert.de/blog/2018/08/bitslicing-with-quine-mccluskey/">Part 3: Bitslicing with Quine-McCluskey</a></p></blockquote>

<h2>A tiny S-box</h2>

<p>Here again is the 3-to-2-bit <a href="https://en.wikipedia.org/wiki/S-box">S-box</a>
function from the previous post.</p>

<figure class='code'><div class="highlight"><pre><span class="k">uint8_t</span> <span class="n">SBOX</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">0</span> <span class="p">};</span>
</pre></div></figure>


<blockquote><p>An AES-inspired S-box that interprets three input bits as a polynomial in
<em>GF(2<sup>3</sup>)</em> and computes its inverse <em>mod P(x) = x<sup>3</sup> + x<sup>2</sup> + 1</em>, with
<em>0<sup>-1</sup> := 0</em>. The result plus <em>(x<sup>2</sup> + 1)</em> is converted back into bits
and the MSB is dropped.</p></blockquote>

<p>This S-box can be represented as a function of three Boolean variables, where
<em>f(0,0,0) = 0b01</em>, <em>f(0,0,1) = 0b00</em>, <em>f(0,1,0) = 0b11</em>, etc. Each output bit
can be represented by its own Boolean function where <em>f<sub>L</sub>(0,0,0) = 0</em>
and <em>f<sub>R</sub>(0,0,0) = 1</em>, <em>f<sub>L</sub>(0,0,1) = 0</em> and
<em>f<sub>R</sub>(0,0,1) = 0</em>, &hellip;</p>

<h3>A truth table per output bit</h3>

<p>Each output bit has its own Boolean function, and therefore also its own thruth
table. Here are the truth tables for the Boolean functions <em>f<sub>L</sub>(a,b,c)</em>
and <em>f<sub>R</sub>(a,b,c)</em>:</p>

<div class="table-wrapper truth">
  <table>
    <caption>SBOX(a,b,c)</caption>
    <thead>
      <tr>
        <th>abc</th>
        <th>out</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>000</td><td>01</td>
      </tr>
      <tr>
        <td>001</td><td>00</td>
      </tr>
      <tr>
        <td>010</td><td>11</td>
      </tr>
      <tr>
        <td>011</td><td>01</td>
      </tr>
      <tr>
        <td>100</td><td>10</td>
      </tr>
      <tr>
        <td>101</td><td>10</td>
      </tr>
      <tr>
        <td>110</td><td>11</td>
      </tr>
      <tr>
        <td>111</td><td>00</td>
      </tr>
    </tbody>
  </table>

  <table>
    <caption>f<sub>L</sub>(a,b,c)</caption>
    <thead>
      <tr>
        <th>abc</th>
        <th>out</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>000</td><td>0</td>
      </tr>
      <tr>
        <td>001</td><td>0</td>
      </tr>
      <tr>
        <td>010</td><td>1</td>
      </tr>
      <tr>
        <td>011</td><td>0</td>
      </tr>
      <tr>
        <td>100</td><td>1</td>
      </tr>
      <tr>
        <td>101</td><td>1</td>
      </tr>
      <tr>
        <td>110</td><td>1</td>
      </tr>
      <tr>
        <td>111</td><td>0</td>
      </tr>
    </tbody>
  </table>

  <table>
    <caption>f<sub>R</sub>(a,b,c)</caption>
    <thead>
      <tr>
        <th>abc</th>
        <th>out</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>000</td><td>1</td>
      </tr>
      <tr>
        <td>001</td><td>0</td>
      </tr>
      <tr>
        <td>010</td><td>1</td>
      </tr>
      <tr>
        <td>011</td><td>1</td>
      </tr>
      <tr>
        <td>100</td><td>0</td>
      </tr>
      <tr>
        <td>101</td><td>0</td>
      </tr>
      <tr>
        <td>110</td><td>1</td>
      </tr>
      <tr>
        <td>111</td><td>0</td>
      </tr>
    </tbody>
  </table>
</div>


<p>Whereas previously at this point we built a tree of multiplexers out of each
truth table, we&rsquo;ll now build a Karnaugh map (K-map) per output bit.</p>

<h2>Karnaugh Maps</h2>

<p>The values of <em>f<sub>L</sub>(a,b,c)</em> and <em>f<sub>R</sub>(a,b,c)</em> are transferred
onto a two-dimensional grid with the cells ordered in <a href="https://en.wikipedia.org/wiki/Gray_code">Gray code</a>.
Each cell position represents one possible combination of input bits, while each
cell value represents the value of the output bit.</p>

<p><a href="https://timtaubert.de/images/kmaps.png" title="Two K-maps, one for each of the two Boolean functions" class="img"><img src="https://timtaubert.de/images/kmaps.png" title="Two K-maps, one for each of the two Boolean functions" ></a></p>

<p>The row and column indices <em>(a)</em> and <em>(b || c)</em> are ordered in Gray code rather
than binary numerical order to ensure only a single variable changes between
each pair of adjacent cells. Otherwise, products of predicates
(<code>a &amp; b</code>, <code>a &amp; c</code>, &hellip;) would scatter.</p>

<p>These products are what you want to find to get a minimum length representation
of the truth function. If the output bit is the same at two adjacent cells,
then it&rsquo;s independent of one of the two input variables, because
<code>(a &amp; ~b) | (a &amp; b) = a</code>.</p>

<h3>Spotting patterns</h3>

<p>The heart of simplifying Boolean expressions via K-maps is finding groups of
adjacent cells with value <code>1</code>. <a href="http://www.ee.surrey.ac.uk/Projects/Labview/minimisation/karrules.html">The rules</a> are as follows:</p>

<ul>
<li>Groups are rectangles of <em>2<sup>n</sup></em> cells with value <code>1</code>.</li>
<li>Groups may not include cells with value <code>0</code>.</li>
<li>Each cell with value <code>1</code> must be in at least one group.</li>
<li>Groups may be horizontal or vertical, not diagonal.</li>
<li>Each group should be as large as possible.</li>
<li>There should be as few groups as possible.</li>
<li>Groups may overlap.</li>
</ul>


<p><a href="https://timtaubert.de/images/kmaps.gif" title="Animation: Building groups on the two K-maps" class="img"><img src="https://timtaubert.de/images/kmaps.gif" title="Animation: Building groups on the two K-maps" ></a></p>

<p>First, we mark all cells with value <code>1</code>. We then form a <em><span style="color:#c62817">red</span></em>
group for the two horizontal groups of size <em>2<sup>1</sup></em>. The two vertical groups are
marked with <em><span style="color:#118730">green</span></em>, also of size <em>2<sup>1</sup></em>.</p>

<p>On <em>f<sub>R</sub></em>&rsquo;s K-map on the right, the <em><span style="color:#c62817">red</span></em>
and <em><span style="color:#118730">green</span></em> group overlap. As per the rules
above, that&rsquo;s perfectly fine. The cell at <code>abc=110</code> can&rsquo;t be without a group
and we&rsquo;re instructed to form the largest groups possible, so they overlap.</p>

<p>But wait, you say, what&rsquo;s going on with the <em><span style="color:#1167bd">blue</span></em>
rectangle on the right?</p>

<h3>Wrapping around</h3>

<p>A somewhat unexpected property of K-maps is that they&rsquo;re not really grids, but
actually toruses. In plain English: they wrap around the top, bottom, and the
sides.</p>

<p>Look at this neat <a href="https://en.wikipedia.org/wiki/Karnaugh_map#/media/File:Torus_from_rectangle.gif">animation on Wikipedia</a>
that demonstrates how a rectangle can turn into a <del>donut</del>torus. <em>Adjacent</em>
thus has a special definition here: cells on the very right touch those on the
far left, as do those at the very top and bottom.</p>

<p><a href="https://timtaubert.de/images/kmaps-rotate.gif" title="Animation: Rotating a K-map to the left (and right)" class="img"><img src="https://timtaubert.de/images/kmaps-rotate.gif" title="Animation: Rotating a K-map to the left (and right)" ></a></p>

<p>Another way to understand this property is to imagine that the columns don&rsquo;t
start at <code>00</code> but rather at <code>01</code>, and so we rotate the whole K-map by one to
the left. Then the rectangles wouldn&rsquo;t need to wrap around and they would all
fit on the grid nicely.</p>

<p>Now that all cells with a <code>1</code> have been assigned to as few groups as possible,
let&rsquo;s get our hands dirty and write some code.</p>

<h2>A bitsliced SBOX() function</h2>

<p>K-maps are read groupwise: we look at each cell&rsquo;s position and focus on the
input values that do not change throughout the group. Values that do change
are ignored.</p>

<h3>One function for <em>f<sub>L(a,b,c)</sub></em> .<span></span>..</h3>

<p>The <em><span style="color:#c62817">red</span></em> group covers the cells at position
<code>100</code> and <code>101</code>. The values <code>a=1</code> and <code>b=0</code> are constant, they will be included
into the group&rsquo;s term. The value of <code>c</code> changes and is therefore irrelevant.
The term is <code>(a &amp; ~b)</code>.</p>

<p>The <em><span style="color:#118730">green</span></em> group covers the cells at <code>010</code>
and <code>110</code>. We ignore <code>a</code>, and include <code>b=1</code> and <code>c=0</code>. The term is <code>(b &amp; ~c)</code>.</p>

<p><code>SBOXL()</code> is the disjunction of the group terms we collected from the K-map. It
lists all possible combinations of input values that lead to output value <code>1</code>.</p>

<figure class='code'><div class="highlight"><pre><span class="k">uint8_t</span> <span class="nf">SBOXL</span><span class="p">(</span><span class="k">uint8_t</span> <span class="n">a</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">b</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">c</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">return</span> <span class="p">(</span><span class="n">a</span> <span class="o">&amp;</span> <span class="o">~</span><span class="n">b</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">b</span> <span class="o">&amp;</span> <span class="o">~</span><span class="n">c</span><span class="p">);</span>
<span class="p">}</span>
</pre></div></figure>


<h3>..<span></span>. and another one for <em>f<sub>R(a,b,c)</sub></em></h3>

<p>The <em><span style="color:#c62817">red</span></em> group covers the cells at <code>011</code>
and <code>010</code>. The term is <code>(~a &amp; b)</code>.</p>

<p>The <em><span style="color:#118730">green</span></em> group covers the cells at <code>010</code>
and <code>110</code>. The term is <code>(b &amp; ~c)</code>.</p>

<p>The <em><span style="color:#1167bd">blue</span></em> group covers the cells at <code>000</code>
and <code>010</code>. The term is <code>(~a &amp; ~c)</code>.</p>

<figure class='code'><div class="highlight"><pre><span class="k">uint8_t</span> <span class="nf">SBOXR</span><span class="p">(</span><span class="k">uint8_t</span> <span class="n">a</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">b</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">c</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">return</span> <span class="p">(</span><span class="o">~</span><span class="n">a</span> <span class="o">&amp;</span> <span class="n">b</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">b</span> <span class="o">&amp;</span> <span class="o">~</span><span class="n">c</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="o">~</span><span class="n">a</span> <span class="o">&amp;</span> <span class="o">~</span><span class="n">c</span><span class="p">);</span>
<span class="p">}</span>
</pre></div></figure>


<p>Great, that&rsquo;s all we need! Now we can merge those two functions and compare
that to the result of the previous post.</p>

<h3>Putting it all together</h3>

<p>The first three variables ensure that we negate inputs only once. <code>t0</code> replaces
the common subexpression <code>b &amp; nc</code>. Any optimizing compiler would do the same.</p>

<figure class='code'><div class="highlight"><pre><span class="k">void</span> <span class="nf">SBOX</span><span class="p">(</span><span class="k">uint8_t</span> <span class="n">a</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">b</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">c</span><span class="p">,</span> <span class="k">uint8_t</span><span class="o">*</span> <span class="n">l</span><span class="p">,</span> <span class="k">uint8_t</span><span class="o">*</span> <span class="n">r</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">uint8_t</span> <span class="n">na</span> <span class="o">=</span> <span class="o">~</span><span class="n">a</span><span class="p">;</span>
  <span class="k">uint8_t</span> <span class="n">nb</span> <span class="o">=</span> <span class="o">~</span><span class="n">b</span><span class="p">;</span>
  <span class="k">uint8_t</span> <span class="n">nc</span> <span class="o">=</span> <span class="o">~</span><span class="n">c</span><span class="p">;</span>

  <span class="k">uint8_t</span> <span class="n">t0</span> <span class="o">=</span> <span class="n">b</span> <span class="o">&amp;</span> <span class="n">nc</span><span class="p">;</span>

  <span class="o">*</span><span class="n">l</span> <span class="o">=</span> <span class="p">(</span><span class="n">a</span> <span class="o">&amp;</span> <span class="n">nb</span><span class="p">)</span> <span class="o">|</span> <span class="n">t0</span><span class="p">;</span>
  <span class="o">*</span><span class="n">r</span> <span class="o">=</span> <span class="p">(</span><span class="n">na</span> <span class="o">&amp;</span> <span class="n">b</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">na</span> <span class="o">&amp;</span> <span class="n">nc</span><span class="p">)</span> <span class="o">|</span> <span class="n">t0</span><span class="p">;</span>
<span class="p">}</span>
</pre></div></figure>


<p><strong>Ten gates.</strong> That&rsquo;s one more than the manually optimized version from the last
post. What&rsquo;s missing? Turns out that K-maps sometimes don&rsquo;t yield the minimal
form and we have to simplify further by taking out common factors.</p>

<p>The conjunctions in the term <code>(na &amp; b) | (na &amp; nc)</code> have the common factor <code>na</code>
and, due to the Distributivity Law, can be rewritten as <code>na &amp; (b | nc)</code>. That
removes one of the <em>AND</em> gates and leaves two.</p>

<figure class='code'><div class="highlight"><pre><span class="k">void</span> <span class="nf">SBOX</span><span class="p">(</span><span class="k">uint8_t</span> <span class="n">a</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">b</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">c</span><span class="p">,</span> <span class="k">uint8_t</span><span class="o">*</span> <span class="n">l</span><span class="p">,</span> <span class="k">uint8_t</span><span class="o">*</span> <span class="n">r</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">uint8_t</span> <span class="n">na</span> <span class="o">=</span> <span class="o">~</span><span class="n">a</span><span class="p">;</span>
  <span class="k">uint8_t</span> <span class="n">nb</span> <span class="o">=</span> <span class="o">~</span><span class="n">b</span><span class="p">;</span>
  <span class="k">uint8_t</span> <span class="n">nc</span> <span class="o">=</span> <span class="o">~</span><span class="n">c</span><span class="p">;</span>

  <span class="k">uint8_t</span> <span class="n">t0</span> <span class="o">=</span> <span class="n">b</span> <span class="o">&amp;</span> <span class="n">nc</span><span class="p">;</span>
  <span class="k">uint8_t</span> <span class="n">t1</span> <span class="o">=</span> <span class="n">b</span> <span class="o">|</span> <span class="n">nc</span><span class="p">;</span>

  <span class="o">*</span><span class="n">l</span> <span class="o">=</span> <span class="p">(</span><span class="n">a</span> <span class="o">&amp;</span> <span class="n">nb</span><span class="p">)</span> <span class="o">|</span> <span class="n">t0</span><span class="p">;</span>
  <span class="o">*</span><span class="n">r</span> <span class="o">=</span> <span class="p">(</span><span class="n">na</span> <span class="o">&amp;</span> <span class="n">t1</span><span class="p">)</span> <span class="o">|</span> <span class="n">t0</span><span class="p">;</span>
<span class="p">}</span>
</pre></div></figure>


<p><strong>Nine gates.</strong> That&rsquo;s exactly what we achieved by tedious artisanal optimization.</p>

<h2>More than four inputs</h2>

<p>K-maps are neat and trivial to use once you&rsquo;ve worked through an example
yourself. They yield minimal circuits <em>fast</em>, compared to manual optimization
where the effort grows exponentially with the number of terms.</p>

<p>There is one downside though, and it&rsquo;s that the original variant of a K-map
can&rsquo;t be used with more than four input variables. There are variants that do
work with more than four variables but they actually make it harder to spot
groups visually.</p>

<p>The <a href="https://timtaubert.de/blog/2018/08/bitslicing-with-quine-mccluskey/">Quine–McCluskey algorithm</a>
is functionally identical to K-maps but can handle an arbitrary number of input
variables in its original variant &ndash; although the running time grows
exponentially with the number of variables. Not too problematic for
us, S-boxes usually don&rsquo;t have too many inputs anyway&hellip;</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Bitslicing, an Introduction]]></title>
    <link href="https://timtaubert.de/blog/2018/08/bitslicing-an-introduction/"/>
    <updated>2018-08-15T14:00:00+02:00</updated>
    <id>https://timtaubert.de/blog/2018/08/bitslicing-an-introduction</id>
    <content type="html"><![CDATA[<p><em>Bitslicing</em> (in software) is an implementation strategy enabling fast,
constant-time implementations of cryptographic algorithms immune to cache and
timing-related side channel attacks.</p>

<p>This post intends to give a brief overview of the general technique, not requiring
much of a cryptographic background. It will demonstrate bitslicing a small S-box,
talk about multiplexers, LUTs, Boolean functions, and minimal forms.</p>

<blockquote><p>Part 1: Bitslicing, An Introduction<br/>
<a href="https://timtaubert.de/blog/2018/08/bitslicing-with-karnaugh-maps/">Part 2: Bitslicing with Karnaugh maps</a><br/>
<a href="https://timtaubert.de/blog/2018/08/bitslicing-with-quine-mccluskey/">Part 3: Bitslicing with Quine-McCluskey</a></p></blockquote>

<h2>What is bitslicing?</h2>

<p>Matthew Kwan coined the term about 20 years ago after seeing Eli Biham present
his paper <a href="http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/1997/CS/CS0891.pdf">A Fast New DES Implementation in Software</a>.
He later published <a href="http://fgrieu.free.fr/Mattew%20Kwan%20-%20Reducing%20the%20Gate%20Count%20of%20Bitslice%20DES.pdf">Reducing the Gate Count of Bitslice DES</a>
showing an even faster DES building on Biham&rsquo;s ideas.</p>

<p>The basic concept is to express a function in terms of single-bit logical
operations &ndash; <em>AND</em>, <em>XOR</em>, <em>OR</em>, <em>NOT</em>, etc. &ndash; as if you were implementing a
logic circuit in hardware. These operations are then carried out for multiple
instances of the function in parallel, using bitwise operations on a CPU.</p>

<p>In a bitsliced implementation, instead of having a single variable storing a,
say, 8-bit number, you have eight variables (slices). The first storing the
left-most bit of the number, the next storing the second bit from the left,
and so on. The parallelism is bounded only by the target architecture&rsquo;s register
width.</p>

<h2>What&rsquo;s it good for?</h2>

<p>Biham applied bitslicing to <a href="https://en.wikipedia.org/wiki/Data_Encryption_Standard">DES</a>,
a cipher designed to be fast in hardware. It uses eight different S-boxes,
that were usually implemented as lookup tables. Table lookups in DES however are
rather inefficient, since one has to collect six bits from different words,
combine them, and afterwards put each of the four resulting bits in a
different word.</p>

<h3>Speed</h3>

<p>In classical implementations, these bit permutations would be implemented with a
combination of shifts and masks. In a bitslice representation though, permuting
bits really just means using the &ldquo;right&rdquo; variables in the next step; this is
mere data routing, which is resolved at compile-time, with no cost at runtime.</p>

<p>Additionally, the code is extremely linear so that it usually runs well on
heavily pipelined modern CPUs. It tends to have a low risk of pipeline stalls,
as it&rsquo;s unlikely to suffer from branch misprediction, and plenty of
opportunities for optimal instruction reordering for efficient scheduling of
data accesses.</p>

<h3>Parallelization</h3>

<p>With a register width of <em>n</em> bits, as long as the bitsliced implementation is no
more than <em>n</em> times slower to run a single instance of the cipher, you end up
with a net gain in throughput. This only applies to workloads that allow for
parallelization. CTR and ECB mode always benefit, CBC and CFB mode only when
decrypting.</p>

<h3>Constant execution time</h3>

<p>Constant-time, secret independent computation is all the rage in modern applied
cryptography. Bitslicing is interesting because by using only single-bit logical
operations the resulting code is immune to cache and timing-related
<a href="https://en.wikipedia.org/wiki/Side-channel_attack">side channel attacks</a>.</p>

<h3>Fully Homomorphic Encryption</h3>

<p>The last decade brought great advances in the field of Fully Homomorphic
Encryption (FHE), i.e. computation on ciphertexts. If you have a secure crypto
scheme and an efficient <a href="https://en.wikipedia.org/wiki/NAND_gate">NAND gate</a>
you can use bitslicing to <a href="https://crypto.stanford.edu/craig/easy-fhe.pdf">compute arbitrary functions of encrypted data</a>.</p>

<h2>Bitslicing a small S-box</h2>

<p>Let&rsquo;s work through a small example to see how one could go about converting
arbitrary functions into a bunch of Boolean gates.</p>

<p>Imagine a 3-to-2-bit <a href="https://en.wikipedia.org/wiki/S-box">S-box</a> function, a
component found in many symmetric encryption algorithms. Naively, this would be
represented by a lookup table with eight entries, e.g.  <code>SBOX[0b000] = 0b01</code>,
<code>SBOX[0b001] = 0b00</code>, etc.</p>

<figure class='code'><div class="highlight"><pre><span class="k">uint8_t</span> <span class="n">SBOX</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">0</span> <span class="p">};</span>
</pre></div></figure>


<blockquote><p>This AES-inspired S-box interprets three input bits as a polynomial in
<em>GF(2<sup>3</sup>)</em> and computes its inverse <em>mod P(x) = x<sup>3</sup> + x<sup>2</sup> + 1</em>, with
<em>0<sup>-1</sup> := 0</em>. The result plus <em>(x<sup>2</sup> + 1)</em> is converted back into bits
and the MSB is dropped.</p></blockquote>

<p>You can think of the above S-box&rsquo;s output as being a function of three Boolean
variables, where for instance <em>f(0,0,0) = 0b01</em>. Each output bit can be
represented by its own Boolean function, i.e. <em>f<sub>L</sub>(0,0,0) = 0</em> and
<em>f<sub>R</sub>(0,0,0) = 1</em>.</p>

<h3>LUTs and Multiplexers</h3>

<p>If you&rsquo;ve dealt with <a href="https://en.wikipedia.org/wiki/Field-programmable_gate_array">FPGAs</a>
before you probably know that these do not actually implement Boolean gates,
but allow Boolean algebra by programming Look-Up-Tables (LUTs). We&rsquo;re going
to do the reverse and convert our S-box into trees of multiplexers.</p>

<p><a href="https://en.wikipedia.org/wiki/Multiplexer">Multiplexer</a> is just a fancy word
for <em>data selector</em>. A 2-to-1 multiplexer selects one of two input bits. A
<em>selector</em> bit decides which of the two inputs will be passed through.</p>

<figure class='code'><div class="highlight"><pre><span class="k">bool</span> <span class="nf">mux</span><span class="p">(</span><span class="k">bool</span> <span class="n">a</span><span class="p">,</span> <span class="k">bool</span> <span class="n">b</span><span class="p">,</span> <span class="k">bool</span> <span class="n">s</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">return</span> <span class="n">s</span> <span class="o">?</span> <span class="nl">b</span> <span class="p">:</span> <span class="n">a</span><span class="p">;</span>
<span class="p">}</span>
</pre></div></figure>


<p>Here are the LUTs, or rather truth tables, for the Boolean functions
<em>f<sub>L</sub>(a,b,c)</em> and <em>f<sub>R</sub>(a,b,c)</em>:</p>

<div class="table-wrapper truth">
  <table>
    <caption>SBOX(a,b,c)</caption>
    <thead>
      <tr>
        <th>abc</th>
        <th>out</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>000</td><td>01</td>
      </tr>
      <tr>
        <td>001</td><td>00</td>
      </tr>
      <tr>
        <td>010</td><td>11</td>
      </tr>
      <tr>
        <td>011</td><td>01</td>
      </tr>
      <tr>
        <td>100</td><td>10</td>
      </tr>
      <tr>
        <td>101</td><td>10</td>
      </tr>
      <tr>
        <td>110</td><td>11</td>
      </tr>
      <tr>
        <td>111</td><td>00</td>
      </tr>
    </tbody>
  </table>

  <table>
    <caption>f<sub>L</sub>(a,b,c)</caption>
    <thead>
      <tr>
        <th>abc</th>
        <th>out</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>000</td><td>0</td>
      </tr>
      <tr>
        <td>001</td><td>0</td>
      </tr>
      <tr>
        <td>010</td><td>1</td>
      </tr>
      <tr>
        <td>011</td><td>0</td>
      </tr>
      <tr>
        <td>100</td><td>1</td>
      </tr>
      <tr>
        <td>101</td><td>1</td>
      </tr>
      <tr>
        <td>110</td><td>1</td>
      </tr>
      <tr>
        <td>111</td><td>0</td>
      </tr>
    </tbody>
  </table>

  <table>
    <caption>f<sub>R</sub>(a,b,c)</caption>
    <thead>
      <tr>
        <th>abc</th>
        <th>out</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>000</td><td>1</td>
      </tr>
      <tr>
        <td>001</td><td>0</td>
      </tr>
      <tr>
        <td>010</td><td>1</td>
      </tr>
      <tr>
        <td>011</td><td>1</td>
      </tr>
      <tr>
        <td>100</td><td>0</td>
      </tr>
      <tr>
        <td>101</td><td>0</td>
      </tr>
      <tr>
        <td>110</td><td>1</td>
      </tr>
      <tr>
        <td>111</td><td>0</td>
      </tr>
    </tbody>
  </table>
</div>


<p>The truth table for <em>f<sub>L</sub>(a,b,c)</em> is <em>(0, 0, 1, 0, 1, 1, 1, 0)</em> or
<em>2E<sub>h</sub></em>. We can also call this the LUT-mask in the context of an
FPGA. For each output bit of our S-box we need a 3-to-1 multiplexer, and
that in turn can be represented by 2-to-1 multiplexers.</p>

<p><a href="https://timtaubert.de/images/mux.png" title="A 3-to-1 multiplexer with LUT-mask 0x2E" class="img"><img src="https://timtaubert.de/images/mux.png" title="A 3-to-1 multiplexer with LUT-mask 0x2E" ></a></p>

<h3>Multiplexers in Software</h3>

<p>Let&rsquo;s take the <code>mux()</code> function from above and make it constant-time. As stated
earlier, bitslicing is competitive only through parallelization, so, for
demonstration, we&rsquo;ll use <code>uint8_t</code> arguments to later compute eight
S-box lookups in parallel.</p>

<figure class='code'><div class="highlight"><pre><span class="k">uint8_t</span> <span class="nf">mux</span><span class="p">(</span><span class="k">uint8_t</span> <span class="n">a</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">b</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">s</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">return</span> <span class="p">(</span><span class="n">a</span> <span class="o">&amp;</span> <span class="o">~</span><span class="n">s</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">b</span> <span class="o">&amp;</span> <span class="n">s</span><span class="p">);</span>
<span class="p">}</span>
</pre></div></figure>


<p>If the <em>n</em>-th bit of <code>s</code> is zero it selects the <em>n</em>-th bit in <code>a</code>, if not it
forwards the <em>n</em>-th bit in <code>b</code>. The wider the target architecture&rsquo;s registers,
the bigger the theoretical throughput &ndash; but only if the workload can take
advantage of the level of parallelization.</p>

<h3>A first implementation</h3>

<p>The two output bits will be computed separately and then assembled into the
final value returned by <code>SBOX()</code>. Each multiplexer in the above diagram is
represented by a <code>mux()</code> call. The first four take the LUT-masks
<em>2E<sub>h</sub></em> and <em>B2<sub>h</sub></em> as inputs.</p>

<p>The diagram shows Boolean functions that only work with single-bit parameters.
We use <code>uint8_t</code>, so instead of <code>1</code> we need to use <code>~0</code> to get <code>0b11111111</code>.</p>

<figure class='code'><div class="highlight"><pre><span class="k">uint8_t</span> <span class="nf">SBOXL</span><span class="p">(</span><span class="k">uint8_t</span> <span class="n">a</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">b</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">c</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">uint8_t</span> <span class="n">c0</span> <span class="o">=</span> <span class="n">mux</span><span class="p">(</span> <span class="mi">0</span><span class="p">,</span>  <span class="mi">0</span><span class="p">,</span> <span class="n">c</span><span class="p">);</span>
  <span class="k">uint8_t</span> <span class="n">c1</span> <span class="o">=</span> <span class="n">mux</span><span class="p">(</span><span class="o">~</span><span class="mi">0</span><span class="p">,</span>  <span class="mi">0</span><span class="p">,</span> <span class="n">c</span><span class="p">);</span>
  <span class="k">uint8_t</span> <span class="n">c2</span> <span class="o">=</span> <span class="n">mux</span><span class="p">(</span><span class="o">~</span><span class="mi">0</span><span class="p">,</span> <span class="o">~</span><span class="mi">0</span><span class="p">,</span> <span class="n">c</span><span class="p">);</span>
  <span class="k">uint8_t</span> <span class="n">c3</span> <span class="o">=</span> <span class="n">mux</span><span class="p">(</span><span class="o">~</span><span class="mi">0</span><span class="p">,</span>  <span class="mi">0</span><span class="p">,</span> <span class="n">c</span><span class="p">);</span>

  <span class="k">uint8_t</span> <span class="n">b0</span> <span class="o">=</span> <span class="n">mux</span><span class="p">(</span><span class="n">c0</span><span class="p">,</span> <span class="n">c1</span><span class="p">,</span> <span class="n">b</span><span class="p">);</span>
  <span class="k">uint8_t</span> <span class="n">b1</span> <span class="o">=</span> <span class="n">mux</span><span class="p">(</span><span class="n">c2</span><span class="p">,</span> <span class="n">c3</span><span class="p">,</span> <span class="n">b</span><span class="p">);</span>

  <span class="k">return</span> <span class="n">mux</span><span class="p">(</span><span class="n">b0</span><span class="p">,</span> <span class="n">b1</span><span class="p">,</span> <span class="n">a</span><span class="p">);</span>
<span class="p">}</span>
</pre></div></figure>




<figure class='code'><div class="highlight"><pre><span class="k">uint8_t</span> <span class="nf">SBOXR</span><span class="p">(</span><span class="k">uint8_t</span> <span class="n">a</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">b</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">c</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">uint8_t</span> <span class="n">c0</span> <span class="o">=</span> <span class="n">mux</span><span class="p">(</span><span class="o">~</span><span class="mi">0</span><span class="p">,</span>  <span class="mi">0</span><span class="p">,</span> <span class="n">c</span><span class="p">);</span>
  <span class="k">uint8_t</span> <span class="n">c1</span> <span class="o">=</span> <span class="n">mux</span><span class="p">(</span><span class="o">~</span><span class="mi">0</span><span class="p">,</span> <span class="o">~</span><span class="mi">0</span><span class="p">,</span> <span class="n">c</span><span class="p">);</span>
  <span class="k">uint8_t</span> <span class="n">c2</span> <span class="o">=</span> <span class="n">mux</span><span class="p">(</span> <span class="mi">0</span><span class="p">,</span>  <span class="mi">0</span><span class="p">,</span> <span class="n">c</span><span class="p">);</span>
  <span class="k">uint8_t</span> <span class="n">c3</span> <span class="o">=</span> <span class="n">mux</span><span class="p">(</span><span class="o">~</span><span class="mi">0</span><span class="p">,</span>  <span class="mi">0</span><span class="p">,</span> <span class="n">c</span><span class="p">);</span>

  <span class="k">uint8_t</span> <span class="n">b0</span> <span class="o">=</span> <span class="n">mux</span><span class="p">(</span><span class="n">c0</span><span class="p">,</span> <span class="n">c1</span><span class="p">,</span> <span class="n">b</span><span class="p">);</span>
  <span class="k">uint8_t</span> <span class="n">b1</span> <span class="o">=</span> <span class="n">mux</span><span class="p">(</span><span class="n">c2</span><span class="p">,</span> <span class="n">c3</span><span class="p">,</span> <span class="n">b</span><span class="p">);</span>

  <span class="k">return</span> <span class="n">mux</span><span class="p">(</span><span class="n">b0</span><span class="p">,</span> <span class="n">b1</span><span class="p">,</span> <span class="n">a</span><span class="p">);</span>
<span class="p">}</span>
</pre></div></figure>




<figure class='code'><div class="highlight"><pre><span class="k">void</span> <span class="nf">SBOX</span><span class="p">(</span><span class="k">uint8_t</span> <span class="n">a</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">b</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">c</span><span class="p">,</span> <span class="k">uint8_t</span><span class="o">*</span> <span class="n">l</span><span class="p">,</span> <span class="k">uint8_t</span><span class="o">*</span> <span class="n">r</span><span class="p">)</span> <span class="p">{</span>
  <span class="o">*</span><span class="n">l</span> <span class="o">=</span> <span class="n">SBOXL</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">);</span>
  <span class="o">*</span><span class="n">r</span> <span class="o">=</span> <span class="n">SBOXR</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">);</span>
<span class="p">}</span>
</pre></div></figure>


<p>That wasn&rsquo;t too hard. <code>SBOX()</code> is constant-time and immune to cache timing
attacks. Not counting the negation of constants (<code>~0</code>) we have 42 gates in total
and perform eight lookups in parallel.</p>

<p>Assuming, for simplicity, that a table lookup is just one operation, the
bitsliced version is about five times as slow. If we had a workflow that
allowed for 64 parallel S-box lookups we could achieve eight times the
current throughput by using <code>uint64_t</code> variables.</p>

<h3>A better mux() function</h3>

<p><code>mux()</code> currently needs three operations. Here&rsquo;s another variant using <em>XOR</em>:</p>

<figure class='code'><div class="highlight"><pre><span class="k">uint8_t</span> <span class="nf">mux</span><span class="p">(</span><span class="k">uint8_t</span> <span class="n">a</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">b</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">s</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">uint8_t</span> <span class="n">c</span> <span class="o">=</span> <span class="n">a</span> <span class="o">^</span> <span class="n">b</span><span class="p">;</span>
  <span class="k">return</span> <span class="p">(</span><span class="n">c</span> <span class="o">&amp;</span> <span class="n">s</span><span class="p">)</span> <span class="o">^</span> <span class="n">a</span><span class="p">;</span>
<span class="p">}</span>
</pre></div></figure>


<p>Now there still are three gates, but the new version lends itself often to
easier optimization as we might be able to precompute <code>a ^ b</code> and reuse the
result.</p>

<h3>Simplifying the circuit</h3>

<p>Let&rsquo;s optimize our circuit manually by following these simple rules:</p>

<ul>
<li><code>mux(a, a, s)</code> reduces to <code>a</code>.</li>
<li>Any <code>X AND ~0</code> will always be <code>X</code>.</li>
<li>Anything <code>AND 0</code> will always be <code>0</code>.</li>
<li><code>mux()</code> with constant inputs can be reduced.</li>
</ul>


<p>With the new <code>mux()</code> variant there are a few <em>XOR</em> rules to follow as well:</p>

<ul>
<li>Any <code>X XOR X</code> reduces to <code>0</code>.</li>
<li>Any <code>X XOR 0</code>  reduces to <code>X</code>.</li>
<li>Any <code>X XOR ~0</code> reduces to <code>~X</code>.</li>
</ul>


<p>Inline the remaining <code>mux()</code> calls, eliminate common subexpressions, repeat.</p>

<figure class='code'><div class="highlight"><pre><span class="k">void</span> <span class="nf">SBOX</span><span class="p">(</span><span class="k">uint8_t</span> <span class="n">a</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">b</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">c</span><span class="p">,</span> <span class="k">uint8_t</span><span class="o">*</span> <span class="n">l</span><span class="p">,</span> <span class="k">uint8_t</span><span class="o">*</span> <span class="n">r</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">uint8_t</span> <span class="n">na</span> <span class="o">=</span> <span class="o">~</span><span class="n">a</span><span class="p">;</span>
  <span class="k">uint8_t</span> <span class="n">nb</span> <span class="o">=</span> <span class="o">~</span><span class="n">b</span><span class="p">;</span>
  <span class="k">uint8_t</span> <span class="n">nc</span> <span class="o">=</span> <span class="o">~</span><span class="n">c</span><span class="p">;</span>

  <span class="k">uint8_t</span> <span class="n">t0</span> <span class="o">=</span> <span class="n">nb</span> <span class="o">&amp;</span> <span class="n">a</span><span class="p">;</span>
  <span class="k">uint8_t</span> <span class="n">t1</span> <span class="o">=</span> <span class="n">nc</span> <span class="o">&amp;</span> <span class="n">b</span><span class="p">;</span>
  <span class="k">uint8_t</span> <span class="n">t2</span> <span class="o">=</span> <span class="n">b</span> <span class="o">|</span> <span class="n">nc</span><span class="p">;</span>
  <span class="k">uint8_t</span> <span class="n">t3</span> <span class="o">=</span> <span class="n">na</span> <span class="o">&amp;</span> <span class="n">t2</span><span class="p">;</span>

  <span class="o">*</span><span class="n">l</span> <span class="o">=</span> <span class="n">t0</span> <span class="o">|</span> <span class="n">t1</span><span class="p">;</span>
  <span class="o">*</span><span class="n">r</span> <span class="o">=</span> <span class="n">t1</span> <span class="o">|</span> <span class="n">t3</span><span class="p">;</span>
<span class="p">}</span>
</pre></div></figure>


<p>Using the <a href="https://en.wikipedia.org/wiki/Boolean_algebra#Laws">laws of Boolean algebra</a>
and the rules formulated above I&rsquo;ve reduced the circuit to nine gates (down from 42!).
We actually couldn&rsquo;t simplify it any further.</p>

<h2>Circuit Minimization</h2>

<p>Finding the <em>minimal form</em> of a Boolean function is an NP-complete problem. Manual
optimization is tedious but doable for a tiny S-box such as the example used in
this post. It will not be as easy for multiple 6-to-4-bit S-boxes (DES) or an
8-to-8-bit one (AES).</p>

<p>There are simpler and faster ways to build those circuits, and deterministic
algorithms to check whether we reached the minimal form. One of those is
covered in the next post <a href="https://timtaubert.de/blog/2018/08/bitslicing-with-karnaugh-maps/">Bitslicing with Karnaugh maps</a>.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Verified Binary Multiplication for GHASH]]></title>
    <link href="https://timtaubert.de/blog/2017/06/verified-binary-multiplication-for-ghash/"/>
    <updated>2017-06-29T19:45:57+02:00</updated>
    <id>https://timtaubert.de/blog/2017/06/verified-binary-multiplication-for-ghash</id>
    <content type="html"><![CDATA[<p><a href="https://timtaubert.de/blog/2017/02/simple-cryptol-specifications/">Previously</a> I introduced some very basic Cryptol and SAWScript, and explained how to reason about the correctness of constant-time integer multiplication written in C/C++.</p>

<p>In this post I will touch on using formal verification as part of the code review process, in particular show how, by using the <a href="http://saw.galois.com/">Software Analysis Workbench</a>, we saved ourselves hours of debugging when rewriting the GHASH implementation for NSS.</p>

<h2>What&rsquo;s GHASH again?</h2>

<p>GHASH is part of the <a href="https://en.wikipedia.org/wiki/Galois/Counter_Mode">Galois/Counter Mode</a>, a mode of operation for block ciphers. AES-GCM for example uses <a href="https://en.wikipedia.org/wiki/Advanced_Encryption_Standard">AES</a> as the block cipher for encryption, and appends a tag generated by the GHASH function, thereby ensuring integrity and authenticity.</p>

<p>The core of GHASH is multiplication in GF(2<sup>128</sup>), a characteristic-two finite field with coefficients in GF(2); they&rsquo;re either zero or one. Polynomials in GF(2<sup>m</sup>) can be represented as m-bit numbers, with each bit corresponding to a term&rsquo;s coefficient. In GF(2<sup>3</sup>) for example, <code>x^2 + 1</code> may be represented as the binary number <code>0b101 = 5</code>.</p>

<p>Additions and subtractions in finite fields are &ldquo;carry-less&rdquo; because the coefficients must be in GF(p), for any GF(p<sup>m</sup>). As <code>x * y</code> is equivalent to adding <code>x</code> to itself <code>y</code> times, we can call multiplication in finite fields &ldquo;carry-less&rdquo; too. In GF(2) addition is simply XOR, so we can say that multiplication in GF(2<sup>m</sup>) is equal to binary multiplication without carries.</p>

<p>Note that the term carry-less only makes sense when talking about GF(2<sup>m</sup>) fields that are easily represented as binary numbers. Otherwise one would rather talk about multiplication in finite fields without comparing it to standard integer multiplication.</p>

<p>Franziskus&#8217; post nicely describes <a href="https://www.franziskuskiefer.de/web/improving-aes-gcm-performance-in-nss/">why and how we updated our AES-GCM code in NSS</a>. In case a user&rsquo;s CPU is not equipped with the <a href="https://en.wikipedia.org/wiki/CLMUL_instruction_set">Carry-less Multiplication (CLMUL) instruction set</a>, we need to provide a fallback and implement carry-less, constant-time binary multiplication ourselves, using standard integer multiplication with carry.</p>

<h2>bmul() for 32-bit machines</h2>

<p>The basic implementation of our binary multiplication algorithm is taken straight from Thomas Pornin&rsquo;s excellent <a href="https://www.bearssl.org/constanttime.html#ghash-for-gcm">constant-time crypto post</a>. To support 32-bit machines the best we can do is multiply two <code>uint32_t</code> numbers and store the result in a <code>uint64_t</code>.</p>

<p>For the full GHASH, <a href="https://en.wikipedia.org/wiki/Karatsuba_algorithm">Karatsuba decomposition</a> is used: multiplication of two 128-bit integers is broken down into nine calls to <code>bmul32(x, y, ...)</code>. Let&rsquo;s take a look at the actual implementation:</p>

<figure class='code'><div class="highlight"><pre><span class="cm">/* Binary multiplication x * y = r_high &lt;&lt; 32 | r_low. */</span>
<span class="k">void</span>
<span class="nf">bmul32</span><span class="p">(</span><span class="k">uint32_t</span> <span class="n">x</span><span class="p">,</span> <span class="k">uint32_t</span> <span class="n">y</span><span class="p">,</span> <span class="k">uint32_t</span> <span class="o">*</span><span class="n">r_high</span><span class="p">,</span> <span class="k">uint32_t</span> <span class="o">*</span><span class="n">r_low</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">uint32_t</span> <span class="n">x0</span><span class="p">,</span> <span class="n">x1</span><span class="p">,</span> <span class="n">x2</span><span class="p">,</span> <span class="n">x3</span><span class="p">;</span>
    <span class="k">uint32_t</span> <span class="n">y0</span><span class="p">,</span> <span class="n">y1</span><span class="p">,</span> <span class="n">y2</span><span class="p">,</span> <span class="n">y3</span><span class="p">;</span>
    <span class="k">uint32_t</span> <span class="n">m1</span> <span class="o">=</span> <span class="p">(</span><span class="k">uint32_t</span><span class="p">)</span><span class="mh">0x11111111</span><span class="p">;</span>
    <span class="k">uint32_t</span> <span class="n">m2</span> <span class="o">=</span> <span class="p">(</span><span class="k">uint32_t</span><span class="p">)</span><span class="mh">0x22222222</span><span class="p">;</span>
    <span class="k">uint32_t</span> <span class="n">m4</span> <span class="o">=</span> <span class="p">(</span><span class="k">uint32_t</span><span class="p">)</span><span class="mh">0x44444444</span><span class="p">;</span>
    <span class="k">uint32_t</span> <span class="n">m8</span> <span class="o">=</span> <span class="p">(</span><span class="k">uint32_t</span><span class="p">)</span><span class="mh">0x88888888</span><span class="p">;</span>
    <span class="k">uint64_t</span> <span class="n">z0</span><span class="p">,</span> <span class="n">z1</span><span class="p">,</span> <span class="n">z2</span><span class="p">,</span> <span class="n">z3</span><span class="p">;</span>
    <span class="k">uint64_t</span> <span class="n">z</span><span class="p">;</span>

    <span class="cm">/* Apply bitmasks. */</span>
    <span class="n">x0</span> <span class="o">=</span> <span class="n">x</span> <span class="o">&amp;</span> <span class="n">m1</span><span class="p">;</span>
    <span class="n">x1</span> <span class="o">=</span> <span class="n">x</span> <span class="o">&amp;</span> <span class="n">m2</span><span class="p">;</span>
    <span class="n">x2</span> <span class="o">=</span> <span class="n">x</span> <span class="o">&amp;</span> <span class="n">m4</span><span class="p">;</span>
    <span class="n">x3</span> <span class="o">=</span> <span class="n">x</span> <span class="o">&amp;</span> <span class="n">m8</span><span class="p">;</span>
    <span class="n">y0</span> <span class="o">=</span> <span class="n">y</span> <span class="o">&amp;</span> <span class="n">m1</span><span class="p">;</span>
    <span class="n">y1</span> <span class="o">=</span> <span class="n">y</span> <span class="o">&amp;</span> <span class="n">m2</span><span class="p">;</span>
    <span class="n">y2</span> <span class="o">=</span> <span class="n">y</span> <span class="o">&amp;</span> <span class="n">m4</span><span class="p">;</span>
    <span class="n">y3</span> <span class="o">=</span> <span class="n">y</span> <span class="o">&amp;</span> <span class="n">m8</span><span class="p">;</span>

    <span class="cm">/* Integer multiplication (16 times). */</span>
    <span class="n">z0</span> <span class="o">=</span> <span class="p">((</span><span class="k">uint64_t</span><span class="p">)</span><span class="n">x0</span> <span class="o">*</span> <span class="n">y0</span><span class="p">)</span> <span class="o">^</span> <span class="p">((</span><span class="k">uint64_t</span><span class="p">)</span><span class="n">x1</span> <span class="o">*</span> <span class="n">y3</span><span class="p">)</span> <span class="o">^</span>
         <span class="p">((</span><span class="k">uint64_t</span><span class="p">)</span><span class="n">x2</span> <span class="o">*</span> <span class="n">y2</span><span class="p">)</span> <span class="o">^</span> <span class="p">((</span><span class="k">uint64_t</span><span class="p">)</span><span class="n">x3</span> <span class="o">*</span> <span class="n">y1</span><span class="p">);</span>
    <span class="n">z1</span> <span class="o">=</span> <span class="p">((</span><span class="k">uint64_t</span><span class="p">)</span><span class="n">x0</span> <span class="o">*</span> <span class="n">y1</span><span class="p">)</span> <span class="o">^</span> <span class="p">((</span><span class="k">uint64_t</span><span class="p">)</span><span class="n">x1</span> <span class="o">*</span> <span class="n">y0</span><span class="p">)</span> <span class="o">^</span>
         <span class="p">((</span><span class="k">uint64_t</span><span class="p">)</span><span class="n">x2</span> <span class="o">*</span> <span class="n">y3</span><span class="p">)</span> <span class="o">^</span> <span class="p">((</span><span class="k">uint64_t</span><span class="p">)</span><span class="n">x3</span> <span class="o">*</span> <span class="n">y2</span><span class="p">);</span>
    <span class="n">z2</span> <span class="o">=</span> <span class="p">((</span><span class="k">uint64_t</span><span class="p">)</span><span class="n">x0</span> <span class="o">*</span> <span class="n">y2</span><span class="p">)</span> <span class="o">^</span> <span class="p">((</span><span class="k">uint64_t</span><span class="p">)</span><span class="n">x1</span> <span class="o">*</span> <span class="n">y1</span><span class="p">)</span> <span class="o">^</span>
         <span class="p">((</span><span class="k">uint64_t</span><span class="p">)</span><span class="n">x2</span> <span class="o">*</span> <span class="n">y0</span><span class="p">)</span> <span class="o">^</span> <span class="p">((</span><span class="k">uint64_t</span><span class="p">)</span><span class="n">x3</span> <span class="o">*</span> <span class="n">y3</span><span class="p">);</span>
    <span class="n">z3</span> <span class="o">=</span> <span class="p">((</span><span class="k">uint64_t</span><span class="p">)</span><span class="n">x0</span> <span class="o">*</span> <span class="n">y3</span><span class="p">)</span> <span class="o">^</span> <span class="p">((</span><span class="k">uint64_t</span><span class="p">)</span><span class="n">x1</span> <span class="o">*</span> <span class="n">y2</span><span class="p">)</span> <span class="o">^</span>
         <span class="p">((</span><span class="k">uint64_t</span><span class="p">)</span><span class="n">x2</span> <span class="o">*</span> <span class="n">y1</span><span class="p">)</span> <span class="o">^</span> <span class="p">((</span><span class="k">uint64_t</span><span class="p">)</span><span class="n">x3</span> <span class="o">*</span> <span class="n">y0</span><span class="p">);</span>

    <span class="cm">/* Merge results. */</span>
    <span class="n">z0</span> <span class="o">&amp;=</span> <span class="p">((</span><span class="k">uint64_t</span><span class="p">)</span><span class="n">m1</span> <span class="o">&lt;&lt;</span> <span class="mi">32</span><span class="p">)</span> <span class="o">|</span> <span class="n">m1</span><span class="p">;</span>
    <span class="n">z1</span> <span class="o">&amp;=</span> <span class="p">((</span><span class="k">uint64_t</span><span class="p">)</span><span class="n">m2</span> <span class="o">&lt;&lt;</span> <span class="mi">32</span><span class="p">)</span> <span class="o">|</span> <span class="n">m2</span><span class="p">;</span>
    <span class="n">z2</span> <span class="o">&amp;=</span> <span class="p">((</span><span class="k">uint64_t</span><span class="p">)</span><span class="n">m4</span> <span class="o">&lt;&lt;</span> <span class="mi">32</span><span class="p">)</span> <span class="o">|</span> <span class="n">m4</span><span class="p">;</span>
    <span class="n">z3</span> <span class="o">&amp;=</span> <span class="p">((</span><span class="k">uint64_t</span><span class="p">)</span><span class="n">m8</span> <span class="o">&lt;&lt;</span> <span class="mi">32</span><span class="p">)</span> <span class="o">|</span> <span class="n">m8</span><span class="p">;</span>
    <span class="n">z</span> <span class="o">=</span> <span class="n">z0</span> <span class="o">|</span> <span class="n">z1</span> <span class="o">|</span> <span class="n">z2</span> <span class="o">|</span> <span class="n">z3</span><span class="p">;</span>
    <span class="o">*</span><span class="n">r_high</span> <span class="o">=</span> <span class="p">(</span><span class="k">uint32_t</span><span class="p">)(</span><span class="n">z</span> <span class="o">&gt;&gt;</span> <span class="mi">32</span><span class="p">);</span>
    <span class="o">*</span><span class="n">r_low</span> <span class="o">=</span> <span class="p">(</span><span class="k">uint32_t</span><span class="p">)</span><span class="n">z</span><span class="p">;</span>
<span class="p">}</span>
</pre></div></figure>


<p>Thomas&#8217; explanation is not too hard to follow. The main idea behind the algorithm are the bitmasks <code>m1 = 0b00010001...</code>, <code>m2 = 0b00100010...</code>, <code>m4 = 0b01000100...</code>, and <code>m8 = 0b10001000...</code>. They respectively have the first, second, third, and fourth bit of every nibble set. This leaves &ldquo;holes&rdquo; of three bits between each &ldquo;data bit&rdquo;, so that with those applied at most a quarter of the 32 bits are equal to one.</p>

<p>Per standard integer multiplication, eight times eight bits will at most add eight carry bits of value one together, thus we need sufficiently sized holes per digit that can hold the value <code>8 = 0b1000</code>. Three-bit holes are big enough to prevent carries from &ldquo;spilling&rdquo; over, they could even handle up to <code>15 = 0b1111</code> data bits in each of the two integer operands.</p>

<h2>Review, tests, and verification</h2>

<p>The first version of the patch came with a bunch of new tests, the vectors taken from the <a href="http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/proposedmodes/gcm/">GCM specification</a>. We previously had no such low-level coverage, all we had were a number of high-level AES-GCM tests.</p>

<p>When reviewing, after looking at the patch itself and applying it locally to see whether it builds and tests succeed, the next step I wanted to try was to write a Cryptol specification to prove the correctness of <code>bmul32()</code>. Thanks to the built-in <code>pmult</code> function that took only a few minutes.</p>

<figure class='code'><div class="highlight"><pre><span class="err">m</span> <span class="err">&lt;-</span> <span class="k">llvm_load_module</span> <span class="s2">&quot;bmul.bc&quot;</span><span class="err">;</span>

<span class="k">let</span> <span class="err">{{</span>
  <span class="err">bmul32</span> <span class="err">:</span> <span class="err">[32]</span> <span class="err">-&gt;</span> <span class="err">[32]</span> <span class="err">-&gt;</span> <span class="err">([32],</span> <span class="err">[32])</span>
  <span class="err">bmul32</span> <span class="err">a</span> <span class="err">b</span> <span class="err">=</span> <span class="err">(</span><span class="k">take</span><span class="err">`{32}</span> <span class="err">prod,</span> <span class="k">drop</span><span class="err">`{32}</span> <span class="err">prod)</span>
      <span class="k">where</span> <span class="err">prod</span> <span class="err">=</span> <span class="err">pad</span> <span class="err">(</span><span class="k">pmult</span> <span class="err">a</span> <span class="err">b)</span>
            <span class="err">pad</span> <span class="err">x</span> <span class="err">=</span> <span class="k">zero</span> <span class="err">#</span> <span class="err">x</span>
<span class="err">}};</span>
</pre></div></figure>


<p>The SAWScript needed to properly parse the LLVM bitcode and formulate the equivalence proof is straightforward, it&rsquo;s basically the same as shown in the previous post.</p>

<figure class='code'><div class="highlight"><pre><span class="k">llvm_verify</span> <span class="err">m</span> <span class="s2">&quot;bmul32&quot;</span> <span class="err">[]</span> <span class="k">do</span> <span class="err">{</span>
  <span class="err">x</span> <span class="err">&lt;-</span> <span class="k">llvm_var</span> <span class="s2">&quot;x&quot;</span> <span class="err">(</span><span class="k">llvm_int</span> <span class="err">32);</span>
  <span class="err">y</span> <span class="err">&lt;-</span> <span class="k">llvm_var</span> <span class="s2">&quot;y&quot;</span> <span class="err">(</span><span class="k">llvm_int</span> <span class="err">32);</span>
  <span class="k">llvm_ptr</span> <span class="s2">&quot;r_high&quot;</span> <span class="err">(</span><span class="k">llvm_int</span> <span class="err">32);</span>
  <span class="err">r_high</span> <span class="err">&lt;-</span> <span class="k">llvm_var</span> <span class="s2">&quot;*r_high&quot;</span> <span class="err">(</span><span class="k">llvm_int</span> <span class="err">32);</span>
  <span class="k">llvm_ptr</span> <span class="s2">&quot;r_low&quot;</span> <span class="err">(</span><span class="k">llvm_int</span> <span class="err">32);</span>
  <span class="err">r_low</span> <span class="err">&lt;-</span> <span class="k">llvm_var</span> <span class="s2">&quot;*r_low&quot;</span> <span class="err">(</span><span class="k">llvm_int</span> <span class="err">32);</span>

  <span class="k">let</span> <span class="err">res</span> <span class="err">=</span> <span class="err">{{</span> <span class="err">bmul32</span> <span class="err">x</span> <span class="err">y</span> <span class="err">}};</span>
  <span class="k">llvm_ensure_eq</span> <span class="s2">&quot;*r_high&quot;</span> <span class="err">{{</span> <span class="err">res.0</span> <span class="err">}};</span>
  <span class="k">llvm_ensure_eq</span> <span class="s2">&quot;*r_low&quot;</span> <span class="err">{{</span> <span class="err">res.1</span> <span class="err">}};</span>

  <span class="k">llvm_verify_tactic</span> <span class="err">abc;</span>
<span class="err">};</span>
</pre></div></figure>


<p>Compile to bitcode and run SAW. After just a few seconds it will tell us it succeeded in proving equivalency of both implementations.</p>

<figure class='code'><div class="highlight"><pre>$ saw bmul.saw
Loading module Cryptol
Loading file &quot;bmul.saw&quot;
Successfully verified @bmul32
</pre></div></figure>


<h2>bmul() for 64-bit machines</h2>

<p><code>bmul32()</code> is called nine times, each time performing 16 multiplications. That&rsquo;s 144 multiplications in total for one GHASH evaluation. If we had a <code>bmul64()</code> for 128-bit multiplication with <code>uint128_t</code> we&rsquo;d need to call it only thrice.</p>

<p>The naive approach taken in the first patch revision was to just double the bitsize of the arguments and variables, and also extend the bitmasks. If you paid close attention to the previous section you might notice a problem here already. If not, it will become clear in a few moments.</p>

<figure class='code'><div class="highlight"><pre><span class="k">typedef</span> <span class="k">unsigned</span> <span class="n">__int128</span> <span class="k">uint128_t</span><span class="p">;</span>

<span class="cm">/* Binary multiplication x * y = r_high &lt;&lt; 64 | r_low. */</span>
<span class="k">void</span>
<span class="nf">bmul64</span><span class="p">(</span><span class="k">uint64_t</span> <span class="n">x</span><span class="p">,</span> <span class="k">uint64_t</span> <span class="n">y</span><span class="p">,</span> <span class="k">uint64_t</span> <span class="o">*</span><span class="n">r_high</span><span class="p">,</span> <span class="k">uint64_t</span> <span class="o">*</span><span class="n">r_low</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">uint64_t</span> <span class="n">x0</span><span class="p">,</span> <span class="n">x1</span><span class="p">,</span> <span class="n">x2</span><span class="p">,</span> <span class="n">x3</span><span class="p">;</span>
    <span class="k">uint64_t</span> <span class="n">y0</span><span class="p">,</span> <span class="n">y1</span><span class="p">,</span> <span class="n">y2</span><span class="p">,</span> <span class="n">y3</span><span class="p">;</span>
    <span class="k">uint64_t</span> <span class="n">m1</span> <span class="o">=</span> <span class="p">(</span><span class="k">uint64_t</span><span class="p">)</span><span class="mh">0x1111111111111111</span><span class="p">;</span>
    <span class="k">uint64_t</span> <span class="n">m2</span> <span class="o">=</span> <span class="p">(</span><span class="k">uint64_t</span><span class="p">)</span><span class="mh">0x2222222222222222</span><span class="p">;</span>
    <span class="k">uint64_t</span> <span class="n">m4</span> <span class="o">=</span> <span class="p">(</span><span class="k">uint64_t</span><span class="p">)</span><span class="mh">0x4444444444444444</span><span class="p">;</span>
    <span class="k">uint64_t</span> <span class="n">m8</span> <span class="o">=</span> <span class="p">(</span><span class="k">uint64_t</span><span class="p">)</span><span class="mh">0x8888888888888888</span><span class="p">;</span>
    <span class="k">uint128_t</span> <span class="n">z0</span><span class="p">,</span> <span class="n">z1</span><span class="p">,</span> <span class="n">z2</span><span class="p">,</span> <span class="n">z3</span><span class="p">;</span>
    <span class="k">uint128_t</span> <span class="n">z</span><span class="p">;</span>

    <span class="cm">/* Apply bitmasks. */</span>
    <span class="n">x0</span> <span class="o">=</span> <span class="n">x</span> <span class="o">&amp;</span> <span class="n">m1</span><span class="p">;</span>
    <span class="n">x1</span> <span class="o">=</span> <span class="n">x</span> <span class="o">&amp;</span> <span class="n">m2</span><span class="p">;</span>
    <span class="n">x2</span> <span class="o">=</span> <span class="n">x</span> <span class="o">&amp;</span> <span class="n">m4</span><span class="p">;</span>
    <span class="n">x3</span> <span class="o">=</span> <span class="n">x</span> <span class="o">&amp;</span> <span class="n">m8</span><span class="p">;</span>
    <span class="n">y0</span> <span class="o">=</span> <span class="n">y</span> <span class="o">&amp;</span> <span class="n">m1</span><span class="p">;</span>
    <span class="n">y1</span> <span class="o">=</span> <span class="n">y</span> <span class="o">&amp;</span> <span class="n">m2</span><span class="p">;</span>
    <span class="n">y2</span> <span class="o">=</span> <span class="n">y</span> <span class="o">&amp;</span> <span class="n">m4</span><span class="p">;</span>
    <span class="n">y3</span> <span class="o">=</span> <span class="n">y</span> <span class="o">&amp;</span> <span class="n">m8</span><span class="p">;</span>

    <span class="cm">/* Integer multiplication (16 times). */</span>
    <span class="n">z0</span> <span class="o">=</span> <span class="p">((</span><span class="k">uint128_t</span><span class="p">)</span><span class="n">x0</span> <span class="o">*</span> <span class="n">y0</span><span class="p">)</span> <span class="o">^</span> <span class="p">((</span><span class="k">uint128_t</span><span class="p">)</span><span class="n">x1</span> <span class="o">*</span> <span class="n">y3</span><span class="p">)</span> <span class="o">^</span>
         <span class="p">((</span><span class="k">uint128_t</span><span class="p">)</span><span class="n">x2</span> <span class="o">*</span> <span class="n">y2</span><span class="p">)</span> <span class="o">^</span> <span class="p">((</span><span class="k">uint128_t</span><span class="p">)</span><span class="n">x3</span> <span class="o">*</span> <span class="n">y1</span><span class="p">);</span>
    <span class="n">z1</span> <span class="o">=</span> <span class="p">((</span><span class="k">uint128_t</span><span class="p">)</span><span class="n">x0</span> <span class="o">*</span> <span class="n">y1</span><span class="p">)</span> <span class="o">^</span> <span class="p">((</span><span class="k">uint128_t</span><span class="p">)</span><span class="n">x1</span> <span class="o">*</span> <span class="n">y0</span><span class="p">)</span> <span class="o">^</span>
         <span class="p">((</span><span class="k">uint128_t</span><span class="p">)</span><span class="n">x2</span> <span class="o">*</span> <span class="n">y3</span><span class="p">)</span> <span class="o">^</span> <span class="p">((</span><span class="k">uint128_t</span><span class="p">)</span><span class="n">x3</span> <span class="o">*</span> <span class="n">y2</span><span class="p">);</span>
    <span class="n">z2</span> <span class="o">=</span> <span class="p">((</span><span class="k">uint128_t</span><span class="p">)</span><span class="n">x0</span> <span class="o">*</span> <span class="n">y2</span><span class="p">)</span> <span class="o">^</span> <span class="p">((</span><span class="k">uint128_t</span><span class="p">)</span><span class="n">x1</span> <span class="o">*</span> <span class="n">y1</span><span class="p">)</span> <span class="o">^</span>
         <span class="p">((</span><span class="k">uint128_t</span><span class="p">)</span><span class="n">x2</span> <span class="o">*</span> <span class="n">y0</span><span class="p">)</span> <span class="o">^</span> <span class="p">((</span><span class="k">uint128_t</span><span class="p">)</span><span class="n">x3</span> <span class="o">*</span> <span class="n">y3</span><span class="p">);</span>
    <span class="n">z3</span> <span class="o">=</span> <span class="p">((</span><span class="k">uint128_t</span><span class="p">)</span><span class="n">x0</span> <span class="o">*</span> <span class="n">y3</span><span class="p">)</span> <span class="o">^</span> <span class="p">((</span><span class="k">uint128_t</span><span class="p">)</span><span class="n">x1</span> <span class="o">*</span> <span class="n">y2</span><span class="p">)</span> <span class="o">^</span>
         <span class="p">((</span><span class="k">uint128_t</span><span class="p">)</span><span class="n">x2</span> <span class="o">*</span> <span class="n">y1</span><span class="p">)</span> <span class="o">^</span> <span class="p">((</span><span class="k">uint128_t</span><span class="p">)</span><span class="n">x3</span> <span class="o">*</span> <span class="n">y0</span><span class="p">);</span>

    <span class="cm">/* Merge results. */</span>
    <span class="n">z0</span> <span class="o">&amp;=</span> <span class="p">((</span><span class="k">uint128_t</span><span class="p">)</span><span class="n">m1</span> <span class="o">&lt;&lt;</span> <span class="mi">64</span><span class="p">)</span> <span class="o">|</span> <span class="n">m1</span><span class="p">;</span>
    <span class="n">z1</span> <span class="o">&amp;=</span> <span class="p">((</span><span class="k">uint128_t</span><span class="p">)</span><span class="n">m2</span> <span class="o">&lt;&lt;</span> <span class="mi">64</span><span class="p">)</span> <span class="o">|</span> <span class="n">m2</span><span class="p">;</span>
    <span class="n">z2</span> <span class="o">&amp;=</span> <span class="p">((</span><span class="k">uint128_t</span><span class="p">)</span><span class="n">m4</span> <span class="o">&lt;&lt;</span> <span class="mi">64</span><span class="p">)</span> <span class="o">|</span> <span class="n">m4</span><span class="p">;</span>
    <span class="n">z3</span> <span class="o">&amp;=</span> <span class="p">((</span><span class="k">uint128_t</span><span class="p">)</span><span class="n">m8</span> <span class="o">&lt;&lt;</span> <span class="mi">64</span><span class="p">)</span> <span class="o">|</span> <span class="n">m8</span><span class="p">;</span>
    <span class="n">z</span> <span class="o">=</span> <span class="n">z0</span> <span class="o">|</span> <span class="n">z1</span> <span class="o">|</span> <span class="n">z2</span> <span class="o">|</span> <span class="n">z3</span><span class="p">;</span>
    <span class="o">*</span><span class="n">r_high</span> <span class="o">=</span> <span class="p">(</span><span class="k">uint64_t</span><span class="p">)(</span><span class="n">z</span> <span class="o">&gt;&gt;</span> <span class="mi">64</span><span class="p">);</span>
    <span class="o">*</span><span class="n">r_low</span> <span class="o">=</span> <span class="p">(</span><span class="k">uint64_t</span><span class="p">)</span><span class="n">z</span><span class="p">;</span>
<span class="p">}</span>
</pre></div></figure>


<h2>Tests and another equivalence proof</h2>

<p>The above version of <code>bmul64()</code> <em>passed</em> the GHASH test vectors with flying colors. That tricked reviewers into thinking it looked just fine, even if they just learned about the basic algorithm idea. Fallible humans. Let&rsquo;s update the proofs and see what happens.</p>

<figure class='code'><div class="highlight"><pre><span class="err">bmul</span> <span class="err">:</span> <span class="err">{n,m}</span> <span class="err">(</span><span class="k">fin</span> <span class="err">n,</span> <span class="err">n</span> <span class="err">&gt;=</span> <span class="err">1,</span> <span class="err">m</span> <span class="err">==</span> <span class="err">n*2</span> <span class="err">-</span> <span class="err">1)</span> <span class="err">=&gt;</span> <span class="err">[n]</span> <span class="err">-&gt;</span> <span class="err">[n]</span> <span class="err">-&gt;</span> <span class="err">([n],</span> <span class="err">[n])</span>
<span class="err">bmul</span> <span class="err">a</span> <span class="err">b</span> <span class="err">=</span> <span class="err">(</span><span class="k">take</span><span class="err">`{n}</span> <span class="err">prod,</span> <span class="k">drop</span><span class="err">`{n}</span> <span class="err">prod)</span>
    <span class="k">where</span> <span class="err">prod</span> <span class="err">=</span> <span class="err">pad</span> <span class="err">(</span><span class="k">pmult</span> <span class="err">a</span> <span class="err">b</span> <span class="err">:</span> <span class="err">[m])</span>
          <span class="err">pad</span> <span class="err">x</span> <span class="err">=</span> <span class="k">zero</span> <span class="err">#</span> <span class="err">x</span>
</pre></div></figure>


<p>Instead of hardcoding <code>bmul</code> for 32-bit integers we use polymorphic types <code>m</code> and <code>n</code> to denote the size in bits. <code>m</code> is mostly a helper to make it a tad more readable. We can now reason about carry-less n-bit binary multiplication.</p>

<p>Duplicating the SAWScript spec and running <code>:s/32/64</code> is easy, but certainly nicer is adding a function that takes <code>n</code> as a parameter and returns a spec for n-bit arguments.</p>

<figure class='code'><div class="highlight"><pre><span class="k">let</span> <span class="err">SpecBinaryMul</span> <span class="err">n</span> <span class="err">=</span> <span class="k">do</span> <span class="err">{</span>
  <span class="err">x</span> <span class="err">&lt;-</span> <span class="k">llvm_var</span> <span class="s2">&quot;x&quot;</span> <span class="err">(</span><span class="k">llvm_int</span> <span class="err">n);</span>
  <span class="err">y</span> <span class="err">&lt;-</span> <span class="k">llvm_var</span> <span class="s2">&quot;y&quot;</span> <span class="err">(</span><span class="k">llvm_int</span> <span class="err">n);</span>
  <span class="k">llvm_ptr</span> <span class="s2">&quot;r_high&quot;</span> <span class="err">(</span><span class="k">llvm_int</span> <span class="err">n);</span>
  <span class="err">r_high</span> <span class="err">&lt;-</span> <span class="k">llvm_var</span> <span class="s2">&quot;*r_high&quot;</span> <span class="err">(</span><span class="k">llvm_int</span> <span class="err">n);</span>
  <span class="k">llvm_ptr</span> <span class="s2">&quot;r_low&quot;</span> <span class="err">(</span><span class="k">llvm_int</span> <span class="err">n);</span>
  <span class="err">r_low</span> <span class="err">&lt;-</span> <span class="k">llvm_var</span> <span class="s2">&quot;*r_low&quot;</span> <span class="err">(</span><span class="k">llvm_int</span> <span class="err">n);</span>

  <span class="k">let</span> <span class="err">res</span> <span class="err">=</span> <span class="err">{{</span> <span class="err">bmul</span> <span class="err">x</span> <span class="err">y</span> <span class="err">}};</span>
  <span class="k">llvm_ensure_eq</span> <span class="s2">&quot;*r_high&quot;</span> <span class="err">{{</span> <span class="err">res.0</span> <span class="err">}};</span>
  <span class="k">llvm_ensure_eq</span> <span class="s2">&quot;*r_low&quot;</span> <span class="err">{{</span> <span class="err">res.1</span> <span class="err">}};</span>

  <span class="k">llvm_verify_tactic</span> <span class="err">abc;</span>
<span class="err">};</span>

<span class="k">llvm_verify</span> <span class="err">m</span> <span class="s2">&quot;bmul32&quot;</span> <span class="err">[]</span> <span class="err">(SpecBinaryMul</span> <span class="err">32);</span>
<span class="k">llvm_verify</span> <span class="err">m</span> <span class="s2">&quot;bmul64&quot;</span> <span class="err">[]</span> <span class="err">(SpecBinaryMul</span> <span class="err">64);</span>
</pre></div></figure>


<p>We use two instances of the <code>bmul</code> spec to prove correctness of <code>bmul32()</code> and <code>bmul64()</code> sequentially. The second verification will take a lot longer before yielding results.</p>

<figure class='code'><div class="highlight"><pre>$ saw bmul.saw
Loading module Cryptol
Loading file &quot;bmul.saw&quot;
Successfully verified @bmul32
When verifying @bmul64:
Proof of Term *(Term Ident &quot;r_high&quot;) failed.
Counterexample:
  %x: 15554860936645695441
  %y: 17798150062858027007
  lss__alloc0: 262144
  lss__alloc1: 8
Term *(Term Ident &quot;r_high&quot;)
Encountered:  5413984507840984561
Expected:     5413984507840984531
saw: user error (&quot;llvm_verify&quot; (bmul.saw:31:1):
Proof failed.)
</pre></div></figure>


<p><em>Proof failed.</em> As you probably expected by now, the <code>bmul64()</code> implementation is erroneous and SAW gives us a specific counterexample to investigate further. It took us a while to understand the failure but it seems very obvious in hindsight.</p>

<h2>Fixing the bmul64() bitmasks</h2>

<p>As already shown above, bitmasks leaving three-bit holes between data bits can avoid carry-spilling for up to two 15-bit integers. Using every fourth bit of a 64-bit argument however yields 16 data bits each, and carries can thus override data bits. We need bitmasks with four-bit holes.</p>

<figure class='code'><div class="highlight"><pre><span class="cm">/* Binary multiplication x * y = r_high &lt;&lt; 64 | r_low. */</span>
<span class="k">void</span>
<span class="nf">bmul64</span><span class="p">(</span><span class="k">uint64_t</span> <span class="n">x</span><span class="p">,</span> <span class="k">uint64_t</span> <span class="n">y</span><span class="p">,</span> <span class="k">uint64_t</span> <span class="o">*</span><span class="n">r_high</span><span class="p">,</span> <span class="k">uint64_t</span> <span class="o">*</span><span class="n">r_low</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">uint128_t</span> <span class="n">x1</span><span class="p">,</span> <span class="n">x2</span><span class="p">,</span> <span class="n">x3</span><span class="p">,</span> <span class="n">x4</span><span class="p">,</span> <span class="n">x5</span><span class="p">;</span>
    <span class="k">uint128_t</span> <span class="n">y1</span><span class="p">,</span> <span class="n">y2</span><span class="p">,</span> <span class="n">y3</span><span class="p">,</span> <span class="n">y4</span><span class="p">,</span> <span class="n">y5</span><span class="p">;</span>
    <span class="k">uint128_t</span> <span class="n">r</span><span class="p">,</span> <span class="n">z</span><span class="p">;</span>

    <span class="cm">/* Define bitmasks with 4-bit holes. */</span>
    <span class="k">uint128_t</span> <span class="n">m1</span> <span class="o">=</span> <span class="p">(</span><span class="k">uint128_t</span><span class="p">)</span><span class="mh">0x2108421084210842</span> <span class="o">&lt;&lt;</span> <span class="mi">64</span> <span class="o">|</span> <span class="mh">0x1084210842108421</span><span class="p">;</span>
    <span class="k">uint128_t</span> <span class="n">m2</span> <span class="o">=</span> <span class="p">(</span><span class="k">uint128_t</span><span class="p">)</span><span class="mh">0x4210842108421084</span> <span class="o">&lt;&lt;</span> <span class="mi">64</span> <span class="o">|</span> <span class="mh">0x2108421084210842</span><span class="p">;</span>
    <span class="k">uint128_t</span> <span class="n">m3</span> <span class="o">=</span> <span class="p">(</span><span class="k">uint128_t</span><span class="p">)</span><span class="mh">0x8421084210842108</span> <span class="o">&lt;&lt;</span> <span class="mi">64</span> <span class="o">|</span> <span class="mh">0x4210842108421084</span><span class="p">;</span>
    <span class="k">uint128_t</span> <span class="n">m4</span> <span class="o">=</span> <span class="p">(</span><span class="k">uint128_t</span><span class="p">)</span><span class="mh">0x0842108421084210</span> <span class="o">&lt;&lt;</span> <span class="mi">64</span> <span class="o">|</span> <span class="mh">0x8421084210842108</span><span class="p">;</span>
    <span class="k">uint128_t</span> <span class="n">m5</span> <span class="o">=</span> <span class="p">(</span><span class="k">uint128_t</span><span class="p">)</span><span class="mh">0x1084210842108421</span> <span class="o">&lt;&lt;</span> <span class="mi">64</span> <span class="o">|</span> <span class="mh">0x0842108421084210</span><span class="p">;</span>

    <span class="cm">/* Apply bitmasks. */</span>
    <span class="n">x1</span> <span class="o">=</span> <span class="n">x</span> <span class="o">&amp;</span> <span class="n">m1</span><span class="p">;</span>
    <span class="n">y1</span> <span class="o">=</span> <span class="n">y</span> <span class="o">&amp;</span> <span class="n">m1</span><span class="p">;</span>
    <span class="n">x2</span> <span class="o">=</span> <span class="n">x</span> <span class="o">&amp;</span> <span class="n">m2</span><span class="p">;</span>
    <span class="n">y2</span> <span class="o">=</span> <span class="n">y</span> <span class="o">&amp;</span> <span class="n">m2</span><span class="p">;</span>
    <span class="n">x3</span> <span class="o">=</span> <span class="n">x</span> <span class="o">&amp;</span> <span class="n">m3</span><span class="p">;</span>
    <span class="n">y3</span> <span class="o">=</span> <span class="n">y</span> <span class="o">&amp;</span> <span class="n">m3</span><span class="p">;</span>
    <span class="n">x4</span> <span class="o">=</span> <span class="n">x</span> <span class="o">&amp;</span> <span class="n">m4</span><span class="p">;</span>
    <span class="n">y4</span> <span class="o">=</span> <span class="n">y</span> <span class="o">&amp;</span> <span class="n">m4</span><span class="p">;</span>
    <span class="n">x5</span> <span class="o">=</span> <span class="n">x</span> <span class="o">&amp;</span> <span class="n">m5</span><span class="p">;</span>
    <span class="n">y5</span> <span class="o">=</span> <span class="n">y</span> <span class="o">&amp;</span> <span class="n">m5</span><span class="p">;</span>

    <span class="cm">/* Integer multiplication (25 times) and merge results. */</span>
    <span class="n">z</span> <span class="o">=</span> <span class="p">(</span><span class="n">x1</span> <span class="o">*</span> <span class="n">y1</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">x2</span> <span class="o">*</span> <span class="n">y5</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">x3</span> <span class="o">*</span> <span class="n">y4</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">x4</span> <span class="o">*</span> <span class="n">y3</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">x5</span> <span class="o">*</span> <span class="n">y2</span><span class="p">);</span>
    <span class="n">r</span> <span class="o">=</span> <span class="n">z</span> <span class="o">&amp;</span> <span class="n">m1</span><span class="p">;</span>
    <span class="n">z</span> <span class="o">=</span> <span class="p">(</span><span class="n">x1</span> <span class="o">*</span> <span class="n">y2</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">x2</span> <span class="o">*</span> <span class="n">y1</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">x3</span> <span class="o">*</span> <span class="n">y5</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">x4</span> <span class="o">*</span> <span class="n">y4</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">x5</span> <span class="o">*</span> <span class="n">y3</span><span class="p">);</span>
    <span class="n">r</span> <span class="o">|=</span> <span class="n">z</span> <span class="o">&amp;</span> <span class="n">m2</span><span class="p">;</span>
    <span class="n">z</span> <span class="o">=</span> <span class="p">(</span><span class="n">x1</span> <span class="o">*</span> <span class="n">y3</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">x2</span> <span class="o">*</span> <span class="n">y2</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">x3</span> <span class="o">*</span> <span class="n">y1</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">x4</span> <span class="o">*</span> <span class="n">y5</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">x5</span> <span class="o">*</span> <span class="n">y4</span><span class="p">);</span>
    <span class="n">r</span> <span class="o">|=</span> <span class="n">z</span> <span class="o">&amp;</span> <span class="n">m3</span><span class="p">;</span>
    <span class="n">z</span> <span class="o">=</span> <span class="p">(</span><span class="n">x1</span> <span class="o">*</span> <span class="n">y4</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">x2</span> <span class="o">*</span> <span class="n">y3</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">x3</span> <span class="o">*</span> <span class="n">y2</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">x4</span> <span class="o">*</span> <span class="n">y1</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">x5</span> <span class="o">*</span> <span class="n">y5</span><span class="p">);</span>
    <span class="n">r</span> <span class="o">|=</span> <span class="n">z</span> <span class="o">&amp;</span> <span class="n">m4</span><span class="p">;</span>
    <span class="n">z</span> <span class="o">=</span> <span class="p">(</span><span class="n">x1</span> <span class="o">*</span> <span class="n">y5</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">x2</span> <span class="o">*</span> <span class="n">y4</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">x3</span> <span class="o">*</span> <span class="n">y3</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">x4</span> <span class="o">*</span> <span class="n">y2</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">x5</span> <span class="o">*</span> <span class="n">y1</span><span class="p">);</span>
    <span class="n">r</span> <span class="o">|=</span> <span class="n">z</span> <span class="o">&amp;</span> <span class="n">m5</span><span class="p">;</span>

    <span class="o">*</span><span class="n">r_high</span> <span class="o">=</span> <span class="p">(</span><span class="k">uint64_t</span><span class="p">)(</span><span class="n">r</span> <span class="o">&gt;&gt;</span> <span class="mi">64</span><span class="p">);</span>
    <span class="o">*</span><span class="n">r_low</span> <span class="o">=</span> <span class="p">(</span><span class="k">uint64_t</span><span class="p">)</span><span class="n">r</span><span class="p">;</span>
<span class="p">}</span>
</pre></div></figure>


<p><code>m1</code>, &hellip;, <code>m5</code> are the new bitmasks. <code>m1</code> equals <code>0b0010000100001...</code>, the others are each shifted by one. As the number of data bits per argument is now <code>64/5 &lt;= n &lt; 64/4</code> we need <code>5*5 = 25</code> multiplications. With three calls to <code>bmul64()</code> that&rsquo;s 75 in total.</p>

<p>Run SAW again and, after about an hour, it will tell us it <em>successfully verified @bmul64</em>.</p>

<figure class='code'><div class="highlight"><pre>$ saw bmul.saw
Loading module Cryptol
Loading file &quot;bmul.saw&quot;
Successfully verified @bmul32
Successfully verified @bmul64
</pre></div></figure>


<p>You might want to take a look at <a href="https://www.bearssl.org/gitweb/?p=BearSSL;a=blob;f=src/hash/ghash_ctmul64.c;h=a46f16fee977f6102abea7f7bcdf169a013c3e8e;hb=5f045c759957fdff8c85716e6af99e10901fdac0">Thomas Pornin&rsquo;s version</a> of <code>bmul64()</code>. This basically is the faulty version that SAW failed to verify, he however works around the overflow by calling it twice, passing arguments reversed bitwise the second time. He invokes <code>bmul64()</code> six times, which results in a total of 96 multiplications.</p>

<h2>Some final thoughts</h2>

<p>One of the takeaways is that even an implementation passing all test vectors given by a spec doesn&rsquo;t need to be correct. That is not too surprising, spec authors can&rsquo;t possibly predict edge cases from implementation approaches they haven&rsquo;t thought about.</p>

<p>Using formal verification as part of the review process was definitely a wise decision. We likely saved hours of debugging intermittently failing connections, or random interoperability problems reported by early testers. I&rsquo;m confident this wouldn&rsquo;t have made it much further down the release line.</p>

<p>We of course added an extra test that covers that specific flaw but the next step definitely should be proper CI integration. The Cryptol code has already been written and there is no reason to not run it on every push. Verifying the full GHASH implementation would be ideal. The Cryptol code is almost trivial:</p>

<figure class='code'><div class="highlight"><pre><span class="err">ghash</span> <span class="err">:</span> <span class="err">[128]</span> <span class="err">-&gt;</span> <span class="err">[128]</span> <span class="err">-&gt;</span> <span class="err">[128]</span> <span class="err">-&gt;</span> <span class="err">([64],</span> <span class="err">[64])</span>
<span class="err">ghash</span> <span class="err">h</span> <span class="err">x</span> <span class="err">buf</span> <span class="err">=</span> <span class="err">(</span><span class="k">take</span><span class="err">`{64}</span> <span class="err">res,</span> <span class="k">drop</span><span class="err">`{64}</span> <span class="err">res)</span>
    <span class="k">where</span> <span class="err">prod</span> <span class="err">=</span> <span class="k">pmod</span> <span class="err">(</span><span class="k">pmult</span> <span class="err">(</span><span class="k">reverse</span> <span class="err">h)</span> <span class="err">xor)</span> <span class="err">&lt;|x^^128</span> <span class="err">+</span> <span class="err">x^^7</span> <span class="err">+</span> <span class="err">x^^2</span> <span class="err">+</span> <span class="err">x</span> <span class="err">+</span> <span class="err">1|&gt;</span>
          <span class="err">xor</span> <span class="err">=</span> <span class="err">(</span><span class="k">reverse</span> <span class="err">x)</span> <span class="err">^</span> <span class="err">(</span><span class="k">reverse</span> <span class="err">buf)</span>
          <span class="err">res</span> <span class="err">=</span> <span class="k">reverse</span> <span class="err">prod</span>
</pre></div></figure>


<p>Proving the multiplication of two 128-bit numbers for a 256-bit product will unfortunately take a very very long time, or maybe not finish at all. Even if it finished after a few days that&rsquo;s not something you want to automatically run on every push. Running it manually every time the code is touched might be an option though.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[The Future of Session Resumption]]></title>
    <link href="https://timtaubert.de/blog/2017/02/the-future-of-session-resumption/"/>
    <updated>2017-02-15T18:00:00+01:00</updated>
    <id>https://timtaubert.de/blog/2017/02/the-future-of-session-resumption</id>
    <content type="html"><![CDATA[<p>A while ago I wrote about the <a href="https://timtaubert.de/blog/2014/11/the-sad-state-of-server-side-tls-session-resumption-implementations/">state of server-side session resumption implementations</a> in popular web servers using OpenSSL. Neither Apache, nor Nginx or HAproxy purged stale entries from the session cache or rotated session tickets automatically, potentially harming forward secrecy of resumed TLS session.</p>

<p>Enabling session resumption is an important tool for speeding up HTTPS websites, especially in a pre-HTTP/2 world where a client may have to open concurrent connections to the same host to quickly render a page. Subresource requests would ideally resume the session that for example a <code>GET / HTTP/1.1</code> request started.</p>

<p>Let&rsquo;s take a look at what has changed in over two years, and whether configuring session resumption securely has gotten any easier. With the TLS 1.3 spec about to be finalized I will show what the future holds and how these issues were addressed by the WG.</p>

<h2>Did web servers react?</h2>

<p>No, not as far as I&rsquo;m aware. None of the three web servers mentioned above has taken steps to make it easier to properly configure session resumption. But to be fair, OpenSSL didn&rsquo;t add any new APIs or options to help them either.</p>

<p>All popular TLS 1.2 web servers still don&rsquo;t evict cache entries when they expire, keeping them around until a client tries to resume &mdash; for performance or ease of implementation. They generate a session ticket key at startup and will never automatically rotate it so that admins have to manually reload server configs and provide new keys.</p>

<h2>The Caddy web server</h2>

<p>I want to seize the chance and positively highlight the <a href="https://caddyserver.com/">Caddy</a> web server, a relative newcomer with the advantage of not having any historical baggage, that enables and configures HTTPS by default, including <a href="https://caddyserver.com/docs/automatic-https">automatically acquiring and renewing certificates</a>.</p>

<p>Version 0.8.3 introduced <a href="https://github.com/wmark/caddy/commit/29235390dca843cb50a10bc104565cbeef981586">automatic session ticket key rotation</a>, thereby making session tickets mostly forward secure by replacing the key every ~10 hours. Session cache entries though aren&rsquo;t evicted until access just like with the other web servers.</p>

<p>But even for &ldquo;traditional&rdquo; web servers all is not lost. The TLS working group has known about the shortcomings of session resumption for a while and addresses those with the next version of TLS.</p>

<h2>1-RTT handshakes by default</h2>

<p>One of the many great things about <a href="https://timtaubert.de/blog/2015/11/more-privacy-less-latency-improved-handshakes-in-tls-13/">TLS 1.3 handshakes</a> is that most connections should take only a single round-trip to establish. The client sends one or more <code>KeyShareEntry</code> values with the <code>ClientHello</code>, and the server responds with a single <code>KeyShareEntry</code> for a key exchange with ephemeral keys.</p>

<p>If the client sends no or only unsupported groups, the server will send a <code>HelloRetryRequest</code> message with a <code>NamedGroup</code> selected from the ones supported by the client. The connection will fall back to two round-trips.</p>

<p>That means you&rsquo;re automatically covered if you enable session resumption only to reduce network latency, a normal handshake is as fast as 1-RTT resumption in TLS 1.2. If you&rsquo;re worried about computational overhead from certificate authentication and key exchange, that still might be a good reason to abbreviate handshakes.</p>

<h2>Pre-shared keys in TLS 1.3</h2>

<p>Session IDs and session tickets are obsolete since TLS 1.3. They&rsquo;ve been replaced by a more generic <a href="https://tlswg.github.io/tls13-spec/#rfc.section.2.2">PSK mechanism</a> that allows resuming a session with a previously established shared secret key.</p>

<p>Instead of an ID or a ticket, the client will send an opaque blob it received from the server after a successful handshake in a prior session. That blob might either be an ID pointing to an entry in the server&rsquo;s session cache, or a session ticket encrypted with a key known only to the server.</p>

<figure class='code'><div class="highlight"><pre><span class="k">enum</span> <span class="p">{</span> <span class="n">psk_ke</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span> <span class="n">psk_dhe_ke</span><span class="p">(</span><span class="mi">1</span><span class="p">),</span> <span class="p">(</span><span class="mi">255</span><span class="p">)</span> <span class="p">}</span> <span class="n">PskKeyExchangeMode</span><span class="p">;</span>

<span class="k">struct</span> <span class="p">{</span>
   <span class="n">PskKeyExchangeMode</span> <span class="n">ke_modes</span><span class="o">&lt;</span><span class="mf">1..255</span><span class="o">&gt;</span><span class="p">;</span>
<span class="p">}</span> <span class="n">PskKeyExchangeModes</span><span class="p">;</span>
</pre></div></figure>


<p>Two PSK key exchange modes are defined, <code>psk_ke</code> and <code>psk_dhe_ke</code>. The first signals a key exchange using a previously shared key, it derives a new master secret from only the PSK and nonces. This basically is as (in)secure as session resumption in TLS 1.2 if the server never rotates keys or discards cache entries long after they expired.</p>

<p>The second <code>psk_dhe_ke</code> mode additionally incorporates a key agreed upon using ephemeral Diffie-Hellman, thereby making it forward secure. By mixing a shared (EC)DHE key into the derived master secret, an attacker can no longer pull an entry out of the cache, or steal ticket keys, to recover the plaintext of past resumed sessions.</p>

<p>Note that 0-RTT data cannot be protected by the DHE secret, the early traffic secret is established without any input from the server and thus derived from the PSK only.</p>

<h2>TLS 1.2 is surely here to stay</h2>

<p>In theory, there should be no valid reason for a web client to be able to complete a TLS 1.3 handshake but not support <code>psk_dhe_ke</code>, as ephemeral Diffie-Hellman key exchanges are mandatory. An internal application talking TLS between peers would likely be a legitimate case for not supporting DHE.</p>

<p>But also for TLS 1.3 it might make sense to properly configure session ticket key rotation and cache turnover in case the odd client supports only <code>psk_ke</code>. It still makes sense especially for TLS 1.2, it will be around for probably longer than we wish and imagine.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Simple Cryptol Specifications]]></title>
    <link href="https://timtaubert.de/blog/2017/02/simple-cryptol-specifications/"/>
    <updated>2017-02-07T16:00:00+01:00</updated>
    <id>https://timtaubert.de/blog/2017/02/simple-cryptol-specifications</id>
    <content type="html"><![CDATA[<p>In the <a href="https://timtaubert.de/blog/2017/01/equivalence-proofs-with-saw/">previous post</a> I showed how to prove equivalence of two different implementations of the same algorithm. This post will cover writing an algorithm specification in <a href="http://cryptol.net/">Cryptol</a> to prove the correctness of a constant-time C/C++ implementation.</p>

<p>Apart from rather simple Cryptol I&rsquo;m also going to introduce <a href="http://saw.galois.com/">SAW</a>&rsquo;s <code>llvm_verify</code> function that allows much more complex verification. We need this as our function will not only take scalar inputs but also store the result of the computation using pointer arguments.</p>

<h2>Constant-time multiplication</h2>

<p>Part 1 dealt with addition, in part 2 we&rsquo;re going to look at multiplication. Let&rsquo;s implement a function <code>mul(a, b, *hi, *lo)</code> that multiplies <code>a</code> and <code>b</code>, and stores the eight most significant bits of the product in <code>*hi</code>, and the eight LSBs in <code>*lo</code>.</p>

<p>This time we&rsquo;ll make it run in constant time right away and won&rsquo;t bother implementing a trivial version first. Instead, we will write a Cryptol specification to verify LLVM bitcode afterwards &mdash; you will be amazed how simple that is.</p>

<h3>Some helper functions</h3>

<p>The first two functions of our C/C++ implementation will seem familiar if you&rsquo;ve read the previous part of the series. <code>msb</code> hasn&rsquo;t changed, and <code>ge</code> is the negated version of <code>lt</code>. <code>nz</code> returns <code>0xff</code> if the given argument <code>x</code> is non-zero, <code>0</code> otherwise.</p>

<figure class='code'><figcaption><span>cmul.c</span><a href='https://gist.github.com/ttaubert/c742ba7adf040e14ff21e111a929f5b8#file-cmul-c'>[gist.github.com/ttaubert/c742ba7adf040e14ff21e111a929f5b8#file-cmul-c] </a></figcaption><div class="highlight"><pre><span class="c1">// 0xff if MSB(x) = 1 else 0x00</span>
<span class="k">uint8_t</span> <span class="nf">msb</span><span class="p">(</span><span class="k">uint8_t</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">return</span> <span class="mi">0</span> <span class="o">-</span> <span class="p">(</span><span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="p">(</span><span class="mi">8</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">));</span>
<span class="p">}</span>

<span class="c1">// 0xff if a &gt;= b else 0x00</span>
<span class="k">uint8_t</span> <span class="nf">ge</span><span class="p">(</span><span class="k">uint8_t</span> <span class="n">a</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">b</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">return</span> <span class="o">~</span><span class="n">msb</span><span class="p">(</span><span class="n">a</span> <span class="o">^</span> <span class="p">((</span><span class="n">a</span> <span class="o">^</span> <span class="n">b</span><span class="p">)</span> <span class="o">|</span> <span class="p">((</span><span class="n">a</span> <span class="o">-</span> <span class="n">b</span><span class="p">)</span> <span class="o">^</span> <span class="n">b</span><span class="p">)));</span>
<span class="p">}</span>

<span class="c1">// 0xff if x &gt; 0 else 0x00</span>
<span class="k">uint8_t</span> <span class="nf">nz</span><span class="p">(</span><span class="k">uint8_t</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">return</span> <span class="o">~</span><span class="n">msb</span><span class="p">(</span><span class="o">~</span><span class="n">x</span> <span class="o">&amp;</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="mi">1</span><span class="p">));</span>
<span class="p">}</span>

<span class="k">uint8_t</span> <span class="nf">add</span><span class="p">(</span><span class="k">uint8_t</span> <span class="n">a</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">b</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="o">*</span><span class="n">carry</span><span class="p">)</span> <span class="p">{</span>
  <span class="o">*</span><span class="n">carry</span> <span class="o">=</span> <span class="n">msb</span><span class="p">(</span><span class="n">ge</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="mi">0</span> <span class="o">-</span> <span class="n">b</span><span class="p">)</span> <span class="o">&amp;</span> <span class="n">nz</span><span class="p">(</span><span class="n">b</span><span class="p">))</span> <span class="o">&amp;</span> <span class="mi">1</span><span class="p">;</span>
  <span class="k">return</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="p">;</span>
<span class="p">}</span>
</pre></div></figure>


<p>Our <code>add</code> function that previously dealt with overflows by capping at <code>UINT8_MAX</code> is a little more mature now and will set <code>*carry = 1</code> when an overflow occurs.</p>

<h3>The core of the algorithm</h3>

<p><code>mul(a, b, *hi, *lo)</code>, using all the helper functions we defined above, implements standard long multiplication, i.e. four multiplications per function call. We split the two 8-bit arguments into two 4-bit halves, multiply and add a few times, and then store two 8-bit results at the addresses pointed to by <code>hi</code> and <code>lo</code>.</p>

<figure class='code'><figcaption><span>cmul.c</span><a href='https://gist.github.com/ttaubert/c742ba7adf040e14ff21e111a929f5b8#file-cmul-c'>[gist.github.com/ttaubert/c742ba7adf040e14ff21e111a929f5b8#file-cmul-c] </a></figcaption><div class="highlight"><pre><span class="k">void</span> <span class="nf">mul</span><span class="p">(</span><span class="k">uint8_t</span> <span class="n">a</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">b</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="o">*</span><span class="n">hi</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="o">*</span><span class="n">lo</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">uint8_t</span> <span class="n">a1</span> <span class="o">=</span> <span class="n">a</span> <span class="o">&gt;&gt;</span> <span class="mi">4</span><span class="p">,</span> <span class="n">a0</span> <span class="o">=</span> <span class="n">a</span> <span class="o">&amp;</span> <span class="mh">0xf</span><span class="p">;</span>
  <span class="k">uint8_t</span> <span class="n">b1</span> <span class="o">=</span> <span class="n">b</span> <span class="o">&gt;&gt;</span> <span class="mi">4</span><span class="p">,</span> <span class="n">b0</span> <span class="o">=</span> <span class="n">b</span> <span class="o">&amp;</span> <span class="mh">0xf</span><span class="p">;</span>
  <span class="k">uint8_t</span> <span class="n">z0</span> <span class="o">=</span> <span class="n">a0</span> <span class="o">*</span> <span class="n">b0</span><span class="p">;</span>
  <span class="k">uint8_t</span> <span class="n">z2</span> <span class="o">=</span> <span class="n">a1</span> <span class="o">*</span> <span class="n">b1</span><span class="p">;</span>

  <span class="k">uint8_t</span> <span class="n">z1</span><span class="p">,</span> <span class="n">z1carry</span><span class="p">,</span> <span class="n">carry</span><span class="p">,</span> <span class="n">trash</span><span class="p">;</span>
  <span class="n">z1</span> <span class="o">=</span> <span class="n">add</span><span class="p">(</span><span class="n">a0</span> <span class="o">*</span> <span class="n">b1</span><span class="p">,</span> <span class="n">a1</span> <span class="o">*</span> <span class="n">b0</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">z1carry</span><span class="p">);</span>
  <span class="o">*</span><span class="n">lo</span> <span class="o">=</span> <span class="n">add</span><span class="p">(</span><span class="n">z1</span> <span class="o">&lt;&lt;</span> <span class="mi">4</span><span class="p">,</span> <span class="n">z0</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">carry</span><span class="p">);</span>
  <span class="o">*</span><span class="n">hi</span> <span class="o">=</span> <span class="n">add</span><span class="p">(</span><span class="n">z2</span><span class="p">,</span> <span class="p">(</span><span class="n">z1</span> <span class="o">&gt;&gt;</span> <span class="mi">4</span><span class="p">)</span> <span class="o">+</span> <span class="n">carry</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">trash</span><span class="p">);</span>
  <span class="o">*</span><span class="n">hi</span> <span class="o">=</span> <span class="n">add</span><span class="p">(</span><span class="o">*</span><span class="n">hi</span><span class="p">,</span> <span class="n">z1carry</span> <span class="o">&lt;&lt;</span> <span class="mi">4</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">trash</span><span class="p">);</span>
<span class="p">}</span>
</pre></div></figure>


<p>It&rsquo;s relatively easy to see that <code>a * b</code> can be rewritten as <code>(a1 * 2^4 + a0) * (b1 * 2^4 + b0)</code>, all four variables being 4-bit integers. After multiplying and rearranging you&rsquo;ll get an equation that&rsquo;s very similar to <code>mul</code> above. Here&rsquo;s a <a href="http://people.mpi-inf.mpg.de/~mehlhorn/ftp/chapter2A-en.pdf">good introduction</a> to computing with long integers if you want to know more.</p>

<figure class='code'><div class="highlight"><pre>$ clang -c -emit-llvm -o cmul.bc cmul.c
</pre></div></figure>


<p>Compile the code to LLVM bitcode as before so that we can load it into SAW later.</p>

<h2>The Cryptol specification</h2>

<p>To automate verification we&rsquo;ll again write a SAW script. It will contain the necessary verification commands and details, as well as a Cryptol specification.</p>

<p>The specification doesn&rsquo;t need to be constant-time, all it needs to be is correct and as simple as possible. We declare a function <code>mul</code> taking two 8-bit integers and returning a tuple containing two 8-bit integers. Read the notation <code>[8]</code> as &ldquo;sequence of 8 bits&rdquo;.</p>

<figure class='code'><figcaption><span>cmul.saw</span><a href='https://gist.github.com/ttaubert/c742ba7adf040e14ff21e111a929f5b8#file-cmul-saw'>[gist.github.com/ttaubert/c742ba7adf040e14ff21e111a929f5b8#file-cmul-saw] </a></figcaption><div class="highlight"><pre><span class="err">m</span> <span class="err">&lt;-</span> <span class="k">llvm_load_module</span> <span class="s2">&quot;cmul.bc&quot;</span><span class="err">;</span>

<span class="k">let</span> <span class="err">{{</span>
  <span class="err">mul</span> <span class="err">:</span> <span class="err">[8]</span> <span class="err">-&gt;</span> <span class="err">[8]</span> <span class="err">-&gt;</span> <span class="err">([8],</span> <span class="err">[8])</span>
  <span class="err">mul</span> <span class="err">a</span> <span class="err">b</span> <span class="err">=</span> <span class="err">(</span><span class="k">take</span><span class="err">`{8}</span> <span class="err">prod,</span> <span class="k">drop</span><span class="err">`{8}</span> <span class="err">prod)</span>
      <span class="k">where</span> <span class="err">prod</span> <span class="err">=</span> <span class="err">(pad</span> <span class="err">a)</span> <span class="err">*</span> <span class="err">(pad</span> <span class="err">b)</span>
            <span class="err">pad</span> <span class="err">x</span> <span class="err">=</span> <span class="k">zero</span> <span class="err">#</span> <span class="err">x</span>
<span class="err">}};</span>
</pre></div></figure>


<p>The built-in function <code>take`{n} x</code> returns a sequence with only the first <code>n</code> items of <code>x</code>. <code>drop`{n} x</code> returns sequence <code>x</code> without the first <code>n</code> items. <code>zero</code> is a special value that has a number of use cases, here it represents a flexible sequence of all zero bits. <code>#</code> is the append operator for sequences.</p>

<p>The first line of the definition gives the return value, a tuple with the first and the last 8 bits of <code>prod</code>. The Cryptol type system can automatically infer that the variable <code>prod</code> must hold a 16-bit sequence if the result of the <code>take`{8}</code> and <code>drop`{8}</code> function calls is a sequence of 8 bits each.</p>

<p><code>prod</code> is the result of multiplying the zero-padded arguments <code>a</code> and <code>b</code>. <code>zero # x</code> appends <code>x</code> to 8 zero bits, and that number is again determined by the type system. If you want to learn more about the language, take a look at <a href="http://www.cryptol.net/files/ProgrammingCryptol.pdf">Programming Cryptol</a>.</p>

<p>That&rsquo;s about as simple as it gets. We multiply two 8-bit integers and out comes a 16-bit integer, split into two halves. Now let&rsquo;s use the specification to verify our constant-time implementation.</p>

<h2>SAW&rsquo;s llvm_verify function</h2>

<p>We will add LLVM SAW instructions to the same file that contains the Cryptol code from above. The <code>llvm_verify</code> call here takes module <code>m</code>, extracts the symbol <code>"mul"</code>, and uses the body given after <code>do</code> for verification.</p>

<p>We need to declare all symbolic inputs as given by our C/C++ implementation. With <code>llvm_var</code> we tell SAW that <code>"a"</code> and <code>"b"</code> are 8-bit integer arguments, and map those to the SAW variables <code>a</code> and <code>b</code>.</p>

<p>The arguments <code>"hi"</code> and <code>"lo"</code> are declared as pointers to 8-bit integers using <code>llvm_ptr</code>. And because we want to dereference the pointers and refer to their values later we declare <code>"*hi"</code> and <code>"*lo"</code> as 8-bit integers too.</p>

<figure class='code'><figcaption><span>cmul.saw</span><a href='https://gist.github.com/ttaubert/c742ba7adf040e14ff21e111a929f5b8#file-cmul-saw'>[gist.github.com/ttaubert/c742ba7adf040e14ff21e111a929f5b8#file-cmul-saw] </a></figcaption><div class="highlight"><pre><span class="k">llvm_verify</span> <span class="err">m</span> <span class="s2">&quot;mul&quot;</span> <span class="err">[]</span> <span class="k">do</span> <span class="err">{</span>
  <span class="err">a</span> <span class="err">&lt;-</span> <span class="k">llvm_var</span> <span class="s2">&quot;a&quot;</span> <span class="err">(</span><span class="k">llvm_int</span> <span class="err">8);</span>
  <span class="err">b</span> <span class="err">&lt;-</span> <span class="k">llvm_var</span> <span class="s2">&quot;b&quot;</span> <span class="err">(</span><span class="k">llvm_int</span> <span class="err">8);</span>

  <span class="k">llvm_ptr</span> <span class="s2">&quot;hi&quot;</span> <span class="err">(</span><span class="k">llvm_int</span> <span class="err">8);</span>
  <span class="err">hi</span> <span class="err">&lt;-</span> <span class="k">llvm_var</span> <span class="s2">&quot;*hi&quot;</span> <span class="err">(</span><span class="k">llvm_int</span> <span class="err">8);</span>
  <span class="k">llvm_ptr</span> <span class="s2">&quot;lo&quot;</span> <span class="err">(</span><span class="k">llvm_int</span> <span class="err">8);</span>
  <span class="err">lo</span> <span class="err">&lt;-</span> <span class="k">llvm_var</span> <span class="s2">&quot;*lo&quot;</span> <span class="err">(</span><span class="k">llvm_int</span> <span class="err">8);</span>

  <span class="k">let</span> <span class="err">res</span> <span class="err">=</span> <span class="err">{{</span> <span class="err">mul</span> <span class="err">a</span> <span class="err">b</span> <span class="err">}};</span>
  <span class="k">llvm_ensure_eq</span> <span class="s2">&quot;*hi&quot;</span> <span class="err">{{</span> <span class="err">res.0</span> <span class="err">}};</span>
  <span class="k">llvm_ensure_eq</span> <span class="s2">&quot;*lo&quot;</span> <span class="err">{{</span> <span class="err">res.1</span> <span class="err">}};</span>

  <span class="k">llvm_verify_tactic</span> <span class="err">abc;</span>
<span class="err">};</span>
</pre></div></figure>


<p>We specify no constraints for any of the arguments and expect the verification to consider all possible inputs. I will talk a bit more about such constraints and how these are useful in a later post.</p>

<p>With <code>llvm_ensure_eq</code> we tell SAW what values we expect <em>after</em> symbolic execution. We expect <code>"*hi"</code> to be equal to the first 8-bit integer element of the tuple returned by <code>mul</code>, and <code>"*lo"</code> to be equal to the second 8-bit integer.</p>

<p><code>llvm_verify_tactic</code> chooses UC Berkely&rsquo;s ABC tool again and off we go.</p>

<h2>Verification with SAW</h2>

<p>Again, make sure you have <code>saw</code> and <code>z3</code> in your <code>$PATH</code>. If you haven&rsquo;t downloaded the binaries yet, take a look at the early sections of the <a href="https://timtaubert.de/blog/2017/01/equivalence-proofs-with-saw/">previous post</a>.</p>

<figure class='code'><div class="highlight"><pre>$ saw cmul.saw
Loading module Cryptol
Loading file &quot;cmul.saw&quot;
Successfully verified @mul
</pre></div></figure>


<p><em>Successfully verified @mul.</em> SAW tells us that for all possible inputs <code>a</code> and <code>b</code>, and actually <code>hi</code> and <code>lo</code> too, our constant-time C/C++ implementation behaves as stated by the SAW verification script and is thereby equivalent to our Cryptol specification.</p>

<h2>Next: Finding bugs and more LLVM commands</h2>

<p>In <a href="https://timtaubert.de/blog/2017/06/verified-binary-multiplication-for-ghash/">the next post</a> I&rsquo;m going to introduce and write more Cryptol, talk about specifying constraints on LLVM arguments and return values, and provide an example for finding bugs in a real-world codebase.</p>

<p>And while you wait, why not try your hand at optimizing <code>mul</code> to use only three instead of four multiplications with the <a href="https://en.wikipedia.org/wiki/Karatsuba_algorithm">Karatsuba algorithm</a>? You can reuse the above Cryptol specification to verify you got it right.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Equivalence Proofs With SAW]]></title>
    <link href="https://timtaubert.de/blog/2017/01/equivalence-proofs-with-saw/"/>
    <updated>2017-01-26T16:00:00+01:00</updated>
    <id>https://timtaubert.de/blog/2017/01/equivalence-proofs-with-saw</id>
    <content type="html"><![CDATA[<p>This is the first of a small series of posts that will scratch the surface of the world of formal verification. I will mainly use <a href="http://saw.galois.com/">SAW</a>, the Software Analysis Workbench, and <a href="http://cryptol.net/">Cryptol</a>, a DSL for specifying crypto algorithms. Both are powerful tools for verifying C, C++, and even Rust code, i.e. almost anything that compiles to LLVM bitcode.</p>

<p>Verifying the implementation of a specific algorithm not only helps you weed out bugs early, it lets you <em>prove</em> that your code is correct and contains no further bugs - assuming you made no mistakes writing your algorithm specification in the first place.</p>

<p>Even if you don&rsquo;t know a lot about formal verification, or anything, it&rsquo;s easy to get started experimenting with Cryptol and SAW, and get a glimpse of what&rsquo;s possible.</p>

<p>In this first post I&rsquo;ll show how you can use SAW to prove equality of multiple implementations of the same algorithm, potentially written in different languages.</p>

<h2>Setting up your workspace</h2>

<p>To get started, download the latest SAW and Z3, as well as clang 3.8:</p>

<ul>
<li>SAW: <a href="http://saw.galois.com/builds/nightly/">http://saw.galois.com/builds/nightly/</a></li>
<li>Z3: <a href="https://github.com/Z3Prover/z3/releases">https://github.com/Z3Prover/z3/releases</a></li>
<li>LLVM 3.8: <a href="http://releases.llvm.org/download.html">http://releases.llvm.org/download.html</a></li>
</ul>


<p>You need clang 3.8, later versions seem currently not supported. Xcode&rsquo;s latest clang would (probably) work for this small example but give you headaches with more advanced verification later on.</p>

<p>Unzip and copy the tools someplace you like, just don&rsquo;t forget to update your <code>$PATH</code> environment variable. Especially if you already have clang on your system.</p>

<p>Let&rsquo;s start with a simple example.</p>

<h2>Unsigned addition without overflow</h2>

<p>We define an addition function <code>add(a, b)</code> that takes two <code>uint8_t</code> arguments and returns a <code>uint8_t</code>. It deals with overflows so that <code>123 + 200 = 255</code>, that is it caps the number at <code>UINT8_MAX</code> instead of wrapping around.</p>

<figure class='code'><figcaption><span>add.c</span><a href='https://gist.github.com/ttaubert/ecf5b710e849ddfefa81c14a70631eec#file-add-c'>[gist.github.com/ttaubert/ecf5b710e849ddfefa81c14a70631eec#file-add-c] </a></figcaption><div class="highlight"><pre><span class="k">uint8_t</span> <span class="nf">add</span><span class="p">(</span><span class="k">uint8_t</span> <span class="n">a</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">b</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">uint8_t</span> <span class="n">sum</span> <span class="o">=</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="p">;</span>
  <span class="k">return</span> <span class="n">sum</span> <span class="o">&lt;</span> <span class="n">a</span> <span class="o">?</span> <span class="k">UINT8_MAX</span> <span class="o">:</span> <span class="n">sum</span><span class="p">;</span>
<span class="p">}</span>
</pre></div></figure>


<p>That&rsquo;s such a trivial function that we probably wouldn&rsquo;t write a test for it. If it compiles we&rsquo;re somewhat confident it&rsquo;ll work just fine:</p>

<figure class='code'><div class="highlight"><pre>$ clang -c -emit-llvm -o add.bc add.c
</pre></div></figure>


<p>Note that the above command will not produce a binary or shared library, but instead instruct clang to emit LLVM bitcode and store it in <code>add.bc</code>. We&rsquo;ll feed this into SAW in a minute.</p>

<h2>Constant-time addition</h2>

<p>Now imagine that we actually want to use <code>add</code> as part of a bignum library to implement cryptographic algorithms, and thus want it to have a <a href="https://cryptocoding.net/index.php/Coding_rules#Avoid_branchings_controlled_by_secret_data">constant runtime</a>, independent of the arguments given. Here&rsquo;s how you could do this:</p>

<figure class='code'><figcaption><span>cadd.c</span><a href='https://gist.github.com/ttaubert/ecf5b710e849ddfefa81c14a70631eec#file-cadd-c'>[gist.github.com/ttaubert/ecf5b710e849ddfefa81c14a70631eec#file-cadd-c] </a></figcaption><div class="highlight"><pre><span class="c1">// 0xff if MSB(x) = 1 else 0x00</span>
<span class="k">uint8_t</span> <span class="nf">msb</span><span class="p">(</span><span class="k">uint8_t</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">return</span> <span class="mi">0</span> <span class="o">-</span> <span class="p">(</span><span class="n">x</span> <span class="o">&gt;&gt;</span> <span class="p">(</span><span class="mi">8</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">));</span>
<span class="p">}</span>

<span class="c1">// 0xff if a &lt; b else 0x00</span>
<span class="k">uint8_t</span> <span class="nf">lt</span><span class="p">(</span><span class="k">uint8_t</span> <span class="n">a</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">b</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">return</span> <span class="n">msb</span><span class="p">(</span><span class="n">a</span> <span class="o">^</span> <span class="p">((</span><span class="n">a</span> <span class="o">^</span> <span class="n">b</span><span class="p">)</span> <span class="o">|</span> <span class="p">((</span><span class="n">a</span> <span class="o">-</span> <span class="n">b</span><span class="p">)</span> <span class="o">^</span> <span class="n">b</span><span class="p">)));</span>
<span class="p">}</span>

<span class="k">uint8_t</span> <span class="nf">add</span><span class="p">(</span><span class="k">uint8_t</span> <span class="n">a</span><span class="p">,</span> <span class="k">uint8_t</span> <span class="n">b</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">return</span> <span class="p">(</span><span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="p">)</span> <span class="o">|</span> <span class="n">lt</span><span class="p">(</span><span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="p">,</span> <span class="n">a</span><span class="p">);</span>
<span class="p">}</span>
</pre></div></figure>


<p>If <code>a + b &lt; a</code>, i.e. the addition overflows, <code>lt(a + b, a)</code> will return <code>0xff</code> and change the return value into <code>UINT8_MAX = 0xff</code>. Otherwise it returns <code>0</code> and the return value will simply be <code>a + b</code>. That&rsquo;s easy enough, but did we get <code>msb</code> and <code>lt</code> right?</p>

<figure class='code'><div class="highlight"><pre>$ clang -c -emit-llvm -o cadd.bc cadd.c
</pre></div></figure>


<p>Let&rsquo;s compile the constant-time <code>add</code> function to LLVM bitcode too and use SAW to prove that both our addition functions are equivalent to each other.</p>

<h2>Writing the SAW script</h2>

<p>SAW executes scripts to automate theorem proving, and we need to write one in order to check that our two implementations are equivalent. The first thing our script does is load the LLVM bitcode from the files we created earlier, <code>add.bc</code> and <code>cadd.bc</code>, as modules into the variables <code>m1</code> and <code>m2</code>, respectively.</p>

<figure class='code'><figcaption><span>add.saw</span><a href='https://gist.github.com/ttaubert/ecf5b710e849ddfefa81c14a70631eec#file-add-saw'>[gist.github.com/ttaubert/ecf5b710e849ddfefa81c14a70631eec#file-add-saw] </a></figcaption><div class="highlight"><pre><span class="err">m1</span> <span class="err">&lt;-</span> <span class="k">llvm_load_module</span> <span class="s2">&quot;add.bc&quot;</span><span class="err">;</span>
<span class="err">m2</span> <span class="err">&lt;-</span> <span class="k">llvm_load_module</span> <span class="s2">&quot;cadd.bc&quot;</span><span class="err">;</span>

<span class="err">add</span> <span class="err">&lt;-</span> <span class="k">llvm_extract</span> <span class="err">m1</span> <span class="s2">&quot;add&quot;</span> <span class="k">llvm_pure</span><span class="err">;</span>
<span class="err">cadd</span> <span class="err">&lt;-</span> <span class="k">llvm_extract</span> <span class="err">m2</span> <span class="s2">&quot;add&quot;</span> <span class="k">llvm_pure</span><span class="err">;</span>

<span class="k">let</span> <span class="err">thm</span> <span class="err">=</span> <span class="err">{{</span> <span class="err">\x</span> <span class="err">y</span> <span class="err">-&gt;</span> <span class="err">add</span> <span class="err">x</span> <span class="err">y</span> <span class="err">==</span> <span class="err">cadd</span> <span class="err">x</span> <span class="err">y</span> <span class="err">}};</span>
<span class="k">prove_print</span> <span class="err">abc</span> <span class="err">thm;</span>
</pre></div></figure>


<p>Next, we&rsquo;ll extract the <code>add</code> functions defined in each of these modules and store them in <code>add</code> and <code>cadd</code>, the latter being our constant-time implementation. <code>llvm_pure</code> indicates that a function always returns the same result given the same arguments, and thus has no side-effects.</p>

<p>Last, we define a theorem <code>thm</code> stating that for all arguments <code>x</code> and <code>y</code> both functions have the same return value, that they are equivalent to each other. We choose to prove this theorem with the ABC tool from UC Berkeley.</p>

<p>We&rsquo;re all set now, time to actually prove something.</p>

<h2>Proving equivalence</h2>

<p>Make sure you have <code>saw</code> and <code>z3</code> in your <code>$PATH</code>. Run SAW and pass it the file we created in the previous section &mdash; it will execute the script and automatically prove our theorem.</p>

<figure class='code'><div class="highlight"><pre>$ saw add.saw
Loading module Cryptol
Loading file &quot;add.saw&quot;
Valid
</pre></div></figure>


<p><em>Valid</em>, that was easy. Maybe too easy. Would SAW even detect if we sneak a minor mistake into the program? Let&rsquo;s find out&hellip;</p>

<figure class='code'><div class="highlight"><pre> uint8_t lt(uint8_t a, uint8_t b) {
<span class="gd">-  return msb(a ^ ((a ^ b) | ((a - b) ^ b)));</span>
<span class="gi">+  return msb(a ^ ((a ^ b) | ((a + b) ^ b)));</span>
 }
</pre></div></figure>


<p>The diff above changes the behavior of <code>lt</code> just slightly, a bug that we could have introduced by accident. Let&rsquo;s run SAW again and see whether it spots it:</p>

<figure class='code'><div class="highlight"><pre>$ saw add.saw
Loading module Cryptol
Loading file &quot;add.saw&quot;
saw: user error (&quot;prove_print&quot; (add.saw:8:1):
prove: 1 unsolved subgoal(s)
Invalid: [x = 240, y = 0])
</pre></div></figure>


<p><em>Invalid</em>! The two functions disagree on the return value at <code>[x = 240, y = 0]</code>. SAW of course doesn&rsquo;t know which function is at fault, but we are confident enough in our reference implementation to know where to look.</p>

<p>I can&rsquo;t possibly explain how this all works in detail, but I can hopefully give you a rough idea. What SAW does is parse the LLVM bitcode and <a href="https://en.wikipedia.org/wiki/Symbolic_execution">symbolically execute</a> it on symbolic inputs to translate it into a circuit representation.</p>

<p>This circuit is then, together with our theorems, fed into a theorem prover. Z3 is an <a href="https://en.wikipedia.org/wiki/Automated_theorem_proving">automated theorem prover</a>, and ABC a tool for logic synthesis and verification; both are able to prove equality using automated reasoning.</p>

<h2>Next: Some Cryptol and more SAW</h2>

<p>In <a href="https://timtaubert.de/blog/2017/02/simple-cryptol-specifications/">the second post</a> I talk about verifying the implementation of a slightly more complex function, also written in C/C++, and show how you can use Cryptol to write a simple specification, as well as introduce more advanced SAW commands for verification.</p>

<p>If you found this interesting, play around with the examples above and come up with your own. Write a straightforward implementation of an algorithm that you can be certain to get right and then optimize it, make it constant-time, or change it in any other way and see how SAW behaves.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Notes on HACS 2017]]></title>
    <link href="https://timtaubert.de/blog/2017/01/notes-on-hacs-2017/"/>
    <updated>2017-01-17T15:00:00+01:00</updated>
    <id>https://timtaubert.de/blog/2017/01/notes-on-hacs-2017</id>
    <content type="html"><![CDATA[<p><a href="https://www.realworldcrypto.com/rwc2017/">Real World Crypto</a> is probably one of my favorite conferences. It&rsquo;s a fine mix of practical and theoretical talks, plus a bunch of great hallway, lunch, and dinner conversations. It was broadcasted live for the first time this year, and the talks are <a href="https://www.totalwebcasting.com/view/?func=VOFF&amp;id=columbia&amp;date=2017-01-04&amp;seq=1">available online</a>. But I&rsquo;m not going to talk more about RWC, <a href="https://www.netmeister.org/blog/rwc2017.html">others have</a> <a href="https://alxdavids.xyz/2017/01/13/notes-from-rwc2017/">covered it</a> <a href="https://www.cryptologie.net/article/380/real-world-crypto-2017-day-1/">perfectly</a>.</p>

<h2>The HACS workshop</h2>

<p>What I want to tell you about is a lesser-known event that took place right after RWC, called HACS - the High Assurance Crypto Software workshop. An intense, highly interactive two-day workshop in its second year, organized by Ben Laurie, Gilles Barthe, Peter Schwabe, Meredith Whittaker, and Trevor Perrin.</p>

<p>Its stated goal is to bring together crypto-implementers and verification people from open source, industry, and academia; introduce them and their projects to each other, and develop practical collaborations that improve verification of crypto code.</p>

<h2>The projects &amp; people</h2>

<p>The formal verification community was represented by projects such as <a href="http://www.mitls.org/">miTLS</a>, <a href="https://github.com/mitls/hacl-star">HACL*</a>, <a href="https://project-everest.github.io/">Project Everest</a>, <a href="https://github.com/Z3Prover/z3/">Z3</a>, <a href="https://people.cs.kuleuven.be/~bart.jacobs/nfm2011.pdf">VeriFast</a>, <a href="http://trust-in-soft.com/tis-interpreter/">tis-interpreter</a>, <a href="https://fdupress.net/files/ctverif.pdf">ct-verif</a>, <a href="http://cryptol.net/">Cryptol</a>/<a href="http://saw.galois.com/">SAW</a>, <a href="http://formal.iti.kit.edu/~klebanov/software/entroposcope/">Entroposcope</a>, and other formal verification and synthesis projects based on <a href="https://coq.inria.fr/">Coq</a> or <a href="https://fstar-lang.org/">F*</a>.</p>

<p>Crypto libraries were represented by one or multiple maintainers of <a href="https://github.com/openssl/openssl/">OpenSSL</a>, <a href="https://boringssl.googlesource.com/boringssl/">BoringSSL</a>, <a href="https://bouncycastle.org/">Bouncy Castle</a>, <a href="https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS">NSS</a>, <a href="https://bearssl.org/">BearSSL</a>, <a href="https://github.com/briansmith/ring">*ring*</a>, and <a href="https://github.com/awslabs/s2n">s2n</a>. Other invited projects included <a href="http://llvm.org/">LLVM</a>, <a href="https://www.torproject.org/">Tor</a>, <a href="https://chromium.googlesource.com/chromium/llvm-project/llvm/lib/Fuzzer">libFuzzer</a>, <a href="https://bitcoin.org/">BitCoin</a>, and <a href="https://whispersystems.org/">Signal</a>. <em>(I&rsquo;m probably missing a few, sorry.)</em></p>

<p>Additionally, there were some attendants not directly involved with any of the above projects but who are experts in formal verification or synthesis, constant-time implementation of crypto algorithms, fast arithmetic in assembler, elliptic curves, etc.</p>

<p>All in all, somewhere between 70 and 80 people.</p>

<h2>HACS - Day 1</h2>

<p>After short one-sentence introductions on early Saturday morning we immediately started with simultaneous round-table discussions, focused on topics such as &ldquo;<em>The state of crypto libraries</em>&rdquo;, &ldquo;<em>Challenges in implementing crypto libraries</em>&rdquo;, &ldquo;<em>Efficient fuzzing</em>&rdquo;, &ldquo;<em>TLS implementation woes</em>&rdquo;, &ldquo;<em>The LLVM ecosystem</em>&rdquo;, &ldquo;<em>Fast and constant-time low-level algorithm implementations</em>&rdquo;, &ldquo;<em>Formal verification/synthesis with Coq</em>&rdquo;, and others.</p>

<p>These discussions were hosted by a rotating set of people, not always leading by pure expertise, sometimes also moderating, asking questions, and making sure we stay on track. We did this until lunch, and continued to talk over food with the people we just met. For the rest of the day, discussions became longer and more focused.</p>

<p>By this point people slowly started to sense what it is they want to focus on this weekend. They got to meet most of the other attendants, found out about their skills, projects, and ideas; thought about possibilities for collaboration on projects for this weekend or the months to come.</p>

<p>In the evening we split into groups and went for dinner. Most people&rsquo;s brains were probably somewhat fried (as was mine) after hours of talking and discussing. Everyone was so engaged that you not once found the time to take out your laptop or phone, or had the desire to do so, which was great.</p>

<h2>HACS - Day 2</h2>

<p>The second day, early Sunday morning, continued much like the previous. We started off with a brainstorming session for what we think the group should be working on. The rest of the day was filled with long and focused discussion that were mostly a continuation from the day before.</p>

<p>A highlight of the day was the <em>skill sharing</em> session, where participants could propose a specific skill to share with others. If you didn&rsquo;t find something to share you could be one of the 50% of the group that gets to learn from others.</p>

<p>My lucky pick was Chris Hawblitzel from Microsoft Research, who did his best to explain to me (in about 45 minutes) how Z3 works, what its limitations are, and what higher-level languages exist that make it a little more usable. Thank you, Chris!</p>

<p>We ended the day with signing up for one or multiple projects for the last day.</p>

<h2>HACS - Day 3</h2>

<p>The third day of the workshop was optional, a hacking day with maybe roughly 50% attendance. Some folks took the chance to arrive a little later after two days of intense discussions and socializing. By now you knew most people&rsquo;s names, and you better did because no one cared to wear name tags anymore.</p>

<p>It was the time to get together with the people from the projects you signed up for and get your laptop out (if needed). I can&rsquo;t possibly remember all the things people worked on but here are a few examples:</p>

<ul>
<li>Verify DRBG implementations, various other crypto algorithms, and/or integrate synthesized implementations for different crypto libraries.</li>
<li>Brainstorm and experiment with a generic comparative fuzzing API for libFuzzer.</li>
<li>Come up with an ASCII representation for TLS records, similar to <a href="https://github.com/google/der-ascii">DER ASCII</a>, that could be used to write TLS implementation tests or feed fuzzers.</li>
<li>Start fuzzing projects like BearSSL and Tor. I do remember that at least BearSSL quickly found a tiny (~900 byte) buffer overflow :)</li>
</ul>


<h2>See you again next year?</h2>

<p>I want to thank all the organizers (and sponsors) for spending their time (or money) planning and hosting such a great event. It always pays off to bring communities closer together and foster collaboration between projects and individuals.</p>

<p>I got to meet dozens of highly intelligent and motivated people, and left with a much bigger sense of community. I&rsquo;m grateful to all the attendants that participated in discussions and projects, shared their skills, asked hard questions, and were always open to suggestions from others.</p>

<p>I hope to be invited again to future workshops and check in on the progress we&rsquo;ve made at improving the verification and quality assurance of crypto code across the ecosystem.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[TLS Version Intolerance]]></title>
    <link href="https://timtaubert.de/blog/2016/09/tls-version-intolerance/"/>
    <updated>2016-09-30T16:00:00+02:00</updated>
    <id>https://timtaubert.de/blog/2016/09/tls-version-intolerance</id>
    <content type="html"><![CDATA[<p>A few weeks ago I listened to Hanno Böck talk about
<a href="https://www.int21.de/slides/berlinsec-versionintolerance/">TLS version intolerance</a>
at the <a href="https://berlinsec.github.io/">Berlin AppSec &amp; Crypto Meetup</a>. He
explained how with TLS 1.3 just around the corner there again are growing
concerns about faulty TLS stacks found in HTTP servers, load balancers,
routers, firewalls, and similar software and devices.</p>

<p>I decided to dig a little deeper and will use this post to explain version
intolerance, how version fallbacks work and why they&rsquo;re insecure, as well as
describe the downgrade protection mechanisms available in TLS 1.2 and 1.3. It
will end with a look at version negotiation in TLS 1.3 and a proposal that
aims to prevent similar problems in the future.</p>

<h2>What is version intolerance?</h2>

<p>Every time a new TLS version is specified, browsers usually are the fastest to
implement and update their deployments. Most major browser vendors have a few
people involved in the standardization process to guide the standard and give
early feedback about implementation issues.</p>

<p>As soon as the spec is finished, and often far before that feat is done, clients
will have been equipped with support for the new TLS protocol version and happily
announce this to any server they connect to:</p>

<blockquote><p><strong>Client:</strong> Hi! The highest TLS version I support is 1.2.<br/>
<strong>Server:</strong> Hi! I too support TLS 1.2 so let&rsquo;s use that to communicate.<br/>
<em>[TLS 1.2 connection will be established.]</em></p></blockquote>

<p>In this case the highest TLS version supported by the client is 1.2, and so
the server picks it because it supports that as well. Let&rsquo;s see what happens
if the client supports 1.2 but the server does not:</p>

<blockquote><p><strong>Client:</strong> Hi! The highest TLS version I support is 1.2.<br/>
<strong>Server:</strong> Hi! I only support TLS 1.1 so let&rsquo;s use that to communicate.<br/>
<em>[TLS 1.1 connection will be established.]</em></p></blockquote>

<p>This too is how it should work if a client tries to connect with a protocol
version unknown to the server. Should the client insist on any specific version
and not agree with the one picked by the server it will have to terminate the
connection.</p>

<p>Unfortunately, there are a few servers and more devices out there that
implement TLS version negotiation incorrectly. The conversation might go
like this:</p>

<blockquote><p><strong>Client:</strong> Hi! The highest TLS version I support is 1.2.<br/>
<strong>Server:</strong> ALERT! I don&rsquo;t know that version. Handshake failure.<br/>
<em>[Connection will be terminated.]</em></p></blockquote>

<p>Or:</p>

<blockquote><p><strong>Client:</strong> Hi! The highest TLS version I support is 1.2.<br/>
<strong>Server:</strong> TCP FIN! I don&rsquo;t know that version.<br/>
<em>[Connection will be terminated.]</em></p></blockquote>

<p>Or even worse:</p>

<blockquote><p><strong>Client:</strong> Hi! The highest TLS version I support is 1.2.<br/>
<strong>Server:</strong> (I don&rsquo;t know this version so let&rsquo;s just not respond.)<br/>
<em>[Connection will hang.]</em></p></blockquote>

<p>The same can happen with the infamous F5 load balancer that can&rsquo;t handle
<code>ClientHello</code> messages with a length between 256 and 512 bytes. Other devices
abort the connection when receiving a large <code>ClientHello</code> split into multiple
TLS records. TLS 1.3 might actually cause more problems of this kind due to
more extensions and client key shares.</p>

<h2>What are version fallbacks?</h2>

<p>As browsers usually want to ship new TLS versions as soon as possible, more
than a decade ago vendors saw a need to prevent connection failures due to
version intolerance. The easy solution was to decrease the advertised version
number by one with every failed attempt:</p>

<blockquote><p><strong>Client:</strong> Hi! The highest TLS version I support is 1.2.<br/>
<strong>Server:</strong> ALERT! Handshake failure. (Or FIN. Or hang.)<br/>
<em>[TLS version fallback to 1.1.]</em><br/>
<strong>Client:</strong> Hi! The highest TLS version I support is 1.1.<br/>
<strong>Server:</strong> Hi! I support TLS 1.1 so let&rsquo;s use that to communicate.<br/>
<em>[TLS 1.1 connection will be established.]</em></p></blockquote>

<p>A client supporting everything from TLS 1.0 to TLS 1.2 would start trying to
establish a 1.2 connection, then a 1.1 connection, and if even that failed a
1.0 connection.</p>

<h2>Why are these insecure?</h2>

<p>What makes these fallbacks insecure is that the connection can be downgraded by
a MITM, by sending alerts or TCP packets to the client, or blocking packets
from the server. To the client this is indistinguishable from a network error.</p>

<p>The <a href="https://www.openssl.org/~bodo/ssl-poodle.pdf">POODLE</a> attack is one
example where an attacker abuses the version fallback to force an SSL 3.0
connection. In response to this browser vendors disabled version fallbacks to
SSL 3.0, and then SSL 3.0 entirely, to prevent even up-to-date clients from
being exploited. Insecure version fallback in browsers pretty much break the
actual version negotiation mechanisms.</p>

<p>Version fallbacks have been disabled since
<a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1084025">Firefox 37</a> and
<a href="https://www.chromestatus.com/feature/5685183936200704">Chrome 50</a>. Browser
telemetry data showed it was no longer necessary as after years, TLS 1.2 and
correct version negotiation was deployed widely enough.</p>

<h2>The TLS_FALLBACK_SCSV cipher suite</h2>

<p>You might wonder if there&rsquo;s a <em>secure</em> way to do version fallbacks, and other
people did so too. Adam Langley and Bodo Möller proposed a special cipher suite
in <a href="https://tools.ietf.org/html/rfc7507">RFC 7507</a> that would help a client
detect whether the downgrade was initiated by a MITM.</p>

<p>Whenever the client includes <code>TLS_FALLBACK_SCSV {0x56, 0x00}</code> in the list of
cipher suites it signals to the server that this is a repeated connection
attempt, but this time with a version lower than the highest it supports,
because previous attempts failed. If the server supports a higher version
than advertised by the client, it MUST abort the connection.</p>

<p>The drawback here however is that a client even if it implements fallback with
a Signaling Cipher Suite Value doesn&rsquo;t know the highest protocol version
supported by the server, and whether it implements a <code>TLS_FALLBACK_SCSV</code> check.
Common web servers will likely be updated faster than others, but router or
load balancer manufacturers might not deem it important enough to implement
and ship updates for.</p>

<h2>Signatures in TLS 1.2</h2>

<p>It&rsquo;s been long known to be problematic that signatures in TLS 1.2 don&rsquo;t cover
the list of cipher suites and other messages sent before server authentication.
They sign the ephemeral DH parameters sent by the server and include the
<code>*Hello.random</code> values as nonces to prevent replay attacks:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><pre>h = Hash(ClientHello.random + ServerHello.random + ServerParams)
</pre></div></figure>


<p>Signing at least the list of cipher suites would have helped prevent downgrade
attacks like <a href="https://freakattack.com/">FREAK</a> and <a href="https://weakdh.org/">Logjam</a>.
TLS 1.3 will sign all messages before server authentication, even though it makes
<a href="http://www.mitls.org/downloads/transcript-collisions.pdf">Transcript Collision Attacks</a>
somewhat easier to mount. With SHA-1 not allowed for signatures that will
hopefully not become a problem anytime soon.</p>

<h2>Downgrade Sentinels in TLS 1.3</h2>

<p>With neither the client version nor its cipher suites (for the SCSV) included
in the hash signed by the server&rsquo;s certificate in TLS 1.2, how do you secure
TLS 1.3 against downgrades like FREAK and Logjam? Stuff a special value into
<code>ServerHello.random</code>.</p>

<p>The TLS WG decided to put static values (sometimes called downgrade sentinels)
into the server&rsquo;s nonce sent with the <code>ServerHello</code> message. TLS 1.3 servers
responding to a <code>ClientHello</code> indicating a maximum supported version of TLS 1.2
MUST set the last eight bytes of the nonce to:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><pre>0x44 0x4F 0x57 0x4E 0x47 0x52 0x44 0x01
</pre></div></figure>


<p>If the client advertises a maximum supported version of TLS 1.1 or below the
server SHOULD set the last eight bytes of the nonce to:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><pre>0x44 0x4F 0x57 0x4E 0x47 0x52 0x44 0x00
</pre></div></figure>


<p>If not connecting with a downgraded version, a client MUST check whether the
server nonce ends with any of the two sentinels and in such a case abort the
connection. The TLS 1.3 spec here introduces an update to TLS 1.2 that requires
servers and clients to update their implementation.</p>

<p>Unfortunately, this downgrade protection relies on a <code>ServerKeyExchange</code>
message being sent and is thus of limited value. Static RSA key exchanges
are still valid in TLS 1.2, and unless the server admin disables all
non-forward-secure cipher suites the protection can be bypassed.</p>

<h2>The comeback of insecure fallbacks?</h2>

<p>Current measurements show that enabling TLS 1.3 by default would break a
significant fraction of TLS handshakes due to version intolerance. According to
Ivan Ristić, as of July 2016,
<a href="https://blog.qualys.com/ssllabs/2016/08/02/tls-version-intolerance-in-ssl-pulse">3.2% of servers from the SSL Pulse data set reject TLS 1.3 handshakes</a>.</p>

<p>This a very high number and would affect way too many people. Alas, with TLS
1.3 we have only limited downgrade protection for forward-secure cipher
suites. And that is assuming that most servers either support TLS 1.3 or
update their 1.2 implementations. <code>TLS_FALLBACK_SCSV</code>, if supported by the
server, will help as long as there are no attacks tampering with the list
of cipher suites.</p>

<p>The TLS working group has been thinking about how to handle intolerance without
bringing back version fallbacks, and there might be light at the end of the
tunnel.</p>

<h2>Version negotiation with extensions</h2>

<p>The next version of the proposed TLS 1.3 spec, draft 16, will introduce a new
version negotiation mechanism based on extensions. The current <code>ClientHello.version</code>
field will be frozen to TLS 1.2, i.e. <code>{3, 3}</code>, and renamed to <code>legacy_version</code>.
Any number greater than that MUST be ignored by servers.</p>

<p>To negotiate a TLS 1.3 connection the protocol now requires the client to send
a <code>supported_versions</code> extension. This is a list of versions the client supports,
in preference order, with the most preferred version first. Clients MUST send
this extension as servers are required to negotiate TLS 1.2 if it&rsquo;s not present.
Any version number unknown to the server MUST be ignored.</p>

<p>This still leaves potential problems with big <code>ClientHello</code> messages or
choking on unknown extensions unaddressed, but according to David Benjamin
<a href="https://www.ietf.org/mail-archive/web/tls/current/msg20679.html">the main problem is <code>ClientHello.version</code></a>.
We will hopefully be able to ship browsers that have TLS 1.3 enabled by default,
without bringing back insecure version fallbacks.</p>

<p>However, it&rsquo;s not unlikely that implementers will screw up even the new version
negotiation mechanism and we&rsquo;ll have similar problems in a few years down the
road.</p>

<h2>GREASE-ing the future</h2>

<p>David Benjamin, following Adam Langley&rsquo;s advice to
<a href="https://www.imperialviolet.org/2016/05/16/agility.html"><em>have one joint and keep it well oiled</em></a>,
proposed <a href="https://tools.ietf.org/html/draft-davidben-tls-grease-01">GREASE</a>
(Generate Random Extensions And Sustain Extensibility), a mechanism to prevent
extensibility failures in the TLS ecosystem.</p>

<p>The heart of the mechanism is to have clients inject &ldquo;unknown values&rdquo; into
places where capabilities are advertised by the client, and the best match
selected by the server. Servers MUST ignore unknown values to allow introducing
new capabilities to the ecosystem without breaking interoperability.</p>

<p>These values will be advertised pseudo-randomly to break misbehaving servers
early in the implementation process. Proposed injection points are cipher
suites, supported groups, extensions, and ALPN identifiers. Should the server
respond with a GREASE value selected in the <code>ServerHello</code> message the client
MUST abort the connection.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Continuous Integration for NSS]]></title>
    <link href="https://timtaubert.de/blog/2016/08/continuous-integration-for-nss/"/>
    <updated>2016-08-09T16:00:00+02:00</updated>
    <id>https://timtaubert.de/blog/2016/08/continuous-integration-for-nss</id>
    <content type="html"><![CDATA[<p>The following image shows our <a href="https://treeherder.mozilla.org/#/jobs?repo=nss">TreeHerder dashboard</a>
after pushing a changeset to the <a href="https://hg.mozilla.org/projects/nss">NSS repository</a>.
It is the result of only a few weeks of work (on our side):</p>

<p><a href="https://timtaubert.de/images/treeherder.png" title="The TreeHerder dashboard showing the NSS repository" class="img"><img src="https://timtaubert.de/images/treeherder.png" title="The TreeHerder dashboard showing the NSS repository" ></a></p>

<p>Based on my experience from building a <a href="https://docs.taskcluster.net/">Taskcluster</a>
CI for NSS over the last weeks, I want to share a rough outline of the process
of setting this up for basically any Mozilla project, using NSS as an example.</p>

<h2>What is the goal?</h2>

<p>The development of NSS has for a long time been heavily supported by a fleet of
buildbots. You can see them in action by looking at our waterfall diagram
showing the build and test statuses of the latest pushes to the NSS repository.</p>

<p><a href="https://timtaubert.de/images/buildbots.png" title="The waterfall diagram showing buildbot statuses" class="img"><img src="https://timtaubert.de/images/buildbots.png" title="The waterfall diagram showing buildbot statuses" ></a></p>

<p>Unfortunately, this setup is rather complex and the bots are slow. Build and
test tasks are run sequentially and so on some machines it takes 10-15 hours
before you will be notified about potential breakage.</p>

<p>The first thing that needs to be done is to replicate the current setup as good
as possible and then split monolithic test runs into many small tasks that can
be run in parallel. Builds will be prepared by build tasks, test tasks will
later download those pieces (called <em>artifacts</em>) to run tests.</p>

<p>A good turnaround time is essential, ideally one should know whether a push
broke the tree after not more than 15-30 minutes. We want a <a href="https://github.com/mozilla/treeherder/">TreeHerder</a>
dashboard that gives a good overview of all current build and test tasks, as
well as an IRC and email notification system so we don&rsquo;t have to watch the
tree all day.</p>

<h2>Docker for Linux tasks</h2>

<p>To build and test on Linux, Taskcluster uses Docker. The build instructions for
the image containing all NSS dependencies, as well as the scripts to build and
run tests, can be found in the <a href="https://hg.mozilla.org/projects/nss/file/tip/automation/taskcluster/docker">automation/taskcluster/docker</a>
directory.</p>

<p>For a start, the fastest way to get something up and running (or building) is
to use <code>ADD</code> in the Dockerfile to bake your scripts into the image. That way
you can just pass them as the <em>command</em> in the task definition later.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><pre><span class="c"># Add build and test scripts.</span>
ADD bin /home/worker/bin
RUN chmod +x /home/worker/bin/*
</pre></div></figure>


<p>Once you have NSS and its tests building and running in a local Docker container,
the next step is to run a Taskcluster task in the <em>cloud</em>. You can use the
<a href="https://tools.taskcluster.net/task-creator/">Task Creator</a> to spawn a one-off
task, experiment with your Docker image, and with the task definition.
Taskcluster will automatically pull your image from Docker Hub:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><pre><span class="p">{</span>

  <span class="nt">&quot;created&quot;</span><span class="p">:</span> <span class="s2">&quot; ... &quot;</span><span class="p">,</span>
  <span class="nt">&quot;deadline&quot;</span><span class="p">:</span> <span class="s2">&quot; ... &quot;</span><span class="p">,</span>
  <span class="nt">&quot;payload&quot;</span><span class="p">:</span> <span class="p">{</span>
    <span class="nt">&quot;image&quot;</span><span class="p">:</span> <span class="s2">&quot;ttaubert/nss-ci:0.0.21&quot;</span><span class="p">,</span>
    <span class="nt">&quot;command&quot;</span><span class="p">:</span> <span class="p">[</span>
      <span class="s2">&quot;/bin/bash&quot;</span><span class="p">,</span>
      <span class="s2">&quot;-c&quot;</span><span class="p">,</span>
      <span class="s2">&quot;bin/build.sh&quot;</span>
    <span class="p">],</span>
    <span class="nt">&quot;maxRunTime&quot;</span><span class="p">:</span> <span class="mi">3600</span>
  <span class="p">},</span>

<span class="p">}</span>
</pre></div></figure>


<p>Docker and task definitions are well-documented, so this step shouldn&rsquo;t be too
difficult and you should be able to confirm everything runs fine. Now instead
of kicking off tasks manually the next logical step is to spawn tasks
automatically when changesets are pushed to the repository.</p>

<h2>Using taskcluster-github</h2>

<p>Triggering tasks on repository pushes should remind you of Travis CI, CircleCI,
or AppVeyor, if you worked with any of those before. Taskcluster offers a similar
tool called <a href="https://github.com/taskcluster/taskcluster-github">taskcluster-github</a>
that uses a configuration file in the root of your repository for task definitions.</p>

<p>If your master is a Mercurial repository then it&rsquo;s very helpful that you don&rsquo;t
have to mess with it until you get the configuration right, and can instead
simply create a fork on GitHub. The <a href="http://docs.taskcluster.net/services/taskcluster-github/">documentation</a>
is rather self-explanatory, and the task definition is similar to the one used
by the Task Creator.</p>

<p>Once the WebHook is set up and receives pings, a push to your fork will make
&ldquo;Lisa Lionheart&rdquo;, the Taskcluster bot, comment on your push and leave either an
error message or a link to the task graph. If on the first try you see failures
about missing scopes you are lacking permissions and should talk to the nice
folks over in <a href="irc://irc.mozilla.org/taskcluster">#taskcluster</a>.</p>

<h2>Move scripts into the repository</h2>

<p>Once you have a GitHub fork spawning build and test tasks when pushing you
should move all the scripts you wrote so far into the repository. The only
script left on the Docker image would be a script that checks out the hg/git
repository and then uses the scripts in the tree to build and run tests.</p>

<p>This step will pay off very early in the process, rebuilding and pushing the
Docker image to Docker Hub is something that you really don&rsquo;t want to do too
often. All NSS scripts for Linux live in the <a href="https://hg.mozilla.org/projects/nss/file/tip/automation/taskcluster/scripts">automation/taskcluster/scripts</a>
directory.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><pre><span class="c">#!/usr/bin/env bash</span>

<span class="nb">set</span> -v -e -x

<span class="k">if</span> <span class="o">[</span> <span class="k">$(</span>id -u<span class="k">)</span> <span class="o">=</span> <span class="m">0</span> <span class="o">]</span><span class="p">;</span> <span class="k">then</span>
    <span class="c"># Drop privileges by re-running this script.</span>
    <span class="nb">exec </span>su worker <span class="nv">$0</span> <span class="nv">$@</span>
<span class="k">fi</span>

<span class="c"># Do things here ...</span>
</pre></div></figure>


<p>Use the above snippet as a template for your scripts. It will set a few flags
that help with debugging later, drop root privileges, and rerun it as the
unprivileged <em>worker</em> user. If you need to do things as root before building or
running tests, just put them before the <code>exec su ...</code> call.</p>

<h2>Split build and test runs</h2>

<p>Taskcluster encourages many small tasks. It&rsquo;s easy to split the big monolithic
test run I mentioned at the beginning into multiple tasks, one for each test
suite. However, you wouldn&rsquo;t want to build NSS before every test run again,
so we should build it only once and then reuse the binary. Taskcluster allows
to leave artifacts after a task run that can then be downloaded by subtasks.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><pre><span class="c"># Build.</span>
<span class="nb">cd </span>nss <span class="o">&amp;&amp;</span> make nss_build_all

<span class="c"># Package.</span>
mkdir artifacts
tar cvfjh artifacts/dist.tar.bz2 dist
</pre></div></figure>


<p>The above snippet builds NSS and creates an archive containing all the binaries
and libraries. You need to let Taskcluster know that there&rsquo;s a directory with
artifacts so that it picks those up and makes them available to the public.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><pre><span class="p">{</span>

  <span class="nt">&quot;created&quot;</span><span class="p">:</span> <span class="s2">&quot; ... &quot;</span><span class="p">,</span>
  <span class="nt">&quot;deadline&quot;</span><span class="p">:</span> <span class="s2">&quot; ... &quot;</span><span class="p">,</span>
  <span class="nt">&quot;payload&quot;</span><span class="p">:</span> <span class="p">{</span>
    <span class="nt">&quot;image&quot;</span><span class="p">:</span> <span class="s2">&quot;ttaubert/nss-ci:0.0.21&quot;</span><span class="p">,</span>
    <span class="nt">&quot;artifacts&quot;</span><span class="p">:</span> <span class="p">{</span>
      <span class="nt">&quot;public&quot;</span><span class="p">:</span> <span class="p">{</span>
        <span class="nt">&quot;type&quot;</span><span class="p">:</span> <span class="s2">&quot;directory&quot;</span><span class="p">,</span>
        <span class="nt">&quot;path&quot;</span><span class="p">:</span> <span class="s2">&quot;/home/worker/artifacts&quot;</span><span class="p">,</span>
        <span class="nt">&quot;expires&quot;</span><span class="p">:</span> <span class="s2">&quot; ... &quot;</span>
      <span class="p">}</span>
    <span class="p">},</span>
    <span class="nt">&quot;command&quot;</span><span class="p">:</span> <span class="p">[</span>
      <span class="err">...</span>
    <span class="p">],</span>
    <span class="nt">&quot;maxRunTime&quot;</span><span class="p">:</span> <span class="mi">3600</span>
  <span class="p">},</span>

<span class="p">}</span>
</pre></div></figure>


<p>The test task then uses the <code>$TC_PARENT_TASK_ID</code> environment variable to
determine the correct download URL, unpacks the build and starts running tests.
Making artifacts automatically available to subtasks, without having to pass
the parent task ID and build a URL, will hopefully be added to Taskcluster in
the future.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><pre><span class="c"># Fetch build artifact.</span>
curl --retry <span class="m">3</span> -Lo dist.tar.bz2 https://queue.taskcluster.net/v1/task/<span class="nv">$TC_PARENT_TASK_ID</span>/artifacts/public/dist.tar.bz2
tar xvjf dist.tar.bz2

<span class="c"># Run tests.</span>
<span class="nb">cd </span>nss/tests <span class="o">&amp;&amp;</span> ./all.sh
</pre></div></figure>


<h2>Writing decision tasks</h2>

<p>Specifying task dependencies in your .taskcluster.yml file is unfortunately not
possible at the moment. Even though the set of builds and tasks you want may
be static you can&rsquo;t create the necessary links without knowing the random task
IDs assigned to them.</p>

<p>Your only option is to create a so-called <em>decision task</em>. A decision task is
the only task defined in your .taskcluster.yml file and started after you
push a new changeset. It will leave an artifact in the form of a JSON file that
Taskcluster picks up and uses to extend the task graph, i.e. schedule further
tasks with appropriate dependencies. You can use whatever tool or language you
like to generate these JSON files, e.g. Python, Ruby, Node, &hellip;</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><pre><span class="l-Scalar-Plain">task</span><span class="p-Indicator">:</span>
  <span class="l-Scalar-Plain">payload</span><span class="p-Indicator">:</span>
    <span class="l-Scalar-Plain">image</span><span class="p-Indicator">:</span> <span class="s">&quot;ttaubert/nss-ci:0.0.21&quot;</span>

    <span class="l-Scalar-Plain">maxRunTime</span><span class="p-Indicator">:</span> <span class="l-Scalar-Plain">1800</span>

    <span class="l-Scalar-Plain">artifacts</span><span class="p-Indicator">:</span>
      <span class="l-Scalar-Plain">public</span><span class="p-Indicator">:</span>
        <span class="l-Scalar-Plain">type</span><span class="p-Indicator">:</span> <span class="s">&quot;directory&quot;</span>
        <span class="l-Scalar-Plain">path</span><span class="p-Indicator">:</span> <span class="s">&quot;/home/worker/artifacts&quot;</span>
        <span class="l-Scalar-Plain">expires</span><span class="p-Indicator">:</span> <span class="s">&quot;7</span><span class="nv"> </span><span class="s">days&quot;</span>

    <span class="l-Scalar-Plain">graphs</span><span class="p-Indicator">:</span>
      <span class="p-Indicator">-</span> <span class="l-Scalar-Plain">/home/worker/artifacts/graph.json</span>
</pre></div></figure>


<p>All task graph definitions including the Node.JS build script for NSS can be
found in the <a href="https://hg.mozilla.org/projects/nss/file/tip/automation/taskcluster/graph">automation/taskcluster/graph</a>
directory. Depending on the needs of your project you might want to use a
completely different structure. All that matters is that in the end you
produce a valid JSON file. Slightly more intelligent decision tasks can be used
to implement features like <a href="https://wiki.mozilla.org/NSS:TryServer#Using_try_syntax">try syntax</a>.</p>

<h2>mozilla-taskcluster for Mercurial projects</h2>

<p>If you have all of the above working with GitHub but your main repository is
hosted on <em>hg.mozilla.org</em> you will want to have Mercurial spawn decision tasks
when pushing.</p>

<p>The Taskcluster team is working on making .taskcluster.yml files work for
Mozilla-hosted Mercurial repositories too, but while that work isn&rsquo;t finished
yet you have to add your project to <a href="https://github.com/taskcluster/mozilla-taskcluster/">mozilla-taskcluster</a>.
mozilla-taskcluster will listen for pushes and then kick off tasks just like
the WebHook.</p>

<h2>TreeHerder Configuration</h2>

<p>A CI is no CI without a proper dashboard. That&rsquo;s the role of <a href="https://github.com/mozilla/treeherder/">TreeHerder</a>
at Mozilla. Add your project to the end of the <a href="https://github.com/mozilla/treeherder/blob/master/treeherder/model/fixtures/repository.json">repository.json</a>
file and create a new pull request. It will usually take a day or two after
merging until your change is deployed and your project shows up in the
dashboard.</p>

<p>TreeHerder gets the per-task configuration from the task definition. You can
configure the symbol, the platform and collection (i.e. row), and other
parameters. Here&rsquo;s the configuration data for the green <em>B</em> at the start of the
fifth row of the image at the top of this post:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><pre><span class="p">{</span>

  <span class="nt">&quot;created&quot;</span><span class="p">:</span> <span class="s2">&quot; ... &quot;</span><span class="p">,</span>
  <span class="nt">&quot;deadline&quot;</span><span class="p">:</span> <span class="s2">&quot; ... &quot;</span><span class="p">,</span>
  <span class="nt">&quot;payload&quot;</span><span class="p">:</span> <span class="p">{</span>
    <span class="err">...</span>
  <span class="p">},</span>
  <span class="nt">&quot;extra&quot;</span><span class="p">:</span> <span class="p">{</span>
    <span class="nt">&quot;treeherder&quot;</span><span class="p">:</span> <span class="p">{</span>
      <span class="nt">&quot;jobKind&quot;</span><span class="p">:</span> <span class="s2">&quot;build&quot;</span><span class="p">,</span>
      <span class="nt">&quot;symbol&quot;</span><span class="p">:</span> <span class="s2">&quot;B&quot;</span><span class="p">,</span>
      <span class="nt">&quot;build&quot;</span><span class="p">:</span> <span class="p">{</span>
        <span class="nt">&quot;platform&quot;</span><span class="p">:</span> <span class="s2">&quot;linux64&quot;</span>
      <span class="p">},</span>
      <span class="nt">&quot;machine&quot;</span><span class="p">:</span> <span class="p">{</span>
        <span class="nt">&quot;platform&quot;</span><span class="p">:</span> <span class="s2">&quot;linux64&quot;</span>
      <span class="p">},</span>
      <span class="nt">&quot;collection&quot;</span><span class="p">:</span> <span class="p">{</span>
        <span class="nt">&quot;debug&quot;</span><span class="p">:</span> <span class="kc">true</span>
      <span class="p">}</span>
    <span class="p">}</span>
  <span class="p">}</span>

<span class="p">}</span>
</pre></div></figure>


<h2>IRC and email notifications</h2>

<p>Taskcluster is a very modular system and offers many APIs. It&rsquo;s built with
mostly Node, and thus there are many Node libraries available to interact with
the many parts. The communication between those is realized by <a href="https://wiki.mozilla.org/Auto-tools/Projects/Pulse">Pulse</a>,
a managed RabbitMQ cluster.</p>

<p>The last missing piece we wanted is an IRC and email notification system, a bot
that notifies about failures on IRC and sends emails to all parties involved.
It was a piece of cake to write <a href="https://github.com/ttaubert/nss-taskcluster">nss-tc</a>
that uses Taskcluster Node.JS libraries and Mercurial JSON APIs to connect to
the task queue and listen for task definitions and failures.</p>

<h2>A rough overview</h2>

<p>I could have probably written a detailed post for each of the steps outlined
here but I think it&rsquo;s much more helpful to start with an overview of what&rsquo;s
needed to get the CI for a project up and running. Each step and each part of
the system is hopefully more obvious now if you haven&rsquo;t had too much interaction
with Taskcluster and TreeHerder so far.</p>

<p><em>Thanks to the Taskcluster team, especially John Ford, Greg Arndt, and Pete
Moore! They helped us pull this off in a matter of weeks and besides Linux
builds and tests we already have Windows tasks, static analysis, ASan+LSan,
and are in the process of setting up workers for ARM builds and tests.</em></p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[The Evolution of Signatures in TLS]]></title>
    <link href="https://timtaubert.de/blog/2016/07/the-evolution-of-signatures-in-tls/"/>
    <updated>2016-07-26T16:00:00+02:00</updated>
    <id>https://timtaubert.de/blog/2016/07/the-evolution-of-signatures-in-tls</id>
    <content type="html"><![CDATA[<p>This post will take a look at the evolution of signature algorithms and schemes
in the TLS protocol since version 1.0. I at first started taking notes for
myself but then decided to polish and publish them, hoping that others will
benefit as well.</p>

<p>(Let&rsquo;s ignore client authentication for simplicity.)</p>

<h2>Signature algorithms in TLS 1.0 and TLS 1.1</h2>

<p>In <a href="https://tools.ietf.org/html/rfc2246">TLS 1.0</a> as well as <a href="https://tools.ietf.org/html/rfc4346">TLS 1.1</a>
there are only two supported signature schemes: RSA with MD5/SHA-1 and DSA with
SHA-1. The RSA here stands for the PKCS#1 v1.5 signature scheme, naturally.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><pre><span class="n">select</span> <span class="p">(</span><span class="n">SignatureAlgorithm</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">case</span> <span class="nl">rsa</span><span class="p">:</span>
        <span class="n">digitally</span><span class="o">-</span><span class="kt">signed</span> <span class="k">struct</span> <span class="p">{</span>
            <span class="n">opaque</span> <span class="n">md5_hash</span><span class="p">[</span><span class="mi">16</span><span class="p">];</span>
            <span class="n">opaque</span> <span class="n">sha_hash</span><span class="p">[</span><span class="mi">20</span><span class="p">];</span>
        <span class="p">};</span>
    <span class="k">case</span> <span class="nl">dsa</span><span class="p">:</span>
        <span class="n">digitally</span><span class="o">-</span><span class="kt">signed</span> <span class="k">struct</span> <span class="p">{</span>
            <span class="n">opaque</span> <span class="n">sha_hash</span><span class="p">[</span><span class="mi">20</span><span class="p">];</span>
        <span class="p">};</span>
<span class="p">}</span> <span class="n">Signature</span><span class="p">;</span>
</pre></div></figure>


<p>An RSA signature signs the concatenation of the MD5 and SHA-1 digest, the DSA
signature only the SHA-1 digest. Hashes will be computed as follows:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><pre><span class="n">h</span> <span class="o">=</span> <span class="n">Hash</span><span class="p">(</span><span class="n">ClientHello</span><span class="p">.</span><span class="n">random</span> <span class="o">+</span> <span class="n">ServerHello</span><span class="p">.</span><span class="n">random</span> <span class="o">+</span> <span class="n">ServerParams</span><span class="p">)</span>
</pre></div></figure>


<p>The <code>ServerParams</code> are the actual data to be signed, the <code>*Hello.random</code> values
are prepended to prevent replay attacks. This is the reason TLS 1.3 puts a
<a href="https://tlswg.github.io/tls13-spec/#server-hello">downgrade sentinel</a>
at the end of <code>ServerHello.random</code> for clients to check.</p>

<p>The <a href="https://tools.ietf.org/html/rfc2246#section-7.4.3">ServerKeyExchange message</a>
containing the signature is sent only when static RSA/DH key exchange is <em>not</em>
used, that means we have a DHE_* cipher suite, an RSA_EXPORT_* suite
downgraded due to export restrictions, or a DH_anon_* suite where both
parties don&rsquo;t authenticate.</p>

<h2>Signature algorithms in TLS 1.2</h2>

<p><a href="https://tools.ietf.org/html/rfc5246">TLS 1.2</a> brought bigger changes to
signature algorithms by introducing the <a href="https://tools.ietf.org/html/rfc5246#section-7.4.1.4.1">signature_algorithms extension</a>.
This is a <code>ClientHello</code> extension allowing clients to signal supported and
preferred signature algorithms and hash functions.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><pre><span class="k">enum</span> <span class="p">{</span>
    <span class="n">none</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span> <span class="n">md5</span><span class="p">(</span><span class="mi">1</span><span class="p">),</span> <span class="n">sha1</span><span class="p">(</span><span class="mi">2</span><span class="p">),</span> <span class="n">sha224</span><span class="p">(</span><span class="mi">3</span><span class="p">),</span> <span class="n">sha256</span><span class="p">(</span><span class="mi">4</span><span class="p">),</span> <span class="n">sha384</span><span class="p">(</span><span class="mi">5</span><span class="p">),</span> <span class="n">sha512</span><span class="p">(</span><span class="mi">6</span><span class="p">)</span>
<span class="p">}</span> <span class="n">HashAlgorithm</span><span class="p">;</span>

<span class="k">enum</span> <span class="p">{</span>
    <span class="n">anonymous</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span> <span class="n">rsa</span><span class="p">(</span><span class="mi">1</span><span class="p">),</span> <span class="n">dsa</span><span class="p">(</span><span class="mi">2</span><span class="p">),</span> <span class="n">ecdsa</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
<span class="p">}</span> <span class="n">SignatureAlgorithm</span><span class="p">;</span>

<span class="k">struct</span> <span class="p">{</span>
    <span class="n">HashAlgorithm</span> <span class="n">hash</span><span class="p">;</span>
    <span class="n">SignatureAlgorithm</span> <span class="n">signature</span><span class="p">;</span>
<span class="p">}</span> <span class="n">SignatureAndHashAlgorithm</span><span class="p">;</span>
</pre></div></figure>


<p>If a client does not include the <code>signature_algorithms</code> extension then it is
assumed to support RSA, DSA, or ECDSA (depending on the negotiated cipher suite)
with SHA-1 as the hash function.</p>

<p>Besides adding all SHA-2 family hash functions, TLS 1.2 also introduced ECDSA
as a new signature algorithm. Note that the extension does not allow to
restrict the curve used for a given scheme, P-521 with SHA-1 is therefore
perfectly legal.</p>

<p>A new requirement for RSA signatures is that the hash has to be wrapped in a
DER-encoded <code>DigestInfo</code> sequence before passing it to the RSA sign function.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><pre><span class="n">DigestInfo</span> <span class="o">::=</span> <span class="n">SEQUENCE</span> <span class="p">{</span>
    <span class="n">digestAlgorithm</span> <span class="n">DigestAlgorithm</span><span class="p">,</span>
    <span class="n">digest</span> <span class="n">OCTET</span> <span class="n">STRING</span>
<span class="p">}</span>
</pre></div></figure>


<p>This unfortunately led to attacks like <a href="https://www.ietf.org/mail-archive/web/openpgp/current/msg00999.html">Bleichenbacher&#8217;06</a>
and <a href="http://www.intelsecurity.com/advanced-threat-research/berserk.html">BERserk</a>
because it turns out handling ASN.1 correctly is hard. As in TLS 1.1, a
<code>ServerKeyExchange</code> message is sent only when static RSA/DH key exchange is not
used. The hash computation did not change either:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><pre><span class="n">h</span> <span class="o">=</span> <span class="n">Hash</span><span class="p">(</span><span class="n">ClientHello</span><span class="p">.</span><span class="n">random</span> <span class="o">+</span> <span class="n">ServerHello</span><span class="p">.</span><span class="n">random</span> <span class="o">+</span> <span class="n">ServerParams</span><span class="p">)</span>
</pre></div></figure>


<h2>Signature schemes in TLS 1.3</h2>

<p>The <code>signature_algorithms</code> extension introduced by TLS 1.2 was revamped in
<a href="https://tlswg.github.io/tls13-spec/#rfc.section.4.2.2">TLS 1.3</a> and MUST now
be sent if the client offers a single non-PSK cipher suite. The format is
backwards compatible and keeps some old code points.</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><pre><span class="k">enum</span> <span class="p">{</span>
    <span class="cm">/* RSASSA-PKCS1-v1_5 algorithms */</span>
    <span class="n">rsa_pkcs1_sha1</span> <span class="p">(</span><span class="mh">0x0201</span><span class="p">),</span>
    <span class="n">rsa_pkcs1_sha256</span> <span class="p">(</span><span class="mh">0x0401</span><span class="p">),</span>
    <span class="n">rsa_pkcs1_sha384</span> <span class="p">(</span><span class="mh">0x0501</span><span class="p">),</span>
    <span class="n">rsa_pkcs1_sha512</span> <span class="p">(</span><span class="mh">0x0601</span><span class="p">),</span>

    <span class="cm">/* ECDSA algorithms */</span>
    <span class="n">ecdsa_secp256r1_sha256</span> <span class="p">(</span><span class="mh">0x0403</span><span class="p">),</span>
    <span class="n">ecdsa_secp384r1_sha384</span> <span class="p">(</span><span class="mh">0x0503</span><span class="p">),</span>
    <span class="n">ecdsa_secp521r1_sha512</span> <span class="p">(</span><span class="mh">0x0603</span><span class="p">),</span>

    <span class="cm">/* RSASSA-PSS algorithms */</span>
    <span class="n">rsa_pss_sha256</span> <span class="p">(</span><span class="mh">0x0700</span><span class="p">),</span>
    <span class="n">rsa_pss_sha384</span> <span class="p">(</span><span class="mh">0x0701</span><span class="p">),</span>
    <span class="n">rsa_pss_sha512</span> <span class="p">(</span><span class="mh">0x0702</span><span class="p">),</span>

    <span class="cm">/* EdDSA algorithms */</span>
    <span class="n">ed25519</span> <span class="p">(</span><span class="mh">0x0703</span><span class="p">),</span>
    <span class="n">ed448</span> <span class="p">(</span><span class="mh">0x0704</span><span class="p">),</span>

    <span class="cm">/* Reserved Code Points */</span>
    <span class="n">private_use</span> <span class="p">(</span><span class="mh">0xFE00</span><span class="p">.</span><span class="mf">.0</span><span class="n">xFFFF</span><span class="p">)</span>
<span class="p">}</span> <span class="n">SignatureScheme</span><span class="p">;</span>
</pre></div></figure>


<p>Instead of <code>SignatureAndHashAlgorithm</code>, a code point is now called a
<code>SignatureScheme</code> and tied to a hash function (if applicable) by the
specification. TLS 1.2 algorithm/hash combinations not listed here
are deprecated and MUST NOT be offered or negotiated.</p>

<p>New code points for RSA-PSS schemes, as well as Ed25519 and Ed448-Goldilocks
were added. ECDSA schemes are now tied to the curve given by the code point
name, to be enforced by implementations. SHA-1 signature schemes SHOULD NOT be
offered, if needed for backwards compatibility then only as the lowest priority
after all other schemes.</p>

<p>The current draft-13 lists RSASSA-PSS as the only valid signature algorithm
allowed to sign handshake messages with an RSA key. The rsa_pkcs1_* values
solely refer to signatures which appear in certificates and are not defined for
use in signed handshake messages.</p>

<p>To prevent various downgrade attacks like <a href="https://freakattack.com/">FREAK</a> and <a href="https://weakdh.org/">Logjam</a> the computation of the hashes to be signed
has changed significantly and covers the complete handshake, up until
<code>CertificateVerify</code>:</p>

<figure class='code'><figcaption><span></span></figcaption><div class="highlight"><pre><span class="n">h</span> <span class="o">=</span> <span class="n">Hash</span><span class="p">(</span><span class="n">Handshake</span> <span class="n">Context</span> <span class="o">+</span> <span class="n">Certificate</span><span class="p">)</span> <span class="o">+</span> <span class="n">Hash</span><span class="p">(</span><span class="n">Resumption</span> <span class="n">Context</span><span class="p">)</span>
</pre></div></figure>


<p>This includes amongst other data the client and server random, key shares, the
cipher suite, the certificate, and resumption information to prevent replay and
downgrade attacks. With static key exchange algorithms gone the
<a href="https://tlswg.github.io/tls13-spec/#rfc.section.4.3.2">CertificateVerify message</a>
is now the one carrying the signature.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Six Months as a Security Engineer]]></title>
    <link href="https://timtaubert.de/blog/2016/05/six-months-as-a-security-engineer/"/>
    <updated>2016-05-13T18:00:00+02:00</updated>
    <id>https://timtaubert.de/blog/2016/05/six-months-as-a-security-engineer</id>
    <content type="html"><![CDATA[<p>It&rsquo;s been a little more than six months since I officially switched to the
Security Engineering team here at Mozilla to work on
<a href="https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS">NSS</a> and
related code. I thought this might be a good time to share what I&rsquo;ve been up
to in a short status update:</p>

<h3>Removed SSLv2 code from NSS</h3>

<p>NSS contained quite a lot of SSLv2-specific code that was waiting to be removed.
It was not compiled by default so there was no way to enable it in Firefox even
if you wanted to. The removal was rather straightforward as the protocol changed
significantly with v3 and most of the code was well separated. Good riddance.</p>

<h3>Added ChaCha20/Poly1305 cipher suites to Firefox</h3>

<p>Adam Langley submitted a patch to bring ChaCha20/Poly1305 cipher suites to NSS
already two years ago but at that time we likely didn&rsquo;t have enough resources
to polish and land it. I picked up where he left and updated it to conform to
the slightly updated specification. <a href="https://timtaubert.de/blog/2016/04/a-fast-constant-time-aead-for-tls/">Firefox 47 will ship with two new
ECDHE/ChaCha20 cipher suites enabled</a>.</p>

<h3>RSA-PSS for TLS v1.3 and the WebCrypto API</h3>

<p>Ryan Sleevi, also a while ago, implemented RSA-PSS in <code>freebl</code>, the lower
cryptographic layer of NSS. I hooked it up to some more APIs so Firefox can
support RSA-PSS signatures in its WebCrypto API implementation. In NSS itself
we need it to support new handshake signatures in our experimental TLS v1.3
code.</p>

<h3>Improve continuous integration for NSS</h3>

<p>Kai Engert from RedHat is currently doing a hell of a job maintaining quite a
few buildbots that run all of our NSS tests whenever someone pushes a new
changeset. Unfortunately the current setup doesn&rsquo;t scale too well and the
machines are old and slow.</p>

<p>Similar to e.g. Travis CI, Mozilla maintains its own continuous integration and
release infrastructure, called <a href="https://docs.taskcluster.net/">TaskCluster</a>.
Using TaskCluster we now have an experimental Docker image that builds NSS/NSPR
and runs all of our 17 (so far) test suites. The turnaround time is already very
promising. This is an ongoing effort, there are lots of things left to do.</p>

<h3>Joined the WebCrypto working group</h3>

<p>I&rsquo;ve been working on the Firefox WebCrypto API implementation for a while, long
before I switched to the Security Engineering team, and so it made sense to join
the working group to help finalize the specification. I&rsquo;m unfortunately still
struggling to carve out more time for involvement with the WG than just
attending meetings and representing Mozilla.</p>

<h3>Added HKDF to the WebCrypto API</h3>

<p>The main reason the WebCrypto API in Firefox did not support HKDF until recently
is that no one found the time to implement it. I finally did find some time and
brought it to Firefox 46. It is fully compatible to Chrome&rsquo;s implementation
(<a href="https://tools.ietf.org/html/rfc5869">RFC 5869</a>), the WebCrypto specification
still needs to be updated to reflect those changes.</p>

<h3>Added SHA-2 for PBKDF2 in the WebCrypto API</h3>

<p>Since we shipped the first early version of the WebCrypto API, SHA-1 was the
only available PRF to be used with PBKDF2. We now support PBKDF2 with SHA-2
PRFs as well.</p>

<h3>Improved the Firefox WebCrypto API threading model</h3>

<p>Our initial implementation of the WebCrypto API would naively spawn a new thread
every time a <code>crypto.subtle.*</code> method was called. We now use a thread pool per
process that is able to handle all incoming API calls much faster.</p>

<h3>Added WebCrypto API to Workers and ServiceWorkers</h3>

<p>After working on this on and off for more than six months, so even before I
officially joined the security engineering team, I managed to finally get it
landed, with a lot of help from Boris Zbarsky who had to adapt our WebIDL code
generation quite a bit. The WebCrypto API can now finally be used from
(Service)Workers.</p>

<h2>What&rsquo;s next?</h2>

<p>In the near future I&rsquo;ll be working further on improving our continuous
integration infrastructure for NSS, and clean up the library and its tests.
I will hopefully find the time to write more about it as we progress.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[A Fast, Constant-time AEAD for TLS]]></title>
    <link href="https://timtaubert.de/blog/2016/04/a-fast-constant-time-aead-for-tls/"/>
    <updated>2016-04-29T15:00:00+02:00</updated>
    <id>https://timtaubert.de/blog/2016/04/a-fast-constant-time-aead-for-tls</id>
    <content type="html"><![CDATA[<p>The only TLS v1.2+ cipher suites with a dedicated AEAD scheme are the ones using
<a href="https://en.wikipedia.org/wiki/Galois/Counter_Mode">AES-GCM</a>, a block cipher
mode that turns AES into an <a href="https://en.wikipedia.org/wiki/Authenticated_encryption">authenticated cipher</a>.
From a cryptographic point of view these are preferable to non-AEAD-based cipher
suites (e.g. the ones with AES-CBC) because getting authenticated encryption
right is hard without using dedicated ciphers.</p>

<p>For CPUs without the <a href="https://en.wikipedia.org/wiki/AES_instruction_set">AES-NI instruction set</a>,
constant-time AES-GCM however is slow and also hard to write and maintain. The
majority of mobile phones, and mostly cheaper devices like tablets and notebooks
on the market thus cannot support efficient and safe AES-GCM cipher suite
implementations.</p>

<p>Even if we ignored all those aforementioned pitfalls we still wouldn&rsquo;t want to
rely on AES-GCM cipher suites as the only good ones available. We need more
diversity. Having widespread support for cipher suites using a second AEAD is
necessary to defend against weaknesses in AES or AES-GCM that may be discovered
in the future.</p>

<p><a href="https://en.wikipedia.org/wiki/Salsa20#ChaCha_variant">ChaCha20</a> and
<a href="https://en.wikipedia.org/wiki/Poly1305">Poly1305</a>, a stream cipher and a
message authentication code, were designed with fast and constant-time
implementations in mind. A combination of those two algorithms yields a safe
and efficient AEAD construction, called ChaCha20/Poly1305, which allows TLS
with a negligible performance impact even on low-end devices.</p>

<p><a href="https://www.mozilla.org/en-US/firefox/47.0beta/releasenotes/">Firefox 47</a>
will ship with two new ECDHE/ChaCha20 cipher suites as specified in the
<a href="https://tools.ietf.org/html/draft-ietf-tls-chacha20-poly1305-04">latest draft</a>.
We are looking forward to see the adoption of these increase and will, as a
next step, work on prioritizing them over AES-GCM suites on devices not
supporting AES-NI.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Build Your Own Signal Desktop]]></title>
    <link href="https://timtaubert.de/blog/2016/01/build-your-own-signal-desktop/"/>
    <updated>2016-01-15T15:00:00+01:00</updated>
    <id>https://timtaubert.de/blog/2016/01/build-your-own-signal-desktop</id>
    <content type="html"><![CDATA[<p>The Signal Private Messenger is great. <strong>Use it.</strong> It&rsquo;s probably the best secure
messenger on the market. When recently a desktop app was announced people were
eager to join the beta and even happier when an invite finally showed up in
their inbox. So was I, it&rsquo;s a great app and works surprisingly well for an early
version.</p>

<p>The only problem is that it&rsquo;s a Chrome App. Apart from excluding folks with
other browsers it&rsquo;s also a shitty user experience. If you too want your
messaging app not tied to a browser then let&rsquo;s just build our own standalone
variant of Signal Desktop.</p>

<h2>NW.js beta with Chrome App support</h2>

<p>Signal Desktop is a Chrome App, so the easiest way to turn it into a standalone
app is to use <a href="http://nwjs.io/">NW.js</a>. Conveniently, their next release v0.13
will ship with Chrome App support and is available for download as a beta
version.</p>

<p>First, make sure you have <code>git</code> and <code>npm</code> installed. Then open a terminal and
prepare a temporary build directory to which we can download a few things and
where we can build the app:</p>

<figure class='code'><div class="highlight"><pre>$ mkdir signal-build
$ cd signal-build
</pre></div></figure>


<h2>[OS X] Packaging Signal and NW.js</h2>

<p>Download the latest beta of NW.js and <code>unzip</code> it. We&rsquo;ll extract the application
and use it as a template for our Signal clone. The NW.js project does
unfortunately not seem to provide a secure source (or at least hashes)
for their downloads.</p>

<figure class='code'><div class="highlight"><pre>$ wget http://dl.nwjs.io/v0.14.4/nwjs-sdk-v0.14.4-osx-x64.zip
$ unzip nwjs-sdk-v0.14.4-osx-x64.zip
$ cp -r nwjs-sdk-v0.14.4-osx-x64/nwjs.app SignalPrivateMessenger.app
</pre></div></figure>


<p>Next, clone the Signal repository and use NPM to install the necessary modules.
Run the <code>grunt</code> automation tool to build the application.</p>

<figure class='code'><div class="highlight"><pre>$ git clone https://github.com/WhisperSystems/Signal-Desktop.git
$ cd Signal-Desktop/
$ npm install
$ node_modules/grunt-cli/bin/grunt
</pre></div></figure>


<p>Finally, simply to copy the <code>dist</code> folder containing all the juicy Signal files
into the application template we created a few moments ago.</p>

<figure class='code'><div class="highlight"><pre>$ cp -r dist ../SignalPrivateMessenger.app/Contents/Resources/app.nw
$ open ..
</pre></div></figure>


<p>The last command opens a Finder window. Move <code>SignalPrivateMessenger.app</code> to
your Applications folder and launch it as usual. You should now see a welcome
page!</p>

<h2>[Linux] Packaging Signal and NW.js</h2>

<p>The build instructions for Linux aren&rsquo;t too different but I&rsquo;ll write them down,
if just for convenience. Start by cloning the Signal Desktop repository and
build.</p>

<figure class='code'><div class="highlight"><pre>$ git clone https://github.com/WhisperSystems/Signal-Desktop.git
$ cd Signal-Desktop/
$ npm install
$ node_modules/grunt-cli/bin/grunt
</pre></div></figure>


<p>The <code>dist</code> folder contains the app, ready to be launched. <code>zip</code> it and place
the resulting package somewhere handy.</p>

<figure class='code'><div class="highlight"><pre>$ cd dist
$ zip -r ../../package.nw *
</pre></div></figure>


<p>Back to the top. Download the NW.js binary, extract it, and change into the
newly created directory. Move the <code>package.nw</code> file we created earlier next to
the <code>nw</code> binary and we&rsquo;re done. The <code>nwjs-sdk-v0.13.0-beta3-linux-x64</code> folder
does now contain the standalone Signal app.</p>

<figure class='code'><div class="highlight"><pre>$ cd ../..
$ wget http://dl.nwjs.io/v0.14.4/nwjs-sdk-v0.14.4-linux-x64.tar.gz
$ tar xfz nwjs-sdk-v0.14.4-linux-x64.tar.gz
$ cd nwjs-sdk-v0.14.4-linux-x64
$ mv ../package.nw .
</pre></div></figure>


<p>Finally, launch NW.js. You should see a welcome page!</p>

<figure class='code'><div class="highlight"><pre>$ ./nw
</pre></div></figure>


<h2>If you see something, file something</h2>

<p>Our standalone Signal clone mostly works, but it&rsquo;s far from perfect. We&rsquo;re
pulling from master and that might bring breaking changes that weren&rsquo;t
sufficiently tested.</p>

<p>We don&rsquo;t have the right icons. The app crashes when you click a media message.
It opens a blank popup when you click a link. It&rsquo;s quite big because also NW.js
has bugs and so we have to use the SDK build for now. In the future it would be
great to have automatic updates, and maybe even signed builds.</p>

<p>Remember, Signal Desktop is beta, and completely untested with NW.js. If you
want to help file bugs, but only after checking that those affect the Chrome
App too. If you want to fix a bug only occurring in the standalone version
it&rsquo;s probably best to file a pull request and cross fingers.</p>

<h2>Is this secure?</h2>

<p>Great question! I don&rsquo;t know. I would love to get some more insights from people
that know more about the NW.js security model and whether it comes with all the
protections Chromium can offer. Another interesting question is whether bundling
Signal Desktop with NW.js is in any way worse (from a security perspective) than
installing it as a Chrome extension. If you happen to have an opinion about
that, I would love to hear it.</p>

<p>Another important thing to keep in mind is that when building Signal on your
own you will possibly miss automatic and signed security updates from the
Chrome Web Store. Keep an eye on the repository and rebuild your app from
time to time to not fall behind too much.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[More Privacy, Less Latency]]></title>
    <link href="https://timtaubert.de/blog/2015/11/more-privacy-less-latency-improved-handshakes-in-tls-13/"/>
    <updated>2015-11-16T18:00:00+01:00</updated>
    <id>https://timtaubert.de/blog/2015/11/more-privacy-less-latency-improved-handshakes-in-tls-13</id>
    <content type="html"><![CDATA[<blockquote><p><em>Please note that this post is about draft-11 of the TLS v1.3 standard.</em></p></blockquote>

<p><em>TLS must be <a href="https://istlsfastyet.com/">fast</a>.</em> Adoption will greatly benefit
from speeding up the initial handshake that authenticates and secures the
connection. You want to get the protocol out of the way and start delivering
data to visitors as soon as possible. This is crucial if we want the web to
succeed at <a href="https://blog.mozilla.org/security/2015/04/30/deprecating-non-secure-http/">deprecating non-secure HTTP</a>.</p>

<p>Let&rsquo;s start by looking at full handshakes as standardized in
<a href="https://tools.ietf.org/html/rfc5246">TLS v1.2</a>, and then continue to
abbreviated handshakes that decrease connection times for resumed sessions.
Once we understand the current protocol we can proceed to proposals made in
the latest <a href="https://tlswg.github.io/tls13-spec/">TLS v1.3 draft</a> to achieve
full 1-RTT and even 0-RTT handshakes.</p>

<p>It helps if you already have a rough idea of how TLS and Diffie-Hellman work
as I can&rsquo;t go into every detail. The focus of this post is on comparing current
and future handshakes and I might omit a few technicalities to get basic ideas
across more easily.</p>

<h2>Full TLS 1.2 Handshake (static RSA)</h2>

<p>Static RSA is a straightforward key exchange method, available since
<a href="https://tools.ietf.org/html/draft-hickman-netscape-ssl-00">SSLv2</a>. After
sharing basic protocol information via the <code>ClientHello</code> and <code>ServerHello</code>
messages the server sends its certificate to the client. <code>ServerHelloDone</code>
signals that for now there will be no further messages until the client
responds.</p>

<p><a href="https://timtaubert.de/images/tls-hs-static-rsa.png" title="Full TLS v1.2 Handshake with Static RSA Key Exchange (2-RTT)" class="img"><img src="https://timtaubert.de/images/tls-hs-static-rsa.png" width="600" title="Full TLS v1.2 Handshake with Static RSA Key Exchange (2-RTT)" ></a></p>

<p>The client then encrypts the so-called premaster secret with the server&rsquo;s
public key found in the certificate and wraps it in a <code>ClientKeyExchange</code>
message. <code>ChangeCipherSpec</code> signals that from now on messages will be encrypted.
<code>Finished</code>, the first message to be encrypted and the client&rsquo;s last message of
the handshake, contains a MAC of all handshake messages exchanged thus far to
prove that both parties saw the same messages, without interference from a MITM.</p>

<p>The server decrypts the premaster secret found in the <code>ClientKeyExchange</code>
message using its certificate&rsquo;s private key, and derives the master secret and
communication keys. It then too signals a switch to encrypted communication
and completes the handshake. <em>It takes two round-trips to establish a
connection.</em></p>

<p><strong>Authentication:</strong> With static RSA key exchanges, the connection is
authenticated by encrypting the premaster secret with the server certificate&rsquo;s
public key. Only the server in possession of the private key can decrypt,
correctly derive the master secret, and send an encrypted <code>Finished</code> message
with the right MAC.</p>

<p>The simplicity of static RSA has a serious drawback: it does not offer
<a href="https://en.wikipedia.org/wiki/Forward_secrecy">forward secrecy</a>. If a passive
adversary records all traffic to a server then every recorded TLS session can
be broken later by obtaining the certificate&rsquo;s private key.</p>

<p><em>This key exchange method will be <a href="https://tlswg.github.io/tls13-spec/#major-differences-from-tls-12">removed in TLS v1.3</a>.</em></p>

<h2>Full TLS 1.2 Handshake (ephemeral DH)</h2>

<p>A full handshake using (Elliptic Curve)
<a href="https://en.wikipedia.org/wiki/Diffie-Hellman_key_exchange">Diffie-Hellman</a> to
exchange ephemeral keys is very similar to the flow of static RSA. The main
difference is that after sending the certificate the server will also send a
<code>ServerKeyExchange</code> message. This message contains either the parameters of a
DH group or of an elliptic curve, paired with an ephemeral public key computed
by the server.</p>

<p><a href="https://timtaubert.de/images/tls-hs-ecdhe.png" title="Full TLS v1.2 Handshake with Ephemeral Diffie-Hellman Key Exchange (2-RTT)" class="img"><img src="https://timtaubert.de/images/tls-hs-ecdhe.png" width="600" title="Full TLS v1.2 Handshake with Ephemeral Diffie-Hellman Key Exchange (2-RTT)" ></a></p>

<p>The client too computes an ephemeral public key compatible with the given
parameters and sends it to the server. Knowing their private keys and the other
party&rsquo;s public key both sides should now share the same premaster secret and
can derive a shared master secret.</p>

<p><strong>Authentication:</strong> With (EC)DH key exchanges it&rsquo;s still the certificate that
must be signed by a CA listed in the client&rsquo;s trust store. To authenticate the
connection the server will sign the parameters contained in <code>ServerKeyExchange</code>
with the certificate&rsquo;s private key. The client verifies the signature with the
certificate&rsquo;s public key and only then proceeds with the handshake.</p>

<h2>Abbreviated Handshakes in TLS 1.2</h2>

<p>Since <a href="https://tools.ietf.org/html/draft-hickman-netscape-ssl-00">SSLv2</a>
clients have been able to use session identifiers as a way to resume previously
established TLS/SSL sessions. <a href="https://blog.cloudflare.com/tls-session-resumption-full-speed-and-secure/">Session resumption</a>
is important because a full handshake can take time: it has a high latency as
it needs two round-trips and might involve expensive computation to exchange
keys, or sign and verify certificates.</p>

<p><strong><a href="https://tools.ietf.org/html/rfc5246#appendix-F.1.4">Session IDs</a></strong>, assigned
by the server, are unique identifiers under which both parties store the master
secret and other details of the connection they established. The client may
include this ID in the <code>ClientHello</code> message of the next handshake to
short-circuit the negotiation and reuse previous connection parameters.</p>

<p><a href="https://timtaubert.de/images/tls-hs-session-ids.png" title="Abbreviated Handshake with Session IDs (1-RTT)" class="img"><img src="https://timtaubert.de/images/tls-hs-session-ids.png" width="600" title="Abbreviated Handshake with Session IDs (1-RTT)" ></a></p>

<p>If the server is willing and able to resume the session it responds with a
<code>ServerHello</code> message including the Session ID given by the client. This
handshake is effectively 1-RTT as the client can send application data
immediately after the <code>Finished</code> message.</p>

<p>Sites with lots of visitors will have to manage and secure big session caches,
or risk pushing out saved sessions too quickly. A setup involving multiple
load-balanced servers will need to securely synchronize session caches across
machines. The forward secrecy of a connection is bounded by how long session
information is retained on servers.</p>

<p><strong><a href="http://tools.ietf.org/html/rfc5077">Session tickets</a></strong>, created by the server
and stored by the client, are blobs containing all necessary information about
a connection, encrypted by a key only known to the server. If the client
presents this tickets with the <code>ClientHello</code> message, and proves that it knows
the master secret stored in the ticket, the session will be resumed.</p>

<p><a href="https://timtaubert.de/images/tls-hs-session-tickets.png" title="Abbreviated Handshake with Session Tickets (1-RTT)" class="img"><img src="https://timtaubert.de/images/tls-hs-session-tickets.png" width="600" title="Abbreviated Handshake with Session Tickets (1-RTT)" ></a></p>

<p>A server willing and able to decrypt the given ticket responds with a
<code>ServerHello</code> message including an empty <em>SessionTicket</em> extension, otherwise
the extension would be omitted completely. As with session IDs, the client will
start sending application data immediately after the <code>Finished</code> message to
achieve 1-RTT.</p>

<p>To not affect the forward secrecy provided by (EC)DHE suites session ticket
keys should be rotated periodically, otherwise stealing the ticket key would
allow recovering recorded sessions later. In a setup with multiple load-balanced
servers the main challenge here is to securely generate, rotate, and
synchronize keys across machines.</p>

<p><strong>Authentication:</strong> Both session resumption mechanisms retain the client&rsquo;s and
server&rsquo;s authentication states as established in the session&rsquo;s initial handshake.
Neither the server nor the client have to send and verify certificates a second
time, and thus can reduce connection times significantly, especially when
dealing with RSA certificates.</p>

<h2>Full Handshakes in TLS 1.3</h2>

<p>The first good news about handshakes in TLS v1.3 is that static RSA key
exchanges are no longer supported. Great! That means we can start with full
handshakes using forward-secure Diffie-Hellman.</p>

<p>Another important change is the removal of the <code>ChangeCipherSpec</code> protocol
(yes, it&rsquo;s actually a protocol, not a message). With TLS v1.3 every message
sent after <code>ServerHello</code> is encrypted with the so-called
<a href="https://tlswg.github.io/tls13-spec/#key-schedule">ephemeral secret</a> to lock
out passive adversaries very early in the game. <code>EncryptedExtensions</code> carries
Hello extension data that must be encrypted because it&rsquo;s not needed to set up
secure communication.</p>

<p><a href="https://timtaubert.de/images/tls13-hs-ecdhe.png" title="Full TLS v1.3 Handshake with Ephemeral Diffie-Hellman Key Exchange (1-RTT)" class="img"><img src="https://timtaubert.de/images/tls13-hs-ecdhe.png" width="600" title="Full TLS v1.3 Handshake with Ephemeral Diffie-Hellman Key Exchange (1-RTT)" ></a></p>

<p>The probably most important change with regard to 1-RTT is the removal of the
<code>ServerKeyExchange</code> and <code>ClientKeyExchange</code> messages. The DH parameters and
public keys are now sent in special <em>KeyShare</em> extensions, a new type of
extension to be included in the <code>ServerHello</code> and <code>ClientHello</code> messages.
Moving this data into Hello extensions keeps the handshake compatible with TLS
v1.2 as it doesn&rsquo;t change the order of messages.</p>

<p>The client sends a list of <em>KeyShareEntry</em> values, each consisting of a named
(EC)DH group and an ephemeral public key. If the server accepts it must respond
with one of the proposed groups and its own public key. If the server does not
support any of the given key shares the server will request retrying the
handshake or abort the connection with a fatal <code>handshake_failure</code> alert.</p>

<p><strong>Authentication:</strong> The Diffie-Hellman parameters itself aren&rsquo;t signed anymore,
authentication will be a tad more explicit in TLS v1.3. The server sends a
<code>CertificateVerify</code> message that contains a hash of all handshake message
exchanged so far, signed with the certificate&rsquo;s private key. The client then
simply verifies the signature with the certificate&rsquo;s public key.</p>

<h2>Session Resumption in TLS 1.3 (PSK)</h2>

<p>Session resumption via identifiers and tickets is obsolete in TLS v1.3.
Both methods are replaced by a <a href="https://tlswg.github.io/tls13-spec/#rfc.section.6.2.3">pre-shared key (PSK) mode</a>.
A PSK is established on a previous connection after the handshake is completed,
and can then be presented by the client on the next visit.</p>

<p><a href="https://timtaubert.de/images/tls13-hs-resumption.png" title="Session Resumption / PSK Mode in TLS v1.3 (1-RTT)" class="img"><img src="https://timtaubert.de/images/tls13-hs-resumption.png" width="600" title="Session Resumption / PSK Mode in TLS v1.3 (1-RTT)" ></a></p>

<p>The client sends one or more <em>PSK identities</em> as opaque blobs of data. They can
be database lookup keys (similar to Session IDs), or self-encrypted and
self-authenticated values (similar to Session Tickets). If the server accepts
one of the given PSK identities it replies with the one it selected. The
<em>KeyShare</em> extension is sent to allow servers to ignore PSKs and fall back to
a full handshake.</p>

<p>Forward secrecy can be maintained by limiting the lifetime of PSK identities
sensibly. Clients and servers may also choose an (EC)DHE cipher suite for PSK
handshakes to provide forward secrecy for every connection, not just the whole
session.</p>

<p><strong>Authentication:</strong> As in TLS v1.2, the client&rsquo;s and server&rsquo;s authentication
states are retained and both parties don&rsquo;t need to exchange and verify
certificates again. A regular PSK handshake initiating a new session, instead
of resuming, omits certificates completely.</p>

<p>Session resumption still allows significantly faster handshakes when using RSA
certificates and can prevent user-facing client authentication dialogs on
subsequent connections. However, the fact that it requires a single round-trip
just like a full handshake might make it less appealing, especially if you
have an ECDSA or EdDSA certificate and do not require client authentication.</p>

<h2>Zero-RTT Handshakes in TLS 1.3</h2>

<p>The latest draft of the specification contains a proposal to let clients
encrypt application data and include it in their first flights. On a previous
connection, after the handshake completes, the server would send a
<code>ServerConfiguration</code> message that the client can use for
<a href="https://tlswg.github.io/tls13-spec/#zero-rtt-exchange">0-RTT handshakes</a>
on subsequent connections. The
<a href="https://tlswg.github.io/tls13-spec/#server-configuration">configuration</a>
includes a configuration identifier, the server&rsquo;s semi-static (EC)DH parameters,
an expiration date, and other details.</p>

<p><a href="https://timtaubert.de/images/tls13-hs-zero-rtt.png" title="TLS v1.3 0-RTT Handshake" class="img"><img src="https://timtaubert.de/images/tls13-hs-zero-rtt.png" width="600" title="TLS v1.3 0-RTT Handshake" ></a></p>

<p>With the very first TLS record the client sends its <code>ClientHello</code> and, changing
the order of messages, directly appends application data (e.g. <code>GET / HTTP/1.1</code>).
Everything after the <code>ClientHello</code> will be encrypted with the
<a href="https://tlswg.github.io/tls13-spec/#key-schedule">static secret</a>, derived from
the client&rsquo;s ephemeral <em>KeyShareEntry</em> and the semi-static DH parameters given
in the server&rsquo;s configuration. The <code>end_of_early_data</code> alert indicates the end
of the flight.</p>

<p>The server, if able and willing to decrypt, responds with its default set of
messages and immediately appends the contents of the requested resource. <em>That&rsquo;s
the same round-trip time as for an unencrypted HTTP request.</em> All communication
following the <code>ServerHello</code> will again be encrypted with the ephemeral secret,
derived from the client&rsquo;s <em>and</em> server&rsquo;s ephemeral key shares. After exchanging
<code>Finished</code> messages the server will be re-authenticated, and traffic encrypted
with keys derived from the master secret.</p>

<h3>Security of 0-RTT Handshakes</h3>

<p>At first glance, 0-RTT mode seems similar to session resumption or PSK, and you
might wonder why one wouldn&rsquo;t merge these mechanisms. The differences however
are subtle but important, and the security properties of 0-RTT handshakes are
weaker than those for other kinds of TLS data:</p>

<p><strong>1.</strong> To protect against replay attacks the server must incorporate a <em>server
random</em> into the master secret. That is unfortunately not possible before the
first round-trip and so the poor server can&rsquo;t easily tell whether it&rsquo;s a valid
request or an attacker replaying a recorded conversation. Replay protection
will be in place again after the <code>ServerHello</code> message is sent.</p>

<p><strong>2.</strong> The semi-static DH share given in the server configuration, used to
derive the static secret and encrypt first flight data, defies forward secrecy.
We need at least one round-trip to establish the ephemeral secret. As
configurations are shared between clients, and recovering the server&rsquo;s DH share
becomes more attractive, expiration dates should be limited sensibly. The
maximum allowed validity is 7 days.</p>

<p><strong>3.</strong> If the server&rsquo;s DH share is compromised a MITM can tamper with the
0-RTT data sent by the client, without being detected. This does not extend to
the full session as the client can retrospectively authenticate the server via
the remaining handshake messages.</p>

<h3>Defending against Replay Attacks</h3>

<p>Thwarting replay attacks without input from the server is fundamentally very
expensive. It&rsquo;s important to understand that this is a generic problem, not an
issue with TLS in particular, so alas one can&rsquo;t just borrow another protocol&rsquo;s
0-RTT model and put that into TLS.</p>

<p>It is possible to have servers keep a list of every <em>ClientRandom</em> they have
received in a given time window. Upon receiving a <code>ClientHello</code> the server
checks its list and rejects replays if necessary. This list must be globally
and temporally consistent as there are
<a href="https://www.ietf.org/mail-archive/web/tls/current/msg15594.html">possible attack vectors</a>
due to TLS&#8217; reliable delivery guarantee if an attacker can force a server to
lose its state, as well as with multiple servers in loosely-synchronized data
centers.</p>

<p>Maintaining a consistent global state is possible, but only in some limited
circumstances, namely for very sophisticated operators or situations where
there is a single server with good state management. We will need something
better.</p>

<h3>Removing Anti-Replay Guarantee</h3>

<p>A possible solution might be a TLS stack API to let applications designate
certain data as replay-safe, for example <code>GET / HTTP/1.1</code> assuming that GET
requests against a given resource are idempotent.</p>

<figure class='code'><div class="highlight"><pre><span class="kd">let</span> <span class="nx">c</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">TLSConnection</span><span class="p">(...);</span>
<span class="nx">c</span><span class="p">.</span><span class="nx">setReplayable0RTTData</span><span class="p">(</span><span class="s2">&quot;GET / ...&quot;</span><span class="p">);</span>
<span class="nx">c</span><span class="p">.</span><span class="nx">connect</span><span class="p">();</span>
</pre></div></figure>


<p>Applications can, before opening the connection, specify replayable 0-RTT data
to send on the first flight. If the server ignores the given 0-RTT data, the
TLS stack automatically replays it after the first round-trip.</p>

<h3>Removing Reliable Delivery Guarantee</h3>

<p>Another way of achieving the same outcome would be a TLS stack API that
again lets applications designate certain data as replay-safe, but does <em>not
automatically</em> replay if the server ignores it. The application can decide to
do this manually if necessary.</p>

<figure class='code'><div class="highlight"><pre><span class="kd">let</span> <span class="nx">c</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">TLSConnection</span><span class="p">(...);</span>
<span class="nx">c</span><span class="p">.</span><span class="nx">setUnreliable0RTTData</span><span class="p">(</span><span class="s2">&quot;GET / ...&quot;</span><span class="p">);</span>
<span class="nx">c</span><span class="p">.</span><span class="nx">connect</span><span class="p">();</span>

<span class="k">if</span> <span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">delivered0RTTData</span><span class="p">())</span> <span class="p">{</span>
  <span class="c1">// Things are cool.</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
  <span class="c1">// Try to figure out whether to replay or not.</span>
<span class="p">}</span>
</pre></div></figure>


<p>Both of these APIs are early proposals and the final version of the
specification might look very different from what we can see above. Though, as
0-RTT handshakes are a charter goal, the working group will very likely find a
way to make them work.</p>

<h2>Summing up</h2>

<p>TLS v1.3 will bring major improvements to handshakes, how exactly will be
finalized in the coming months. They will be more private by default as all
information not needed to set up a secure channel will be encrypted as early
as possible. Clients will need only a single round-trip to establish secure
and authenticated connections to servers they never spoke to before.</p>

<p>Static RSA mode will no longer be available, forward secrecy will be the
default. The two session resumption standards, session identifiers and session
tickets, are merged into a single PSK mode which will allow streamlining
implementations.</p>

<p>The proposed 0-RTT mode is promising, for custom application communication
based on TLS but also for browsers, where a <code>GET / HTTP/1.1</code> request to your
favorite news page could deliver content blazingly fast as if no TLS was
involved. The security aspects of zero round-trip handshakes will become more
clear as the draft progresses.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[A Firefox OS Password Storage]]></title>
    <link href="https://timtaubert.de/blog/2015/05/implementing-a-pbkdf2-based-password-storage-scheme-for-firefox-os/"/>
    <updated>2015-05-18T15:00:00+02:00</updated>
    <id>https://timtaubert.de/blog/2015/05/implementing-a-pbkdf2-based-password-storage-scheme-for-firefox-os</id>
    <content type="html"><![CDATA[<p>My esteemed colleague <a href="https://frederik-braun.com/">Frederik Braun</a> recently
took on to rewrite the module responsible for storing and checking passcodes
that unlock Firefox OS phones. While we are still working on actually landing
it in <a href="https://developer.mozilla.org/en-US/Firefox_OS/Platform/Gaia">Gaia</a> I
wanted to seize the chance to talk about this great use case of the
<a href="https://dvcs.w3.org/hg/webcrypto-api/raw-file/tip/spec/Overview.html">WebCrypto API</a>
in the wild and highlight a few important points when using
<a href="https://en.wikipedia.org/wiki/PBKDF2">password-based key derivation (PBKDF2)</a>
to store passwords.</p>

<h2>The Passcode Module</h2>

<p>Let us take a closer look at not the verbatim implementation but at a slightly
simplified version. The API offers the only two operations such a module needs
to support: setting a new passcode and verifying that a given passcode matches
the stored one.</p>

<figure class='code'><div class="highlight"><pre><span class="kd">let</span> <span class="nx">Passcode</span> <span class="o">=</span> <span class="p">{</span>
  <span class="nx">store</span><span class="p">(</span><span class="nx">code</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// ...</span>
  <span class="p">},</span>

  <span class="nx">verify</span><span class="p">(</span><span class="nx">code</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// ...</span>
  <span class="p">}</span>
<span class="p">};</span>
</pre></div></figure>


<p>When setting up the phone for the first time - or when changing the passcode
later - we call <code>Passcode.store()</code> to write a new code to disk.
<code>Passcode.verify()</code> will help us determine whether we should unlock the phone.
Both methods return a Promise as all operations exposed by the WebCrypto API
are asynchronous.</p>

<figure class='code'><div class="highlight"><pre><span class="nx">Passcode</span><span class="p">.</span><span class="nx">store</span><span class="p">(</span><span class="s2">&quot;1234&quot;</span><span class="p">).</span><span class="nx">then</span><span class="p">(()</span> <span class="o">=&gt;</span> <span class="p">{</span>
  <span class="k">return</span> <span class="nx">Passcode</span><span class="p">.</span><span class="nx">verify</span><span class="p">(</span><span class="s2">&quot;1234&quot;</span><span class="p">);</span>
<span class="p">}).</span><span class="nx">then</span><span class="p">(</span><span class="nx">valid</span> <span class="o">=&gt;</span> <span class="p">{</span>
  <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">valid</span><span class="p">);</span>
<span class="p">});</span>

<span class="c1">// Output: true</span>
</pre></div></figure>


<h2>Make the passcode look &ldquo;random&rdquo;</h2>

<p>The module should <em>absolutely not</em> store passcodes in the clear. We will use
<a href="https://en.wikipedia.org/wiki/PBKDF2">PBKDF2</a> as a
<a href="https://en.wikipedia.org/wiki/Pseudorandom_function_family">pseudorandom function (PRF)</a>
to retrieve a result that <em>looks random</em>. An attacker with read access to the
part of the disk storing the user&rsquo;s passcode should not be able to recover the
original input, assuming limited computational resources.</p>

<p>The function <code>deriveBits()</code> is a PRF that takes a passcode and returns a Promise
resolving to a random looking sequence of bytes. To be a little more specific,
it uses PBKDF2 to derive pseudorandom bits.</p>

<figure class='code'><div class="highlight"><pre><span class="kd">function</span> <span class="nx">deriveBits</span><span class="p">(</span><span class="nx">code</span><span class="p">)</span> <span class="p">{</span>
  <span class="c1">// Convert string to a TypedArray.</span>
  <span class="kd">let</span> <span class="nx">bytes</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">TextEncoder</span><span class="p">(</span><span class="s2">&quot;utf-8&quot;</span><span class="p">).</span><span class="nx">encode</span><span class="p">(</span><span class="nx">code</span><span class="p">);</span>

  <span class="c1">// Create the base key to derive from.</span>
  <span class="kd">let</span> <span class="nx">importedKey</span> <span class="o">=</span> <span class="nx">crypto</span><span class="p">.</span><span class="nx">subtle</span><span class="p">.</span><span class="nx">importKey</span><span class="p">(</span>
    <span class="s2">&quot;raw&quot;</span><span class="p">,</span> <span class="nx">bytes</span><span class="p">,</span> <span class="s2">&quot;PBKDF2&quot;</span><span class="p">,</span> <span class="kc">false</span><span class="p">,</span> <span class="p">[</span><span class="s2">&quot;deriveBits&quot;</span><span class="p">]);</span>

  <span class="k">return</span> <span class="nx">importedKey</span><span class="p">.</span><span class="nx">then</span><span class="p">(</span><span class="nx">key</span> <span class="o">=&gt;</span> <span class="p">{</span>
    <span class="c1">// Salt should be at least 64 bits.</span>
    <span class="kd">let</span> <span class="nx">salt</span> <span class="o">=</span> <span class="nx">crypto</span><span class="p">.</span><span class="nx">getRandomValues</span><span class="p">(</span><span class="k">new</span> <span class="nx">Uint8Array</span><span class="p">(</span><span class="mi">8</span><span class="p">));</span>

    <span class="c1">// All required PBKDF2 parameters.</span>
    <span class="kd">let</span> <span class="nx">params</span> <span class="o">=</span> <span class="p">{</span><span class="nx">name</span><span class="o">:</span> <span class="s2">&quot;PBKDF2&quot;</span><span class="p">,</span> <span class="nx">hash</span><span class="o">:</span> <span class="s2">&quot;SHA-1&quot;</span><span class="p">,</span> <span class="nx">salt</span><span class="p">,</span> <span class="nx">iterations</span><span class="o">:</span> <span class="mi">5000</span><span class="p">};</span>

    <span class="c1">// Derive 160 bits using PBKDF2.</span>
    <span class="k">return</span> <span class="nx">crypto</span><span class="p">.</span><span class="nx">subtle</span><span class="p">.</span><span class="nx">deriveBits</span><span class="p">(</span><span class="nx">params</span><span class="p">,</span> <span class="nx">key</span><span class="p">,</span> <span class="mi">160</span><span class="p">);</span>
  <span class="p">});</span>
<span class="p">}</span>
</pre></div></figure>


<h2>Choosing PBKDF2 parameters</h2>

<p>As you can see above PBKDF2 takes a whole bunch of parameters. Choosing good
values is crucial for the security of our passcode module so it is best to take
a detailed look at every single one of them.</p>

<h3>Select a cryptographic hash function</h3>

<p>PBKDF2 is a <em>big</em> PRF that iterates a <em>small</em> PRF. The small PRF, iterated
multiple times (more on why this is done later), is fixed to be an
<a href="https://en.wikipedia.org/wiki/HMAC">HMAC</a> construction; you are however
allowed to specify the cryptographic hash function used inside HMAC itself. To
understand why you need to select a hash function it helps to take a look at
HMAC&rsquo;s definition, here with <a href="https://en.wikipedia.org/wiki/SHA-1">SHA-1</a> at
its core:</p>

<figure class='code'><div class="highlight"><pre>HMAC-SHA-1(k, m) = SHA-1((k ⊕ opad) + SHA-1((k ⊕ ipad) + m))
</pre></div></figure>


<p>The outer and inner padding <code>opad</code> and <code>ipad</code> are static values that can be
ignored for our purpose, the important takeaway is that the given hash function
will be called twice, combining the message <code>m</code> and the key <code>k</code>. Whereas HMAC
is usually used for authentication PBKDF2 makes use of its PRF properties, that
means its output is computationally indistinguishable from random.</p>

<p><code>deriveBits()</code> as defined above uses <a href="https://en.wikipedia.org/wiki/SHA-1">SHA-1</a>
as well, and although it is <a href="http://valerieaurora.org/hash.html">considered broken</a>
as a <a href="https://en.wikipedia.org/wiki/Collision_resistance">collision-resistant</a>
hash function it is still a safe building block in the HMAC-SHA-1 construction.
HMAC only relies on a hash function&rsquo;s PRF properties, and while finding SHA-1
collisions is considered feasible it is still believed to be a secure PRF.</p>

<p>That said, it would not hurt to switch to a secure cryptographic hash function
like <a href="https://en.wikipedia.org/wiki/SHA-2">SHA-256</a>. Chrome supports other hash
functions for PBKDF2 today, Firefox unfortunately has to wait for an
<a href="https://bugzil.la/554827">NSS fix</a> before those can be unlocked for the
WebCrypto API.</p>

<h3>Pass a random salt</h3>

<p>The salt is a random component that PBKDF2 feeds into the HMAC function along
with the passcode. This prevents an attacker from simply computing the hashes
of for example all 8-character combinations of alphanumerics (~5.4 PetaByte of
storage for SHA-1) and use a huge
<a href="https://en.wikipedia.org/wiki/Lookup_table">lookup table</a> to quickly reverse
a given password hash. Specify 8 random bytes as the salt and the poor attacker
will have to suddenly compute (and store!) 2<sup>64</sup> of those lookup tables and face
8 additional random characters in the input. Even without the salt the effort
to create even one lookup table would be hard to justify because chances are
high you cannot reuse it to attack another target, they might be using a
different hash function or combine two or more of them.</p>

<p>The same goes for <a href="https://en.wikipedia.org/wiki/Rainbow_table">Rainbow Tables</a>.
A random salt included with the password would have to be incorporated
when precomputing the hash chains and the attacker is back to square one where
she has to compute a Rainbow Table for every possible salt value. That certainly
works ad-hoc for a single salt value but preparing and storing 2<sup>64</sup> of those
tables is impossible.</p>

<p>The salt is public and will be stored in the clear along with the derived bits.
We need the exact same salt to arrive at the exact same derived bits later
again. We thus have to modify <code>deriveBits()</code> to accept the salt as an argument
so that we can either generate a random one or read it from disk.</p>

<figure class='code'><div class="highlight"><pre><span class="kd">function</span> <span class="nx">deriveBits</span><span class="p">(</span><span class="nx">code</span><span class="p">,</span> <span class="nx">salt</span><span class="p">)</span> <span class="p">{</span>
  <span class="c1">// Convert string to a TypedArray.</span>
  <span class="kd">let</span> <span class="nx">bytes</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">TextEncoder</span><span class="p">(</span><span class="s2">&quot;utf-8&quot;</span><span class="p">).</span><span class="nx">encode</span><span class="p">(</span><span class="nx">code</span><span class="p">);</span>

  <span class="c1">// Create the base key to derive from.</span>
  <span class="kd">let</span> <span class="nx">importedKey</span> <span class="o">=</span> <span class="nx">crypto</span><span class="p">.</span><span class="nx">subtle</span><span class="p">.</span><span class="nx">importKey</span><span class="p">(</span>
    <span class="s2">&quot;raw&quot;</span><span class="p">,</span> <span class="nx">bytes</span><span class="p">,</span> <span class="s2">&quot;PBKDF2&quot;</span><span class="p">,</span> <span class="kc">false</span><span class="p">,</span> <span class="p">[</span><span class="s2">&quot;deriveBits&quot;</span><span class="p">]);</span>

  <span class="k">return</span> <span class="nx">importedKey</span><span class="p">.</span><span class="nx">then</span><span class="p">(</span><span class="nx">key</span> <span class="o">=&gt;</span> <span class="p">{</span>
    <span class="c1">// All required PBKDF2 parameters.</span>
    <span class="kd">let</span> <span class="nx">params</span> <span class="o">=</span> <span class="p">{</span><span class="nx">name</span><span class="o">:</span> <span class="s2">&quot;PBKDF2&quot;</span><span class="p">,</span> <span class="nx">hash</span><span class="o">:</span> <span class="s2">&quot;SHA-1&quot;</span><span class="p">,</span> <span class="nx">salt</span><span class="p">,</span> <span class="nx">iterations</span><span class="o">:</span> <span class="mi">5000</span><span class="p">};</span>

    <span class="c1">// Derive 160 bits using PBKDF2.</span>
    <span class="k">return</span> <span class="nx">crypto</span><span class="p">.</span><span class="nx">subtle</span><span class="p">.</span><span class="nx">deriveBits</span><span class="p">(</span><span class="nx">params</span><span class="p">,</span> <span class="nx">key</span><span class="p">,</span> <span class="mi">160</span><span class="p">);</span>
  <span class="p">});</span>
<span class="p">}</span>
</pre></div></figure>


<p>Keep in mind though that Rainbow tables today are mainly a thing from the past
where password hashes were smaller and <a href="http://en.wikipedia.org/wiki/LM_hash">shittier</a>.
Salts are the bare minimum a good password storage scheme needs, but they
merely protect against a threat that is largely irrelevant today.</p>

<h3>Specify a number of iterations</h3>

<p>As computers became faster and Rainbow Table attacks infeasible due to the
prevalent use of salts everywhere, people started attacking password hashes
with dictionaries, simply by taking the public salt value and passing that
combined with their educated guess to the hash function until a match was found.
Modern password schemes thus employ a &ldquo;work factor&rdquo; to make hashing millions of
password guesses unbearably slow.</p>

<p>By specifying a <em>sufficiently high</em> number of iterations we can slow down
PBKDF2&rsquo;s inner computation so that an attacker will have to face a massive
performance decrease and be able to only try a few thousand passwords per
second instead of millions.</p>

<p>For a single-user disk or file encryption it might be acceptable if computing
the password hash takes a few seconds; for a lock screen 300-500ms might be
the upper limit to not interfere with user experience. Take a look at
<a href="http://security.stackexchange.com/questions/3959/recommended-of-iterations-when-using-pkbdf2-sha256/3993#3993">this great StackExchange post</a>
for more advice on what might be the right number of iterations for your
application and environment.</p>

<p>A much more secure version of a lock screen would allow to not only use four
digits but any number of characters. An additional delay of a few seconds
after a small number of wrong guesses might increase security even more,
assuming the attacker cannot access the PRF output stored on disk.</p>

<h3>Determine the number of bits to derive</h3>

<p>PBKDF2 can output an almost arbitrary amount of pseudo-random data. A single
execution yields the number of bits that is equal to the chosen hash function&rsquo;s
output size. If the desired number of bits exceeds the hash function&rsquo;s output
size PBKDF2 will be repeatedly executed until enough bits have been derived.</p>

<figure class='code'><div class="highlight"><pre><span class="kd">function</span> <span class="nx">getHashOutputLength</span><span class="p">(</span><span class="nx">hash</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">switch</span> <span class="p">(</span><span class="nx">hash</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">case</span> <span class="s2">&quot;SHA-1&quot;</span><span class="o">:</span>   <span class="k">return</span> <span class="mi">160</span><span class="p">;</span>
    <span class="k">case</span> <span class="s2">&quot;SHA-256&quot;</span><span class="o">:</span> <span class="k">return</span> <span class="mi">256</span><span class="p">;</span>
    <span class="k">case</span> <span class="s2">&quot;SHA-384&quot;</span><span class="o">:</span> <span class="k">return</span> <span class="mi">384</span><span class="p">;</span>
    <span class="k">case</span> <span class="s2">&quot;SHA-512&quot;</span><span class="o">:</span> <span class="k">return</span> <span class="mi">512</span><span class="p">;</span>
  <span class="p">}</span>

  <span class="k">throw</span> <span class="k">new</span> <span class="nb">Error</span><span class="p">(</span><span class="s2">&quot;Unsupported hash function&quot;</span><span class="p">);</span>
<span class="p">}</span>
</pre></div></figure>


<p>Choose 160 bits for SHA-1, 256 bits for SHA-256, and so on. Slowing down the
key derivation even further by requiring more than one round of PBKDF2 will not
increase the security of the password storage.</p>

<h2>Do not hard-code parameters</h2>

<p>Hard-coding PBKDF2 parameters - the name of the hash function to use in the
HMAC construction, and the number of HMAC iterations - is tempting at first.
We however need to be flexible if for example it turns out that SHA-1 can no
longer be considered a secure PRF, or you need to increase the number of
iterations to keep up with faster hardware.</p>

<p>To ensure future code can verify old passwords we store the parameters that
were passed to PBKDF2 at the time, including the salt. When verifying the
passcode we will read the hash function name, the number of iterations, and the
salt from disk and pass those to <code>deriveBits()</code> along with the passcode itself.
The number of bits to derive will be the hash function&rsquo;s output size.</p>

<figure class='code'><div class="highlight"><pre><span class="kd">function</span> <span class="nx">deriveBits</span><span class="p">(</span><span class="nx">code</span><span class="p">,</span> <span class="nx">salt</span><span class="p">,</span> <span class="nx">hash</span><span class="p">,</span> <span class="nx">iterations</span><span class="p">)</span> <span class="p">{</span>
  <span class="c1">// Convert string to a TypedArray.</span>
  <span class="kd">let</span> <span class="nx">bytes</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">TextEncoder</span><span class="p">(</span><span class="s2">&quot;utf-8&quot;</span><span class="p">).</span><span class="nx">encode</span><span class="p">(</span><span class="nx">code</span><span class="p">);</span>

  <span class="c1">// Create the base key to derive from.</span>
  <span class="kd">let</span> <span class="nx">importedKey</span> <span class="o">=</span> <span class="nx">crypto</span><span class="p">.</span><span class="nx">subtle</span><span class="p">.</span><span class="nx">importKey</span><span class="p">(</span>
    <span class="s2">&quot;raw&quot;</span><span class="p">,</span> <span class="nx">bytes</span><span class="p">,</span> <span class="s2">&quot;PBKDF2&quot;</span><span class="p">,</span> <span class="kc">false</span><span class="p">,</span> <span class="p">[</span><span class="s2">&quot;deriveBits&quot;</span><span class="p">]);</span>

  <span class="k">return</span> <span class="nx">importedKey</span><span class="p">.</span><span class="nx">then</span><span class="p">(</span><span class="nx">key</span> <span class="o">=&gt;</span> <span class="p">{</span>
    <span class="c1">// Output length in bits for the given hash function.</span>
    <span class="kd">let</span> <span class="nx">hlen</span> <span class="o">=</span> <span class="nx">getHashOutputLength</span><span class="p">(</span><span class="nx">hash</span><span class="p">);</span>

    <span class="c1">// All required PBKDF2 parameters.</span>
    <span class="kd">let</span> <span class="nx">params</span> <span class="o">=</span> <span class="p">{</span><span class="nx">name</span><span class="o">:</span> <span class="s2">&quot;PBKDF2&quot;</span><span class="p">,</span> <span class="nx">hash</span><span class="p">,</span> <span class="nx">salt</span><span class="p">,</span> <span class="nx">iterations</span><span class="p">};</span>

    <span class="c1">// Derive |hlen| bits using PBKDF2.</span>
    <span class="k">return</span> <span class="nx">crypto</span><span class="p">.</span><span class="nx">subtle</span><span class="p">.</span><span class="nx">deriveBits</span><span class="p">(</span><span class="nx">params</span><span class="p">,</span> <span class="nx">key</span><span class="p">,</span> <span class="nx">hlen</span><span class="p">);</span>
  <span class="p">});</span>
<span class="p">}</span>
</pre></div></figure>


<h2>Storing a new passcode</h2>

<p>Now that we are done implementing <code>deriveBits()</code>, the heart of the Passcode
module, completing the API is basically a walk in the park. For the sake of
simplicity we will use <a href="https://mozilla.github.io/localForage/">localforage</a>
as the storage backend. It provides a simple, asynchronous, and Promise-based
key-value store.</p>

<figure class='code'><div class="highlight"><pre><span class="c1">// &lt;script src=&quot;localforage.min.js&quot;/&gt;</span>

<span class="kr">const</span> <span class="nx">HASH</span> <span class="o">=</span> <span class="s2">&quot;SHA-1&quot;</span><span class="p">;</span>
<span class="kr">const</span> <span class="nx">ITERATIONS</span> <span class="o">=</span> <span class="mi">4096</span><span class="p">;</span>

<span class="nx">Passcode</span><span class="p">.</span><span class="nx">store</span> <span class="o">=</span> <span class="kd">function</span> <span class="p">(</span><span class="nx">code</span><span class="p">)</span> <span class="p">{</span>
  <span class="c1">// Generate a new random salt for every new passcode.</span>
  <span class="kd">let</span> <span class="nx">salt</span> <span class="o">=</span> <span class="nx">crypto</span><span class="p">.</span><span class="nx">getRandomValues</span><span class="p">(</span><span class="k">new</span> <span class="nx">Uint8Array</span><span class="p">(</span><span class="mi">8</span><span class="p">));</span>

  <span class="k">return</span> <span class="nx">deriveBits</span><span class="p">(</span><span class="nx">code</span><span class="p">,</span> <span class="nx">salt</span><span class="p">,</span> <span class="nx">HASH</span><span class="p">,</span> <span class="nx">ITERATIONS</span><span class="p">).</span><span class="nx">then</span><span class="p">(</span><span class="nx">bits</span> <span class="o">=&gt;</span> <span class="p">{</span>
    <span class="k">return</span> <span class="nx">Promise</span><span class="p">.</span><span class="nx">all</span><span class="p">([</span>
      <span class="nx">localforage</span><span class="p">.</span><span class="nx">setItem</span><span class="p">(</span><span class="s2">&quot;digest&quot;</span><span class="p">,</span> <span class="nx">bits</span><span class="p">),</span>
      <span class="nx">localforage</span><span class="p">.</span><span class="nx">setItem</span><span class="p">(</span><span class="s2">&quot;salt&quot;</span><span class="p">,</span> <span class="nx">salt</span><span class="p">),</span>
      <span class="nx">localforage</span><span class="p">.</span><span class="nx">setItem</span><span class="p">(</span><span class="s2">&quot;hash&quot;</span><span class="p">,</span> <span class="nx">HASH</span><span class="p">),</span>
      <span class="nx">localforage</span><span class="p">.</span><span class="nx">setItem</span><span class="p">(</span><span class="s2">&quot;iterations&quot;</span><span class="p">,</span> <span class="nx">ITERATIONS</span><span class="p">)</span>
    <span class="p">]);</span>
  <span class="p">});</span>
<span class="p">};</span>
</pre></div></figure>


<p>We generate a new random salt for every new passcode. The derived bits are
stored along with the salt, the hash function name, and the number of
iterations. <code>HASH</code> and <code>ITERATIONS</code> are constants that provide default values
for our PBKDF2 parameters and can be updated whenever desired. The Promise
returned by <code>Passcode.store()</code> will resolve when all values have been
successfully stored in the backend.</p>

<h2>Verifying a given passcode</h2>

<p>To verify a passcode all values and parameters stored by <code>Passcode.store()</code>
will have to be read from disk and passed to <code>deriveBits()</code>. Comparing the
derived bits with the value stored on disk tells whether the passcode is valid.</p>

<figure class='code'><div class="highlight"><pre><span class="nx">Passcode</span><span class="p">.</span><span class="nx">verify</span> <span class="o">=</span> <span class="kd">function</span> <span class="p">(</span><span class="nx">code</span><span class="p">)</span> <span class="p">{</span>
  <span class="kd">let</span> <span class="nx">loadValues</span> <span class="o">=</span> <span class="nx">Promise</span><span class="p">.</span><span class="nx">all</span><span class="p">([</span>
    <span class="nx">localforage</span><span class="p">.</span><span class="nx">getItem</span><span class="p">(</span><span class="s2">&quot;digest&quot;</span><span class="p">),</span>
    <span class="nx">localforage</span><span class="p">.</span><span class="nx">getItem</span><span class="p">(</span><span class="s2">&quot;salt&quot;</span><span class="p">),</span>
    <span class="nx">localforage</span><span class="p">.</span><span class="nx">getItem</span><span class="p">(</span><span class="s2">&quot;hash&quot;</span><span class="p">),</span>
    <span class="nx">localforage</span><span class="p">.</span><span class="nx">getItem</span><span class="p">(</span><span class="s2">&quot;iterations&quot;</span><span class="p">)</span>
  <span class="p">]);</span>

  <span class="k">return</span> <span class="nx">loadValues</span><span class="p">.</span><span class="nx">then</span><span class="p">(([</span><span class="nx">digest</span><span class="p">,</span> <span class="nx">salt</span><span class="p">,</span> <span class="nx">hash</span><span class="p">,</span> <span class="nx">iterations</span><span class="p">])</span> <span class="o">=&gt;</span> <span class="p">{</span>
    <span class="k">return</span> <span class="nx">deriveBits</span><span class="p">(</span><span class="nx">code</span><span class="p">,</span> <span class="nx">salt</span><span class="p">,</span> <span class="nx">hash</span><span class="p">,</span> <span class="nx">iterations</span><span class="p">).</span><span class="nx">then</span><span class="p">(</span><span class="nx">bits</span> <span class="o">=&gt;</span> <span class="p">{</span>
      <span class="k">return</span> <span class="nx">compare</span><span class="p">(</span><span class="nx">bits</span><span class="p">,</span> <span class="nx">digest</span><span class="p">);</span>
    <span class="p">});</span>
  <span class="p">});</span>
<span class="p">};</span>
</pre></div></figure>


<h3>Should compare() be a constant-time operation?</h3>

<p><code>compare()</code> does not <em>have</em> to be constant-time. Even if the attacker learns
the first byte of the final digest stored on disk she cannot easily produce
inputs to guess the second byte - the opposite would imply knowing the
pre-images of all those two-byte values. She cannot do better than submitting
simple guesses that become harder the more bytes are known. For a successful
attack all bytes have to be recovered, which in turns means a valid pre-image
for the full final digest needs to be found.</p>

<p>If it makes you feel any better, you can of course implement <code>compare()</code> as a
constant-time operation. This might be tricky though given that all modern
JavaScript engines optimize code heavily.</p>

<h2>What about bcrypt or scrypt?</h2>

<p>Both <a href="https://en.wikipedia.org/wiki/Bcrypt">bcrypt</a> and
<a href="https://en.wikipedia.org/wiki/Scrypt">scrypt</a> are probably better alternatives
to PBKDF2. Bcrypt automatically embeds the salt and cost factor into its output,
most APIs are clever enough to parse and use those parameters when verifying a
given password.</p>

<p>Scrypt implementations can usually securely generate a random salt, that is one
less thing for you to care about. The most important aspect of scrypt though is
that it allows consuming a lot of memory when computing the password hash which
makes cracking passwords using ASICs or FPGAs close to impossible.</p>

<p>The Web Cryptography API does unfortunately support neither of the two
algorithms and currently there are no proposals to add those. In the case of
scrypt it might also be somewhat controversial to allow a website to consume
arbitrary amounts of memory.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Botching Forward Secrecy]]></title>
    <link href="https://timtaubert.de/blog/2014/11/the-sad-state-of-server-side-tls-session-resumption-implementations/"/>
    <updated>2014-11-17T18:00:00+01:00</updated>
    <id>https://timtaubert.de/blog/2014/11/the-sad-state-of-server-side-tls-session-resumption-implementations</id>
    <content type="html"><![CDATA[<blockquote><p><em>After you finished reading this one, please also read the
<a href="https://timtaubert.de/blog/2017/02/the-future-of-session-resumption/">follow-up post</a>
that covers session resumption changes in TLS 1.3.</em></p></blockquote>

<p>The probably oldest complaint about TLS is that its handshake is slow and
together with the transport encryption has a lot of CPU overhead. This
certainly <a href="https://istlsfastyet.com/">is not true anymore</a> if configured
correctly.</p>

<p>One of the most important features to improve user experience for visitors
accessing your site via TLS is session resumption.
<a href="http://vincent.bernat.im/en/blog/2011-ssl-session-reuse-rfc5077.html">Session resumption</a>
is the general idea of avoiding a full TLS handshake by storing the secret
information of previous sessions and reusing those when connecting to a host
the next time. This drastically reduces latency and CPU usage.</p>

<p>Enabling session resumption in web servers and proxies can however easily
<a href="https://media.blackhat.com/us-13/US-13-Daigniere-TLS-Secrets-WP.pdf">compromise forward secrecy</a>.
To find out why having a de-factor standard TLS library (i.e. OpenSSL) can be a
bad thing and how to avoid
<a href="https://www.imperialviolet.org/2013/06/27/botchingpfs.html">botching PFS</a>
let us take a closer look at forward secrecy, and the current state of
server-side implementation of session resumption features.</p>

<h2>What is (Perfect) Forward Secrecy?</h2>

<p><a href="https://en.wikipedia.org/wiki/Perfect_forward_secrecy">(Perfect) Forward Secrecy</a>
is an important part of modern TLS setups. The core of it is to use ephemeral
(short-lived) keys for key exchange so that an attacker gaining access to a
server cannot use any of the keys found there to decrypt past TLS sessions they
may have recorded previously.</p>

<p>We must not use a server&rsquo;s RSA key pair, whose public key is contained in the
certificate, for key exchanges if we want PFS. This key pair is long-lived and
will most likely outlive certificate expiration dates as you would just use the
same key pair to generate a new certificate after the current expired. In case
the server is compromised it would be far too easy to determine the location of
the private key on disk or in memory and use it to decrypt recorded TLS
sessions from the past.</p>

<p>Using <a href="https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange">Diffie-Hellman</a>
key exchanges where key generation is <em>a lot</em> cheaper we can use a key pair
exactly once and discard it afterwards. An attacker with access to the server
can still compromise the authentication part as shown above and MITM
everything from here on using the certificate&rsquo;s private key, but past TLS
sessions stay protected.</p>

<h2>How can Session Resumption botch PFS?</h2>

<p>TLS provides two session resumption features: Session IDs and Session Tickets.
To better understand how those can be attacked it is worth looking at them in
more detail.</p>

<h3>Session IDs</h3>

<p>In a full handshake the server sends a <em>Session ID</em> as part of the &ldquo;hello&rdquo;
message. On a subsequent connection the client can use this session ID and
pass it to the server when connecting. Because both server and client have
saved the last session&rsquo;s &ldquo;secret state&rdquo; under the session ID they can simply
resume the TLS session where they left off.</p>

<p>To support session resumption via session IDs the server must maintain a cache
that maps past session IDs to those sessions&#8217; secret states. The cache itself
is the main weak spot, stealing the cache contents allows to decrypt all
sessions whose session IDs are contained in it.</p>

<p>The forward secrecy of a connection is thus bounded by how long the session
information is retained on the server. Ideally, your server would use a
medium-sized cache that is purged daily. Purging your cache might however not
help if the cache itself lives on a persistent storage as it might be feasible
to restore deleted data from it. An in-memory storage should be more resistant
to these kind of attacks if it turns over about once a day and ensures old data
is overridden properly.</p>

<h3>Session Tickets</h3>

<p>The second mechanism to resume a TLS session are
<a href="http://tools.ietf.org/html/rfc5077">Session Tickets</a>. This extension transmits
the server&rsquo;s secret state to the client, encrypted with a key only known to the
server. That ticket key is protecting the TLS connection now and in the future
and is the weak spot an attacker will target.</p>

<p>The client will store its secret information for a TLS session along with the
ticket received from the server. By transmitting that ticket back to the server
at the beginning of the next TLS connection both parties can resume their
previous session, given that the server can still access the secret key that
was used to encrypt.</p>

<p>We ideally want the same secrecy bounds for Session Tickets as for Session IDs.
To achieve this we need to ensure that the key used to encrypt tickets is
rotated about daily. It should just as the session cache not live on a
persistent storage to not leave any trace.</p>

<h2>Apache configuration</h2>

<p>Now that we determined how we ideally want session resumption features to be
configured we should take a look at a popular web servers and load balancers to
see whether that is supported, starting with Apache.</p>

<h3>Configuring the Session Cache</h3>

<p>The Apache HTTP Server offers the
<a href="http://httpd.apache.org/docs/trunk/mod/mod_ssl.html#sslsessioncache">SSLSessionCache directive</a>
to configure the cache that contains the session IDs of previous TLS sessions
along with their secret state. You should use <code>shmcb</code> as the storage type, that is
a high-performance cyclic buffer inside a shared memory segment in RAM. It will
be shared between all threads or processes and allow session resumption no
matter which of those handles the visitor&rsquo;s request.</p>

<figure class='code'><div class="highlight"><pre>SSLSessionCache shmcb:/path/to/ssl_gcache_data(512000)
</pre></div></figure>


<p>The example shown above establishes an in-memory cache via the path
<code>/path/to/ssl_gcache_data</code> with a size of 512 KiB. Depending on
the amount of daily visitors the cache size might be too small (i.e. have a
high turnover rate) or too big (i.e. have a low turnover rate).</p>

<p>We ideally want a cache that turns over daily and there is no really good way
to determine the right session cache size. What we really need is a way to tell
Apache the maximum time an entry is allowed to stay in the cache before it gets
overridden. This must happen regardless of whether the cyclic buffer has
actually cycled around yet and must be a periodic background job to ensure the
cache is purged even when there have not been any requests in a while.</p>

<blockquote><p>You might wonder whether the <code>SSLSessionCacheTimeout</code> directive can be of any
help here - unfortunately no. The timeout is only checked when a session ID
is given at the start of a TLS connection. It does not cause entries to be
purged from the session cache.</p></blockquote>

<h3>Configuring Session Tickets</h3>

<p>While Apache offers the
<a href="http://httpd.apache.org/docs/trunk/mod/mod_ssl.html#sslsessionticketkeyfile">SSLSessionTicketKeyFile directive</a>
to specify a key file that should contain 48 random bytes, it is recommended to
not specify one at all. Apache will simply generate a random key on startup and
use that to encrypt session tickets for as long as it is running.</p>

<p>The good thing about this is that the session ticket key will not touch
persistent storage, the bad thing is that it will never be rotated. Generated
once on startup it is only discarded when Apache restarts. For most of the
servers out there that means they use the same key for months, if not years.</p>

<p>To provide forward secrecy we need to rotate the session ticket key about daily
and current Apache versions provide no way of doing that. The only way to
achieve that might be use a cron job to
<a href="http://mail-archives.apache.org/mod_mbox/httpd-dev/201309.mbox/%3C522339E0.2040005@opensslfoundation.com%3E">gracefully restart Apache daily</a>
to ensure a new key is generated. That does not sound like a real solution
though and nothing ensures the old key is properly overridden.</p>

<p>Changing the key file while Apache is running does not do it either, you would
still need to gracefully restart the service to apply the new key. An do not
forget that if you use a key file it should be stored on a temporary file
system like <code>tmpfs</code>.</p>

<h3>Disabling Session Tickets</h3>

<p>Although disabling session tickets will undoubtedly have a negative performance
impact, for the moment being you will need to do that in order to provide
forward secrecy:</p>

<figure class='code'><div class="highlight"><pre>SSLOpenSSLConfCmd Options -SessionTicket
</pre></div></figure>


<blockquote><p><a href="https://www.reddit.com/r/netsec/comments/2mkupe/the_sad_state_of_serverside_tls_session/">Ivan Ristic adds</a>
that to disable session tickets for Apache using <code>SSLOpenSSLConfCmd</code>, you have
to be running OpenSSL 1.0.2 which has not been released yet. If you want to
disable session tickets with earlier OpenSSL versions, Ivan
<a href="https://github.com/ivanr/bulletproof-tls/tree/master/apache">has a few patches</a>
for the Apache 2.2.x and Apache 2.4.x branches.</p></blockquote>

<p>To securely support session resumption via tickets Apache should provide a
configuration directive to specify the maximum lifetime for session ticket
keys, at least if auto-generated on startup. That would allow us to simply
generate a new random key and override the old one daily.</p>

<h2>Nginx configuration</h2>

<p>Another very popular web server is Nginx. Let us see how that compares to
Apache when it comes to setting up session resumption.</p>

<h3>Configuring the Session Cache</h3>

<p>Nginx offers the <a href="http://nginx.org/en/docs/http/ngx_http_ssl_module.html#ssl_session_cache">ssl_session_cache directive</a>
to configure the TLS session cache. The type of the cache should be <code>shared</code> to
share it between multiple workers:</p>

<figure class='code'><div class="highlight"><pre>ssl_session_cache shared:SSL:10m;
</pre></div></figure>


<p>The above line establishes an in-memory cache with a size of 10 MB. We again
have no real idea whether 10 MB is the right size for the cache to turn over
daily. Just as Apache, Nginx should provide a configuration directive to allow
cache entries to be purged automatically after a certain time. Any entries not
purged properly could simply be read from memory by an attacker with full
access to the server.</p>

<blockquote><p>You guessed right, the <code>ssl_session_timeout</code> directive again only applies
when trying to resume a session at the beginning of a connection. Stale
entries will not be removed automatically after they time out.</p></blockquote>

<h3>Configuring Session Tickets</h3>

<p>Nginx allows to specify a session ticket file using the
<a href="http://nginx.org/en/docs/http/ngx_http_ssl_module.html#ssl_session_ticket_key">ssl_session_ticket_key directive</a>,
and again you are probably better off by not specifying one and having the
service generate a random key on startup. The session ticket key will never be
rotated and might be used to encrypt session tickets for months, if not years.</p>

<p>Nginx, too, provides no way to automatically rotate keys. Reloading its
configuration daily using a cron job <a href="http://forum.nginx.org/read.php?2,229538,230872#msg-230872">might work</a>
but does not come close to a real solution either.</p>

<h3>Disabling Session Tickets</h3>

<p>The best you can do to provide forward secrecy to visitors is thus again switch
off session ticket support until a proper solution is available.</p>

<figure class='code'><div class="highlight"><pre>ssl_session_tickets off;
</pre></div></figure>


<h2>HAproxy configuration</h2>

<p>HAproxy, a popular load balancer, suffers from basically the same problems as
Apache and Nginx. All of them rely on OpenSSL&rsquo;s TLS implementation.</p>

<h3>Configuring the Session Cache</h3>

<p>The size of the session cache can be set using the
<a href="http://cbonte.github.io/haproxy-dconv/configuration-1.5.html#3.2-tune.ssl.cachesize">tune.ssl.cachesize directive</a>
that accepts a number of &ldquo;blocks&rdquo;. The HAproxy documentation tries to be helpful
and explain how many blocks would be needed per stored session but we again
cannot ensure an at least daily turnover. We would need a directive to
automatically purge entries just as for Apache and Nginx.</p>

<blockquote><p>And yes, the <code>tune.ssl.lifetime</code> directive does not affect how long entries
are persisted in the cache.</p></blockquote>

<h3>Configuring Session Tickets</h3>

<p>HAproxy does not allow configuring session ticket parameters. It implicitly
supports this feature because OpenSSL enables it by default. HAproxy will thus
always generate a session ticket key on startup and use it to encrypt tickets
for the whole lifetime of the process.</p>

<p>A graceful daily restart of HAproxy <em>might</em> be the only way to trigger key
rotation. This is a <em>pure assumption</em> though, please do your own testing before
using that in production.</p>

<h3>Disabling Session Tickets</h3>

<p>You can disable session ticket support in HAproxy using the
<a href="http://cbonte.github.io/haproxy-dconv/configuration-1.5.html#no-tls-tickets">no-tls-tickets directive</a>:</p>

<figure class='code'><div class="highlight"><pre>ssl-default-bind-options no-sslv3 no-tls-tickets
</pre></div></figure>


<blockquote><p>A previous version of the post said it would be impossible to deactivate
session tickets. Thanks to the HAproxy team for correcting me!</p></blockquote>

<h2>Session Resumption with multiple servers</h2>

<p>If you have multiple web servers that act as front-ends for a fleet of back-end
servers you will unfortunately not get away with not specifying a session ticket
key file and a dirty hack that reloads the service configuration at midnight.</p>

<p>Sharing a session cache between multiple machines using memcached is possible
but using session tickets you &ldquo;only&rdquo; have to share one or more session ticket
keys, not the whole cache. Clients would take care of storing and discarding
tickets for you.</p>

<p><a href="https://blog.twitter.com/2013/forward-secrecy-at-twitter">Twitter wrote a great post</a>
about how they manage multiple web front-ends and distribute session ticket
keys securely to each of their machines. I suggest reading that if you are
planning to have a similar setup and support session tickets to improve
response times.</p>

<p>Keep in mind though that Twitter had to write their own web server to handle
forward secrecy in combination with session tickets properly and this might not
be something you want to do yourselves.</p>

<p>It would be great if either OpenSSL or all of the popular web servers and load
balancers would start working towards helping to provide forward secrecy by
default and server admins could get rid of custom front-ends or dirty hacks
to rotate keys.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Generating .onion Names for Tor Hidden Services]]></title>
    <link href="https://timtaubert.de/blog/2014/11/using-the-webcrypto-api-to-generate-onion-names-for-tor-hidden-services/"/>
    <updated>2014-11-02T16:00:00+01:00</updated>
    <id>https://timtaubert.de/blog/2014/11/using-the-webcrypto-api-to-generate-onion-names-for-tor-hidden-services</id>
    <content type="html"><![CDATA[<p>You have probably read that
<a href="https://www.facebook.com/notes/protect-the-graph/making-connections-to-facebook-more-secure/1526085754298237">Facebook unveiled its hidden service</a>
that lets users access their website more safely via Tor. While there are lots
of opinions about whether this is good or bad I think that
the Tor project described best <a href="https://blog.torproject.org/blog/facebook-hidden-services-and-https-certs">why that is not as crazy as it seems</a>.</p>

<p>The most interesting part to me however is that
<a href="https://lists.torproject.org/pipermail/tor-talk/2014-October/035412.html">Facebook brute-forced a custom hidden service address</a>
as it never occurred to me that this is something you might want to do. Again
ignoring the pros and cons of doing that, investigating the <em>how</em> seems like a
fun exercise to get more familiar with the
<a href="http://dvcs.w3.org/hg/webcrypto-api/raw-file/tip/spec/Overview.html">WebCrypto API</a>
if that is still unknown territory to you.</p>

<h2>How are .onion names created?</h2>

<p><a href="https://trac.torproject.org/projects/tor/wiki/doc/HiddenServiceNames">Names for Tor hidden services</a>
are meant to be self-authenticating. When creating a hidden service Tor
generates a new 1024 bit <a href="https://en.wikipedia.org/wiki/RSA_%28cryptosystem%29">RSA</a>
key pair and then computes the <a href="https://en.wikipedia.org/wiki/SHA-1">SHA-1</a>
digest of the public key. The .onion name will be the
<a href="http://en.wikipedia.org/wiki/Base32">Base32</a>-encoded first half of that digest.</p>

<p>By using a hash of the public key as the URL to contact a hidden service you
can easily authenticate it and bypass the existing CA structure. This 80 bit
URL is sufficient to prevent collisions, even with
a <a href="http://en.wikipedia.org/wiki/Birthday_attack">birthday attack</a> (and thus an
entropy of 40 bit) you can only find a <em>random</em> collision but not the key pair
matching a specific .onion name.</p>

<h2>Creating custom .onion names</h2>

<p>So how did Facebook manage to come up with a public key resulting in
<code>facebookcorewwwi.onion</code>? The answer is that they were incredibly lucky.</p>

<p>You can brute-force .onion names matching a specific pattern using tools like
<a href="https://github.com/katmagic/Shallot">Shallot</a> or
<a href="https://github.com/lachesis/scallion">Scallion</a>. Those will generate key pairs
until they find one resulting in a matching URL. That is usably fast for 1-5
characters. Finding a 6-character pattern takes on average 30 minutes and for
just 7 characters you might need to let it run for a full day.</p>

<p>Coming up with an .onion name <em>starting with</em> an 8-character pattern like
<code>facebook</code> would thus take even longer or need a lot more resources. As a
<a href="https://lists.torproject.org/pipermail/tor-talk/2014-October/035413.html">Facebook engineer confirmed</a>
they indeed got extremely lucky: they generated a few keys matching the pattern,
picked the best and then just needed to come up with an explanation for the
<code>corewwwi</code> part to let users memorize it better.</p>

<p>Without taking a closer look at &ldquo;Shallot&rdquo; or &ldquo;Scallion&rdquo; let us go with a naive
approach. We do not <em>need</em> to create another tool to find .onion names in the
browser (the existing ones work great) but it is a good opportunity to again
show what you can do with the WebCrypto API in the browser.</p>

<h2>Generating a random .onion name</h2>

<p>To generate a random name for a Tor hidden service we first need to generate
a new 1024 bit RSA key just as Tor would do:</p>

<figure class='code'><div class="highlight"><pre><span class="kd">function</span> <span class="nx">generateRSAKey</span><span class="p">()</span> <span class="p">{</span>
  <span class="kd">var</span> <span class="nx">alg</span> <span class="o">=</span> <span class="p">{</span>
    <span class="c1">// This could be any supported RSA* algorithm.</span>
    <span class="nx">name</span><span class="o">:</span> <span class="s2">&quot;RSASSA-PKCS1-v1_5&quot;</span><span class="p">,</span>
    <span class="c1">// We won&#39;t actually use the hash function.</span>
    <span class="nx">hash</span><span class="o">:</span> <span class="p">{</span><span class="nx">name</span><span class="o">:</span> <span class="s2">&quot;SHA-1&quot;</span><span class="p">},</span>
    <span class="c1">// Tor hidden services use 1024 bit keys.</span>
    <span class="nx">modulusLength</span><span class="o">:</span> <span class="mi">1024</span><span class="p">,</span>
    <span class="c1">// We will use a fixed public exponent for now.</span>
    <span class="nx">publicExponent</span><span class="o">:</span> <span class="k">new</span> <span class="nx">Uint8Array</span><span class="p">([</span><span class="mh">0x03</span><span class="p">])</span>
  <span class="p">};</span>

  <span class="k">return</span> <span class="nx">crypto</span><span class="p">.</span><span class="nx">subtle</span><span class="p">.</span><span class="nx">generateKey</span><span class="p">(</span><span class="nx">alg</span><span class="p">,</span> <span class="kc">true</span><span class="p">,</span> <span class="p">[</span><span class="s2">&quot;sign&quot;</span><span class="p">,</span> <span class="s2">&quot;verify&quot;</span><span class="p">]);</span>
<span class="p">}</span>
</pre></div></figure>


<p><em>generateKey()</em> returns a Promise that resolves to the new key pair. The second
argument specifies that we want the key to be exportable as we need to do that
in order to check for pattern matches. We will not actually use the key to
<em>sign</em> or <em>verify</em> data but we need specify valid usages for the public and
private keys.</p>

<p>To check whether a generated public key matches a specific pattern we of course
have to compute the hash for the .onion URL:</p>

<figure class='code'><div class="highlight"><pre><span class="kd">function</span> <span class="nx">computeOnionHash</span><span class="p">(</span><span class="nx">publicKey</span><span class="p">)</span> <span class="p">{</span>
  <span class="c1">// Export the DER encoding of the SubjectPublicKeyInfo structure.</span>
  <span class="kd">var</span> <span class="nx">promise</span> <span class="o">=</span> <span class="nx">crypto</span><span class="p">.</span><span class="nx">subtle</span><span class="p">.</span><span class="nx">exportKey</span><span class="p">(</span><span class="s2">&quot;spki&quot;</span><span class="p">,</span> <span class="nx">publicKey</span><span class="p">);</span>

  <span class="nx">promise</span> <span class="o">=</span> <span class="nx">promise</span><span class="p">.</span><span class="nx">then</span><span class="p">(</span><span class="kd">function</span> <span class="p">(</span><span class="nx">spki</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Compute the SHA-1 digest of the SPKI.</span>
    <span class="c1">// Skip 22 bytes (the SPKI header) that are ignored by Tor.</span>
    <span class="k">return</span> <span class="nx">crypto</span><span class="p">.</span><span class="nx">subtle</span><span class="p">.</span><span class="nx">digest</span><span class="p">({</span><span class="nx">name</span><span class="o">:</span> <span class="s2">&quot;SHA-1&quot;</span><span class="p">},</span> <span class="nx">spki</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mi">22</span><span class="p">));</span>
  <span class="p">});</span>

  <span class="k">return</span> <span class="nx">promise</span><span class="p">.</span><span class="nx">then</span><span class="p">(</span><span class="kd">function</span> <span class="p">(</span><span class="nx">digest</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Base32-encode the first half of the digest.</span>
    <span class="k">return</span> <span class="nx">base32</span><span class="p">(</span><span class="nx">digest</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">10</span><span class="p">));</span>
  <span class="p">});</span>
<span class="p">}</span>
</pre></div></figure>


<p>We first use <em>exportKey()</em> to get an <a href="https://tools.ietf.org/html/rfc5280">SPKI</a>
representation of the public key, use <em>digest()</em> to compute the SHA-1 digest
of that, and finally pass it to <em>base32()</em> to Base32-encode the first half of
that digest.</p>

<blockquote><p>Note: <em>base32()</em> is an <a href="https://tools.ietf.org/html/rfc3548">RFC 3548</a>
compliant Base32 implementation. <a href="https://github.com/chrisumbel/thirty-two">chrisumbel/thirty-two</a>
is a good one that unfortunately does not support ArrayBuffers, I will use a
slightly adapted version of it in the example code.</p></blockquote>

<h2>Finding a specific .onion name</h2>

<p>The only thing missing now is a function that checks for pattern matches and
loops until we found one:</p>

<figure class='code'><div class="highlight"><pre><span class="kd">function</span> <span class="nx">findOnionName</span><span class="p">(</span><span class="nx">pattern</span><span class="p">)</span> <span class="p">{</span>
  <span class="kd">var</span> <span class="nx">key</span><span class="p">;</span>

  <span class="c1">// Start by generating a random key pair.</span>
  <span class="kd">var</span> <span class="nx">promise</span> <span class="o">=</span> <span class="nx">generateRSAKey</span><span class="p">().</span><span class="nx">then</span><span class="p">(</span><span class="kd">function</span> <span class="p">(</span><span class="nx">pair</span><span class="p">)</span> <span class="p">{</span>
    <span class="nx">key</span> <span class="o">=</span> <span class="nx">pair</span><span class="p">.</span><span class="nx">privateKey</span><span class="p">;</span>

    <span class="c1">// Generate the .onion hash of the public key.</span>
    <span class="k">return</span> <span class="nx">computeOnionHash</span><span class="p">(</span><span class="nx">pair</span><span class="p">.</span><span class="nx">publicKey</span><span class="p">);</span>
  <span class="p">});</span>

  <span class="k">return</span> <span class="nx">promise</span><span class="p">.</span><span class="nx">then</span><span class="p">(</span><span class="kd">function</span> <span class="p">(</span><span class="nx">hash</span><span class="p">)</span> <span class="p">{</span>
    <span class="c1">// Try again if the pattern doesn&#39;t match.</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="nx">pattern</span><span class="p">.</span><span class="nx">test</span><span class="p">(</span><span class="nx">hash</span><span class="p">))</span> <span class="p">{</span>
      <span class="k">return</span> <span class="nx">findOnionName</span><span class="p">(</span><span class="nx">pattern</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="c1">// Key matches! Export and format it.</span>
    <span class="k">return</span> <span class="nx">formatKey</span><span class="p">(</span><span class="nx">key</span><span class="p">).</span><span class="nx">then</span><span class="p">(</span><span class="kd">function</span> <span class="p">(</span><span class="nx">formatted</span><span class="p">)</span> <span class="p">{</span>
      <span class="k">return</span> <span class="p">{</span><span class="nx">key</span><span class="o">:</span> <span class="nx">formatted</span><span class="p">,</span> <span class="nx">hash</span><span class="o">:</span> <span class="nx">hash</span><span class="p">};</span>
    <span class="p">});</span>
  <span class="p">});</span>
<span class="p">}</span>
</pre></div></figure>


<p>We simply use <em>generateRSAKey()</em> and <em>computeOnionHash()</em> as defined before.
In case of a pattern match we export the
<a href="http://tools.ietf.org/html/rfc5208">PKCS8</a> private key information, encode it
as <a href="https://en.wikipedia.org/wiki/Base64">Base64</a> and format it nicely:</p>

<figure class='code'><div class="highlight"><pre><span class="kd">function</span> <span class="nx">formatKey</span><span class="p">(</span><span class="nx">key</span><span class="p">)</span> <span class="p">{</span>
  <span class="c1">// Export the DER-encoded ASN.1 private key information.</span>
  <span class="kd">var</span> <span class="nx">promise</span> <span class="o">=</span> <span class="nx">crypto</span><span class="p">.</span><span class="nx">subtle</span><span class="p">.</span><span class="nx">exportKey</span><span class="p">(</span><span class="s2">&quot;pkcs8&quot;</span><span class="p">,</span> <span class="nx">key</span><span class="p">);</span>

  <span class="k">return</span> <span class="nx">promise</span><span class="p">.</span><span class="nx">then</span><span class="p">(</span><span class="kd">function</span> <span class="p">(</span><span class="nx">pkcs8</span><span class="p">)</span> <span class="p">{</span>
    <span class="kd">var</span> <span class="nx">encoded</span> <span class="o">=</span> <span class="nx">base64</span><span class="p">(</span><span class="nx">pkcs8</span><span class="p">);</span>

    <span class="c1">// Wrap lines after 64 characters.</span>
    <span class="kd">var</span> <span class="nx">formatted</span> <span class="o">=</span> <span class="nx">encoded</span><span class="p">.</span><span class="nx">match</span><span class="p">(</span><span class="sr">/.{1,64}/g</span><span class="p">).</span><span class="nx">join</span><span class="p">(</span><span class="s2">&quot;\n&quot;</span><span class="p">);</span>

    <span class="c1">// Wrap the formatted key in a header and footer.</span>
    <span class="k">return</span> <span class="s2">&quot;-----BEGIN PRIVATE KEY-----\n&quot;</span> <span class="o">+</span> <span class="nx">formatted</span> <span class="o">+</span>
           <span class="s2">&quot;\n-----END PRIVATE KEY-----&quot;</span><span class="p">;</span>
  <span class="p">});</span>
<span class="p">}</span>
</pre></div></figure>


<blockquote><p>Note: <em>base64()</em> refers to an existing Base64 implementation that can deal with
ArrayBuffers. <a href="https://github.com/niklasvh/base64-arraybuffer">niklasvh/base64-arraybuffer</a>
is a good one that I will use in the example code.</p></blockquote>

<p>What is logged to the console can be directly used to replace any random key
that Tor has assigned before. Here is how you would use the code we just wrote:</p>

<figure class='code'><div class="highlight"><pre><span class="nx">findOnionName</span><span class="p">(</span><span class="sr">/ab/</span><span class="p">).</span><span class="nx">then</span><span class="p">(</span><span class="kd">function</span> <span class="p">(</span><span class="nx">result</span><span class="p">)</span> <span class="p">{</span>
  <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">result</span><span class="p">.</span><span class="nx">hash</span> <span class="o">+</span> <span class="s2">&quot;.onion&quot;</span><span class="p">,</span> <span class="nx">result</span><span class="p">.</span><span class="nx">key</span><span class="p">);</span>
<span class="p">},</span> <span class="kd">function</span> <span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="p">{</span>
  <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s2">&quot;An error occurred, please reload the page.&quot;</span><span class="p">);</span>
<span class="p">});</span>
</pre></div></figure>


<p>The Promise returned by <em>findOnionName()</em> will not resolve until a match was
found. When generating lots of keys Firefox currently sometimes fails with a
&ldquo;transient error&rdquo; that needs to be investigated. If you want a loop that runs
despite that error you could simply restart the search in the error handler.</p>

<p><a href="https://timtaubert.de/images/onion-console.png" title="The Web Console showing a found .onion name with its key" class="img"><img src="https://timtaubert.de/images/onion-console.png" title="The Web Console showing a found .onion name with its key" ></a></p>

<h2>The code</h2>

<p><a href="https://gist.github.com/ttaubert/389255d724f219f76900">https://gist.github.com/ttaubert/389255d724f219f76900</a></p>

<p>Include it in a minimal web site and have the Web Console open. It will run in
Firefox 33+ and Chrome 37+ with the WebCrypto API explicitly enabled (if
necessary).</p>

<h2>The pitfalls</h2>

<p>As said before, the approach shown above is quite naive and thus very slow. The
easiest optimization to implement might be to spawn multiple web workers and
let them search in parallel.</p>

<p>We could also speed up finding keys by not regenerating the whole RSA key every
loop iteration but instead increasing the public exponent by 2 (starting from 3)
until we find a match and then check whether that produces a valid key pair.
If it does not we can just continue.</p>

<p>Lastly, the current implementation does not perform any safety checks that Tor
might run on the generated key. All of these points would be great reasons for
a follow-up post.</p>

<blockquote><p><strong>Important</strong>: You should use the keys generated with this code to run a
hidden service only if you trust the host that serves it. Getting your keys
off of someone else&rsquo;s web server is a terrible idea. Do not be <em>that</em> guy or gal.</p></blockquote>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[HTTP Public-Key-Pinning Explained]]></title>
    <link href="https://timtaubert.de/blog/2014/10/http-public-key-pinning-explained/"/>
    <updated>2014-10-30T14:00:00+01:00</updated>
    <id>https://timtaubert.de/blog/2014/10/http-public-key-pinning-explained</id>
    <content type="html"><![CDATA[<p>In my last post
<a href="https://timtaubert.de/blog/2014/10/deploying-tls-the-hard-way/">&ldquo;Deploying TLS the hard way&rdquo;</a>
I explained how TLS and its extensions (as well as a few HTTP extensions) work
and what to watch out for when enabling TLS for your server. One of the HTTP
extensions mentioned is
<a href="https://tools.ietf.org/html/rfc7469">HTTP Public-Key-Pinning (HPKP)</a>.
As a short reminder, the header looks like this:</p>

<figure class='code'><div class="highlight"><pre>Public-Key-Pins:
  pin-sha256=&quot;GRAH5Ex+kB4cCQi5gMU82urf+6kEgbVtzfCSkw55AGk=&quot;;
  pin-sha256=&quot;lERGk61FITjzyKHcJ89xpc6aDwtRkOPAU0jdnUqzW2s=&quot;;
  max-age=15768000; includeSubDomains
</pre></div></figure>


<p>You can see that it specifies two <em>pin-sha256</em> values, that is the pins of two
public keys. One is the pin of any public key in your current certificate chain
and the other is the pin of any public key <em>not</em> in your current certificate
chain. The latter is a backup in case your certificate expires or has to be
revoked.</p>

<p>It is definitely not obvious which public keys you should pin and what a good
backup pin would be. Let us answer those questions by starting with a more
detailed overview of how public key pinning and TLS certificates work.</p>

<h2>How are RSA keys represented?</h2>

<p>Let us go back to the beginning and start by taking a closer look at
<a href="https://en.wikipedia.org/wiki/RSA_%28cryptosystem%29">RSA</a> keys:</p>

<figure class='code'><div class="highlight"><pre>$ openssl genrsa 2048
</pre></div></figure>


<p>The above command generates a 2048 bit RSA key and prints it to the console.
Although it says <code>-----BEGIN RSA PRIVATE KEY-----</code> it does not only return the
private key but an
<a href="https://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One">ASN.1</a> structure
that also contains the public key - we thus actually generated an RSA key pair.</p>

<p>A common misconception when learning about keys and certificates is that the
RSA key itself for a given certificate expires. RSA keys however never expire -
after all they are just numbers. Only the certificate containing the public key
can expire and only the certificate can be revoked. Keys &ldquo;expire&rdquo; or are
&ldquo;revoked&rdquo; as soon as there are no more valid certificates using the public key,
and you threw away the keys and stopped using them altogether.</p>

<h2>What does the certificate contain?</h2>

<p>By submitting the
<a href="https://en.wikipedia.org/wiki/Certificate_signing_request">Certificate Signing Request (CSR)</a>
containing your public key to a Certificate Authority it will issue a valid
certificate. That will again contain the public key of the RSA key pair we
generated above and an expiration date. Both the public key and the expiration
date will be signed by the CA so that modifications of any of the two would
render the certificate invalid immediately.</p>

<p>For simplicity I left out a few other fields that
<a href="https://en.wikipedia.org/wiki/X.509#Structure_of_a_certificate">X.509 certificates</a>
contain to properly authenticate TLS connections, for example your server&rsquo;s
hostname and other details.</p>

<h2>How does public key pinning work?</h2>

<p>The whole purpose of public key pinning is to detect when the public key of a
certificate for a specific host has changed. That may happen when an attacker
compromises a CA such that they are able to issue valid certificates for <em>any</em>
domain. A foreign CA might also just be the attacker, think of state-owned CAs
that you do not want to be able to MITM your site. Any attacker intercepting
a connection from a visitor to your server with a forged certificate can only
be prevented by detecting that the public key has changed.</p>

<p>After establishing a TLS session with the server, the browser will look up any
stored pins for the given hostname and check whether any of those stored pins
match any of the <a href="https://tools.ietf.org/html/rfc7469#section-2.4">SPKI fingerprints</a>
(the output of applying SHA-256 to the public key information) in the
certificate chain. The connection must be terminated immediately if
<a href="https://tools.ietf.org/html/rfc7469#section-2.6">pin validation</a> fails.</p>

<p>A valid certificate that passed all basics checks will be accepted if the
browser could not find any pins stored for the current hostname. This might
happen if the site does not support public key pinning and does not send any
HPKP headers at all, or if this is the first time visiting and the server has
not seen the HPKP header yet in a previous visit.</p>

<h2>What if you need to replace your certificate?</h2>

<p>If your certificate expires or an attacker stole the private key you will have
to replace (and possibly revoke) the leaf certificate. This might invalidate
your pin, the constraints for obtaining a new valid certificate are the same as
for an attacker that tries to impersonate you and intercept TLS sessions.</p>

<p>Pin validation requires checking the SPKI fingerprints of all certificates in
the chain and will succeed if any of the public keys matches any of the pins.
When for example StartSSL signed your certificate you have another intermediate
Class 1 or 2 certificate and their root certificate in the chain. The browser
trusts only the root certificate but the intermediate ones are signed by the
root certificate. The intermediate certificate in turn signs the certificate
deployed on your server and that is called a chain of trust.</p>

<p>If you pinned your leaf certificate then the only way to recover is your backup
pin - whatever this points to must be included in your new certificate chain
if you want to allow users that stored your pin from previous connections back
on your server.</p>

<p>An easier solution would be available if you provided the SPKI fingerprint of
StartSSL&rsquo;s Class 1 intermediate certificate. To construct a new valid
certificate chain you simply have to ask StartSSL to re-issue a new certificate
for a new or your current key. This comes at the price of a slightly bigger
attack surface as someone that stole the private key of the CA&rsquo;s intermediate
certificate would be able to impersonate your site and pass key pinning checks.</p>

<p>Another possibility is pinning StartSSL&rsquo;s root certificate. Any certificate
issued by StartSSL would let you construct a new valid certificate chain. Again,
this slightly increases the attack vector as any compromised intermediate or
root certificate would allow to impersonate your site and pass pinning checks.</p>

<h2>What key should I pin?</h2>

<p>Given all of the above scenarios you might ask which key would be the best to
pin, and the answer is: it depends. You can pin one or all of the public keys
in your certificate chain and that will work. The specification requires you to
have at least two pins, so you must include the SPKI hash of another CA&rsquo;s root
certificate, another CA&rsquo;s intermediate certificate (a different tier of your
current CA would also work), or another leaf certificate. The only requirement
is that this pin is not equal to the hash of any of the certificates in the
current chain. The poor browser cannot tell whether you gave it a valid and
useful backup pin so it will happily accept random values too.</p>

<p>Pinning to a small set of CAs that you are comfortable with helps you reduce the
risk to yourself. Pinning just your leaf certificates is only advised if you are
really certain that this is for you. It is a little like driving without a
seatbelt and might work most of the time. If something goes wrong it usually
goes really wrong and you want to avoid that.</p>

<p>Pinning only your own leaf certs also bears the risk of creating a backup key
that adheres to ancient standards and could not be used anymore when you have
to replace your current certificate. Assume it was three years ago, and your
backup was a 1024-bit RSA key pair. You pin for a year, and your certificate
expires. You go to a CA and say &ldquo;Hey, re-issue my cert for Key A&rdquo;, and they say
&ldquo;No, your key is too small/weak&rdquo;. You then say &ldquo;Ah, but what about my backup
key?&rdquo; - and that also gets rejected because it is too short. In effect, because
you only pinned to keys under your control you are now bricked.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Deploying TLS the Hard Way]]></title>
    <link href="https://timtaubert.de/blog/2014/10/deploying-tls-the-hard-way/"/>
    <updated>2014-10-27T19:00:00+01:00</updated>
    <id>https://timtaubert.de/blog/2014/10/deploying-tls-the-hard-way</id>
    <content type="html"><![CDATA[<blockquote><ol>
<li><a href="#tls">How does TLS work?</a></li>
<li><a href="#the-cert">The certificate</a></li>
<li><a href="#pfs">(Perfect) Forward Secrecy</a></li>
<li><a href="#cipher-suites">Choosing the right cipher suites</a></li>
<li><a href="#hsts">HTTP Strict Transport Security</a></li>
<li><a href="#hsts-preload">HSTS Preload List</a></li>
<li><a href="#ocsp-stapling">OCSP Stapling</a></li>
<li><a href="#hpkp">HTTP Public Key Pinning</a></li>
<li><a href="#attacks">Known attacks</a></li>
</ol>
</blockquote>

<p>Last weekend I finally deployed TLS for <code>timtaubert.de</code> and decided to write up
what I learned on the way hoping that it would be useful for anyone doing the
same. Instead of only giving you a few buzz words I want to provide background
information on how TLS and certain HTTP extensions work and why you should use
them or configure TLS in a certain way.</p>

<p>One thing that bugged me was that most posts only describe what to do but not
necessarily why to do it. I hope you appreciate me going into a little more
detail to end up with the bigger picture of what TLS currently is, so that you
will be able to make informed decisions when deploying yourselves.</p>

<p>To follow this post you will need some basic cryptography knowledge. Whenever
you do not know or understand a concept you should probably just head over to
Wikipedia and take a few minutes or just do it later and maybe re-read the
whole thing.</p>

<blockquote><p>Disclaimer: I am not a security expert or cryptographer but did my best to
research this post thoroughly. Please <a href="https://twitter.com/ttaubert">let me know</a>
of any mistakes I might have made and I will correct them as soon as possible.</p></blockquote>

<h2>But didn&rsquo;t Andy say this is all shit?</h2>

<p>I read <a href="http://wingolog.org/archives/2014/10/17/ffs-ssl">Andy Wingo&rsquo;s blog post</a>
too and I really liked it. Everything he says in there is true. But what is
also true is that TLS with the few add-ons is all we have nowadays and we
better make the folks working for the NSA earn their money instead of not
trying to encrypt traffic at all.</p>

<p>After you finished reading this page, maybe go back to Andy&rsquo;s post and read it
again. You might have a better understanding of what he is ranting about than
you had before if the details of TLS are still dark matter to you.</p>

<h2><a name="tls"></a> So how does TLS work?</h2>

<p>Every TLS connection starts with both parties sharing their supported TLS
versions and cipher suites. As the next step the server sends its
<a href="https://en.wikipedia.org/wiki/X.509#Structure_of_a_certificate">X.509 certificate</a>
to the browser.</p>

<h3>Checking the server&rsquo;s certificate</h3>

<p>The following certificate checks need to be performed:</p>

<ul>
<li>Does the certificate contain the server&rsquo;s hostname?</li>
<li>Was the certificate issued by a CA that is in my list of trusted CAs?</li>
<li>Does the certificate&rsquo;s signature verify using the CA&rsquo;s public key?</li>
<li>Has the certificate expired already?</li>
<li>Was the certificate revoked?</li>
</ul>


<p>All of these are very obvious crucial checks. To query a certificate&rsquo;s
revocation status the browser will use the
<a href="https://tools.ietf.org/html/rfc6960">Online Certificate Status Protocol (OCSP)</a>
which I will describe in more detail in a later section.</p>

<p>After the certificate checks are done and the browser ensured it is talking to
the right host both sides need to agree on secret keys they will use to
communicate with each other.</p>

<h3>Key Exchange using RSA</h3>

<p>A simple key exchange would be to let the client generate a <em>master secret</em>
and encrypt that with the server&rsquo;s public
<a href="https://en.wikipedia.org/wiki/RSA_%28cryptosystem%29">RSA</a> key given by the
certificate. Both client and server would then use that master secret to derive
symmetric encryption keys that will be used throughout this TLS session. An
attacker could however simply record the handshake and session for later, when
breaking the key has become feasible or the machine is suspect to a
vulnerability. They may then use the server&rsquo;s private key to recover the whole
conversation.</p>

<h3>Key Exchange using (EC)DHE</h3>

<p>When using (Elliptic Curve)
<a href="https://en.wikipedia.org/wiki/Diffie-Hellman_key_exchange">Diffie-Hellman</a> as
the key exchange mechanism both sides have to collaborate to generate a master
secret. They generate DH key pairs (which is <em>a lot</em> cheaper than generating
RSA keys) and send their public key to the other party. With the private key
and the other party&rsquo;s public key the shared master secret can be calculated and
then again be used to derive session keys. We can provide
<a href="https://en.wikipedia.org/wiki/Forward_secrecy">Forward Secrecy</a> when using
ephemeral DH key pairs. See the section below on how to enable it.</p>

<p>We could in theory also provide forward secrecy with an RSA key exchange if
the server would generate an ephemeral RSA key pair, share its public key and
would then wait for the master secret to be sent by the client. As hinted above
RSA key generation is very expensive and does not scale in practice. That is
why RSA key exchanges are not a practical option for providing forward secrecy.</p>

<p>After both sides have agreed on session keys the TLS handshake is done and they
can finally start to communicate using symmetric encryption algorithms like
<a href="https://en.wikipedia.org/wiki/Advanced_Encryption_Standard">AES</a> that are
<em>much</em> faster than asymmetric algorithms.</p>

<h2><a name="the-cert"></a> The certificate</h2>

<p>Now that we understand authenticity is an integral part of TLS we know that in
order to serve a site via TLS we first need a certificate. The TLS protocol
can encrypt traffic between two parties just fine but the certificate provides
the necessary authentication towards visitors.</p>

<p>Without a certificate a visitor could securely talk to either us, the NSA, or
a different attacker but they probably want to talk to us. The certificate
ensures by cryptographic means that they established a connection to <em>our</em>
server.</p>

<h3>Selecting a Certificate Authority (CA)</h3>

<p>If you want a cheap certificate, have no specific needs, and only a single
subdomain (e.g. www) then StartSSL is an easy option. Do of course feel free
to take a look at different authorities - their services and prices will vary
heavily.</p>

<p>In the chain of trust the CA plays an important role: by verifying that you are
the rightful owner of your domain and signing your certificate it will let
browsers trust your certificate. The browsers do not want to do all this
verification themselves so they defer it to the CAs.</p>

<p>For your certificate you will need an RSA key pair, a public and private key.
The public key will be included in your certificate and thus also signed by the
CA.</p>

<h3>Generating an RSA key and a certificate signing request</h3>

<p>The example below shows how you can use OpenSSL on the command line to generate
a key for your domain. Simply replace <code>example.com</code> with the domain of your
website. <code>example.com.key</code> will be your new RSA key and <code>example.com.csr</code> will
be the
<a href="https://en.wikipedia.org/wiki/Certificate_signing_request">Certificate Signing Request</a>
that your CA needs to generate your certificate.</p>

<figure class='code'><div class="highlight"><pre>openssl req -new -newkey rsa:4096 -nodes -sha256 \
  -keyout example.com.key -out example.com.csr
</pre></div></figure>


<p>We will use a SHA-256 based signature for integrity as
<a href="https://blog.mozilla.org/security/2014/09/23/phasing-out-certificates-with-sha-1-based-signature-algorithms/">Firefox and Chrome will phase out support for SHA-1 based certificates soon</a>.
The RSA keys used to authenticate your website will use a 4096 bit modulus. If
you need to handle a lot of traffic or your server has a weak CPU you might
want to use 2048 bit. Never go below that as keys smaller than 2048 bit are
considered insecure.</p>

<h3>Get a signed certificate</h3>

<p>Sign up with the CA you chose and depending on how they handle this process you
probably will have to first verify that you are the rightful owner of the
domain that you claim to possess. StartSSL will do that by sending a token to
<code>postmaster@example.com</code> (or similar) and then ask you to confirm the receipt
of that token.</p>

<p>Now that you signed up and are the verified owner of <code>example.com</code> you simply
submit the <code>example.com.csr</code> file to request the generation of a certificate
for your domain. The CA will sign your public key and the other information
contained in the CSR with their private key and you can finally download the
certificate to <code>example.com.crt</code>.</p>

<p>Upload the .crt and .key files to your web server. Be aware that any
intermediate certificate in the CA&rsquo;s chain must be included in the .crt file as
well - you can just <code>cat</code> them together. StartSSL&rsquo;s free tier has an
intermediate Class 1 certificate - make sure to use
<a href="http://www.startssl.com/certs/class1/sha2/pem/sub.class1.server.sha2.ca.pem">the SHA-256 version</a>
of it. All files should be owned by root and must not be readable by anyone
else. Configure your web server to use those and you should probably have TLS
running configured out-of-the-box.</p>

<h2><a name="pfs"></a> (Perfect) Forward Secrecy</h2>

<p>To properly deploy TLS you will want to provide
<a href="http://vincent.bernat.im/en/blog/2011-ssl-perfect-forward-secrecy.html">(Perfect) Forward Secrecy</a>.
Without forward secrecy TLS still seems to secure your communication today, it
might however not if your private key is compromised in the future.</p>

<p>If a powerful adversary (think NSA) records all communication between a visitor
and your server, they can decrypt all this traffic years later by stealing your
private key or going the &ldquo;legal&rdquo; way to obtain it. This can be prevented by
using short-lived (ephemeral) keys for key exchanges that the server will
throw away after a short period.</p>

<h3>Diffie-Hellman key exchanges</h3>

<p>Using RSA with your certificate&rsquo;s private and public keys for key exchanges is
off the table as generating a 2048+ bit prime is very expensive. We thus need
to switch to ephemeral (Elliptic Curve) Diffie-Hellman cipher suites. For DH
you can generate a 2048 bit parameter once, choosing a private key afterwards
is cheap.</p>

<figure class='code'><div class="highlight"><pre>openssl dhparam -out dhparam.pem 2048
</pre></div></figure>


<p>Simply upload <code>dhparam.pem</code> to your server and instruct the web server to use
it for Diffie-Hellman key exchanges. When using ECDH the predefined elliptic
curve represents this parameter and no further action is needed.</p>

<figure class='code'><div class="highlight"><pre>(Nginx)
ssl_dhparam /path/to/ssl/dhparam.pem;
</pre></div></figure>


<p>Apache does unfortunately not support custom DH parameters, it is always set to
1024 bit and is not user configurable. This might hopefully be fixed in future
versions.</p>

<h3>Session IDs</h3>

<p>One of the most important mechanisms to improve TLS performance is
<a href="https://en.wikipedia.org/wiki/Transport_Layer_Security#Resumed_TLS_handshake">Session Resumption</a>.
In a full handshake the server sends a <em>Session ID</em> as part of the &ldquo;hello&rdquo;
message. On a subsequent connection the client can use this session ID and
pass it to the server when connecting. Because both the server and the client
have saved the last session&rsquo;s &ldquo;secret state&rdquo; under the session ID they can
simply resume the TLS session where they left off.</p>

<p>Now you might notice that this could violate forward secrecy as a compromised
server might reveal the secret state for all session IDs if the cache is just
large enough. The forward secrecy of a connection is thus bounded by how long
the session information is retained on the server. Ideally, your server would
use a medium-sized in-memory cache that is purged daily.</p>

<p>Apache lets you configure that using the <code>SSLSessionCache</code> directive and you
should use the high-performance cyclic buffer <code>shmcb</code>. Nginx has the
<code>ssl_session_cache</code> directive and you should use a <code>shared</code> cache that is
shared between workers. The right size of those caches would depend on the
amount of traffic your server handles. You want browsers to resume TLS sessions
but also get rid of old ones about daily.</p>

<h3>Session Tickets</h3>

<p>The second mechanism to resume a TLS session are
<a href="http://tools.ietf.org/html/rfc5077">Session Tickets</a>. This extension transmits
the server&rsquo;s secret state to the client, encrypted with a key only known to the
server. That ticket key is protecting the TLS connection now and in the future.</p>

<p>This might as well violate forward secrecy if the key used to encrypt session
tickets is compromised. The ticket (just as the session cache) contains all of
the server&rsquo;s secret state and would allow an attacker to reveal the whole
conversation.</p>

<p>Nginx and Apache by default generate a session ticket key at startup and do
unfortunately provide no way to rotate it. If your server is running for months
without a restart then you will use that same session ticket key for months and
breaking into your server could reveal every recorded TLS conversation since
the web server was started.</p>

<p>Neither Nginx nor Apache have a sane way to work around this, Nginx might be able to
<a href="http://forum.nginx.org/read.php?2,229538,230872#msg-230872">rotate the key by reloading the server config</a>
which is rather easy to implement with a cron job. Make sure to test that this
actually works before relying on it though.</p>

<p>Thus if you really want to provide forward secrecy you should disable session
tickets using <code>ssl_session_tickets off</code> for Nginx and <code>SSLOpenSSLConfCmd
Options -SessionTicket</code> for Apache.</p>

<h2><a name="cipher-suites"></a> Choosing the right cipher suites</h2>

<p><a href="https://wiki.mozilla.org/Security/Server_Side_TLS#Modern_compatibility">Mozilla&rsquo;s guide on server side TLS</a>
provides a great list of modern cipher suites that needs to be put in your web
server&rsquo;s configuration. The combinations below are unfortunately supported by
only modern browsers, for broader client support you might want to consider
using the &ldquo;intermediate&rdquo; list.</p>

<figure class='code'><div class="highlight"><pre>ECDHE-RSA-AES128-GCM-SHA256:   \
ECDHE-ECDSA-AES128-GCM-SHA256: \
ECDHE-RSA-AES256-GCM-SHA384:   \
ECDHE-ECDSA-AES256-GCM-SHA384: \
DHE-RSA-AES128-GCM-SHA256:     \
DHE-DSS-AES128-GCM-SHA256:     \
[...]
!aNULL:!eNULL:!EXPORT:!DES:!RC4:!3DES:!MD5:!PSK
</pre></div></figure>


<p>All these cipher suites start with (EC)DHE which means they only support
ephemeral Diffie-Hellman key exchanges for forward secrecy. The last line
discards non-authenticated key exchanges, null-encryption (cleartext), legacy
weak ciphers marked exportable by US law, weak ciphers (3)DES and RC4, weak MD5
signatures, and pre-shared keys.</p>

<blockquote><p>Note: To ensure that the order of cipher suites is respected you need to set
<code>ssl_prefer_server_ciphers on</code> for Nginx or <code>SSLHonorCipherOrder on</code> for
Apache.</p></blockquote>

<h2><a name="hsts"></a> HTTP Strict Transport Security (HSTS)</h2>

<p>Now that your server is configured to accept TLS connections you still want to
support HTTP connections on port 80 to redirect old links and folks typing
<code>example.com</code> in the URL bar to your shiny new HTTPS site.</p>

<p>At this point however a <a href="https://en.wikipedia.org/wiki/Man-in-the-middle_attack">Man-In-The-Middle</a>
(or Woman-In-The-Middle) attack can easily intercept and modify traffic to
deliver a forged HTTP version of your site to a visitor. The poor visitor might
never know because they did not realize you offer TLS connections now.</p>

<p>To ensure your users are secured when visiting your site the next time you
want to send a HSTS header to enforce
<a href="https://tools.ietf.org/html/rfc6797">strict transport security</a>.
By sending this header the browser will not try to establish a HTTP connection
next time but directly connect to your website via TLS.</p>

<figure class='code'><div class="highlight"><pre>Strict-Transport-Security:
  max-age=15768000; includeSubDomains; preload
</pre></div></figure>


<p>Sending these headers over a HTTPS connection (they will be ignored via HTTP)
lets the browser remember that this domain wants strict transport security for
the next six months (~15768000 seconds). The <code>includeSubDomains</code> token enforces
TLS connections for every subdomain of your domain and the non-standard
<code>preload</code> token will be required for the next section.</p>

<h2><a name="hsts-preload"></a> HSTS Preload List</h2>

<p>If after deploying TLS the very first connection of a visitor is genuine we are
fine. Your server will send the HSTS header over TLS and the visitor&rsquo;s browser
remembers to use TLS in the future. The very first connection and every
connection after the HSTS header expires however are still vulnerable to a
MITM attack.</p>

<p>To prevent this Firefox and Chrome share a
<a href="https://chromium.googlesource.com/chromium/src/net/+/master/http/transport_security_state_static.json">HSTS Preload List</a>
that basically includes HSTS headers for all sites that would send that header
when visiting anyway. So before connecting to a host Firefox and Chrome check
whether that domain is in the list and if so would not even try using an
insecure HTTP connection.</p>

<p>Including your page in that list is easy, just submit your domain using the
<a href="http://hstspreload.appspot.com/">HSTS Preload List submission form</a>. Your
HSTS header must be set up correctly and contain the <code>includeSubDomains</code> and
<code>preload</code> tokens to be accepted.</p>

<h2><a name="ocsp-stapling"></a> OCSP Stapling</h2>

<p><a href="https://en.wikipedia.org/wiki/Online_Certificate_Status_Protocol">OCSP</a> -
using an external server provided by the CA to check whether the certificate
given by the server was revoked - might sound like a great idea at first. On
the second thought it actually sounds rather terrible. First, the CA providing
the OCSP server suddenly has to be able to handle a lot of requests: every
client opening a connection to your server will want to know whether your
certificate was revoked before talking to you.</p>

<p>Second, the browser contacting a CA and passing the certificate is an easy way
to monitor a user&rsquo;s browsing behavior. If all CAs worked together they probably
could come up with a nice data set of TLS sites that people visit, when and in
what order (not that I know of any plans they actually wanted to do that).</p>

<h3>Let the server do the work for your visitors</h3>

<p><a href="https://tools.ietf.org/html/rfc6066#section-8">OCSP Stapling</a> is a TLS
extension that enables the server to query its certificate&rsquo;s revocation status
at regular intervals in the background and send an OCSP response with the TLS
handshake. The stapled response itself cannot be faked as it needs to be
signed with the CA&rsquo;s private key. Enabling OCSP stapling thus improves
performance and privacy for your visitors immediately.</p>

<p>You need to create a certificate file that contains your CA&rsquo;s root certificate
prepended by any intermediate certificates that might be in your CA&rsquo;s chain.
StartSSL has an intermediate certificate for Class 1 (the free tier) - make
sure to use
<a href="http://www.startssl.com/certs/class1/sha2/pem/sub.class1.server.sha2.ca.pem">the one having the SHA-256 signature</a>.
Pass the file to Nginx using the <code>ssl_trusted_certificate</code> directive and to
Apache using the <code>SSLCACertificateFile</code> directive.</p>

<h3>OCSP Must Staple</h3>

<p>OCSP however is unfortunately not a silver bullet. If a browser does not know
in advance it will receive a stapled response then the attacker might as well
redirect HTTPS traffic to their server and block any traffic to the OCSP server
(in which case browsers soft-fail).
<a href="https://www.imperialviolet.org/2014/04/19/revchecking.html">Adam Langley explains</a>
all possible attack vectors in great detail.</p>

<p>One solution might be the proposed
<a href="https://tools.ietf.org/html/draft-hallambaker-muststaple-00">OCSP Must Staple Extension</a>.
This would add another field to the certificate issued by the CA that says a
server <em>must</em> provide a stapled OCSP response. The problem here is that the
proposal expired and in practice it would take years for CAs to support that.</p>

<p>Another solution would be to implement
<a href="https://bugzilla.mozilla.org/show_bug.cgi?id=901698">a header similar to HSTS</a>,
that lets the browser remember to require a stapled OCSP response when
connecting next time. This however has the same problems on first connection
just like HSTS, and we might have to maintain a &ldquo;OCSP-Must-Staple Preload List&rdquo;.
As of today there is unfortunately no immediate solution in sight.</p>

<h2><a name="hpkp"></a> HTTP Public Key Pinning (HPKP)</h2>

<p>Even with all those security checks when receiving the server&rsquo;s certificate
we would still be completely out of luck in case your
<a href="http://en.wikipedia.org/wiki/DigiNotar">CA&rsquo;s private key is compromised</a> or
your <a href="http://nakedsecurity.sophos.com/2013/01/08/the-turktrust-ssl-certificate-fiasco-what-happened-and-what-happens-next/">CA simply fucks up</a>.
We can prevent these kinds of attacks with an HTTP extension called
<a href="https://tools.ietf.org/html/draft-ietf-websec-key-pinning-21">Public Key Pinning</a>.</p>

<p>Key pinning is a trust-on-first-use (TOFU) mechanism. The first time a browser
connects to a host it lacks the the information necessary to perform &ldquo;pin
validation&rdquo; so it will not be able to detect and thwart a MITM attack. This
feature only allows detection of these kinds of attacks after the first
connection.</p>

<h3>Generating a HPKP header</h3>

<p><a href="https://developer.mozilla.org/en-US/docs/Web/Security/Public_Key_Pinning">Creating an HPKP header is easy</a>,
all you need to do is to compute the base64-encoded &ldquo;SPKI fingerprint&rdquo; of your
server&rsquo;s certificate. An SPKI fingerprint is the output of applying SHA-256
to the public key information contained in your certificate.</p>

<figure class='code'><div class="highlight"><pre>openssl req -inform pem -pubkey -noout &lt; example.com.csr |
  openssl pkey -pubin -outform der |
  openssl dgst -sha256 -binary |
  base64
</pre></div></figure>


<p>The result of running the above command can be directly used as the
<em>pin-sha256</em> values for the <em>Public-Key-Pins</em> header as shown below:</p>

<figure class='code'><div class="highlight"><pre>Public-Key-Pins:
  pin-sha256=&quot;GRAH5Ex+kB4cCQi5gMU82urf+6kEgbVtzfCSkw55AGk=&quot;;
  pin-sha256=&quot;lERGk61FITjzyKHcJ89xpc6aDwtRkOPAU0jdnUqzW2s=&quot;;
  max-age=15768000; includeSubDomains
</pre></div></figure>


<p>Upon receiving this header the browser knows that it has to store the pins
given by the header and discard any certificates whose SPKI fingerprints do
not match for the next six months (max-age=15768000). We specified the
<code>includeSubDomains</code> token so the browser will verify pins when connecting
to any subdomain.</p>

<h3>Include the pin of a backup key</h3>

<p>It is considered good practice to include at least a second pin, the SPKI
fingerprint of a backup RSA key that you can generate exactly as the original
one:</p>

<figure class='code'><div class="highlight"><pre>openssl req -new -newkey rsa:4096 -nodes -sha256 \
  -keyout example.com.backup.key -out example.com.backup.csr
</pre></div></figure>


<p>In case your private key is compromised you might need to revoke your
current certificate and request the CA to issue a new one. The old pin however
would still be stored in browsers for six months which means they would not
be able to connect to your site. By sending two <em>pin-sha256</em> values the browser
will later accept a TLS connection when any of the stored fingerprints match
the given certificate.</p>

<h2><a name="attacks"></a> Known attacks</h2>

<p>In the past years (and especially the last year) a few attacks on SSL/TLS were
published. Some of those attacks can be worked around on the protocol or crypto
library level so that you basically do not have to worry as long as your web
server is up to date and the visitor is using a modern browser. A few attacks
however need to be thwarted by configuring your server properly.</p>

<h3>BEAST (Browser Exploit Against SSL/TLS)</h3>

<p><a href="http://blog.cryptographyengineering.com/2011/09/brief-diversion-beast-attack-on-tlsssl.html">BEAST</a>
is an attack that only affects TLSv1.0. Exploiting this vulnerability is
possible but rather difficult. You can either disable TLSv1.0 completely -
which is certainly the preferred solution although you might neglect folks
with old browsers on old operating systems - or you can just not worry. All
major browsers have implemented workarounds so that it should not be an issue
anymore in practice.</p>

<h3>BREACH (Browser Reconnaissance and Exfiltration via Adaptive Compression of Hypertext)</h3>

<p><a href="https://en.wikipedia.org/wiki/BREACH_%28security_exploit%29">BREACH</a> is a
security exploit against HTTPS when using HTTP compression. BREACH is based
on <a href="https://en.wikipedia.org/wiki/CRIME">CRIME</a> but unlike CRIME - which can be
successfully defended by turning off TLS compression (which is the default
for Nginx and Apache nowadays) - BREACH can only be prevented by turning off
HTTP compression. Another method to mitigate this would be to use
<a href="https://en.wikipedia.org/wiki/Cross-site_request_forgery">cross-site request forgery (CSRF)</a>
protection or
<a href="https://community.qualys.com/blogs/securitylabs/2013/08/07/defending-against-the-breach-attack">disable HTTP compression selectively based on headers</a>
sent by the application.</p>

<h3>POODLE (Padding Oracle On Downgraded Legacy Encryption)</h3>

<p><a href="https://en.wikipedia.org/wiki/POODLE">POODLE</a>
is yet another
<a href="https://en.wikipedia.org/wiki/Padding_oracle_attack">padding oracle attack</a> on
TLS. Luckily it only affects the predecessor of TLS which is SSLv3. The only
solution when deploying a new server is to just disable SSLv3 completely.
Fortunately, we already excluded SSLv3 in our list of preferred ciphers
previously. Firefox 34 will ship with SSLv3 disabled by default, Chrome and
others will hopefully follow soon.</p>

<h2>Further reading</h2>

<p>Thanks for reading and I am really glad you made it that far! I hope this post
did not discourage you from deploying TLS - after all getting your setup right
is the most important thing. And it certainly is better to to know what you are
getting yourselves into than leaving your visitors unprotected.</p>

<p>If you want to read even
more about setting up TLS, the Mozilla Wiki page on
<a href="https://wiki.mozilla.org/Security/Server_Side_TLS">Server-Side TLS</a> has more
information and proposed web server configurations.</p>

<blockquote><p>Thanks a lot to <a href="https://frederik-braun.com/">Frederik Braun</a> for taking the
time to proof-read this post and helping to clarify a few things!</p></blockquote>
]]></content>
  </entry>
  
</feed>
