To build secure software, we pay attention to many threats that come from outside. Malicious users attempting SQL injection, DoS techniques, and the like.

We also need to pay attention to threats from inside. By this I mean the software dependencies that we import and invoke from within our code. Malicious or compromised dependencies are not just a potent theoretical threat, successful attacks have been documented in the wild.

Given that we rely on dependencies, and that we need a streamlined process to keep them updated, how can we protect our builds from a compromised dependency?

I’ve been working on a tool to do just that, called hancock.

TL;DR

go get src.d10.dev/hancock/cmd/...
go doc src.d10.dev/hancock/cmd/hancock

Declaration of Dependence

If you’re developing software, you no doubt rely on some dependencies. Perhaps your product is a dependency of another.

By dependency, I mean source code or assets included in Alice’s software, but not originally authored by Alice.

Before releasing your product, you should be certain its dependencies behave as expected. Once released, those dependencies become a built-in part of your product.

How do you check the integrity of those dependencies? This can get complicated for several reasons. One being that code may be fetched via a variety of channels (i.e. your favorite version control tool, or your favorite package manager). Integrity depends on the distribution channel, as well as the original author.

It would be nice if verifying all dependencies was as simple as:

  1. Opt-in to each of your trusted dependency authors (or code auditors).
  2. Automatically verify all dependencies, at build time.

This sounds like a job for end-to-end authentication!
Exactly how hancock addresses it…

Introducing Hancock

Named after John Hancock. In the U.S., John Hancock is not only a famous name, it is slang for a signature.

To demonstrate, let’s say Alice is the author of some source code. Carol uses Alice’s package as a dependency. Carol wants to be certain that she has an unaltered copy of the software as published by Alice.

Verifier must determine if local copy is authentic

Carol wants this verification to be automated, so that future builds can include Alice’s updates and bug fixes, but will never include malicious code written by someone attempting to spoof Alice.

Signed Testimony

When releasing a version of her software, Alice uses hancock to produce testimony about source code files. Testimony is a verifiable data structure. Anyone with a copy of the source and Alice’s testimony can verify that the local copy is identical to Alice’s original.

verifier compares copy to signed testimony

Authentication

To verify, Carol provides hancock with Alice’s public key and the local copy of the source file. If the copy is authentic, the build will proceed, otherwise it fails.

verifier compares copy to signed testimony

In this example, Carol opts-in to trusting the author, Alice. Carol has the option to use other trusted authorities, such as a third-party code auditor, or perhaps self-signed patches.

End-to-End

Hancock separates authentication from distribution, so that it works with either proprietary or open source dependencies. It works whether Carol cloned the dependency from BobHub.com, or installed it via Bob Package Manager.

hancock requires three things when authenticating a file:

  • Source Copy - the file being authenticated, can be fetched via any channel.
  • Trusted Authority - a one-time configuration step, the verifier specifies authorities’ public keys.
  • Signed Testimony - hancock fetches, currently via IPFS.

Testimony is treated as public information, and distributed widely. Testimony refers to source files by cryptographic hash, so the content of the source being authenticated is never leaked.

Testimony on IPFS must be addressed by a content identifier, which cannot be guessed. To discover testimony, hancock uses an index that maps source identifier and public key to testimony. Again, hashes are used so that testimony can be indexed without revealing sensitive data to the index.

By default, hancock uses an index hosted by Beyond Central.

ETC.

See hancock source code for more documentation.

Hancock is open source software available under the AGPL 3 license.