MIT 6.824: Lecture 18 - Certificate Transparency
This lecture is about building systems out of mutually untrustworthy components—using the Web as a case study. The systems we have seen so far are closed systems for which we have assumed that all the participants are trustworthy. But in an open system like the Web where anyone can take part, and there is no universally trusted authority, trust and security are top-level issues to address.
A fundamental challenge in building open systems is verifying the identity of each component involved. We can frame that challenge as each computer in the system asking: Am I talking to the right computer?
Certificate Transparency (CT) aims to help answer this question, but before going into CT, I'll give a brief tour of the evolution of security on the Web.
- Without HTTPS (Man-in-the-middle attacks)
- Certificate Transparency
- Further Reading
Without HTTPS (Man-in-the-middle attacks) #
A man-in-the-middle attack happens when a third party intercepts a connection between a user and an application, as illustrated in the figure below.
Figure 1: Man-in-the-middle attack.
When an attacker intercepts an HTTP connection, they can read and change any packets being sent over the network. These packets may contain any information ranging from passwords to bank details to other private information that a user does not intend for an intruder.
HTTPS was invented to make communication over the internet more secure. It takes the original HTTP protocol and adds a layer of security known as to as SSL/TLS. I'll refer to this layer as TLS for the rest of the post.
With TLS (Transport Layer Security), you only send and receive encrypted data over the network, and only a secret key agreed on by your computer and the site you're visiting can decrypt this data. Thus, while an attacker can still intercept your HTTPS connection, they cannot make sense of the transmitted packets.
How HTTPS works #
TLS is based on public-key cryptography, which means here that the server has a public/private key pair. The server exposes the public key and keeps the private key private. When a client encrypts data using the server's public key, only the private key can decrypt it.
The first step in enabling HTTPS for a server is to get a certificate from a Certificate Authority (CA). This certificate is an ID for the server that contains its domain name, information about its owners, the server's public key, the CA's identity, and a digital signature signed by the CA.
At a high level, when your browser connects to an HTTPS server:
- The server responds with its certificate to prove its identity to the browser.
- Your browser then checks the validity of this certificate using the digital signature from the CA. I'll describe how this works soon.
- After verifying the certificate, your browser will generate a random key and encrypt it using the server's public key. It will then send this encrypted key to the server as a challenge. The challenge is for the server to prove that it has the private key equivalent for the public key by decrypting the encrypted message.
Once the server decrypts this random key, it means each party is happy that they are talking to the right party and they agree to use the key for subsequent communication.
These steps make up the TLS handshake. After the handshake is complete, both parties encrypt HTTP requests and responses using the key they agreed on, which only the other party can decrypt.
Certificate Authorities protect us... #
Without the digital signature from a Certificate Authority, anyone can create a certificate falsely claiming to be, say, 'netflix.com' and get your browser to trust them. Note that your browser will trust them as long as the fake server can decrypt a message it encrypts with the server's public key. To prevent this, there are few authorized CAs which may issue certificates.
Like servers, CAs also have a public/private key pair. When a CA issues a certificate, it encrypts the certificate's content with its private key and uses the encrypted text as the digital signature for the certificate. Anyone can decrypt the signature using the CA's public key.
Each browser comes with a pre-installed list of public keys of all CAs it trusts. When your browser receives a server's certificate, it first checks its list for whether it trusts the issuing CA before decrypting the digital signature using the CA's public key. If the decrypted content matches the certificate, your browser is sure that a valid CA issued the certificate and continues the TLS handshake.
...But they can go rogue #
Unfortunately, CAs can get compromised or go rogue and end up issuing "bogus" certificates, i.e., a CA may issue the certificate for a domain name to the wrong owner. This has happened before. Since any CA can issue a certificate for any domain name, the least secure CA limits the overall security of the certificate mechanism.
Thus, while HTTPS can increase our confidence that we are talking to the right computers, it is not enough.
Can we have an online database of valid certificates? #
To limit the effect of a bogus certificate, what would be ideal is if our browsers could somehow detect and reject bogus certificates. One way this could work is if there is a database of all the valid certificates in existence that our browsers could query. This comes with several questions, though:
- Given that there's no single authority that the entire world trusts, who would run this database?
- How do we decide who owns a domain name?
- How do we handle situations where people change their CAs, renew their certificates or lose their private key and have to request a new one? These will all look like a second certificate for an existing domain name.
Certificate Transparency is an approach to answering these questions, and we'll look at it next.
Certificate Transparency #
Certificate Transparency (CT) is a system for making the existence of all certificates publicly available to domain owners, CAs, and browsers. This way, when a rogue CA issues a certificate for 'netflix.com' to the wrong person, the certificate is immediately visible to the right owners for them to act on it.
CT works by introducing three components to the certificate system: certificate logs, monitors, and auditors.
Certificate Logs #
Certificate logs contain an append-only record of certificates. Anyone can add a certificate to the logs, though only certificate authorities do this typically. When a server issues a new certificate, it must add it to the logs. Anyone can also query a log to verify that it contains a certificate.
Certificate logs are hosted on a group of servers spread over the world and can be managed independently by a CA or any interested party.
Monitors are servers that periodically check the log servers for if a CA has issued any suspicious certificates for the domain names they are aware of. They are hosted by organizations which manage a set of domain names. For example, a company like Netflix could host their monitors and run periodic checks on the certificate logs for if any suspicious certificates exist 'netflix.com'.
When a monitor detects a suspicious certificate, I believe there is a manual step involved where a human checks whether the certificate is actually OK or was wrongly issued. There is a certificate revocation process to get rid of bad certificates.
An auditor runs in a web browser and checks whether the certificate it receives from a server has been registered in the certificate logs.
These components work together to bring openness to the SSL certificate system. Quoting the lecture notes:
If browsers and monitors see the same log, and monitors raise an alarm if there's a bogus cert in the log, and browsers require that each cert they use is in the log, then browsers can feel safe using any cert that's in the log.
Note that auditors and monitors also communicate with each other to exchange information about the logs.
Everyone must see the same logs #
For certificate transparency to work, there are two critical requirements for the logs:
- No deletion: This requirement prevents a situation where a log server claims that a bogus certificate is in the log and shows it to the browser, but then the log operator deletes it from the log before the monitor can detect it.
- No equivocation: All parties must see the same log content, otherwise, a log server could show browsers a log with the bogus certificate, and show the monitor a log without the certificate.
With these requirements, if a CA issues a bogus certificate for a domain name, it must add the certificate to the log. And since a log operator can't delete it, the domain name's owner will eventually see it. But meeting this requirement is difficult because, like CAs, log operators can also get compromised and may even conspire with malicious CAs.
To show that it isn't violating any of the requirements, a certificate log must be able to prove two things:
- That a particular certificate is in the log.
- That if it is showing a version with new certificates added, that version is consistent with the previous version. Proving this confirms that a log operator has modified no certificates in the log and the log has never been branched or forked.
It does this by storing the certificates in a Merkle Tree data structure. I won't go into the details of that here, but I recommend reading this post if you're interested in that.
The key property of Certificate Transparency is that everyone sees the same logs. By doing this, users can detect when CAs have issued bogus certificates for their domain name, and browsers can be confident that any certificates in the log are approved by their owners—which means they are talking to the right servers.
Finally, note that Certificate Transparency does not completely prevent the effect of bogus certificates. There might be a window where the certificates may dupe a browser before the monitors can detect them. What CT offers, though, is a system for quicker detection of these certificates, which will reduce their effect.
: Image lifted from this post by Imperva.
Further Reading #
- Lecture 18: Certificate Transparency - MIT 6.824 lecture notes
- Certificate Transparency FAQ - Additional material from 6.824
- Executing a Man-in-the-Middle Attack in just 15 Minutes by Patrick Nohe
- How does SSL actually work? by Robert Heaton
- What is Certificate Transparency?
- How Certificate Transparency Works
- Transparent Logs for Skeptical Clients
- How Log Proofs Work
mit-6.824 distributed-systems learning-diary