XML Signatures are a bad idea executed even worse

Let’s say you’re going to send Bob a message, and you want Bob to know it’s you that sent the message just by looking at the message. You’ll probably

Make a public/private keypair, and show Bob the public key
Serialize your message into bytes
Sign the bytes with the private key
Concatenate the message and the signature in some sane dumb way, like base64(msg) + "." + base64(signature)
Send that

Bob checks your message is legit by

Splitting your payload into msg and signature
Checking that the signature does match the msg bytes (he has your public key already)

If that checks out, then Bob can go about his business with msg, knowing it came from you.

Congrats, that’s basically how a JSON Web Token works. It’s simple, it works. Life is good.

eXtensible Markup Language

Software engineers circa 2002 were committed to making their lives difficult, and a cornerstone of this masochism was by doing everything with XML if at all possible. XML was hot. People were using its CSS-type thing and transformation language. There were like a dozen competing schema specs. Conferences with keynotes. Super cool.

<Cool very="yes" />

So let’s repeat the above situation, but we’re going to pretend it’s 2002. Instead of reinventing the JWT, you’re gonna use the preferred standard of the time: XML Signatures. Conceptually, it works the same way, because there’s really only one way to skin this cat: convert message to bytes, sign bytes, send both to Bob.

Aside: Why anyone uses XML Signatures today

The joke here is that we’re pretending it’s 2002, but I assure you this story is relevant today. At bigger companies, employees log into all their software from tools like Okta or Microsoft Entra. That works using a decentralized, XML-flavored protocol called SAML. SAML relies entirely on XML Signatures to prove message authenticity. (At SSOReady, we’re in the business of making SAML a breeze to implement, which is why we care about it.)

With that aside, back to our 2002 hypothetical. iPods. Frosted tips. XML.

You’re sending Bob a message.

The central conceit of XML Signatures is that you’re gonna use as many XML-related specs as possible to get this message to Bob.

Step 1 of this endeavor is to realize:

The core bad idea of XML Signatures

I don’t need to concatenate my message and its signature! I can just embed the signature in the message.

This is unwise, because instead of doing XML-y things followed by crypto-y things, you now have to do both simultaneously. You have to take your message:

<Message>
    ... stuff ...
</Message>

And then print it out to a buffer, sign the bytes in that buffer, and then modify the original message in an XML-aware way to insert it back in:

<Message>
    <Signature>...</Signature>
    ... stuff ...
</Message>

How is Bob supposed to verify your message now? He needs to check the exact, byte-for-byte message that you signed. But that’s not what you sent Bob. You sent Bob something similar, but with extra Signature stuff embedded in that. Bob now needs to exactly undo those modifications to check your message.

How is Bob supposed to know whether your original message was this:

<Message>... stuff ...</Message>

<Message>
... stuff ...
</Message>

<Message>

    ... stuff ...
</Message>

The answer is you canonicalize your message when you write it out to your byte-buffer to sign. Canonicalization is an XML-aware algorithm that takes your in-memory data structures, and writes them out to an exact set of bytes. Canonicalization needs to have one critical property. When Bob runs doc.Signature = null and then canonicalizes what’s left of doc, he needs to get exactly the same message that you signed.

Canonicalization is hairy, and that’s you shouldn’t embed signatures within messages. XML Signature suggests two algorithms. Neither works in practice, and instead everyone now uses a follow-on, “exclusive” algorithm that handles XML namespaces more correctly. It contains fun sentences like this:

(This step for for xmlns="" is necessary because it is not represented in the XPath data model as a namespace node, but as the absence of a namespace node; see §4.7 Propagation of Default Namespace Declaration in Document Subsets [XML-C14N].)

But whatever, that’s incidental design flaws, and we’re for now focusing on inherent design flaws.

Here’s a load-bearing sentence in XML Signatures to ponder over:

the validity of a transformed document on the basis of a valid signature should operate over the data that was transformed (including canonicalization) and signed, not the original pre-transformed data

In other words, when Bob processes your message, he can only ever trust the post-canonicalization version of your message, not the message you sent him.

If he does this:

public_key = # known in advance
msg = # ... load from outside world over http / carrier pigeons / etc
assert msg.Signature == signatureOf(public_key, msg.without(Signature))

# ok cool, process `msg` now...

Then he’s screwed! Because XML canonicalization does fun things like producing this, trusted, authenticated message:

<Message>
    <TheBearerOfThisMessageIs>
        steve.jobs@apple.com.evilcorp.com
    </TheBearerOfThisMessageIs>
</Message>

Out of an untrusted, user-inputted message like:

<Message>
    <TheBearerOfThisMessageIs>
        steve.jobs@apple.com<!-- -->.evilcorp.com
    </TheBearerOfThisMessageIs>
</Message>

And so if you had a lapse in judgement and just “verified” msg and forgot to throw it away and only ever look at canonicalize(msg), then code like this:

msg["TheBearerOfThisMessageIs"].children[0].text

returns

steve.jobs@apple.com

instead of what was actually signed, which is:

steve.jobs@apple.com.evilcorp.com

Someone you trust told you to log them in as steve.jobs@apple.com.evilcorp.com; they signed a message to that effect, and the resulting message is basically a self-signed bearer token. But the attacker manipulated the message — and with a single comment, tricked you into thinking the signed message was about steve.jobs@apple.com, dropping the .evilcorp.com. They present you that manipulated, but legit as far as XML Signatures is concerned, message. You log them in as Steve Jobs. Oops!

My takeaway here is, yes — this one’s on you. Should have read the spec more carefully. But also, why do we do this? Why are we making chandeliers out of swords of Damocles?

The moral of the story here is:

If you find yourself in a situation where you need to precisely canonicalize messages in order to correctly cryptographically sign them,
Realize you’re in a hole
Put down the shovel

Mixing your data structure semantics with your cryptography semantics is one of the core bad ideas of software engineering.

Fun, extra credit bad ideas in XML Signatures

It doesn’t stop there, of course. As you may have sensed from the presence of two canonicalization algorithms which are both wrong followed by an “exclusive” one that works better, XML Signatures features a number of incidental, unforced errors as well.

Let’s see some of the highlights.

Including an untrusted key in the signature

Remember how you sent Bob a public key out-of-band so that he could verify your messages? Why not just include it in the untrusted message?

Here’s an actual XML Signature in the wild:

<ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#">
  <ds:SignedInfo>
    <ds:CanonicalizationMethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/>
    <ds:SignatureMethod Algorithm="http://www.w3.org/2001/04/xmldsig-more#rsa-sha256"/>
    <ds:Reference URI="#id35528194005172931133953195">
      <ds:Transforms>
        <ds:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature"/>
        <ds:Transform Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/>
      </ds:Transforms>
      <ds:DigestMethod Algorithm="http://www.w3.org/2001/04/xmlenc#sha256"/>
      <ds:DigestValue>tQ3cGy9Kax5v8DdRTNTVPboMtL5viRVZLNmBIgpx/rQ=</ds:DigestValue>
    </ds:Reference>
  </ds:SignedInfo>
  <ds:SignatureValue>
    Jbtjo4MLglMSc6SopDHj2ZdRf8IA0bT5nlLeaysYgGlj0kd3gO6vYFzsybD6EqRiZvrUrOJU8JANuz17vpPxSGLmt8h1N1Uy0vVRpL3VQYU7KNgr6o2xtSU87IzBKCaGfFqPqN4CLaCs1wbKkAdkxKnwdEo6kHE//hAEckDofmKXdEJDihy8h6uUxO/EwKJgg9+G/8UYD3YiKpeFHfJTI0W+rDKLGmPXbRvHNF/JriltOTPSSZ8noQk2fz7WWYyO0F179MDMBDyxRHhA1uOf9JCYr28pCQ9iPQIIQnABVgAdaq++hixIHhvR4jNrwpGItwJb7aqCqd28TuXXzBUkxw==
  </ds:SignatureValue>
  <ds:KeyInfo>
    <ds:X509Data>
      <ds:X509Certificate>
        MIIDqjCCApKgAwIBAgIGAY8W9FSqMA0GCSqGSIb3DQEBCwUAMIGVMQswCQYDVQQGEwJVUzETMBEG
        A1UECAwKQ2FsaWZvcm5pYTEWMBQGA1UEBwwNU2FuIEZyYW5jaXNjbzENMAsGA1UECgwET2t0YTEU
        MBIGA1UECwwLU1NPUHJvdmlkZXIxFjAUBgNVBAMMDXRyaWFsLTEwMjI4NjMxHDAaBgkqhkiG9w0B
        CQEWDWluZm9Ab2t0YS5jb20wHhcNMjQwNDI1MjAzMDAyWhcNMzQwNDI1MjAzMTAyWjCBlTELMAkG
        A1UEBhMCVVMxEzARBgNVBAgMCkNhbGlmb3JuaWExFjAUBgNVBAcMDVNhbiBGcmFuY2lzY28xDTAL
        BgNVBAoMBE9rdGExFDASBgNVBAsMC1NTT1Byb3ZpZGVyMRYwFAYDVQQDDA10cmlhbC0xMDIyODYz
        MRwwGgYJKoZIhvcNAQkBFg1pbmZvQG9rdGEuY29tMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIB
        CgKCAQEAh8g24a5HDZpwtWuA/HP1JuecGMZ1Wh8R3QC/DQb4aNJtNwJlzMN746MQhkEtXI4TYTah
        3bpbJc5jUFunjZdy8I4+pHCa4wS7lf9Z3c2Ptc9R1XzAX9zhC1Cuj01L69vAinNF8JR1tTx1A7im
        pAWqjtKEQAZNsWrjo0TkQVZlU2wY/CLW+w/zRmHmxSzuCHIVtD9SkgPVXr/Wr2X2SFUc0miGc09x
        FKSl1ARIRVf7jrI0hcSpB5lOd4jrZaM6pvYPTHZYsvtvE9IJUtRlD3OAenBeiHBvkzPwbnhIFUm0
        2Rq9Q7Fvr2CMD8+w/vdgFECelHS0euNVx3uOGydnUh9WOQIDAQABMA0GCSqGSIb3DQEBCwUAA4IB
        AQBqUvihKyejxTpV/mcm7KQu4g3NUx5blTa1jRj2jCDfbn3YckqGI9i0j8BAHNaZw56Nu7OIzDrL
        nxsi8uMmdRAJqAQA7iILGAEJuMvHfv2SJkcu2goB9Xl69Kh34UgZd3tucDEgM3cwhUlltU8yV+P2
        +uzhNaHJkDargKeEI1NQG0lvcFJHP5ESTR9idIipJDdBcSxais3wLkRlhvufp3Rr71Z6TylTVvc3
        QwAjCyTmfR2YjhQkVVfWdOEwqOYhyIn2d+gUex0gEGOZqzmMgCD20mNkiL+YTEsz5XqDaUDQsLrS
        whMgwbzHoz7vrWZiwq2K2AYIu8Uh//DZxsDM9g0B
      </ds:X509Certificate>
    </ds:X509Data>
  </ds:KeyInfo>
</ds:Signature>

(That’s the signature for a legitimate SAML assertion issued by Okta.)

That X509Certificate contains an RSA public key. Can you trust it? Nope! An attacker can put anything there, hoping that you use that public key instead of the correct one, which is established out-of-band.

In case you’re curious, here’s the relevant standard-ese:

KeyInfo may contain keys, names, certificates and other public key management information […] However, questions of trust of such key information (e.g., its authenticity or strength) are out of scope of this specification and left to the application.

That’s design committee language for “good luck out there!”

Signing subsets of the data

Check out this little nugget in the above payload:

<ds:Reference URI="#id35528194005172931133953195">
    ...
</ds:Reference>

That URI is a pointer to somewhere in the payload. I left it out in the previous dump:

<?xml version="1.0" encoding="UTF-8"?>
<saml2p:Response ID="id35528194005172931133953195">
  <ds:Signature>
    ... copy-pasted above ...
  </ds:Signature>
  [... okta-issued information about a user ...]
</saml2p:Response>

The Signature indicates what it’s signing using URI. In theory, you’re supposed to only use the subset of the data that the Signature points to (post-canonicalization, as mentioned above). In practice, a lot of people have their day ruined because they:

Get an XML messsage
Look for a signature
Check that it’s a valid signature, i.e. it really is a good signature for the data it says it’s good for
Process the entire message

The spec begs you to make this mistake. And attackers can just take your message, and do this:

<?xml version="1.0" encoding="UTF-8"?>
<saml2p:Response>
  <!-- the original, legit message, just relocated somewhere pointless -->
  <saml2p:Response ID="id35528194005172931133953195">
    [... okta-issued information about a user ...]
  </saml2p:Response>
  <!-- end of original message -->

  <!-- the original, legit signature and its pointer is still valid -->
  <ds:Signature>
    ... copy-pasted above ...
  </ds:Signature>

  [... attacker-issued payload ...]
</saml2p:Response>

In other words, just keep the signature there but put the signed legit data in a corner off to the side. People call this general class of nastiness “signature wrapping attacks”. Lots of different variations you can try. All of them caused by a cursed mix of trusted and untrusted data.

You can embed multiple signatures in a XML message, and this is actually a thing people do — most real-world SAML assertions have one at the top-level Response level and at the child Assertion level.

What is the spec’s advice of “only trust post-URI, post-canonicalization messages” even supposed to mean when you have multiple Signature elements in play?

Signing the wrong hash

Let’s take another look at that Signature. I’m gonna remove a bunch of stuff to focus on two pieces of data:

<ds:Signature>
  <ds:SignedInfo>
    ...
    <ds:Reference>
      ...
      <ds:DigestValue>tQ3cGy9Kax5v8DdRTNTVPboMtL5viRVZLNmBIgpx/rQ=</ds:DigestValue>
    </ds:Reference>
  </ds:SignedInfo>
  <ds:SignatureValue>
    Jbtjo4MLglMSc6SopDHj2ZdRf8IA0bT5nlLeaysYgGlj0kd3gO6vYFzsybD6EqRiZvrUrOJU8JANuz17vpPxSGLmt8h1N1Uy0vVRpL3VQYU7KNgr6o2xtSU87IzBKCaGfFqPqN4CLaCs1wbKkAdkxKnwdEo6kHE//hAEckDofmKXdEJDihy8h6uUxO/EwKJgg9+G/8UYD3YiKpeFHfJTI0W+rDKLGmPXbRvHNF/JriltOTPSSZ8noQk2fz7WWYyO0F179MDMBDyxRHhA1uOf9JCYr28pCQ9iPQIIQnABVgAdaq++hixIHhvR4jNrwpGItwJb7aqCqd28TuXXzBUkxw==
  </ds:SignatureValue>
  ...
</ds:Signature>

There’s a DigestValue and a SignatureValue. Here’s how they work:

DigestValue is the SHA-256 of the canonicalized data
SignatureValue is the RSA signature of SignedInfo, which contains DigestValue.

Here’s a fun attack: just sign a different DigestValue, and put that in the SignatureValue. Maybe you get lucky and your target only checks DigestValue-SignatureValue correspondence, and not whether the DigestValue is in fact the correct SHA-256 of the payload.

Why even include a DigestValue? It’s extra work, just to add new ways to screw this up. Just do the normal digest-and-sign, and only put one value to verify.

Settings, settings, settings

If your goal is to introduce a spec with maximal damage, you would want to make it:

Security-critical
Hard to use correctly
Hard to implement

You do (3) because that way, fewer implementations ever happen. Everyone flocks to them, requesting features. It’s hard to say “no” as a maintainer when you know full well folks don’t have an alternative to you, so you say “yes” to a lot of people, getting you (2).

You get these libraries where everything is possible, nothing is easy, and misconfiguration is vulnerability.

Here are some of the things the W3C folks recommend go into every implementation:

All of XPath is recommended. XSLT is mercifully “optional”.
So far, I’ve only ever talked about the “enveloped signature” approach, where <Signature> goes inside the payload. Almost every cognizable permutation of putting the payload, Signature, SignedInfo, and DigestValue into the same or separate messages is possible.
They say you’re required to implement “Canonical XML”, even though everyone instead uses the “exclusive” variant.
For signatures, you’re required to support the following HMAC algorithms:
- HMAC-SHA1
- HMAC-SHA256
And the following signature algorithms:
- RSAwithSHA256
- ECDSAwithSHA256
- DSAwithSHA1
Oh right — to this point I’ve only mentioned public-key crypto. But yeah the spec has HMACs because it also supports symmetric-key stuff too.
Canonicalizations can be parameterized. For example, you can specify a InclusiveNamespaces child element to the relevant Transform in the Signature, whose PrefixList attribute guarantees that certain XML namespaces are always considered “visible” by that canonicalization algorithm.

There’s so much more. Remember that URI on SignedInfo from earlier? We saw it have values like #id.... Looks like a URL “fragment” part. Because it is a URL fragment part. Because URI is supposed to contain a (possibly but not always) relative-URI. Here’s a fun paragraph:

XML signature applications MUST be able to parse URI syntax. We RECOMMEND they be able to dereference URIs in the HTTP scheme. Dereferencing a URI in the HTTP scheme MUST comply with the Status Code Definitions of [HTTP] (e.g., 302, 305 and 307 redirects are followed to obtain the entity-body of a 200 status code response). Applications should also be cognizant of the fact that protocol parameter and state information, (such as HTTP cookies, HTML device profiles or content negotiation), may affect the content yielded by dereferencing a URI.

Fun! Fun.

How we implemented XML Signatures

At SSOReady, we need to implement XML Signatures because they are the only way you can tell if a SAML assertion is legitimate. Here’s how we went about building it:

Read the entire spec in its heaving, boundless, expansionist glory.
Put that all aside.
Look at how SAML, and by extension XML Signature, works in working production systems.
Implement against that profile of the specification.

For instance, at SSOReady we don’t “discover” XML signatures, in the way the spec often seems to want you to. We expect the signature is exactly in the place every implementer in practice puts it (rejecting the request if it isn’t there), pluck out exactly the data we’ve decided in advance we care about, and then carry out a specific sequence of checks.

In practice, this looks to the outside like an implementation of the spec. But it’s not. It only implements the subset of the spec that people have mostly converged on.

We think this is the only responsible way to handle something like XML Signatures. You either implement a strict subset yourself, or you tie your fate to something like libxml2 and hope for the best.

Contemporary lessons from the XML era

The core lesson, in my view, from XML Signatures is this: keep it simple, stupid.

XML Signatures are the way they are because the goal was to solve everyone’s problems at once.

The “wrapped Signature” design comes from trying to retrofit cryptography into every existing XML-processing system. They couldn’t just say “stick a signature in an HTTP header”, because not everyone was on HTTP.
The URI design and its associated wrapped signature attacks come from a desire to be able to have a signature for data totally independent from any copy of that data. I’m sure somebody liked that, but everyone else was profoundly inconvenienced.
XML canonicalization is a mess because people thought namespaces were a great idea in software and thus a great idea in wire protocols. It’s a good idea for certain documents, but it’s a massive pain for everyone else. The fact that we all get along mostly fine with dumb old JSON suggests that dumb-and-plain is usually good enough.

I’m not saying the designers of XML Signatures are foolish. I’m saying their aims — the aim of unifying and systematizing all of cryptography — left them with no room for the most important thing: simplicity.

In my view, the lesson is this:

Work backwards from what you can do in a simple way.
Focus on actual problems, not potential ones.
Make only specific pains go away.
Don’t try to unify things into one complicated thing, when doing a hundred simple things would do instead.

Present laughter, not utopian bliss.