XML Signatures are a bad idea executed even worse
Let’s say you’re going to send Bob a message, and you want Bob to know it’s you that sent the message just by looking at the message. You’ll probably
- Make a public/private keypair, and show Bob the public key
- Serialize your message into bytes
- Sign the bytes with the private key
- Concatenate the message and the signature in some sane dumb way, like
base64(msg) + "." + base64(signature)
- Send that
Bob checks your message is legit by
- Splitting your payload into
msg
andsignature
- Checking that the signature does match the
msg
bytes (he has your public key already)
If that checks out, then Bob can go about his business with msg
, knowing it
came from you.
Congrats, that’s basically how a JSON Web Token works. It’s simple, it works. Life is good.
eXtensible Markup Language
Software engineers circa 2002 were committed to making their lives difficult, and a cornerstone of this masochism was by doing everything with XML if at all possible. XML was hot. People were using its CSS-type thing and transformation language. There were like a dozen competing schema specs. Conferences with keynotes. Super cool.
<Cool very="yes" />
So let’s repeat the above situation, but we’re going to pretend it’s 2002. Instead of reinventing the JWT, you’re gonna use the preferred standard of the time: XML Signatures. Conceptually, it works the same way, because there’s really only one way to skin this cat: convert message to bytes, sign bytes, send both to Bob.
Aside: Why anyone uses XML Signatures today
The joke here is that we’re pretending it’s 2002, but I assure you this story is relevant today. At bigger companies, employees log into all their software from tools like Okta or Microsoft Entra. That works using a decentralized, XML-flavored protocol called SAML. SAML relies entirely on XML Signatures to prove message authenticity. (At SSOReady, we’re in the business of making SAML a breeze to implement, which is why we care about it.)
With that aside, back to our 2002 hypothetical. iPods. Frosted tips. XML.
You’re sending Bob a message.
The central conceit of XML Signatures is that you’re gonna use as many XML-related specs as possible to get this message to Bob.
Step 1 of this endeavor is to realize:
The core bad idea of XML Signatures
I don’t need to concatenate my message and its signature! I can just embed the signature in the message.
This is unwise, because instead of doing XML-y things followed by crypto-y things, you now have to do both simultaneously. You have to take your message:
<Message>
... stuff ...
</Message>
And then print it out to a buffer, sign the bytes in that buffer, and then modify the original message in an XML-aware way to insert it back in:
<Message>
<Signature>...</Signature>
... stuff ...
</Message>
How is Bob supposed to verify your message now? He needs to check the exact, byte-for-byte message that you signed.
But that’s not what you sent Bob. You sent Bob something similar, but with extra Signature
stuff embedded in that. Bob
now needs to exactly undo those modifications to check your message.
How is Bob supposed to know whether your original message was this:
<Message>... stuff ...</Message>
or
<Message>
... stuff ...
</Message>
or
<Message>
... stuff ...
</Message>
The answer is you canonicalize your message when you write it out to your byte-buffer to sign. Canonicalization is an
XML-aware algorithm that takes your in-memory data structures, and writes them out to an exact set of bytes.
Canonicalization needs to have one critical property. When Bob runs doc.Signature = null
and then canonicalizes what’s
left of doc
, he needs to get exactly the same message that you signed.
Canonicalization is hairy, and that’s you shouldn’t embed signatures within messages. XML Signature suggests two algorithms. Neither works in practice, and instead everyone now uses a follow-on, “exclusive” algorithm that handles XML namespaces more correctly. It contains fun sentences like this:
(This step for for
xmlns=""
is necessary because it is not represented in the XPath data model as a namespace node, but as the absence of a namespace node; see §4.7 Propagation of Default Namespace Declaration in Document Subsets [XML-C14N].)
But whatever, that’s incidental design flaws, and we’re for now focusing on inherent design flaws.
Here’s a load-bearing sentence in XML Signatures to ponder over:
the validity of a transformed document on the basis of a valid signature should operate over the data that was transformed (including canonicalization) and signed, not the original pre-transformed data
In other words, when Bob processes your message, he can only ever trust the post-canonicalization version of your message, not the message you sent him.
If he does this:
public_key = # known in advance
msg = # ... load from outside world over http / carrier pigeons / etc
assert msg.Signature == signatureOf(public_key, msg.without(Signature))
# ok cool, process `msg` now...
Then he’s screwed! Because XML canonicalization does fun things like producing this, trusted, authenticated message:
<Message>
<TheBearerOfThisMessageIs>
steve.jobs@apple.com.evilcorp.com
</TheBearerOfThisMessageIs>
</Message>
Out of an untrusted, user-inputted message like:
<Message>
<TheBearerOfThisMessageIs>
steve.jobs@apple.com<!-- -->.evilcorp.com
</TheBearerOfThisMessageIs>
</Message>
And so if you had a lapse in judgement and just “verified” msg
and forgot to
throw it away and only ever look at canonicalize(msg)
, then code like this:
msg["TheBearerOfThisMessageIs"].children[0].text
returns
steve.jobs@apple.com
instead of what was actually signed, which is:
steve.jobs@apple.com.evilcorp.com
Someone you trust told you to log them in as steve.jobs@apple.com.evilcorp.com
; they signed a message to that effect,
and the resulting message is basically a self-signed bearer token. But the attacker manipulated the message — and with
a single comment, tricked you into thinking the signed message was about steve.jobs@apple.com
, dropping the
.evilcorp.com
. They present you that manipulated, but legit as far as XML Signatures is concerned, message. You log
them in as Steve Jobs. Oops!
My takeaway here is, yes — this one’s on you. Should have read the spec more carefully. But also, why do we do this? Why are we making chandeliers out of swords of Damocles?
The moral of the story here is:
- If you find yourself in a situation where you need to precisely canonicalize messages in order to correctly cryptographically sign them,
- Realize you’re in a hole
- Put down the shovel
Mixing your data structure semantics with your cryptography semantics is one of the core bad ideas of software engineering.
Fun, extra credit bad ideas in XML Signatures
It doesn’t stop there, of course. As you may have sensed from the presence of two canonicalization algorithms which are both wrong followed by an “exclusive” one that works better, XML Signatures features a number of incidental, unforced errors as well.
Let’s see some of the highlights.
Including an untrusted key in the signature
Remember how you sent Bob a public key out-of-band so that he could verify your messages? Why not just include it in the untrusted message?
Here’s an actual XML Signature in the wild:
<ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#">
<ds:SignedInfo>
<ds:CanonicalizationMethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/>
<ds:SignatureMethod Algorithm="http://www.w3.org/2001/04/xmldsig-more#rsa-sha256"/>
<ds:Reference URI="#id35528194005172931133953195">
<ds:Transforms>
<ds:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature"/>
<ds:Transform Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/>
</ds:Transforms>
<ds:DigestMethod Algorithm="http://www.w3.org/2001/04/xmlenc#sha256"/>
<ds:DigestValue>tQ3cGy9Kax5v8DdRTNTVPboMtL5viRVZLNmBIgpx/rQ=</ds:DigestValue>
</ds:Reference>
</ds:SignedInfo>
<ds:SignatureValue>
Jbtjo4MLglMSc6SopDHj2ZdRf8IA0bT5nlLeaysYgGlj0kd3gO6vYFzsybD6EqRiZvrUrOJU8JANuz17vpPxSGLmt8h1N1Uy0vVRpL3VQYU7KNgr6o2xtSU87IzBKCaGfFqPqN4CLaCs1wbKkAdkxKnwdEo6kHE//hAEckDofmKXdEJDihy8h6uUxO/EwKJgg9+G/8UYD3YiKpeFHfJTI0W+rDKLGmPXbRvHNF/JriltOTPSSZ8noQk2fz7WWYyO0F179MDMBDyxRHhA1uOf9JCYr28pCQ9iPQIIQnABVgAdaq++hixIHhvR4jNrwpGItwJb7aqCqd28TuXXzBUkxw==
</ds:SignatureValue>
<ds:KeyInfo>
<ds:X509Data>
<ds:X509Certificate>
MIIDqjCCApKgAwIBAgIGAY8W9FSqMA0GCSqGSIb3DQEBCwUAMIGVMQswCQYDVQQGEwJVUzETMBEG
A1UECAwKQ2FsaWZvcm5pYTEWMBQGA1UEBwwNU2FuIEZyYW5jaXNjbzENMAsGA1UECgwET2t0YTEU
MBIGA1UECwwLU1NPUHJvdmlkZXIxFjAUBgNVBAMMDXRyaWFsLTEwMjI4NjMxHDAaBgkqhkiG9w0B
CQEWDWluZm9Ab2t0YS5jb20wHhcNMjQwNDI1MjAzMDAyWhcNMzQwNDI1MjAzMTAyWjCBlTELMAkG
A1UEBhMCVVMxEzARBgNVBAgMCkNhbGlmb3JuaWExFjAUBgNVBAcMDVNhbiBGcmFuY2lzY28xDTAL
BgNVBAoMBE9rdGExFDASBgNVBAsMC1NTT1Byb3ZpZGVyMRYwFAYDVQQDDA10cmlhbC0xMDIyODYz
MRwwGgYJKoZIhvcNAQkBFg1pbmZvQG9rdGEuY29tMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIB
CgKCAQEAh8g24a5HDZpwtWuA/HP1JuecGMZ1Wh8R3QC/DQb4aNJtNwJlzMN746MQhkEtXI4TYTah
3bpbJc5jUFunjZdy8I4+pHCa4wS7lf9Z3c2Ptc9R1XzAX9zhC1Cuj01L69vAinNF8JR1tTx1A7im
pAWqjtKEQAZNsWrjo0TkQVZlU2wY/CLW+w/zRmHmxSzuCHIVtD9SkgPVXr/Wr2X2SFUc0miGc09x
FKSl1ARIRVf7jrI0hcSpB5lOd4jrZaM6pvYPTHZYsvtvE9IJUtRlD3OAenBeiHBvkzPwbnhIFUm0
2Rq9Q7Fvr2CMD8+w/vdgFECelHS0euNVx3uOGydnUh9WOQIDAQABMA0GCSqGSIb3DQEBCwUAA4IB
AQBqUvihKyejxTpV/mcm7KQu4g3NUx5blTa1jRj2jCDfbn3YckqGI9i0j8BAHNaZw56Nu7OIzDrL
nxsi8uMmdRAJqAQA7iILGAEJuMvHfv2SJkcu2goB9Xl69Kh34UgZd3tucDEgM3cwhUlltU8yV+P2
+uzhNaHJkDargKeEI1NQG0lvcFJHP5ESTR9idIipJDdBcSxais3wLkRlhvufp3Rr71Z6TylTVvc3
QwAjCyTmfR2YjhQkVVfWdOEwqOYhyIn2d+gUex0gEGOZqzmMgCD20mNkiL+YTEsz5XqDaUDQsLrS
whMgwbzHoz7vrWZiwq2K2AYIu8Uh//DZxsDM9g0B
</ds:X509Certificate>
</ds:X509Data>
</ds:KeyInfo>
</ds:Signature>
(That’s the signature for a legitimate SAML assertion issued by Okta.)
That X509Certificate
contains an RSA public key. Can you trust it? Nope! An
attacker can put anything there, hoping that you use that public key instead
of the correct one, which is established out-of-band.
In case you’re curious, here’s the relevant standard-ese:
KeyInfo
may contain keys, names, certificates and other public key management information […] However, questions of trust of such key information (e.g., its authenticity or strength) are out of scope of this specification and left to the application.
That’s design committee language for “good luck out there!”
Signing subsets of the data
Check out this little nugget in the above payload:
<ds:Reference URI="#id35528194005172931133953195">
...
</ds:Reference>
That URI
is a pointer to somewhere in the payload. I left it out in the previous
dump:
<?xml version="1.0" encoding="UTF-8"?>
<saml2p:Response ID="id35528194005172931133953195">
<ds:Signature>
... copy-pasted above ...
</ds:Signature>
[... okta-issued information about a user ...]
</saml2p:Response>
The Signature
indicates what it’s signing using URI
. In theory, you’re
supposed to only use the subset of the data that the Signature
points to
(post-canonicalization, as mentioned above). In practice, a lot of people have
their day ruined because they:
- Get an XML messsage
- Look for a signature
- Check that it’s a valid signature, i.e. it really is a good signature for the data it says it’s good for
- Process the entire message
The spec begs you to make this mistake. And attackers can just take your message, and do this:
<?xml version="1.0" encoding="UTF-8"?>
<saml2p:Response>
<!-- the original, legit message, just relocated somewhere pointless -->
<saml2p:Response ID="id35528194005172931133953195">
[... okta-issued information about a user ...]
</saml2p:Response>
<!-- end of original message -->
<!-- the original, legit signature and its pointer is still valid -->
<ds:Signature>
... copy-pasted above ...
</ds:Signature>
[... attacker-issued payload ...]
</saml2p:Response>
In other words, just keep the signature there but put the signed legit data in a corner off to the side. People call this general class of nastiness “signature wrapping attacks”. Lots of different variations you can try. All of them caused by a cursed mix of trusted and untrusted data.
You can embed multiple signatures in a XML message, and this is actually a thing
people do — most real-world SAML assertions have one at the top-level
Response
level and at the child Assertion
level.
What is the spec’s advice of “only trust post-URI, post-canonicalization messages” even supposed to mean when you have
multiple Signature
elements in play?
Signing the wrong hash
Let’s take another look at that Signature
. I’m gonna remove a bunch of stuff
to focus on two pieces of data:
<ds:Signature>
<ds:SignedInfo>
...
<ds:Reference>
...
<ds:DigestValue>tQ3cGy9Kax5v8DdRTNTVPboMtL5viRVZLNmBIgpx/rQ=</ds:DigestValue>
</ds:Reference>
</ds:SignedInfo>
<ds:SignatureValue>
Jbtjo4MLglMSc6SopDHj2ZdRf8IA0bT5nlLeaysYgGlj0kd3gO6vYFzsybD6EqRiZvrUrOJU8JANuz17vpPxSGLmt8h1N1Uy0vVRpL3VQYU7KNgr6o2xtSU87IzBKCaGfFqPqN4CLaCs1wbKkAdkxKnwdEo6kHE//hAEckDofmKXdEJDihy8h6uUxO/EwKJgg9+G/8UYD3YiKpeFHfJTI0W+rDKLGmPXbRvHNF/JriltOTPSSZ8noQk2fz7WWYyO0F179MDMBDyxRHhA1uOf9JCYr28pCQ9iPQIIQnABVgAdaq++hixIHhvR4jNrwpGItwJb7aqCqd28TuXXzBUkxw==
</ds:SignatureValue>
...
</ds:Signature>
There’s a DigestValue
and a SignatureValue
. Here’s how they work:
DigestValue
is the SHA-256 of the canonicalized dataSignatureValue
is the RSA signature ofSignedInfo
, which containsDigestValue
.
Here’s a fun attack: just sign a different DigestValue
, and put that in the
SignatureValue
. Maybe you get lucky and your target only checks
DigestValue
-SignatureValue
correspondence, and not whether the DigestValue
is in fact the correct SHA-256 of the payload.
Why even include a DigestValue
? It’s extra work, just to add new ways to
screw this up. Just do the normal digest-and-sign, and only put one value to
verify.
Settings, settings, settings
If your goal is to introduce a spec with maximal damage, you would want to make it:
- Security-critical
- Hard to use correctly
- Hard to implement
You do (3) because that way, fewer implementations ever happen. Everyone flocks to them, requesting features. It’s hard to say “no” as a maintainer when you know full well folks don’t have an alternative to you, so you say “yes” to a lot of people, getting you (2).
You get these libraries where everything is possible, nothing is easy, and misconfiguration is vulnerability.
Here are some of the things the W3C folks recommend go into every implementation:
-
So far, I’ve only ever talked about the “enveloped signature” approach, where
<Signature>
goes inside the payload. Almost every cognizable permutation of putting the payload,Signature
,SignedInfo
, andDigestValue
into the same or separate messages is possible. -
They say you’re required to implement “Canonical XML”, even though everyone instead uses the “exclusive” variant.
-
For signatures, you’re required to support the following HMAC algorithms:
- HMAC-SHA1
- HMAC-SHA256
And the following signature algorithms:
- RSAwithSHA256
- ECDSAwithSHA256
- DSAwithSHA1
-
Oh right — to this point I’ve only mentioned public-key crypto. But yeah the spec has HMACs because it also supports symmetric-key stuff too.
-
Canonicalizations can be parameterized. For example, you can specify a
InclusiveNamespaces
child element to the relevantTransform
in theSignature
, whosePrefixList
attribute guarantees that certain XML namespaces are always considered “visible” by that canonicalization algorithm.
There’s so much more. Remember that URI
on SignedInfo
from earlier? We saw
it have values like #id...
. Looks like a URL “fragment” part. Because it is a
URL fragment part. Because URI
is supposed to contain a (possibly but not
always) relative-URI. Here’s a fun paragraph:
XML signature applications MUST be able to parse URI syntax. We RECOMMEND they be able to dereference URIs in the HTTP scheme. Dereferencing a URI in the HTTP scheme MUST comply with the Status Code Definitions of [HTTP] (e.g., 302, 305 and 307 redirects are followed to obtain the entity-body of a 200 status code response). Applications should also be cognizant of the fact that protocol parameter and state information, (such as HTTP cookies, HTML device profiles or content negotiation), may affect the content yielded by dereferencing a URI.
Fun! Fun.
How we implemented XML Signatures
At SSOReady, we need to implement XML Signatures because they are the only way you can tell if a SAML assertion is legitimate. Here’s how we went about building it:
- Read the entire spec in its heaving, boundless, expansionist glory.
- Put that all aside.
- Look at how SAML, and by extension XML Signature, works in working production systems.
- Implement against that profile of the specification.
For instance, at SSOReady we don’t “discover” XML signatures, in the way the spec often seems to want you to. We expect the signature is exactly in the place every implementer in practice puts it (rejecting the request if it isn’t there), pluck out exactly the data we’ve decided in advance we care about, and then carry out a specific sequence of checks.
In practice, this looks to the outside like an implementation of the spec. But it’s not. It only implements the subset of the spec that people have mostly converged on.
We think this is the only responsible way to handle something like XML
Signatures. You either implement a strict subset yourself, or you tie your fate
to something like libxml2
and hope for the
best.
Contemporary lessons from the XML era
The core lesson, in my view, from XML Signatures is this: keep it simple, stupid.
XML Signatures are the way they are because the goal was to solve everyone’s problems at once.
- The “wrapped
Signature
” design comes from trying to retrofit cryptography into every existing XML-processing system. They couldn’t just say “stick a signature in an HTTP header”, because not everyone was on HTTP. - The
URI
design and its associated wrapped signature attacks come from a desire to be able to have a signature for data totally independent from any copy of that data. I’m sure somebody liked that, but everyone else was profoundly inconvenienced. - XML canonicalization is a mess because people thought namespaces were a great idea in software and thus a great idea in wire protocols. It’s a good idea for certain documents, but it’s a massive pain for everyone else. The fact that we all get along mostly fine with dumb old JSON suggests that dumb-and-plain is usually good enough.
I’m not saying the designers of XML Signatures are foolish. I’m saying their aims — the aim of unifying and systematizing all of cryptography — left them with no room for the most important thing: simplicity.
In my view, the lesson is this:
- Work backwards from what you can do in a simple way.
- Focus on actual problems, not potential ones.
- Make only specific pains go away.
- Don’t try to unify things into one complicated thing, when doing a hundred simple things would do instead.