Why SCIM exists

Imagine you’re running a relatively large company, one with a few thousand employees. All of those employees use at least some software to do their jobs. It’s probably safe to assume that you’re dealing with hundreds of different SaaS applications across the company. You’ll have an app for approving expenses, an app for managing salespeople’s compensation, an app for piping data into your data warehouse, and much more. There’s an awfully long list of stuff.

Every employee needs access to some subset of apps. They need to do their jobs, after all. But you can’t give everyone access to everything. That’d cause all kinds of security, compliance, and practical problems. You need a way to assign different permissions to different people.

To handle access and permissions all in one centralized place, companies tend to use IT management software like Entra, Okta, or OneLogin (among many others); people tend to describe these tools as identity providers.

An identity provider (IDP) behaves a bit like a database. It maintains a list of employees along with a bunch of information about each person. Similarly, it maintains a list of different software applications. It keeps track of the mappings between people and applications. It’s very easy for the IT team to modify and create relationships between records.

Simply having a list of users and their access privileges in a database doesn’t help anyone much, though. The identity provider also needs to communicate information about users with other software.

The identity provider basically needs to communicate three kinds of changes to other software:

The addition of new users (e.g. new hires)
The change of any existing user’s attributes (e.g. name, job title, etc.)
The removal of any existing users (e.g. departing employees)

Identity providers typically rely on a standard called SCIM (the System for Cross-domain Identity Management) for these three communication tasks. They use SCIM to make every integration with other software look roughly the same, which eliminates the need for complicated bespoke integrations with the myriad applications they need to support.

What SCIM (basically) does

At a certain level of abstraction, a SCIM implementation looks a bit like a CS101 problem set. All we’re doing is making one list in your software look like another list in your customer’s software.

Put very simply, SCIM just defines some rules for the JSON that the identity provider sends and the JSON that the identity provider expects to receive in response. The JSON we’re trading with the identity provider exists solely to help us perform matching CRUD operations.

Let’s go through a conceptual example.

Suppose you’re selling a new inventory management system to Dunder Mifflin. Their IT team wants you to provision users programmatically from their identity provider. They have a list of users that need access to the inventory software:

Kevin Malone, Accountant
Darryl Philbin, Shipping Manager
Creed Bratton, Quality Assurance Representative

Great – via SCIM, you’ll get a pretty standardized block of JSON telling you which users to create. With that block of JSON, you’ll modify your Users table to include records for Kevin, Darryl, and Creed.

Suppose Darryl gets transitioned to a new role as Marketing Director. Great, IT updates his title in the identity provider. Within a few moments, you’ll get a standard block of JSON telling you to change Darryl’s job title. You’ll again just modify your Users table. Nothing too crazy.

Given Darryl’s job change, the Dunder Mifflin IT team decides he doesn’t need access to his account in the inventory management system anymore. Great, they just remove the mapping between Darryl and your software in their identity provider. Before long, you receive a standard block of JSON that tells you to deprovision – remove – Darryl’s account. You’ll make the corresponding changes in your Users table.

That’s all that SCIM is trying to do.

What SCIM isn’t

We often meet people who think SCIM support means that they need to make major changes to their software. This really should not be the case. Here, I’ll go through a few common misconceptions.

SCIM doesn’t really have anything to do with compliance. SCIM isn’t really related to SOC 2, even though there’s probably overlap in the kinds of customers who care about SCIM and the customers who care about SOC 2.

SCIM doesn’t really have anything to do with data retention. This is usually a separate conversation you might need to have with your customer.

SCIM doesn’t have any direct effect on your single sign-on implementation. Although it’s typical for people to use SCIM in combination with SAML SSO, these two standards actually don’t need to exist together or interact at all. It would be unusual, but you could use SCIM to manage users that use vanilla passwords to access your software.

SCIM doesn’t have any direct effect on how you manage your sessions. You can use whatever session management tools you’d like. Relatedly, SCIM doesn’t require single log-out support. If you receive an instruction from your customer to deprovision a user, you don’t typically need to revoke any active sessions belonging to that user.

SCIM doesn’t require major changes to your users schema. As long as you have a way of translating the JSON you receive into the desired CRUD operations,

SCIM doesn’t even really mean you have to support hard-deleting users or their data. You should set clear expectations with your customers, but it’s usually sufficient from your customer’s perspective that de-provisioning a user results in their account appearing no longer to exist. If that simply means adding some boolean column in your database that looks like IS_DELETED, that’s probably fine.

SCIM doesn’t require real-time updates. In practice, it’s usually fine if you’re processing updates every few hours. Many companies’ identity providers can’t support real-time updates anyway.

How SCIM works at a (minimally) technical level

Earlier, I mentioned that SCIM just performs CRUD operations via JSON. We should take that relatively literally. SCIM will only ever handle creating, reading, updating, and deleting records.

To do so, SCIM maps pretty familiar HTTP verbs onto these CRUD tasks. We’ll need to support GET and POST, as you would expect. We’ll also need to spend time thinking about PUT, PATCH, and DELETE, which can get annoying.

There’s something really important to bear in mind as we venture a little deeper into SCIM here. If your customer wants your software to hook into their identity provider, your software becomes the server – and your customer’s identity provider becomes the client. This might feel a little weird at first. After all, your customer’s identity provider stores the data you need. But it should make sense as we go.

Client/server relationship and authentication in SCIM

For any given SCIM operation, your customer’s identity provider will send one or more requests to your software. You need to process the request correctly, then you need to respond in a manner that the identity provider expects and understands.

Your software will play a passive role in SCIM. It will stay online and process requests as they roll in. The customer’s identity provider is the client. Your software is the server.

Given its role as the client, your customer’s identity provider needs to authenticate itself to you. It needs to prove that it’s actually the identity provider – and not some attacker – to make the changes it’s requesting. (It would be really bad if anyone could come along and provision users!)

We have a few different ways of authenticating SCIM clients, but bearer tokens are the most widely-supported option. You as the server need to generate a secret, share it with your customer, and have the customer’s identity provider present that secret in HTTP headers when it makes requests. You’ll consider the presentation of a valid bearer token to be sufficient proof of identity and, consequently, you’ll honor any HTTP request that correctly presents a valid bearer token.

The data that we handle in SCIM operations

SCIM’s creators built it around a generic concept that they call a resource. In this context, we should understand a resource to mean a particular kind of record.

In principle, we can represent basically anything as a SCIM resource. In practice, though, we pretty much only care about two of them. SCIM has users, and it has groups. People don’t tend to use SCIM for anything else.

Users in SCIM work pretty much like you’d expect. They’re just records of the people who use your software. You can think of groups as lists of users. (We can also have groups of groups.)

Read operations in SCIM

Let’s say you’ve just correctly configured a client/server trust relationship with your customer’s identity provider.

Before anything else happens, your customer’s identity provider will send you an HTTP GET. It will look basically like this:

All the identity provider does here is to hit a given /Users endpoint – in this case with a userName query parameter. It’s effectively asking here, tell me what data you have for creed.bratton@example.com. You’ll reply with a standard response code and some JSON.

In this case, we don’t have any data for Creed, so we’ll respond with a status code 200 and the following JSON:

{
    "itemsPerPage": 1,
    "resources": [],
    "schemas": [
        "urn:ietf:params:scim:schemas:core:2.0:User"
    ],
    "startIndex": 1,
    "totalResults": 0
}

Notice that we’re using this schemas property here to reference urn:ietf:params:scim:schemas:core:2.0:User. We’re just referencing the built-in concept of users from the SCIM spec.

Creation operations in SCIM

The identity provider sees that we don’t have any data for Creed yet – based on the JSON we sent back – and so it instructs us to create a record.

To do so, it will use an HTTP POST with the following request body:

{
    "name": {
        "familyName": "Bratton",
        "givenName": "Creed"
    },
    "userName": "creed.bratton@example.com"
}

We just take this JSON and use our preferred method of creating a user record for Creed in our backend.

Update operations in SCIM

Updates in SCIM will use either HTTP PUT or HTTP PATCH, depending on the identity provider. Some identity providers tend only to use PUT. Others tend only to use PATCH. It’s kind of chaotic.

PUT operations in SCIM are pretty simple. They basically communicate: replace this user’s data with the data I’m showing you here. For example, Okta might send the following PUT request:

PUT /scim/v2/Users/23a35c27-23d3-4c03-b4c5-6443c09e7173 HTTP/1.1
User-Agent: Okta SCIM Client 1.0.0
Authorization: <Authorization credentials>

{
    "schemas": ["urn:ietf:params:scim:schemas:core:2.0:User"],
    "id": "23a35c27-23d3-4c03-b4c5-6443c09e7173",
    "userName": "test.user@okta.local",
    "name": {
        "givenName": "Another",
        "middleName": "Excited",
        "familyName": "User"
    },
    "emails": [{
        "primary": true,
        "value": "test.user@okta.local",
        "type": "work",
        "display": "test.user@okta.local"
    }],
    "active": true,
    "groups": [],
    "meta": {
        "resourceType": "User"
    }
}

You as the server will just look for the test.user@okta.local user data and replace it. Not too bad!

PATCH requests in SCIM can be a little more complicated. PATCH represents a partial update, whereas PUT represents a complete replacement. Here’s an example PATCH from the SCIM specification:

PATCH /Groups/acbf3ae7-8463-...-9b4da3f908ce
Host: example.com
Accept: application/scim+json
Content-Type: application/scim+json
Authorization: Bearer h480djs93hd8
If-Match: W/"a330bc54f0671c9"

{
    "schemas": ["urn:ietf:params:scim:api:messages:2.0"], 
    "Operations":[{ 
            "Op":"add", 
            "Path":"members", 
            "Value":[{ 
                    "display": "Babs Jensen",
                    "$ref": "https://example.com/v2/Users/2819c223...413861904646",
                    "Value": "2819c223-7f76-453a-919d-413861904646" 
                }
            ]
        }
    ]
}

Notice the Operations array here. The SCIM specification allows PATCH operations to add, remove, or replace data. This turns out to be a little complicated. (More on this later).

Delete operations in SCIM

According to the SCIM specification, the identity provider should send an HTTP DELETE request, something like the following:

DELETE /Users/2819c223-7f76-453a-919d-413861904646
Host: example.com
Authorization: Bearer h480djs93hd8
If-Match: W/"c310cd84f0281b7"

This is sometimes how things work, but it’s not how things always work in practice.

For example, Okta doesn’t want to use DELETE. Instead, they PUT or PATCH a record such that the active property gets set to False.

Is it a good idea to build SCIM?

The SCIM specification is basically good, but has some subtle details

We really like the SCIM specification. Compared to other standards (looking at you, SAML), SCIM makes a lot of sense. It’s conceptually pretty simple, and it doesn’t come with major design flaws.

It does have some subtle quirks, though. I’ll mention just a few here.

As I mentioned earlier, PATCH can get a bit complicated. You have to handle a bunch of different kinds of operations to support PATCH. Don’t worry about the content – but here’s an excerpt from the SCIM spec on how PATCH’s add operation is supposed to work:

The result of the add operation depends upon what the target location indicated by “path” references:

If omitted, the target location is assumed to be the resource itself. The “value” parameter contains a set of attributes to be added to the resource.

If the target location does not exist, the attribute and value are added.

If the target location specifies a complex attribute, a set of sub-attributes SHALL be specified in the “value” parameter.

If the target location specifies a multi-valued attribute, a new value is added to the attribute.

If the target location specifies a single-valued attribute, the existing value is replaced.

If the target location specifies an attribute that does not exist (has no value), the attribute is added with the new value.

If the target location exists, the value is replaced.

If the target location already contains the value specified, no changes SHOULD be made to the resource, and a success response SHOULD be returned. Unless other operations change the resource, this operation SHALL NOT change the modify timestamp of the resource.

That’s a decent number of ifs to support! Individually, none of these is too bad. But taken as a whole, they represent a decent chunk of code that you’ll have to write. More importantly, they represent an awful lot of tests that you’ll have to write. Yuck.

I alluded indirectly to this other quirk earlier. SCIM stakes some strange opinions, including the expected instant messaging providers (e.g. Yahoo) associate with a built-in user attribute. It does not, however, stake a strong opinion at all regarding the meaning of active. Whether active represents a concept like is_deleted or something else altogether is left up to you and the identity provider. That just creates some weird, needless ambiguity.

Open source repos like Authentik’s are just littered with weird issues resulting from ambiguity. No one seems to have all of the answers.

Identity providers don’t always implement the spec

Some identity providers do some really weird stuff in non-compliance with the SCIM spec. And then they don’t document their weird choices properly. (I’m looking at you, Microsoft!) You just have to collide with their weird SCIM APIs in the real world and tweak your implementation as you go.

How I feel navigating Microsoft's documentation

Microsoft keeps a list of SCIM non-compliance issues here. You may notice that an issue called Update PATCH behavior to ensure compliance (such as active as boolean and proper group membership removals) still has a planned fix date of TBD … on an article last updated in October 2023.

It turns out that Microsoft’s default behavior sends a boolean value as a string:

{
  "schemas": [
      "urn:ietf:params:scim:api:messages:2.0:PatchOp"
  ],
  "Operations": [
      {
          "op": "Replace",
          "path": "active",
          "value": "False"
      }
  ]
}

It should really be sending the below modified JSON,

{
  "schemas": [
      "urn:ietf:params:scim:api:messages:2.0:PatchOp"
  ],
  "Operations": [
      {
          "op": "replace",
          "path": "active",
          "value": false
      }
  ]
}

You can force Microsoft to send you the proper JSON if you use a certain feature flag (aadOptscim062020), but that’s really not an obvious solution! You really have to dig. It’s more practical just to accept that Microsoft misbehaves and modify your code accordingly.

This sort of stuff is very time-consuming and demoralizing to resolve.

You probably shouldn’t implement SCIM from scratch

I really do not recommend building SCIM in-house. To be clear, you absolutely could. We’re not exactly splitting the atom here.

It’s just that you definitely have better things to spend your time on – things that are closer to the problems your customers care most about.

Although SCIM lacks much conceptual complexity, it comes with a bunch of annoying baggage. If you build SCIM yourself, it will become someone’s de facto job to babysit the SCIM API and debug unforeseen subtleties. Your customers will have a not-so-great experience. You will likely be very unhappy and distracted.

This is a case where you should probably look for an off-the-shelf solution and move on.

What a developer needs to know about SCIM