On trying hard not to capture value
In taking SSOReady to market, we have chosen to offer a strong product comfortably below its market price. Sometimes our customers express confusion, even skepticism. I hope in this post to articulate our strategic bet with something that approaches rigor.
It’s mostly in the service of my own thinking, but maybe others find it interesting.
Comments on the SaaS model
I’ll start with relatively brief comments on SaaS as a business. I frequently find that even professional investors misunderstand the SaaS model, so I’ll frame my later comments with some context.
We most commonly understand SaaS to describe the sale of some software’s use over a period of time. Conventionally, we’re talking about an ordinary subscription with an annual contract, but we can also mean some usage-based or otherwise unconventional monetization scheme.
Critically, the SaaS model implies recurring fees in perpetuity (or at least over the lifetime of a customer, given the customer’s entitlement no longer to use the software in question). In this sense, SaaS looks very unlike many other businesses; we cannot calculate unit economics in SaaS like we might for fast-moving consumer goods.
To gauge the efficiency of a software business, we often estimate a customer acquisition cost (or CAC), a rough approximation of the upfront expenditures needed to win a customer. We’ll use our CAC estimate to project a payback period, usually the number of months elapsed before accumulated gross profit from the customer exceeds CAC.
This post explores a similar, albeit slightly different framework for imagining unit economics in SaaS. It will not be an especially practical framework, except insofar as it structures my own thinking.
Comments on CAC
We have to estimate customer acquisition costs using some judgment. Unless you’re running a high volume consumer e-commerce business and spending nearly every dollar on digital advertising, any CAC number surfaced from a dashboard will bear no resemblance to reality.
We have to make some guesses about which expenses go where. What percentage of an account manager’s salary gets counted toward new customer acquisition? What percentage of OOH marketing do we allocate to the partnerships business?
We’ll sum up some guess about a sensible allocation of sales and marketing dollars during a window of time, then we’ll divide by the quantity of customers we got. If we’re doing something really fancy, we might lag sales and marketing spending by a few months to adjust for long deal cycles. But it’s still basically just stuff we spent divided by number of companies that we got as customers.
Conventional lifetime value (LTV)
Many companies spend more CAC on a sale than they’ll earn back in the entire first year – or sometimes first several years – of a customer’s life. They’ll have long payback periods before breakeven. And that’s usually fine, as long as the customer’s lifetime is sufficiently long.
As a gut-check for whether some CAC makes practical sense, we might estimate some lifetime value, or LTV. We’re just guessing the total amount a given customer will pay us.
For lack of a better alternative, we’ll usually assume some churn rate and assume a constant contract value, often the contract value at the initial time of acquisition.
If we assume a 20% churn rate each year and a constant $100,000 annual contract value, then we can conclude that average customer lifetime value totals $500,000. (Just a convergent geometric series.)
Generalizing LTV
We can elaborate a little bit on that crude approximation, though.
Let’s say \(f_i(t)\) represents the contracted price for a customer \(i\) in month \(t\). We can calculate some naive idea of the customer’s lifetime value as \( \sum_{t = 0}^{\infty} f_i(t)\). We’re just summing up all the payments they’ll make.
Without any loss of generality, we can choose to imagine that \(f\) represents some quantity like gross profit, so we’ll just imagine away COGS and choose to believe that gross margin is 100%. We’ll capture churn simply by acknowledging that \(f\) varies in \(t\) such that \(f_i(t) = 0\) on certain intervals of \(t\) for certain customers \(i\).
Now, because \(f_i\) varies in time, we may choose to apply some positive discount rate \(d\), such that lifetime value \(v_i = \sum_{t = 0}^{\infty} \frac{f_i(t)}{(1 + d)^t}\)
The discount here represents the cost of capital, which we can loosely imagine as the time value of money: we strictly prefer a dollar today to an otherwise equivalent dollar tomorrow. Why? Imagine I’m an investor. I can apply my cash to any number of productive ends – things that make me money. A dollar today is worth \((1+x) * $1\) in the future for some \(x>0\). When we discount by \(d\) above, we’ve just moved the cost of capital term from one side of the equation to another.
A given customer makes economic sense if and only if \(v_i = \sum_{t = 0}^{\infty} \frac{f_i(t)}{(1 + d)^t} > c_i\) for some customer acquisition cost \(c_i\). That is, allowing for the fact that capital is itself expensive, we need to make more money from a customer over time than we spend on that customer. Not terribly complicated stuff, really.
How we typically try to improve LTV
Under the conventional SaaS LTV framework, we have relatively few ways to push \(v_i-c_i\) away from the origin – particularly since we can treat \(d\) as an exogenous constant.
We can either shift \(f_i(t)\) up on some interval of \(t\) or drive down \(c_i\). We can either make more money from the customer or spend less to acquire the customer.
Lots of options present themselves for improving \(f_i(t)\), though.An improved onboarding flow might boost retention. Or maybe a cross-sold product yields some net increase in sales. We might similarly find a bunch of ways to improve \(c_i\), e.g. by cutting back on some inefficient marketing program or debugging some suboptimal conversion rate high up in the funnel.
It’s all pretty conventional, familiar stuff.
Adding an additional layer to lifetime value
I’d argue a customer’s value to a business often exceeds the discounted sum of its lifetime payments. You see, customers have a way of affecting other customers’ behavior. We know this intuitively in consumer markets. I’m reminded of the Midtown Four Seasons, which became known as a destination for power lunches. The clientele skewed rich and famous … because the clientele was rich and famous.
Critically, a customer may acquire another customer on your behalf and at functionally no incremental cost. Sometimes, SaaS takesters like to use terms like network effects and viral loops to describe similar behavior. But real virality is pretty rare, and we don’t really need virality for referral-driven behavior to be relevant. We might even be interested in linear referral behaviors like the below:
We can ultimately trace the two men on the right back to the man in the orange shirt. Their acquisition as customers relied on the original man’s becoming a customer.
The point is, \(v_i = \sum_{t = 0}^{\infty} \frac{f_i(t)}{(1 + d)^t} > c_i\) doesn’t sufficiently capture a customer’s real lifetime value. There’s some other indirect, residual component \(g_i(\cdot)\). Using \(g(_i\cdot)\), we can identify some alternative measure of lifetime value \(v_i^*\) such that \(v_i^* = v_i +g_i(\cdot)\) .
Attempting to model \(g_i(\cdot)\)
Let’s imagine there exist basically two kinds of customers: (1) inorganic customers acquired directly and (2) organic customers acquired indirectly via referral from existing customers.
We can imagine all referral-acquired customers arranged in a tree that ultimately points up to a directly-acquired customer as the tree’s root. We’ll attribute all lifetime value for the indirect customers back to their respective roots. We’ll further say that \(g_i(\cdot)\) for a directly-acquired customer consists solely of the lifetime value of its associated referral-acquired customers.
Let’s take some directly-acquired customer \(i\), to which we can link \(N\) referral-acquired customers. Let’s say each referral-acquired customer \(j\) makes some payment \(f_j(t)\) at time \(t\).
We can then say:
\[ v_i^* = v_i + g_i(\cdot)\] \[ v_i^* = \sum_{t = 0}^{\infty} \frac{f_i(t)}{(1 + d)^t} + g_t(\cdot)\] \[ v_i^* = \sum_{t = 0}^{\infty} \frac{f_i(t)}{(1 + d)^t} + \sum_{t = 0}^{\infty} \sum_{j = 1}^N \frac{f_j(t)}{(1 + d)^t}\] \[v_i^* = \sum_{t = 0}^{\infty} \left( \frac{f_i(t)}{(1 + d)^t} + \sum_{j = 1}^N \frac{f_j(t)}{(1 + d)^t} \right) \]There’s relatively little that anyone can do to change \( \frac{f_j(t)}{(1+ d)^t}\) for some \(t\). We should consider referral-acquired customers basically exogenous; they just kind of … show up, fortuitously.
But what if we can vary \(N\)? What if we can do something to make referral-acquired customers more abundant?
Value, price, and customer surplus
Earlier, I threw out \(f_i(t)\) as a representation of a customer \(i\)’s sum payments at some time \(t\). \(f_i(t)\) just represents the matrix product of price and quantity for some vector of \(k\) distinct products.
\[ f_i(t) = \begin{bmatrix} p_{i,0,t} & p_{i,1,t} & ... & p_{i,k,t} \end{bmatrix} \begin{bmatrix} q_{i,0,t} \\ q_{i,1,t} \\ ... \\ q_{i,k,t} \end{bmatrix} \]Expanding \(f_i(t)\) like this helps us think in terms of price. Suppose every customer perceives some value in a given product. That customer will pay for the product if and only if their perceived value exceeds the price.
So using \(u\) to describe value and \(s\) to represent consumer surplus, we can rewrite \(f_i(t)\) as the following:
\[ f_i(t) = \begin{bmatrix} u_{i,0,t} - s_{i,0,t} & u_{i,1,t} - s_{i,1,t} & ... & u_{i,k,t} - s_{i,k,t} \end{bmatrix} \begin{bmatrix} q_{i,0,t} \\ q_{i,1,t} \\ ... \\ q_{i,k,t} \end{bmatrix} \]Distributing our quantity terms and aggregating across products, we can identically rewrite this as:
\[ f_i(t) = U_{i,t} - S_{i,t} \]Conventionally, a rational company would try hard to set prices \(p\) to minimize \(s\). Any positive \(s\) on any interval of \(t\) for any product unambiguously reduces \(f_i(t)\). Under an ordinary lifetime value framework, you want to price as closely as you can to perceived value.
Establishing the propensity to refer
But we know lifetime value consists of more than just \(f_i(t)\). Earlier, we said that \(v_i^* = v_i +g_i(\cdot)\). And \(g_i(\cdot)\) depends on \(N\), the number of referral-driven customers.
What if we think of \(N\) as a function increasing in \(s\)? What if giving customers a better deal makes them more likely to refer other customers?
Let’s suppose every customer has some propensity to refer other customers that’s bounded from above by an arbitrary constant. We can just call this constant \(\lambda_i\) and say \(\lambda_i > 1 \).
Let’s say for simplicity that the customer knows \(U_{i,t}\) and \(S_{i, t}\) in perpetuity, such that we can treat \( \frac{\sum_{t = 0}^{\infty}S_{i,t}}{\sum_{t = 0}^{\infty}U_{i,t}} \) as a constant (with respect to \(t\)) on the interval [0, 1]. As a shorthand, we’ll just call this quantity \(\sigma_i\).
Then we can express the customer’s propensity to refer as \(r_i = \lambda_i^{\sigma_i}-1\). As \(\sigma_i\) approaches zero (meaning aggregate customer surplus approaches zero), then \(\lambda_i^{\sigma_i}\) approaches 1, and consequently \(r_i\) approaches zero. Relatedly, as \(\sigma_i\) approaches 1, then \(r_i\) approaches some positive quantity \(\lambda_i-1\).
Then we can express \(N\) as a time-invariant function increasing monotonically in \(r_i\). Say, for instance, \( N = \lfloor{r_i} \rfloor = \lfloor{ \lambda_i^{\sigma_i} - 1} \rfloor \). It’s therefore increasing in \( S \), which in turn decreases in price when holding perceived value constant.
Returning to lifetime value
If we choose to imagine lifetime value as the following:
\[v_i^* = \sum_{t = 0}^{\infty} \left( \frac{f_i(t)}{(1 + d)^t} + \sum_{j = 1}^N \frac{f_j(t)}{(1 + d)^t} \right) \]Then we may choose identically to imagine lifetime value as:
\[v_i^* = \sum_{t = 0}^{\infty} \left( \frac{f_i(t)}{(1 + d)^t} + \sum_{j = 1}^{\lfloor{ \lambda_i^{\sigma_i} - 1} \rfloor} \frac{f_j(t)}{(1 + d)^t} \right) \]We should then recognize that even bringing price in line with value for some product \(k\) at time \(t\) effects an ambiguous outcome. It raises the direct component of lifetime value but lowers the residual component. And we don’t immediately know which matters more.
Back to our business
We understand – or at least believe – basically two things to be true about distribution in our market:
- We face very high direct customer acquisition costs. Engineers are discerning, particular buyers. They don’t like to be sold stuff.
- Our customers have unusually high propensities to refer other potential customers. Our users very frequently discuss potential vendors with their peers; our users find software interesting.
Consequently, we expect that, compared to other businesses, we should invest relatively more effort in generating customer surplus. Compared to other businesses, we will recoup a relatively greater share of LTV from \( g(\cdot) \).
To be explicit, I don’t care about short-term margins. Through some medium-run horizon, our business will focus on delivering great products instead of maximizing \(f(\cdot)\). I expect that deferring monetization yields greater long-run \(v^*\) relative to customer acquisition costs.