(2300 words) Threat modeling isn’t just for software security; you can even threat model threat modeling. When a major software incident occurs, the first thing we should be asking is “show us the threat model”.
In any endeavor, unless you can anticipate threats you risk scrambling to patch things up reactively when you first learn of the threat the hard way. What could be more obvious and fundamental than this? Threat modeling is a systematic approach to software security that arose decades ago and was popularized in 1999 at Microsoft (the author concocted the STRIDE acronym as part of the responsible task force). In recent years I have come to see how threat modeling is not only critical for software engineering, it’s useful for a broad range of applications — as will be shown, we can even threat model threat modeling!
Readers unfamiliar with threat modeling can learn the basic concept from Adam Shostack’s elegant Four Questions rendition: What are we working on? What can go wrong? What are we going to do about it? Did we do a good job? For our purposes here, the second question is the heart of the matter— What can go wrong? — because it leads us to enumerate potential threats. This much about what to do is straightforward, but how to identify threats without missing any big ones is not so simple. Additionally, the first question gives context, and the third and fourth entail considering mitigations and then evaluating the effectiveness of the defenses implemented.
The recent Crowdstrike incident is a stunning example in software demonstrating how threat modeling could have helped. We don’t know if they did or didn’t threat model, tried to but missed an important threat, or identified the threat but then failed to prevent its occurrence, but given the reality that the event happened, one of those three things must be true. (As of this writing in August 2024, we have no information about Crowdstrike threat modeling.) Using threat modeling concepts we can build a Q&A decision tree, but we need answers to actually learn what happened.
More importantly, the lens of threat modeling helps us understand what went wrong, which if disclosed helps everyone else learn of the disaster. If we are kept in the dark then we learn nothing and only risk repetitions of similar events going forward. Threat modeling as we know it began for software, but what important human endeavor exists that it wouldn’t be a good idea to methodically consider upfront: What can go wrong?
People proactively anticipate threats all the time (in its simplest form it’s called “worrying”) concerning all manner of things, but what makes threat modeling unique is that it’s methodical. Done right, threat modeling enumerates all the threats (to an appropriate level of rigor), then proposes one or more ways of addressing each. Mitigations are the point of the exercise, and for a given threat these may be in part or in full: avoiding, reducing, or raising visibility of the risk, transferring the risk of a trusty party, purchasing insurance against it happening, or accepting the risk as unavoidable or not worth averting because it’s survivable.
Anytime an endeavor experiences a setback that takes the team by surprise, threat modeling could probably have averted the problem or at least provided an opportunity to plan ahead and take action proactively. Since trouble has a way of surprising us, it’s incredible what a wide range of things this applies to: buying goods or services, crafting legislation, military planning, starting a business, investing, choosing a college and degree program, getting married, you name it. If you don’t consider what could possibly go wrong up front, you can easily make a bad decision or act unprepared.
In the aftermath of the Crowdstrike incident we also have a great example of how difficult it is for responsible companies with software products to openly disclose details transparently — even though this clearly is essential both to give customers a complete picture so they can recover and also to regain trust by demonstrating how they will change in order to mitigate the possibility of any future repetitions. Legal liability is clearly a big factor we must leave to the lawyers, but just in terms of human psychology it’s always hard to just share details about your very worst episode.
Crowdstrike is slowly disclosing more and more details now, but undoubtedly they are doing so walking on very thin legal ice. They blog to “dispel some common misinterpretations” but such misinformation is hardly surprising given that it’s filling an information vacuum. Having a full threat model published in advance nicely shuts down such speculation (and commentary that ignores it can easily be tagged as amateurish and unprofessional by comparison).
Fortunately there’s an easy solution. We should be sharing proactively — threat models, policy, process, designs, test regimens, monitoring — based first on pride of work, and also as the best sales tool ever. Then when things go sideways we have established context, making it easy to disclose what specifically went wrong - despite all the great work demonstrated. I think people would generally respond positively to more technical transparency, and of course doing such planning with transparency would make these incidents all that much less likely.
Someone might have suggested that making details of the system public was risky, that educating customers risks tipping off competitors to their “secret sauce” but I see that as a coward’s response. Threat modeling is way too important for anyone to hold monopoly power, and the description of your highly robust system design isn’t the hard part — reliably innovating and executing professionally is. (The plan to win the 100m race is to run it in world record time, but that hardly makes it easy for anyone to win: actually achieving that result is the trick.) In the end, wouldn’t this put companies in stronger legal positions, and surely avert many possible disasters from occurring in the first place?
Done well, threat modeling can be an invigorating activity, shining light on potential problems at the best time to deal with them — at the design phase when it’s easiest to do something about it. Developing threat models might seem like a lot of work, but shouldn’t a professional company have done all this groundwork in the first place? What parts of anticipating threats to the business and anticipating the needs of its customers is optional? Releasing it might be frightening to some unused to practicing transparency, but why shouldn’t you share the impeccable work behind your products with pride? If security is important, hire a consultant to review everything to suggest improvements and ensure there are no embarrassing errors or omissions. If you are afraid of public exposure, start by sharing the work with potential customers under NDA if you must.
Skipping threat modeling entirely or “doing it in your head” isn’t a viable approach for professional software that others depend on: it simply cannot be good practice not making a concerted attempt not to identify risks so you can anticipate them. That said, threat modeling need not be a major project to be effective (learn how easy it can be starting with the Threat Modeling Manifesto).
When system designers privately threat model without sharing it, even if they manage to adapt design to accommodate them, by not explicitly calling them out they are rolling the dice that later changes by others might break mitigations, or developers might not understand and fail to properly implement the system. Without a common threat model, others might assume certain threats are already taken care of and miss an opportunity to take action. Inevitably, when bad things happen it’s sure to be a complete surprise. Nothing good about any of that.
Competitors might freeload on your public threat models, but if they are smart that’s one of the riskiest places for them to cut corners. Solidly executing on the mitigations is where the hard work is, and there’s no easy way to fake that. Brandishing your excellent threat model is a huge opportunity to grab customers by showing the professional quality of your work. Imagine a world where customers get to choose among competing products complete with threat models they can evaluate in the context of their own applications, compared to today’s “just trust us”. Lacking any of this information, it’s just a duel between sales and marketing teams based on vague promises and fancy presentations with very little actual evidence of true quality in sight.
Imagine if Crowdstrike had already published complete threat models, design docs, operations processes, and more to explain how their system works. If these are complete and competent then the threat of such an incident would have been anticipated and quite possibly wouldn’t have happened at all. If it did happen it would be easy to pinpoint where execution broke down, or maybe it was just incredibly bad luck, we’ll never know until they share what threat modeling, if any, they did in advance.
The pressure of disclosing how they secure their product and protect customers serves as a forcing function to get it right in the first place. Had the threat modeling been incomplete that would be evident — they’d have egg on their face, but quite possibly a customer or researcher would see the missing threat, provide feedback, and there would be a chance to fix it without having a disaster to deal with. The same goes for an insufficient mitigation, it would be right there for anyone to see, and the odds of catching the omission are far better with many eyes on it.
Possibly the threat model was solid, mitigations were strong, and the problem happened due to poor execution: an unimaginable threat occurred (e.g. a meteor struck a key datacenter), or an inside attack by a covert spy. People will make their own assessments, but the company can make a strong case that what happened was exceptional and not due to incompetence. Perhaps policy was robust but due to a lax culture it was routinely ignored: the threat model brings the failing into good focus, and serves to guide countermeasures such as more monitoring of process adherence or extra layers of redundant testing and signoffs might be added.
Public threat models protect competent efforts done in good faith while shining light on remedies when problems arise. By contrast, secrecy and minimal disclosures protect the good and bad actors alike, so why not distinguish yourself as a pro if that’s the kind of work you really are doing?
Considering the matter of whether to publish a threat model in the first place, let’s analyze the argument made by those who prefer to keep such details proprietary. We can threat model that:
- Keep threat model secret:
- avoids possible embarrassment (mistakes; why aren’t they proud of the work?)
- helps the competition (are they so incompetent this is a huge help?)
- Publish a complete threat model:
- demonstrate solid engineering, product quality
- educate and give assurance to customers
- opportunity to receive helpful feedback for any shortcomings
- establish company-wide commitment to security and trustworthiness
- stronger positioning in the event of a failure
It’s worth pointing out that the above isn’t what you would normally call a threat model, but I think it’s more readily understandable and basically equivalent. Consider the first sub-bullet about avoiding embarrassment: this could be rewritten as embarrassment being a threat in the case of publishing an inadequate threat model. Or the next one (helping the competition) could also be phrased as a threat resulting from disclosure, and to be effective the precise threat of how they might benefit exactly should be added (in general I’m unsure how it would help significantly so I’ve left it unstated). To that last threat, one mitigation might be to redact some details in a public-facing more abstract threat model along with a more detailed one for internal use. Similarly, under publishing the threat model, demonstrating quality could be stated as a threat to choosing to keep it secret, and so on. However you prefer to express it — as threats and mitigations, as pros and cons — the concepts are the same and the method is just as powerful.
To conclude, let’s get meta and threat model the question of whether to threat model (suppose we are building a software product, but this could apply to many other things as explained above).
What are we working on? Threat modeling our software project.
What can go wrong?
- It’s a waste of time because everyone already understands all the threats. Unlikely.
- The threat model is incomplete. Likely it’s better than ad hoc threat mitigation.
- Nobody will use the threat model. The project team certainly can and should.
- We will find too many big threats and give up. Better to know sooner than later.
What are we going to do about it?
- Constrain time allotted to threat modeling appropriate to the project size.
- Include all stakeholders and conduct reviews. Incomplete model is better than none.
- Educate and promote using the threat model for all stakeholders.
- Significant threats require solid mitigation or management review to assess the product’s viability and realistic potential. The sooner weaknesses are known the better.
Did we do a good job? This is for project staff, and ultimately, customers, to assess.
Alternatively, not threat modeling amounts to threat blindness and is fraught with danger.
- Serious design flaws are overlooked, caught late (expensive to fix) or to blow up in the field.
- Misunderstandings about security are more likely if the threats aren’t clearly documented.
- Even if some people understand some threats, everyone on the team won’t know them all.
- Opportunity to make strategic design decisions for mitigations or eliminate the threat.
The choice is straightforward: what is everyone waiting for, and who is brave enough to go first?