Threat modeling is the most powerful, underutilized, easy-to-do security methodology we have: why isn’t everybody doing it already, or why do those who are keep their work secret? If you already threat model your digital systems and products, and are doing the work already then you are doing security right so you should share it with pride. Publishing threat models may be the best evidence of excellent security work that customers and users can appreciate the value of, short of a rigorous detailed design and code review. You’ve already done the work — or if not you really should — and making it public not only is great promotion but it also helps all stakeholders understand their respective roles and responsibilities in securing larger systems. (about 4000 words)
Value proposition
Threat models have long been recognized as an essential foundation for building secure software, identifying and proactively mitigating risks. However, in addition, they also offer value to other stakeholders in ways yet to be fully appreciated. The software marketplace is flooded with “most secure cloud storage” or “best overall security” offerings, but invariably these are completely vague and also unsubstantiated. While we don’t have objective metrics for security levels, publishing a threat model provides details of the maker’s view of security characteristics in terms of threats and mitigations and provides concrete evidence of security work done.
Beyond aspirational slogans, customers can assess threat models to choose among competitive products as well as inquire about possible omissions or other details in an informed way to better understand the offering. The difference between a very basic threat model (or none at all) compared to a thorough one that lists specifics, describes attack surfaces, mitigations, and assessment of defenses, is night and day. In addition, end users better understand security policy and controls in the context of threats, resulting in better compliance and alignment.
Security questionnaires are a common practice today for vendor assessment (full disclosure: I have never filled out nor requested one myself) but I believe that a threat model provides a much more useful description of security posture, with risks and countermeasures presented in context of the overall system posture rather than itemized statements about various parts of the system. Unless fully standardized, various different questionnaires may be required, but a threat model is fit to the product so it stands as a universal documentation of its security.
It’s important to state that when using threat models for a variety of purposes with different stakeholders, some flexibility in presentation is necessary to make it accessible to a wider audience of readers. The most detailed form of the threat model may only be used internal due to references to the organization or other confidential information and internal details, but from this (perhaps by tagging sections for audiences) derivative versions can be compiled: high level executive summary for the C-suite or sales presentation; business risk management; technical integrators; software developers; end users.
Historical perspective
Since it’s important that threat models include perspectives from all stakeholders, this informative section is included for the benefit of readers new to the concept of threat modeling.
Imagine the surprise in Troy the day after they accepted (let through the gates) that “gift horse” and the invaders hidden inside wreaked havoc: they had no idea that such an attack was possible. Had even one Trojan guard thought about the “gift horse” possibly being a trap (it did come from their bitter enemies, after all) it wouldn’t have taken a genius to consider that something could be hidden inside, and even a cursory inspection would have been sufficient to save the city.
The Trojans were defeated because they failed to threat model — over three thousand years later, the internet spans the globe, computers are faster and cheaper than ever, we have advanced generative AI, and most software companies apparently are making the same mistake. (I say “apparently” because if they are threat modeling they are doing so in secret for some reason, choosing “security by obscurity”.)
Any digital product or service is potentially exposed to attack (unless it’s in an air-gapped system, and even then sometimes there are ways) so guarding all “gates” (attack surface is the term of art) is essential, and solid defense begins with a threat model to know what to expect as incoming.
Case study
If you think modern software makers can’t possibly be missing the threat modeling boat, consider the recent Crowdstrike incident that crashed millions of machines, disrupting numerous large corporation operations including thousands of flight cancellations. The company has not disclosed any threat models so we don’t know what happened, but the possibilities must be:
- They had a great up-to-date threat model
- They had a threat model but it was outdated or low quality
- They did not have a threat model
Option 1 makes no sense: why in the world wouldn’t they share their threat model as solid evidence of the great job they were doing? But more importantly, with hindsight we know that a great threat model must have included the threat of “pushing a bad update that crashes millions of customer systems” … in which they need to explain how that threat was allowed to occur.
Option 2 fits the facts, a partial threat model omitting the aforementioned crashing threat. This would be extremely embarrassing (and therefore management could decide to never mention the threat model at all), but a quick check of the threat model against what happened would immediately identify the problem internally: failure to identify, and hence mitigate, crashes.
Option 3 (no threat model) does seem most likely, but in the absence of disclosure we can only guess from the outside. Presumably the developers knew well that software updates can have bugs and lead to crashes (not to mention the extreme risk for code running in the kernel), so how was this allowed to happen? My best speculation is that many people considered the risk but assumed it was somebody else’s job — with a threat model it’s documented (or not) for all to see.
Note that threat modeling is usually considered a “software security” technique to mitigate vulnerabilities, implicitly to proactively defend against malicious “attacks”; however, in the Crowdstrike incident there is no evidence of any bad actor. All the trouble was all self-inflicted and the problem quickly expanded to massive scale because the Blue Screen crash prevented customer machines from updating. Even though the flaw was promptly fixed, since the crash apparently occurred before the “check for updates” stage of the code was reached, resulting in an endless reboot loop. While we cannot forget our hindsight perspective, when you drop the “attacker” mindset and simply focus on the crashing threat to availability (Denial of service in STRIDE terms), crashing in the kernel before “check for updates” is clearly a risky exposure.
With no official word on threat modeling we don’t know the facts, but we can enumerate a complete set of options (what follows is a coarse grain breakdown that could be refined) with respect to this important threat that emerged suddenly causing so much disruption:
- The threat was never explicitly identified (poor, outdated, or non-existent threat model)
- The threat was identified by poorly mitigated (a bug or process failure allowed it to happen)
- The threat mitigation was outsourced (insurance, third party responsibility) and they failed
- The threat was accepted (unlikely to occur, not worth the effort, etc.) and left unmitigated
Threat analysis
Surely the details get more complicated, but in terms of first principles the explanations above should cover the map of possibilities. Notably missing is the threat was identified and robustly mitigated, because then (by definition) it would not have happened.
The “acceptable” category is worth explaining a little more because there are a few reasonable possibilities lurking there. One example might be insider attack risk (sabotage): by all accounts this is extremely rare in software tech (so far as we know), and crucial positions (administrator privileges in production operations) always goes to high performing staff with years of experience; additionally, it’s very hard to eliminate risk at this level of authority, and having every crucial action require multiple reviews and signoffs would be highly infeasible in practice (so it’s quite reasonable to accept this risk, and in this case also highly unlikely anything like this was a factor). The other kind of “acceptable” risk is extremely unlikely or some external catastrophe beyond anyone’s control — simultaneous power outages in five different locations, terrorism or war, etc.
My purpose in dissecting the Crowdstrike incident as a lens on the importance of threat modeling is to demonstrate how central understanding threats and mitigations is to security. With a threat model in hand (or non-existent, as the case may be) when something like this happens, we can immediately zoom in on the proximate cause.
- Was the threat identified?
- If so, was the threat mitigated, outsourced, or accepted?
- If mitigated, were the countermeasures sufficient and robust?
- If outsourced, how did they perform and what assurance was obtained?
- If accepted, was that a reasonable decision (given that this just happened)?
Responding to the incident is now a matter of drilling down into the relevant details, e.g. review the mitigation plan, compare to process and implementation, and so on. Threat modeling provides a map for analysis and remediation, and without it understanding is fragmented and responses to incidents are scattershot.
Secretly threat modeling
We may never know the full story of the Crowdstrike incident now that litigation has begun, and I want to be clear this is not meant to throw shade, but rather I’m writing in the spirit of learning from what already happened because it’s a prominent example of my point. Writing in 2024, virtually all software companies threat model in secrecy — legally shored up by serious NDAs ensuring it stays private — and Crowdstrike was just one more case of this industry practice.
Now let’s consider the cost of threat modeling behind a curtain of secrecy:
- First of all, we have no idea if threat modeling is even being done anywhere!
- If it is being done, who is doing it well (or not), how thoroughly, and is it up to date?
- When incidents happen, the maker is now in a bind: that’s a bad time to release a model.
- Most importantly, customers who depend on a product have no idea what threats the maker is aware of and mitigating, so they don’t know what’s on them to defend against.
It goes without saying that plenty of folks are unhappy with the status quo of software security, but nobody knows what to do. CISA rightly urges software makers to take more security responsibility and urges that “customers demand better security from their technology providers” — but it’s very unclear how that is going to happen, even though it’s a great aspiration.
Without seeing their threat model we potential customers can only blindly trust that the software is secure with no idea how thoroughly the maker has actually done the work. Software marketing tends to imply “great security” so customers naturally expect that all relevant threats are carefully identified and mitigated, but never get any evidence of this — until they learn otherwise the hard way as we saw. In no small part, this lack of visibility leads to mismatched expectations.
Openly threat modeling
A world where software makers routinely publish threat models would be a huge step forward. Doing so would provide useful details about what providers are doing to secure their products in an actionable way, giving customers something to assess instead of marketing hype and empty promises. By broad consensus they should be threat modeling already, so sharing this work openly (which they should be proud of assuming it’s done well) is not at all hard to do!
Threats are a reality of any software product, so enumerating them and discussing mitigation isn’t divulging any proprietary information or making the product easy to clone as full source code disclosure might (and most customers don’t have the time or skills to evaluate all that anyway). Threat models boil it all down to the essential security matters.
If software makers published their threat models it would let customers make informed security decisions when choosing software offerings. Once a few bold software makers take the first step, threat models could quickly become the norm for quality software.
- No threat model? Let’s look elsewhere… they aren’t even trying to be secure.
- Weak threat model? No thanks, they don’t have expertise and/or security isn’t a priority.
- Decent threat model, but they missed X. Now you can have a discussion about X: maybe they can improve and handle it, or maybe the buyer can cover that easily, etc.
- Great threat model, very thorough. Let’s get this one, definitely!
- They say they have a great threat model but cannot share it: why not?
Threats are mainly inherent for a software component by virtue of its functionality, so listing threats should not endanger valuable trade secrets. Some threats may be implementation dependent, but you don’t have to detail the mitigation technology just assert recognizing and addressing relevant threats. Alternatively, if your database component is only meant for theoretical research purposes, disclose that information leak is not mitigated so people know.
Therefore, open threat modeling should never be divulging any crown jewels, it’s more like a “spec sheet” for security considerations. Also it’s worth mentioning that for large products having separate internal and external versions of a threat model makes sense. Details of internal techniques, review process, testing, release procedures, and approvals can be kept private.
Dropping the ball
Interfaces are always challenging in software, especially with different developers on either side, across teams or between organizations, so many details need to align just right for things to work smoothly. Complete documentation of all interfaces is often unavailable or outdated, there are changes with newer versions, and so on, so it’s no surprise that the ball often gets dropped.
Threat models help at a high level clarifying who is responsible for specific threats. Customers still have to trust providers to do a quality job, but having clear commitments to which threats they do or do not take responsibility for makes interfacing much more straightforward. There’s no more “but we assumed you handled that”.
For example, one of the big challenges of securing software is doing input validation. Injection attacks are a common symptom of not doing this well, e.g. SQL injection, cross-site scripting, and more. One very common pattern is across an interface both sides assume the other will be input validation (it’s always easier to hope someone else takes care of things) — or another miscommunication is where the two sides use slightly different notions of what input is valid. Good threat models will defend attack surfaces (such as unauthenticated web traffic) and can specify what input validation they do (ideally clearly defined) so there is no room for misunderstanding. There is no right and wrong for which side should handle untrusted input, clarity on responsibility is the key point.
Do they know? Do they care?
We see major software security debacles on a regular basis these days, but it doesn’t seem that we are learning important lessons from them so these are likely — undoubtedly? — sure to continue. Exactly how these mistakes happen seems invariably veiled from view for very understandable reasons (including legal risk and embarrassment) yet every time we duck getting to the bottom of what happened we miss an enormous learning opportunity. Even with vulnerabilities in open source, disclosures rarely if ever mention process: was there a code review, does the team think about security responsibility, do tests include security cases where access should be blocked, is there a threat model and is it up to date?
This isn’t about shaming, so I won’t mention any particulars, but over and over I end up wondering, “Do they know?” (about SQL injection attacks, or integer overflow, or whatever the flaw happens to me) because if not this indicates we have an education and awareness problem. Alternatively, if they do indeed know, then I wonder, “Do they care?” because that suggests there was a failure or process or in execution. Perhaps someone was rushing and skipped code review, or there was a code review that flagged the issue but the change was never made, or nobody considered that security was even an issue.
Answering these most basic of questions is important because it allows us to see what actually caused the failure. Looking at a code diff — vulnerable code changed to secure code — tells us nothing about where the ball was dropped, and if it was due to inattention or ignorance. When we can pinpoint how problems occur it serves as a valuable lesson to us all. It’s one thing to say that the root cause of a vulnerability is a buffer overflow, but underlying that is the more important story of how that code was released. For example, if the senior developer on a project got into the habit of skipping code reviews and that led to a vulnerability, then others doing that might get the message how risky it is.
If security problems are due to people not knowing then we can work on that through education; if the problem is not caring then it’s a matter of incentives and customers demanding security assurance, including open threat models as concrete evidence of trustworthiness. But when vulnerabilities “just happen” without any clear root cause — not the vulnerable code but the reason that code was released — we learn nothing and are bound to fall into the same traps again and again. And the best I know to get to the root of these lapses is via the lens of threat modeling.
Knowing and caring enough
Of course the reality of modern software security is more complicated than a binary choice between not knowing or not caring. It’s a simplification to characterize any organization of people in such a way, and attitudes vary over time and circumstances, but it’s the responsibility of management to lead and institute culture and policy that fosters right thinking and action. Additionally, even when we know and we care, humans are fallible and mistakes can always happen. For major software products, IMHO, process and oversight should make this vanishingly unlikely to the point that if it does it’s a major failure resulting in well deserved reputation loss. When this happens it’s a signal that mitigations were insufficient and the best way to repair trust is to explain how despite all efforts this happened (maybe asteroids hit two data centers the same day!) and what changes will prevent recurrence.
As a bonus, such open disclosures, which should be rare, serve as valuable lessons for others to prevent similar pitfalls where applicable. Just as airlines refrain from competing on safety (after an airline disaster the others don’t pile on claiming to be safer), a mature software industry should share security experience and countermeasures generously for the good of the industry as a whole. Doing so begins with maximal transparency, as unnatural as that may seem now, and meaningful security understanding always is best couched in terms of threat models.
Adam Shostack (in a private communication) points out yet another cautionary factor within knowing and caring to recognize. Software makers should endeavor to include all stakeholder points of view in their threat models, but of course the world is large and complicated, software gets used in ways its makers never imagined, so this is at best imperfect best effort. Software customers can be inventive, and use software in creative ways resulting in surprises both positive and at times unanticipated use cases result in problems. When this happens all we can do is learn from experience, share the details to help others repeat similar failures, and update the threat models accordingly. Typically this means responding to the new threat(s) with mitigations, or transferring the risk to be the customer’s responsibility or advising against such use explicitly. And the best way to do that, yet again, is updating the threat model to incorporate the new perspectives gained when unexpected problems arise.
What if …?
Imagine traveling back in time to mid-2024, you are hired to do SecOps duty for a large IT system that uses Crowdstrike EDR technology. You are still learning the ropes, so first you have a look at their threat model which is published to their website in our alternative timeline. Now consider those three options we identified but have no information about:
- They have a great up-to-date threat model: excellent; now if the same incident occurs and everything crashes suddenly, it’s clearly on them, DevOps did not screw it up.
- They have a threat model but it omits the crashing threat: you escalate that, consider short-term mitigations on your side and demand they handle it or you will go elsewhere.
- They have an outdated threat model: you contact them and insist on a prompt update.
- They did not have a threat model: with competition who do, ideally you are using them.
Whatever the case, the relationship between provider and customer is far better informed and productive — responsibilities are clarified, or choices to accept risk are made explicit. This might even be the beginning of a better software market where customers are empowered to demand better security, or at least have a far clearer picture of what risks are on them to handle.
Conclusion
Professional software developers should all be using threat modeling to ensure your awareness of applicable threats and that sufficient mitigations are implemented and tested. Especially in recent years there is growing acknowledgment of the importance of threat models, yet what work is actually being done seems to be all done behind closed doors so we have no idea if it’s being utilized for 1% or 99% of our software.
This article makes the case for open threat modeling published as an important “spec sheet” for any software (or its absence signifying the security characteristics are unknown). Just as opening threat modeling to all stakeholder perspectives is important, sharing the full model with everyone is equally important to foster better understanding and clear communication of responsibilities for security. Since the point of threat modeling is understanding how the software will interact with humans in a larger environment, it’s most effective when “open” both in the sense of being published widely and also broad in scope. Risks potentially impact everyone and are a shared community responsibility: that’s why threat modeling is not just for experts, not just about malicious attacks (but including accidents and hardware failures), not only used internally, and actually not strictly limited only to software.
As I wrote earlier this year, “We understand software security best through specific threats and mitigations, articulated by threat models shared openly.”
Postscript
The following is pure speculation, but it’s an interesting perspective. The Crowdstrike incident suggests many interesting tradeoffs we in play (without details, these are guesswork):
- EDR needs to load into the kernel for power and visibility throughout system
- it needs to load early to watch everything as early as possible
- checking for updates later (after parsing, so after it crashed, blocking recovery) means faster bootup time
- more kernel code runs faster (vs customers saying it runs too slowly)
- pushing content files ASAP with new threat intel is valuable for protection (vs careful review and methodical release process)
- pushing content updates to all customers ASAP (vs staggered release over time so customer-side crashing would be caught early and remedied)
- moving all validation to Crowdstrike mothership (vs on each client as extra layer of protection) is more efficient
- proprietary obfuscated content files mean customers cannot validate independently (vs open format allowing competition to steal their rules)
I can see how company mindset could dial up all the “speed” at the expense of “reliability” and with a stable complex system (given that nothing like this had yet happened) it would be easy for staff to assume sufficient checks and balances were in place — unless there was a threat model, of course.