Vulnerabilities are Mistakes


Spilled coffee beans, breaking the sound barrier, and software security

The Right Stuff is Tom Wolfe’s popular history of the US astronaut program, and it begins by recounting the early effort to break the sound barrier which involved such frequent crashes that there were weekly funerals for test pilots. What’s most striking about the account of this early period in what would become the space program is how the pilots gathering to bury their comrades would invariably talk themselves into believing that they would never have crashed — it was always the other guy who messed up and sadly paid the price.

Software vulnerabilities continue to plague modern software, and it seems to me that a similar mentality applies. Many of us think we understand how to write and maintain secure code, yet somebody continues to drop the ball, and achieving that goal as an industry remains elusive. What’s going on?

I’ve been learning to roast coffee, and a story from the roastery may be a good analogy to think about this problem from a new perspective. Small batch coffee roasters are simple machines built around a rotating heated drum not unlike a clothes dryer. The operator charges the roaster with green beans, gravity-fed from a hopper above the drum, and after roasting the crackling smokey coffee beans dump out into a cooling bin. Both the input and output to the drum are operated by mechanical levers. One day an experienced roaster inexplicably made a simple mistake: instead of dumping the roasted beans, they grabbed the other lever and dumped the green beans instead, creating an unusable mix of green and roasted coffee.

The question I want to consider is what went wrong? Obviously, the roaster understands the process and would never intentionally make this mistake, yet in practice that did. Little would be gained by making the operator take a refresher course in coffee roasting. The two levers are similar, but located far apart and of different lengths, and by their position it is crystal clear what each does at a glance.

The point here is that humans make mistakes sometimes even when they know better, and it’s unclear that this foible can effectively be eliminated. Checklists or automation may mitigate such problems, but at a cost and with attendant overhead and complexity. Punishment or ridicule are usually counter-productive; attempts to educate easily backfire unless genuinely new information or insights are presented.

I suspect that to no small degree software engineering suffers from similar failure modes. To the degree that this is so, what more can we do in order to reduce vulnerabilities? First, we should make security an acknowledged priority and ensure that everyone involved is fully aware of the threats and counter-measures. Security needs to be baked into the requirements and design phase, not slapped on later as an afterthought. Since slip ups are inevitable, we should use automation to reduce exposure, and then have a strong process in place to ensure that reviews provide needed second opinions. Finally, when problems do still occur, it’s important to respond quickly and thoroughly, and transparently publish a post mortem detailing what happened, the extent of the response, and what measures were taken to prevent future recurrences.

My book, Designing Secure Software: A Guide for Developers, offers actionable ideas in all of these areas. Check it out and it just might reduce your chances of spilling the beans, leading to a crash.

This article was originally published here.