Root causes


Each time a high-profile software security bug is reported, I wonder how this happened yet again. I don’t expect vulnerabilities to approach zero any time soon, but still I’d like to know how this keeps happening over and over, so we can do better. For example, was a developer unaware of the implications of their code change that broke security, or did they know but were just sloppy? How do these bugs get past a code reviewer, or was that considered unnecessary and skipped? Why weren’t there test cases to prevent such problems? And once we get a fix, how was it tested, and did anyone check for similar vulnerabilities that might exist elsewhere in the code? We can’t make much progress if we don’t know why well-known countermeasures aren’t working.

I don’t want to pick on Apache, but the recent series of vulnerabilities and patches is worrisome and a perfect example of so many unanswered questions — and this is an open source project. Considering potential security flaws that a generic web server might have, a path traversal is probably the first kind of vulnerability that comes to mind. When a major web platform that should, after over twenty years, have a mature technology base introduces and then struggles to repair such an obvious vulnerability, to my mind that is a big red flag that something is very wrong (and this is just one recent example that’s hardly exceptional).

That’s exactly what a recent code change released in Apache HTTP Server 2.4.49 introduced (CVE-2021-41773)[^1]. The attack is just an HTTP GET with an odd-looking URL. The team dutifully released a fix, but the fix proved insufficient and new exploits were quickly reported in 2.4.50 (CVE-2021-42013) that were finally repaired[^2] days later by the 2.4.51 release. From a quick look I was unable to find clues about any of the usual questions I mentioned above. Some proof of concept attack examples are available, but those don’t shed light on how this happened, nor give much assurance that things are now solid. Understanding the cause and response requires more context than just the code changes.

I understand reticence disclosing a detailed post mortem, but without this information how are we going to do better? Is there a lack of understanding of security, insufficient testing, rushed code reviews, or if these vulnerabilities happen despite competent developers knowing what they are doing where is the process failing? The software community virtually never learns the full story, so the pattern continues.

I wrote a book to put down the basics of what we should be doing, because that appears to be part of the problem. The book explains path traversals, with examples of vulnerable code and then fixed code; it also discusses several ways to test that code is secure, as well as doing security code reviews. This is basic stuff but it’s critical that everybody working on software understands, so I made the book as clear and comprehensible as possible. Books don’t solve problems, but they can inform and motivate, and convey ways of putting it into practice.

[^1] Details at https://httpd.apache.org/security/vulnerabilities_24.html

[^2] Code changes: r1893775 (2.4.50); r1893977, r1893980, r1893982 (2.4.51)