The following is an excerpt from the book Designing Secure Software: A Guide for Developers by Loren Kohnfelder, Copyright 2022, No Starch Press
File path traversals are a common vulnerability closely related to
injection attacks. Instead of escaping from quotation marks, as we saw
in the previous section’s examples, this attack escapes into parent
directories to make unexpected access to other parts of the
filesystem. For example, to serve a collection of images, an
implementation might collect image files in a directory named
/server/data/image_store
and then process requests for an image named
X by fetching image data from the path /server/data/image_store/X
,
formed from the (untrusted) input name X.
The obvious attack would be requesting the name ../../secret/key
,
which would return the file /server/secret/key that should have been
private. Recall that .
(dot) is a special name for the current
directory and ..
(dot-dot) is the parent directory that allows
traversal toward the filesystem root, as shown by this sequence of
equivalent pathnames:
/server/data/image_store/../../secret/key
/server/data/../secret/key
/server/secret/key
The best way to secure against this kind of attack is to limit the character set allowed in the input (X in our example). Often, input validation ensuring that the input is an alphanumeric string suffices to completely close the door. This works well because it excludes the troublesome file separator and parent directory forms needed to escape from the intended part of the filesystem.
However, sometimes that approach is too limiting. When it’s necessary to handle arbitrary filenames this simple method is too restrictive, so you have more work to do, and it can get complicated because filesystems are complicated. Furthermore, if your code will run across different platforms, you need to be aware of possible filesystem differences (for example, the *nix path separator is a slash, but on Microsoft Windows it’s a backslash).
Here is a simple example of a function that inspects input strings
before using them as subpaths for accessing files in the directory
that this Python code resides in (denoted by __file__
).
The idea is to
provide access only to files in a certain directory or its
subdirectories—but absolutely not to arbitrary files elsewhere. In the
version shown here, the guard function safe_path checks the input for
a leading slash (which goes to the filesystem root) or parent
directory dot-dot and rejects inputs that contain these. To get this
right you should work with paths using standard libraries, such as
Python’s os.path suite of functionality, rather than ad hoc string
manipulation. But this alone isn’t sufficient to ensure against
breaking out of the intended directory:
def safe_path(path):
"""Checks that argument path is a safe file path. If not, returns None.
If safe, returns the normalized absolute file path.
"""
if path.startswith('/') or path.startswith('..'):
return None
base_dir = os.path.dirname(os.path.abspath(__file__))
filepath = os.path.normpath(os.path.join(base_dir, path))
return filepath
The remaining hole in this protection is that the path can name a
valid directory, and then go up to the parent directory, and so on to
break out. For example, since the current directory this sample code
runs in is five levels below the root, the path
./../../../../../etc/passwd
(with five dot-dots) resolves to the
/etc/passwd
file.
We could improve the string-based tests for invalid paths by rejecting any path containing dot-dot, but such an approach can be risky, since it’s hard to be certain that we’ve anticipated all possible tricks and completely blocked them. Instead, there’s a straightforward solution that relies on the os.path library, rather than constructing path strings with your own code:
def safe_path(path):
"""Checks that argument path is a safe file path. If not, returns None.
If safe, returns the normalized absolute file path.
"""
base_dir = os.path.dirname(os.path.abspath(__file__))
filepath = os.path.normpath(os.path.join(base_dir, path))
if base_dir != os.path.commonpath([base_dir, filepath]):
return None
return filepath
This protection you can take to the bank, and here’s why. The base directory is a reliable path, because there is no involvement of untrusted input: it’s fully derived from values completely under the programmer’s control. After joining with the input path string, that path gets normalized, which resolves any dot-dot parent references to produce an absolute path: filepath. Now we can check that the longest common subpath of these is the intended directory to which we want to restrict access.