Last Updated: Wednesday 14th August 2013

When you use a scripting language like Python, one thing you will find yourself doing over and over again is walking a directory tree, and processing files. While there are many ways to do this, Python offers a built-in function that makes this process a breeze.

Basic Python Directory Traversal

Here's a really simple example that walks a directory tree, printing out the name of each directory and the files contained:

os.walk takes care of the details, and on every pass of the loop, it gives us three things:

  • dirName: The next directory it found.
  • subdirList: A list of sub-directories in the current directory.
  • fileList: A list of files in the current directory.

Let's say we have a directory tree that looks like this:

+--- test.py
|
+--- [subdir1]
|     |
|     +--- file1a.txt
|     +--- file1b.png
|
+--- [subdir2]
|
+--- file2a.jpeg
+--- file2b.html

The code above will produce the following output:

Changing the Way the Directory Tree is Traversed

By default, Python will walk the directory tree in a top-down order (a directory will be passed to you for processing), then Python will descend into any sub-directories. We can see this behaviour in the output above; the parent directory (.) was printed first, then its 2 sub-directories.

Sometimes we want to traverse the directory tree bottom-up (files at the very bottom of the directory tree are processed first), then we work our way up the directories. We can tell os.walk to do this via the topdown parameter:

Which gives us this output:

Now we get the files in the sub-directories first, then we ascend up the directory tree.

Selectively Recursing Into Sub-Directories

The examples so far have simply walked the entire directory tree, but os.walk allows us to selectively skip parts of the tree.

For each directory os.walk gives us, it also provides a list of sub-directories (in subdirList). If we modify this list, we can control which sub-directories os.walk will descend into. Let's tweak our example above so that we skip the first sub-directory.

This gives us the following output:

We can see that the first sub-directory (subdir1) was indeed skipped.

This only works when the directory is being traversed top-down since for a bottom-up traversal, sub-directories are processed before their parent directory, so trying to modify the subdirList would be pointless since by that time, the sub-directories would have already been processed!

It's also important to modify the subdirList in-place, so that the code calling us will see the changes. If we did something like this:

... we would create a new list of sub-directories, one that the calling code wouldn't know about.

For a more comprehensive tutorial on Python's os.walk method, checkout the recipe Recursive File and Directory Manipulation in Python. Or to take a look at traversing directories in another way (using recursion), checkout the recipe Recursive Directory Traversal in Python: Make a list of your movies!.