Command-line scripting can be a powerful approach to Python, but like all good things in programming, it can get complicated.
Whether you're looking to simply script on the command line or just want to use Python alongside command-line or other applications, you'll benefit from learning about the subprocess module.
You can do everything from running shell commands to launching GUI apps with it. Very convenient!
This guide will walk you through how processes and subprocesses work before exploring how you can use the subprocess module in everyday programming.
The Basics of Processes and Subprocesses
From the time you boot your computer to when you shut it off, you're always interacting with programs in some capacity. And processes are involved in every program you use.
You see, processes are an operating system's abstraction of a program. So, every program running on a machine has at least one corresponding process. Everything from your start menu to a video game is comprised of processes.
Contemporary operating systems report at least a few hundred, but typically a few thousand processes running at the same time. That said, CPUs have a limited number of cores, leaving a computer limited to processing a handful of instructions simultaneously.
So, how is a computer able to run thousands of processes?
The answer isn't as complicated as you might expect: operating systems are designed for multitasking effectively. To add to that, CPUs are extremely powerful and operate at the nanosecond timescale.
They are called the brain of a computer for good reason – one of them being that they are astronomically faster than other computer components. You might be surprised to learn that the typical hard disk is thousands of times slower at reading data than a CPU.
So, when a process wants to write data onto a hard disk, the CPU isn't actively involved in the task most of the time. It simply idles. And this is the case for most processes. But since CPUs can multitask, they're always busy tending to some instruction and hence aren't perpetually idling.
Another reason why operating systems are excellent at multitasking is that they are remarkably organized. An OS continually tracks every process using a process control block or process table. Besides file handles and address space references, you will also find information such as security context in this table.
The table is designed to aid the operating system in abandoning a process when the circumstance calls for it. The table allows this by storing all the information required for the CPU to continue working on the process at a later time.
A process is typically interrupted several thousand times during its execution, and the operating system manages to keep track of where the CPU stopped processing and where it needs to continue.
It's interesting to note that an OS doesn't load thousands of processes at startup. Most of the processes running on an OS are launched because you launch programs after booting it up.
Subprocesses and Their Relationship with Processes
When you're scripting on the command line, what you're doing is using the command line – which is a process – to start a Python application – which is also a process.
So, you're using one process to start another. The process that starts another is called the parent process. As you'd expect, the new process is called the child process or a subprocess.
Conveniently, parent and child processes don't typically have anything to do with each other. They run independently, and sometimes, the child inherits some resources from the parent.
But the relationship between processes and subprocesses can change as the circumstance demands. They might share resources, such as inputs and outputs. Other times, the parent process ends long before its subprocess does. When this happens, the subprocess is referred to as an orphan or zombie process.
That said, when any process has finished performing its task, it typically terminates. And when it does, it returns an integer called a "return code." It is also sometimes called the "exit status."
When a process returns zero, it means it performed its task successfully. If it returns any other integer, it indicates that the process failed.
Developers use the exit status to keep track of the reason why a process failed. Just like you can code a function to return a value in Python, the OS expects a return value from a process on its termination.
For this reason, the main() function in C programming is always written so it returns an integer – you might be familiar with this if you have some basic C programming skills.
Checking the Active Processes on Your Machine
In this section, we'll explore how you can view the processes running on your OS right now. We'll do this by looking at the process tree. Taking a look at how processes are structured will help you visualize how Python's subprocess module works.
There are several utilities available on every major operating system that enable you to see what processes are running on the machine.
On Windows
Launching the Task Manager is the most basic method of viewing the running processes on Windows. You might already be familiar with this. Hit Shift+Ctrl+Esc, and it will launch. Then, navigate to the "Processes" tab to view the processes.
To view a process tree, you can use a third-party app such as Process Hacker. You can download its executable here or install it using Chocolatey on PowerShell like so:
PS> choco install processhacker
Launch the app, and you will see the process tree right away.
On Linux
Every Linux distro offers a handful of command-line utilities that allow you to view the process tree. The best-known utility for this task is "top," which is installed by default on every Linux machine. Run it, and hit Shift+V to switch to "forest view."
You can also use "htop," "atop," "pstree," and "bpytop" to explore the process tree.
On macOS
Since macOS is UNIX-based, you can view the process tree using all the utilities that work on Linux. Another way to go about this is to use the Activity Monitor app. You'll find it in your utilities. If you decide to use the app, you will need to navigate to the View menu and choose "All Processes, Hierarchically" to find the process tree.
One solution to viewing the process tree that works on all operating systems is the psutil Python library.
Regardless of which operating system you're running and how you view the process tree, you'll notice that every process has a Process Identification Number associated with it. This unique integer allows the operating system to identify processes.
Besides the PID, you will likely see the resource usage of every process. The utilities typically show the RAM and CPU percentage in use. If a program starts hogging your resources, these are the indicators you look at to find the one causing the trouble.
Checking the resource utilization also helps develop and debug scripts that are using the subprocess module. Surprisingly, you won't need the PID or other information about which resources the processes within your code are using.
We'll be looking at some examples of using the subprocess module in the coming sections. It's a good idea to leave the process tree open to see the processes you run appear in it.
The Subprocess Module
The function of the subprocess module is to launch child processes. The module can launch both shell and GUI apps – what you use it for is up to you.
The module was proposed for 2.4 and accepted since it made an excellent alternative to the os module. The Python team has improved upon the module since, with the latest documented change noted in Python 3.8. The examples have been tested on 3.10.4, but you can use any version over 3.8 to run them.
It's worth noting that most of the time you use the subprocess module, you will use it through the run() method. It is a blocking method – a method that initiates a process and waits until the currently running process terminates. Only then is this process worked on by the CPU.
The official Python documentation recommends using the run() method for every case it is designed to handle, although the Popen class exists.
The Popen class is the underlying class for the entirety of the module. Meaning all the methods in the subprocess module are simply wrappers around the Popen() constructor, and of course, its underlying instance methods.
We'll discuss why the Python documentation suggests using run() over the Popen class towards the end of this guide.
In your experience with using the subprocess module to write solutions, you are bound to come across other relevant methods such as check_call(), call(), and check_output(). These belong to the subprocess module in Python 3.5, so everything these functions do, the run() method can do.
If you're wondering why these methods still exist, it's primarily for the convenience of backward compatibility. We won't be discussing how those older modules work in this guide.
It's worth noting that as powerful as the subprocess module is, it also has a fair amount of redundancy. You can do the same thing in several ways, and we'll stick to the most useful variations – mainly run() – in this guide.
How to Use the Subprocess Module
All the programs below use the standard Python library, so you need not get any dependencies on your machine. You also don't need a virtual environment to follow along with the tutorial.
Let's explore the fundamentals of the subprocess module with a simple command-line program that uses run().
Assume that you need to write a program that counts the number of seconds and functions like a timer. Here's what the code would look like:
from argparse import ArgumentParser from time import sleep parser = ArgumentParser() parser.add_argument("time", type=int) args = parser.parse_args() print(f"Starting timer of {args.time} seconds") for _ in range(args.time): print(".", end="", flush=True) sleep(1) print("Done!")
As you can see, we're using argparse to accept an integer argument from the user. The argument holds the number of seconds our timer program should wait before it terminates. The termination is handled using the sleep() method.
The program will print a dot every second till it reaches the five-second mark, then print "Done!"
This is a bare-bones program – it's nothing special.
However, you can see how it facilitates cross-platform processes that only work for a few seconds. You can easily tinker around with it and make it as complex as you like. If you were to use it as an independent executable, all you'd need to do is call it with the subprocess module.
But bear in mind calling a Python program with the module doesn't make sense. There's no need for Python modules to exist in separate processes since Python offers you the ability to import them directly.
The whole idea behind the subprocess module is to call programs that aren't in Python. The only reason we're using Python programs in our examples is that you likely already have it installed and that Python programs work across platforms.
You should note that although the module seems like the perfect way to achieve concurrency, this isn't the intended use case for it. There are other modules for that purpose, and we will discuss them towards the end of this guide.
Now that our timer program is ready, launch an interactive Python session and use the following code to call the program using the subprocess module, like so:
>>> import subprocess >>> subprocess.run(["python", "timer.py", "5"]) Starting timer of 5 seconds .....Done! CompletedProcess(args=['python', 'timer.py', '5'], returncode=0)
As you can see, we've imported the subprocess module and called run() with a list of strings. A few strings are supplied along with a single argument – the args parameter of run().
As expected, the program prints a dot on-screen for every passing second. When it terminates, it returns a CompletedProcess class instance.
If you were to run this program on the command line, you could use the following string:
$ python timer.py 5
But calling run() is different from calling a program on the command line since the method makes a system call. So, using the shell isn't necessary. run() demands that you pass the command as a sequence, as shown above. The items in the sequence represent the tokens by system calls to initiate processes.
Besides, shells work out the required tokenization without needing prompting, which is why you can write commands as a single long string on the command line.
That said, the subprocess module demands that you chunk up the command into tokens manually. So, your arguments, executable names, and flags must be one token each.
Using Subprocess to Run Any Application
Any app you can launch from the Start menu, you can start with the subprocess module. But you will need to know the exact name or path of the program for this.
Let's see how you can open Notepad on Windows using subprocess:
>>> subprocess.run(["notepad"]) CompletedProcess(args=['notepad'], returncode=0)
The command above will open a text editor window. Remember that the shell won't return the CompletedProcess until you close the window.
The subprocess module works the same way on macOS and Linux, too. On macOS, though, you will need to use the "open" launcher process to run TextEdit. For this reason, on that OS, CompletedProcess will appear right after you run the command.
It's worth noting that launcher processes run a process and then terminate. Some programs, notably web browsers, come with these launcher processes built-in. We won't be exploring these launchers in this guide.
What you should take away is that launcher processes can manipulate an OS's process tree and reassign process-subprocess relationships.
The CompletedProcess Object
The run() method always returns a CompletedProcess class instance when the associated subprocess terminates. The method is also exceptionally versatile, offering several attributes that can help you accomplish various tasks. You've already encountered the args and returncode attributes.
To closely observe how run() works, assign the result of the method to a variable. Let's see how you would do this for the returncode attribute:
>>> import subprocess >>> completed_process = subprocess.run(["python", "timer.py"]) usage: timer.py [-h] time timer.py: error: the following arguments are required: time >>> completed_process.returncode 2
In the script above, the exit status doesn't indicate success, and no exception is raised either. But when a child process fails, exceptions need to be raised, so the developers know what to do next. You can use the check argument to introduce an exception, like so:
>>> completed_process = subprocess.run( ... ["python", "timer.py"], ... check=True ... ) ... usage: timer.py [-h] time timer.py: error: the following arguments are required: time Traceback (most recent call last): ... subprocess.CalledProcessError: Command '['python', 'timer.py']' returned non-zero exit status 2.
There are several approaches to dealing with failed processes. We'll discuss a few solutions in the next section. What you must take away from this section is that the run() method won't raise an exception unless you use the right argument, as demonstrated above.
Communicating with Subprocesses
Most tasks involving the subprocess module will demand that you accept inputs dynamically. To help you understand how you can communicate with subprocesses, here's a quick overview of the standard I/O streams:
The I/O Streams in Python
The term "stream" in the context of Python refers to a sequence of elements. These elements aren't transmitted all at once.
When you read a file with Python, you're working with a stream. It's just that the stream presents itself as a file object. Coming to our point – when you initialize any subprocess, it uses three standard streams:
- stdin: For reading the input
- stdout: For writing the output
- stderr: To report errors
A child process may inherit these streams from its parent, which is exactly what happens when you run subprocess.run() on a REPL. The subprocess inherits the stdout stream of Python's interpreter to pass outputs.
Simply put, the REPL is a CLI process that uses all three of the standard I/O streams.
Capturing the Output of the run() Method
Let's say you're working with a number generator for a project and don't have access to the program that generates it. You would use the subprocess's stdout stream in your wrapper program to read this value like so:
from random import randint print(randint(0, 1000))
You can also grab the output using the capture_output argument in run(), like so:
>>> import subprocess >>> magic_number_process = subprocess.run( ... ["python", "magic_number.py"], capture_output=True ... ) >>> magic_number_process.stdout b'769\n'
What's happening here is that the capture_output argument is making the subprocess's output available at .stdout.
See how the value is returned as a bytes object? This makes it necessary to carefully approach reading the output, since encoding is involved.
After the code has run, the .stdout attribute stops being a stream since it's been read and stored in the .stdout attribute as a bytes object.
But now that the output is available, you can employ several subprocesses to grab values and operate on them. Here's how that would work:
>>> import subprocess >>> sum( ... int( ... subprocess.run( ... ["python", "magic_number.py"], capture_output=True ... ).stdout ... ) ... for _ in range(2) ... ) 1085
In the code above, the int() constructor decodes the bytes object automatically. One of the best ways to handle encoding and decoding when using the subprocess module is to put the module in text mode. You can do this by passing text=True.
Though the module will attempt to use the default encoding, it's a good idea to explicitly state what encoding to use using the "encoding" argument.
Exceptions in the Subprocess Module
Most use cases of the subprocess module aren't long and complicated. You can expect to write a few short scripts with it and not spend much time using it.
But whenever you write scripts that involve the module, you must ensure your subprocesses fail early and raise exceptions, so you don't face problems later on.
The CalledProcessError
When the exit status of a subprocess isn't zero, you must conclude that it failed.
As demonstrated earlier, you can use the check=True argument to introduce exceptions – more specifically, the CalledProcessError. This error appears when the subprocess returns a non-zero return code. And this simple exception might be enough for you if the script you've written is short and personal.
However, if you're interested in handling the error more elegantly, the section on exception handling below can help you.
It's important to note that this error doesn't appear when a subprocess hangs and blocks execution indefinitely. To avoid this situation, it makes sense to use the timeout parameter with run().
The TimeoutExpired Error
Not every subprocess you create will behave well. Some subprocesses might take too long to execute. Others might refuse to execute at all. In such situations, it's best to use the timeout parameter.
If you pass "timeout=1" to the run() method, the function will terminate after a second, raising the TimeoutExpired error like so:
>>> import subprocess >>> subprocess.run(["python", "timer.py", "5"], timeout=1) Starting timer of 5 seconds .Traceback (most recent call last): ... subprocess.TimeoutExpired: Command '['python', 'timer.py', '5']' timed out after 1.0 seconds
Notice the single dot behind "Traceback" in the output above. That dot is the expected output of the timer program we discussed earlier in this guide. However, Python terminated the subprocess before it could finish running.
The FileNotFoundError
This is the third and final exception you will encounter when working with the subprocess module. It appears when you call a program that doesn't exist. Let's take a look:
>>> import subprocess >>> subprocess.run(["now_you_see_me"]) Traceback (most recent call last): ... FileNotFoundError: The system cannot find the file specified
The nice thing about this type of error is that it doesn't require you to use an argument to indicate to the subprocess module that it must be raised. It appears automatically when the situation demands it.
Most use cases of the subprocess module demand that you use the check and timeout arguments. Knowing to use these arguments is sufficient to use the module well. The idea is to always be in the know when your subprocesses fail. Because if the subprocess fails, there's likely an issue with your script.
If you intend to use the subprocess module in complicated scripts that call several processes over long periods, it's best to rely on the try … except construct to handle your exceptions.
Handling Exceptions in the Subprocess Module
Below is some code that demonstrates how to handle the three primary exceptions discussed above:
import subprocess try: subprocess.run( ["python", "timer.py", "5"], timeout=10, check=True ) except FileNotFoundError as exc: print(f"Process failed because the executable could not be found.\n{exc}") except subprocess.CalledProcessError as exc: print( f"Process failed because did not return a successful return code. " f"Returned {exc.returncode}\n{exc}" ) except subprocess.TimeoutExpired as exc: print(f"Process timed out.\n{exc}")
Modules Associated with the Subprocess Module
Learning about some of the modules associated with the subprocess module will help you decipher which tasks are a good fit for the module and which aren't.
Before the subprocess module was introduced, the os.system() method was the go-to method for running commands. Today, several standard library modules have replaced the os module, so this module is typically only used internally by the Python development team.
There is nothing you can do with the os module today that another module cannot achieve. There's a page in the official Python documentation discussing some ways of using the os module. You can check it out if you're interested in applying some of the old ways on the subprocess module.
If you're hotfixing something or writing a simple script for personal use, using the subprocess module to achieve concurrency is not a bad idea. However, using the multiprocessing module is among the best ways to achieve concurrency.
Depending on the solution you're trying to create, you can also consider using the threading or asyncio modules to achieve concurrency. If the entirety of your solution is written in Python, these two modules are your best bet.
The asyncio module has a high-level API that allows you to create and manage subprocesses. So, this is the module to use if you're looking to control non-Python parallel processes.
The Popen Class and Popen() Constructor
Early on in this guide, we mentioned that the Popen class is the underlying class of the subprocess module. Every method in the subprocess module calls the Popen() constructor, and using the constructor gives you control over the subprocesses.
What you'll now find interesting is that the run() method is really a Popen() class constructor, plus some setup and a call to .communicate(), which is a blocking method that returns the stderr and stdout data when the subprocess terminates.
The term "Popen" originates from a UNIX command that means "pipe open." This command creates a pipe and initializes a new process that invokes the shell. The difference between that command and the subprocess module is that subprocess doesn't invoke the shell.
So, the run() method is a blocking function, meaning it cannot dynamically interact with a process. That said, the Popen() constructor starts a process and leaves it running in parallel.
Using the Popen() Constructor
The Popen() constructor looks similar to the run() method when in use. Additionally, the constructor works with virtually every argument that the run() method works with.
The primary difference between the two is that Popen() isn't a blocking function. It doesn't wait for the process to finish. Rather, it runs it in parallel. So, if you decide to use Popen(), you must remember this characteristic if you intend to read the process's output.
Let's see how the timer program we wrote earlier would look like if we used Popen():
import subprocess from time import sleep with subprocess.Popen( ["python", "timer.py", "5"], stdout=subprocess.PIPE ) as process: def poll_and_read(): print(f"Output from poll: {process.poll()}") print(f"Output from stdout: {process.stdout.read1().decode('utf-8')}") poll_and_read() sleep(3) poll_and_read() sleep(3) poll_and_read()
The code above calls the timer process in a context manager and then assigns stdout to a pipe. Next, it uses .poll() on the Popen object and reads the stdout.
A note on the .poll() method: it's a simple method that checks if a process is running and if yes, it returns "None." Else, it returns its exit code.
Coming back to the program above, we then use the .read() method to read all the available bytes at .stdout.
When you run the code, you will see it first prints None, then whatever is available at stdout. You will see the initial message of the program and then the first dot.
When three seconds pass, the timer won't have finished, so the program returns None and two more dots. Another three seconds pass, and the process will have terminated. Now, the .poll() method generates a 0 along with the last characters and "Done!"
In most real-life use cases, you won't need this much control over your program. In the final section below, we'll see how you can pipe one process into another.
Connecting Two Processes Using Pipes
Using the run() method won't work if you want to connect two processes since it's a blocking call. The idea of a blocking call is that a new process starts only when the previous one has ended. This makes linking the stdout streams a non-possibility.
However, you can use the Popen() constructor for this purpose since it offers that flexibility. Piping isn't common in Windows systems, so the following example is only demonstrated for UNIX-based systems:
import subprocess ls_process = subprocess.Popen(["ls", "/usr/bin"], stdout=subprocess.PIPE) grep_process = subprocess.Popen( ["grep", "python"], stdin=ls_process.stdout, stdout=subprocess.PIPE ) for line in grep_process.stdout: print(line.decode("utf-8").strip())
In this program, we're starting two processes in parallel and joining them with a pipe. The loop reads the pipe at stdout and supplies the output.
It's important to note that the Popen() constructor returns a Popen object. It doesn't return CompletedProcess object like run(). More importantly, the Popen object points to the real I/O streams, whereas the CompletedProcess object points to the bytes strings or objects.
This is what makes it possible for you to communicate with processes while they run.
Connecting pipe processes is not always necessary, though. Before you decide to use Popen, think about whether you'll lose much if you use run() exclusively and mediate the process with Python.
Conclusion
Now that you've understood the ins and outs of the subprocess module, it should be easy to decide whether it's the right fit for the problem you want to solve. It should also be easy for you to decide whether you want to invoke the shell or not.
Besides, you're now also prepared to manipulate processes as you see fit using the Popen() constructor.