"
This article is part of in the series
Published: Friday 1st March 2024

How To Parse a String in Python: A Step-by-Step Guide

Python programmers often use the string data type to store and modify text as needed. 

Sometimes, developers find themselves needing to extract some specific information from strings. For example, a programmer may need to extract all the URLs present in a block of text. This process is referred to as parsing a string. 

Python offers several methods to parse strings, including string functions, parsing libraries, and regular expressions. 

In this short article, we cover the various methods of parsing strings in Python.

The Three Ways to Parse Strings in Python

The three most popular methods of parsing in Python are:

  1. String methods: Modifying and manipulating strings is easy with the many built-in functions in Python. These functions make it easy to split strings into smaller pieces and search for specific substrings in them. You can then replace parts of the strings with other values.
  2. Parsing libraries: Python has a plethora of parsing libraries that can extract information stored in structured formats like XML, JSON, and CSV.
  3. Regular expressions: These expressions help you create complex patterns and look for them in substrings within a larger string.

Let's explore these three techniques in closer detail with some examples. You will then be ready to parse strings in Python!

How to Split Strings in Python

If you need to break a string into smaller parts, you can use one of the three functions below. 

#1 The split() Function

Most developers rely on the split() function to break down strings into smaller pieces. The function accepts a delimiter as an argument before returning a list of separated substrings.

Here's a simple example:

statement = "This is a statement."

parts = statement.split()

print(parts)

 

The output of the code is:

['This', 'is', 'a', 'statement.']

The split() function uses whitespaces as delimiters by default. If you want to use a different delimiter, you can pass it as an argument. Here's what this looks like:

statement = "This,is,a,statement."

parts = statement.split(“,”)

print(parts)

 

The code gives the output:

['This', 'is', 'a', 'statement.']

#2 The partition() Function

The partition() function works similarly to the split() function, except it returns tuples instead of substrings. Interestingly, the tuple it returns has three elements only: the string before the delimiter, the delimiter, and the string after the delimiter. 

Let's call the partition() function in a simple example:

textFile = “example.txt”

fileName, _, extension = textFile.partition(“.”)

print(fileName)

print(extension)

 

The code above splits the textFile string into two strings: the string before the delimiter and the string after the delimiter. 

#3 The rsplit() Function

You can think of the rsplit() function as doing the opposite of the split() function. Using rsplit() will split the strings from the end to the beginning instead of the other way around.

Here's an example of the rsplit() function breaking a sentence into three parts:

statement = "This is a statement."

parts = statement.rsplit(maxsplit=2)

print(parts)

 

The code gives the output:

['This is', 'a', 'statement.']

As you can see, the rsplit() function breaks the sentence into three parts, with the first part having two words. This is a result of using the maxsplit argument. 

How to Slice Strings in Python

Sometimes, developers need to extract a portion of a string to process it further. Extracting a specific portion of a string is called "slicing." 

Though there are many ways to slice strings in Python, most developers use one of the following two methods:

#1 Using Indexing

One fact that not every new Python programmer knows is that every string is indexed, starting with the number 0. In other words, the first character in a string has an index of 0, the second has an index of 1, and so on.

You can slice a string by specifying the start and end indices of the string you require. You must place a colon between these indices, and the indices themselves must be within square brackets. 

Here's an example:

fullString = "Examples are always helpful"

sliceOfText = fullString[12:27]

print(sliceOfText) 

 

This script gives the output:

always helpful

Did you notice that the extracted substring in the output does not include the character at the ending index?

Remember that the character at the last index doesn't get parsed when parsing by index.

#2 Using slice()

The slice() function comes built-in with Python and accepts three arguments, of which two are mandatory: the beginning index and the end index of the slice. 

The third argument is the optional step value, which defaults to one when not specified. When you use slice(), it returns a slice object that you can use to extract the substring you want.

Here's how:

fullString = "Examples are always helpful"

sliceOfText = slice(12,27)

print(fullString[sliceOfText])

 

This script gives the output:

always helpful

Bear in mind that the slice() method is most helpful when you need to use a substring several times across your program. You can use the substring repeatedly by assigning the slice object to a named variable, then use the variable as required.

How to Use Regular Expressions to Parse Strings

Regular expressions are helpful in parsing strings as they are sequences of characters that define search patterns. In simple words, you can use these expressions to find specific combinations of characters in a string. 

Python comes with a module called "re" that enables the use of regular expressions. It has several functions that help you find, change, and modify strings.

You must import the "re" module to use regular expressions in Python. Let's look at how you can find some email addresses in a string using regular expressions:

import re

sample_text = "You can reach me at [email protected] or [email protected]"

email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b'

found_emails = re.findall(email_pattern, sample_text)

print(found_emails)

 

Conclusion

Parsing strings is a basic programming skill every developer should possess. We've covered many useful techniques in this article, so extracting data from strings should now be easy.

Just remember that you must pick a parsing method depending on the type of data you are working with.