This article is part 2 of 4 in the series Python Regular Expressions Tutorial

Last Updated: Thursday 12th December 2013

In the first part of this series, we looked at the basic syntax of regular expressions and some simple examples. In this part, we'll take a look at some more advanced syntax and a few of the other features Python has to offer.

Regular Expression Captured Groups

So far, we've searched within a string using a regular expression and used the returned MatchObject to extract the entire sub-string that was matched. Now we'll look at how we can extract parts within the sub-string that was matched.

This regular expression:

Will match a date with the following format:

  • A 2-digit date.
  • A hyphen.
  • A 2-digit month.
  • A hyphen.
  • A 4-digit year.

For example:

We can capture various parts of this regular expression by putting them in parentheses:

If Python matches this regular expression, we can then retrieve each captured group separately.

When you start writing more complex regular expressions, with lots of captured groups, it can be useful to refer to them by a meaningful name rather than a number. The syntax is (...), where ... is the regular expression to be captured, and name is the name you want to give to the group.

Re-using Captured Groups with Regular Expressions

We can also take captured groups and re-use them later in the regular expression! (?P=name) means match whatever was previously matched in the named group. For example:

Python Regular Expression Assertions

Sometimes we want to match something only if it is followed by something else, which means that Python needs to peek ahead as it is searching the string. This is called a look-ahead assertion and the syntax is (?=...), where ... is a regular expression for what needs to follow.

In the example below, the regular expression ham(?= and eggs) means match 'ham' but only if it is followed by ' and eggs'.

Note that the matched sub-string is only ham, and not ham and eggs. The and eggs part is simply a requirement for the ham part to be matched. Let's see what happens if this requirement is not met.

Unfortunately, Python only does simple character matching and will only match the string ham, as long as it is followed by and eggs. Artificial intelligence and semantic analysis is a whole 'nother article. :)

We can also do negative look-ahead assertions, that is, an element matches only if it is not followed by something else.