Fatskills
Practice. Master. Repeat.
Study Guide: Python Programming: Python Regular Expressions
Source: https://www.fatskills.com/python/chapter/python-programming-python-regular-expressions

Python Programming: Python Regular Expressions

By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.

⏱️ ~8 min read

You might have heard the term ‘Regular Expressions’ in UNIX where these are used to match or find other strings or sets of strings based on specialized syntax in the form of patterns. In the similar way, Python regular expression is a special sequence of characters that helps to match a string or sets of strings based on a particular pattern.

In Python, the module known as “re” provides the support for regular expressions in Python. If any error occurs while compiling or handling a regular expression in Python then this “re” module will raise an exception known as “re.error”.

There are two important functions in the “re” module.

They are “match” and “search” functions.

In the following Python regular expression examples we are going to use Raw Strings as “rexpression”.

The match Function
It is the function present in “re” module that matches the RE pattern to string with optional flags.

Syntax

re.match (pattern, string, flags=0)

Following is the description of these parameters.

 

PARAMETERS

DESCRIPTION

Pattern

It accepts the regular expression that to be matched.

String

This is the string, which would be searched to match the pattern at the beginning.

Flags

This exception is raised by the sys.exit () function.



The re.match function returns the matched object when the matching is successful and none when the matching fails.

After that, we can use “group (num)” or “groups ()” function on matched object to get matched expression.

 

Match Object Methods

DESCRIPTION

group (num=0)

This function returns entire match or specific subgroup num.

groups ()

This function returns all matching subgroups in a tuple. It will be empty if there aren't any.



Regular expression example for match function:


When we execute the above Python program, we will observe the following output:

The search Function
It is the function present in “re” module that searches for first occurrence of RE pattern within string with optional flags.

Syntax

 

re.search (pattern, string, flags=0)

 


Following is the description of these parameters:

 

PARAMETERS

DESCRIPTION

Pattern

It accepts the regular expression that to be matched.

String

This is the string, which would be searched to match the pattern anywhere.

Flags

This parameter is used to specify different flags using bitwise OR (|). These are the modifiers which are listed in the table below.

 


The re.search function returns the matched object when the matching is successful and none when the matching fails. After that, we can use “group (num)” or “groups ()” function on matched object to get the matched expression.

 

 

Match Object Methods

DESCRIPTION

group (num=0)

This function returns entire match or specific subgroup num.

groups ()

This function returns all matching subgroups in a tuple. It will be empty if there aren't any.



Regular expression example for search function:


When we execute the above Python program, we will observe the following output:

Match vs Search function of “re” module
Both of these functions are different primitive operations which do the matching of string or set of strings based on a particular pattern. The only difference is in their way of operation.

Regular expressions: match function checks for the matching pattern at the beginning of the string whereas Regular expressions: search function checks for the matching pattern anywhere in that string. If we compare Python language with the Perl language in term of matching of strings using regular expressions, then expressions: search is the default matching operation for the Perl language.

Search and Replace
Python “re” module has an important function known as “sub”. This function is used to do search and replace operations. Let’s understand this with the help of following example. Following is the syntax for this method.

Syntax

 

re.sub (pattern, replace, string, max=0)

 


This “sub” method or function replaces all occurrences of the Regular Expression pattern present in the string with “replace” string parameter, it will substitute all of the occurrences unless max limit is passed in the parameter. This method will return a modified string after matching regular expression substitution with “replace” string parameter.

 

PARAMETERS

DESCRIPTION

Pattern

It accepts the regular expression that to be matched.

Replace

It is the string which will replace or substitute the matching portion in the main String passed as a parameter.

String

This is the main string, which would be matched to match the pattern anywhere in the string.

Max

This is an optional parameter that defines the limit for maximum number of substitution with the matching pattern.



Let’s understand this “sub” method with the help of following example:


When we execute the above Python program, we will observe the following output:

Regular Expression Modifiers: Option Flags
Regular expression literals includes an optional modifiers that controls various aspects of matching. These optional modifiers are specified as an optional flag. We can supply multiple modifiers by using exclusive OR (|) operation. Following are the representation for such an operation.

 

 

MODIFIERS

DESCRIPTION

re.I

This modifier performs a case-insensitive matching.

re.L

This modifier interprets words according to the current locale. This type of interpretation affects the alphabetic group (\w and \W) as well as word boundary behavior (\b and \B).

re.M

This modifier makes $ match the end of a line, and not just the end of the string. It makes ^ match the start of any line, and not just the start of the string.

re.S

This modifier is used to make a period (dot) match with any character and it includes a newline as well.

re.U

This modifier interprets letters according to the Unicode character set and this flag affects the behavior of \w, \W, \b, \B.

re.X

This modifier permits "cuter" regular expression syntax. It ignores whitespace except those which are present inside a set [] or when escaped by a backslash. It treats un-escaped # as a comment marker.

 


Regular Expression Pattern Summary

 

PATTERN

DESCRIPTION

^

This pattern is used to match the beginning of line.

$

This pattern is used to match the end of line.

.

This pattern is used to match any single character except newline. Using m option allows it to match newline as well.

[...]

This pattern is used to match any single character in brackets.

[^...]

This pattern is used to match any single character not in brackets

re*

This pattern is used to match 0 or more occurrences of preceding expression.

re+

This pattern is used to match 1 or more occurrence of preceding expression.

re?

This pattern is used to match 0 or 1 occurrence of preceding expression.

re{ n}

This pattern is used to match exactly n number of occurrences of preceding expression.

re{ n,}

This pattern is used to match n or more occurrences of preceding expression.

re{ n, m}

This pattern is used to match at least n and at most m occurrences of preceding expression.

a| b

This pattern is used to match either a or b.

(re)

This pattern is used to group the regular expressions and remembers matched text.

(?imx)

This pattern will temporarily toggle on i, m, or x options within a regular expression. If it is present with in parentheses, then only that area is affected.

(?-imx)

This pattern will temporarily toggle off i, m, or x options within a regular expression. If it is present with in parentheses, then only that area is affected.

(?: re)

This pattern is used to group the regular expressions without remembering matched text.

(?imx: re)

This pattern will temporarily toggle on i, m, or x options within parentheses.

(?-imx: re)

This pattern will temporarily toggle off i, m, or x options within parentheses.

(?#...)

This pattern is used to match comment.

(?= re)

This pattern is used to specify the position using a pattern. It doesn't have a range.

(?! re)

This pattern is used to specify the position using pattern negation. It doesn't have a range.

(?> re)

This pattern is used to match the independent pattern without backtracking.

\w

This pattern is used to match the word characters.

\W

This pattern is used to match the non-word characters.

\s

This pattern is used to match the whitespace. Equivalent to [\t\n\r\f].

\S

This pattern is used to match the non-whitespace.

\d

This pattern is used to match the digits. Equivalent to [0-9].

\D

This pattern is used to match the non-digits.

\A

This pattern is used to match the beginning of string.

\Z

This pattern is used to match the end of string. If a newline exists, then it matches just before newline.

\z

This pattern is used to match the end of string.

\G

This pattern is used to match the point where last match finished.

\b

This pattern is used to match the word boundaries when outside brackets. It also matches backspace (0x08) when inside brackets.

\B

This pattern is used to match the non-word boundaries.

\n, \t, etc.

This pattern is used to match newlines, carriage returns, tabs, etc.

\1...\9

This pattern is used to match the nth grouped subexpression.

\10

This pattern is used to match the nth grouped subexpression if it matched already. Otherwise it will refer to the octal representation of a character code.

(?! re)

This pattern is used to specify the position using pattern negation. It doesn't have a range.