A Quick Regex Overview

Types

Search matches the string. Match is like starts with.

The r before the “” ignores escape characters. re.split(r””)

^ is starts with and $ is ends with

Character Repetition:

ab?c (? means we can have 0 or 1 occurences of b in the string)

ab*c (* means we can have any number of b’s in the string (0-infinite))

ab+c (+ means we can have one or more occurences of b)

[0-9]{3} ({3} means, take the next 3 numeric characters)

Extract values between two commas

Input text: idnid,10GB,fdbf

Regex: [,](\w{1,6})[,]

Output: One of the output groups includes 10GB

Second input text: jbbf:fbdbd,rollover:3392, iofdfdfd:8438043

Regex: rollover[:](\w{1,6})[,]

Output: one groupo will be 3392

Example Regex in Python

import re

telephone = ‘+4454587982641’

match = re.search(r”(([+][0-9]{2})([0-9]{4})([0-9]*))”, telephone)

#print different groups

print(‘Country Code:’, match[2])

print(‘Local Code:’, match[3])

print(‘The Rest:’, match[4])

Output:
+4454587982641

+44

5458

7982641

Extract elements from phone number:

Regex:

([+][0-9]{2})([0-9]{4})([0-9]*)

Input Text:

+447958425686

Outcomes:

Group 1 = +44

Group 2 = 7958

Group 3 = the rest

Split strings at multiple delimiters

This will split the string when any of the characters in the square brackets appear.

line = ‘kieran:jacek,bartek;99’

print(re.split(r”[,;.:]”, line))

if word in string then:

line = ‘kieran:jacek,bartek;99’

if re.search(r’kieran|bob|clive’, line):

    print(re.split(r”[,;.:]”, line))

Output: kieran is found in ‘line’. So, it prints.

Match a string with between 1 and 6 characters and then a dot.

\D{1,6}[.] = \D looks for letters

\d{1,6}[.] = \d looks for numbers

\w{1,6}[.] = \w any a-z, A-Z or 0-9 characters

Case sensitive

We can ignore the case of the string being searched

if re.search(r’kieran|bob|clive’, line, re.IGNORECASE):

    print(re.split(r”[,;.:]”, line))

Find all occurences

Use re.findall

import re

line = ‘jbbf:fbdbd,rollover:3392, iofdfdfd:8438043, rollover:5585’

match = re.findall(r”rollover[:](\w{1,6})”, line)

print(match)