Moses Odhiambo
2 Feb 2021
•
4 min read
Regular expressions are a part of programming most developers don’t enjoy or entirely avoid. I would not blame them since it can be boring as hell and a cumbersome topic to maneuver. None the less it is an important skill to gain and it may be one of the last bridges to cross to become a “Pro” developer.
It is a structure that make sure a string matches a pattern. A good example is validating a password that should be between 4 and 12 characters, contain at least one digit, contain at least one uppercase and one lowercase character.
I will be using a simple online tool regex101.com, which I highly recommend. The first thing I will instruct you to do is select ECMAScript(JavaScript) item on the left menu in the Flavor section.
You will notice 2 fields: (Regular expression) where we type in our RegEx and (Text String) where we type in our word samples.
Notice the forward slash at the start and the end of the Text String input with a trailing “g” or “gm”. By clicking the flag, a dropdown will appear with multiple options. I prefer to uncheck them all options before you begin.
Below is how we format regular expressions in JavaScript. Note that the expression is similar in all languages :).
Note: In my examples, there will be a starting and ending forward slashes.
1 Simple pattern
Lets start with something that is very simple. Create a RegEx that looks for the word hello in the text “hello world”. The regular expression will be:
/hello/
The expression checks if the word hello exists in the entire text and finds only one match. Changing the text to “hell” will fail to match.
Changing the text to “hello world and hello humans” will still find only one match this is because it is only looking for the 1st instance of “hello”. To find all instances we need to update it to:
/hello/g
Add the flag “g” that stands for global, this will allow us to look for more than one instance. We will discover new flags as we move on. In regex101 simply select the flag icon to display the dropdown and select global option.
When you change the text to “Hello world”, the match also fails. This is because the regular expressions are case sensitive. We can use the “i” flag to fix this. In regex101 simply select the flag icon to display the dropdown and select insensitive option.
/hello/gi
Now a match is found.
Now lets get things a little more interesting by finding a match of the words “cat” and “rat” in the sentence “The rat killed the cat with a hat”. You will notice both words have a similar last 2 characters(at) but different 1st character. Have an an expression that allows either the first character to be “c” or “r”.
/(c|r)at/gi
Simply wrap the variable characters in a parenthesis () and separate each character with a vertical bar |. You can have as many characters as you want (c|r|p|g).
There is another way to do this, which I tend to use most of the time.
/\[cr\]at/gi
Simply wrap the variable characters in square brackets []. You will not need to separate each character in this case. You can have as many characters as you want [crpg].
In all these cases we achieve a match on both words.
In the previous text we can choose to match the word “rat” and exclude all other words ending with “at”.
/\[^ch\]at/gi
Notice the caret ^ after the opening square bracket [. Every character following it will be excluded. Only the word “rat” will match
Now lets imagine we choose all words ending with “at” by allowing all alphabetic characters in the first position.
/\[abcdefghijklmnopqrstuvwxyz\]at/gi
It works but it is certainly not neat and may be prone to errors. Instead we can use a range.
/\[a-z\]at/gi
Ranges are values separated using a hyphen — . They can span from any point: c to x [c-x] or g to p [g-p]. Ranges can also be uppercase [A-Z] or numerical [0–9]. Remove the “i” option to test.
/\[A-Z\]at/g
/\[0-9\]at/g
To match both uppercase and lowercase range of characters we can use the or operator we learnt earlier.
/\[a-zA-Z\]at/g
Now lets match any word that has 5 characters only.
/\[a-z\]{5}/gi
Simply wrap the number of characters requires in curly brackets {5}. Any word below 5 characters will not match. Additionally, if the word length is above 5 or a multiple of 5 it will match as a separate chunk.
We can also match any word that is at least 5 characters.
/\[a-z\]{5,}/gi
A comma , is added after the number in this case. All characters that are more than 5 characters will be matched in one chunk.
Lastly we can match any word that is between 5 to 12 characters.
/\[a-z\]{5,12}/gi
The max and min lengths are separated with a comma. Any word below 5 and above 12 characters will not match. Matches will be chunked depending on the length of the word.
They are characters in RegEx that have a special meaning. They start with a backslash \ followed by a character. Here are the most common ones.
\\d, match any digit character(0-9)
\\w, match any word character(a-z, A-Z, 0-9 and \_')
\\s, match any whitespace character(spaces, tabs)
\\t, match a tab character only
Let us have a word with 5 characters
/\\w{5}/
Let us have a number between 4 and 10 digits
/\\d{4,10}/
Let us have a word with 6 characters and ends with 2 spaces
\\w{5}\\s{2}
You have a license to play around and get comfortable with them.
I this post, we are getting comfortable with the basics of regular expressions from a beginner level. Take your time and practice every step.
This post is the first of a two. Stay tuned to get the next one.
Have fun and recommend the article if you liked it 👍
Ground Floor, Verse Building, 18 Brunswick Place, London, N1 6DZ
108 E 16th Street, New York, NY 10003
Join over 111,000 others and get access to exclusive content, job opportunities and more!