JavaScript JunkiesJavaScript Junkies Unleash Your Coding Superpowers with JavaScript Junkies

A Practical Guide to Regular Expressions (RegEx) In JavaScript

If you have ever worked with text data, chances are you have heard of regular expressions, or “regex” for short. Regular expressions are a powerful tool for pattern matching and text manipulation. In this guide, we will cover the basics of regular expressions, as well as some more advanced topics.

What are Regular Expressions?

A regular expression is a sequence of characters that define a search pattern. It is used to match and manipulate text based on certain patterns. Regular expressions are supported by many programming languages, text editors, and command-line tools.

How to Create A Regular Expression

Creating a regular expression involves identifying a pattern that you want to match in a string, and then using special characters and syntax to define that pattern in a regular expression. Here are the basic steps to create a regular expression:

Identify the pattern

 Start by identifying the specific pattern you want to match in a string. This could be a word, phrase, or set of characters that follow a certain structure.

Choose the regular expression syntax

Once you’ve identified the pattern you want to match, choose the regular expression syntax that will help you define that pattern. For example, you might use character classes, quantifiers, or anchors to match specific types of characters or sequences.

Write the regular expression

Using the chosen syntax, write the regular expression pattern that will match the desired pattern in the string. This might involve using special characters like parentheses, brackets, or backslashes to define the pattern.

Test and refine the regular expression

After writing the regular expression, test it on a variety of different strings to ensure that it matches the desired pattern correctly. You may need to refine the regular expression by tweaking the syntax or pattern until it matches exactly what you want.

Basic Regular Expression Syntax

Regular expressions consist of a combination of characters and metacharacters. Characters are matched literally, while metacharacters have special meanings. Here are some basic metacharacters:

. –Matches any single character except newline.
* – Matches zero or more occurrences of the preceding character or group.
+ –Matches one or more occurrences of the preceding character or group.
? – Matches zero or one occurrence of the preceding character or group.
\ – Escapes the following character, treating it as a literal character rather than a metacharacter.

Advanced Regular Expression Topics

Regular expressions can be quite complex, and there are many advanced topics to explore. Here are a few examples:

Character Classes

Character classes in regular expressions are used to match specific sets of characters. Here are some common character classes and examples of how to use them:

[abc] Matches any of the characters “a”, “b”, or “c”. For example, the regular expression b[ae]g would match “bag” or “beg”, but not “big” or “bug”.
[a-z]Matches any lowercase letter. For example, the regular expression [a-z]+ would match one or more lowercase letters.
[A-Z]Matches any uppercase letter. For example, the regular expression [A-Z]+ would match one or more uppercase letters.
[0-9]Matches any digit. For example, the regular expression [0-9]+ would match one or more digits.
[^abc] Matches any character except “a”, “b”, or “c”. For example, the regular expression [^abc]+ would match one or more characters that are not “a”, “b”, or “c”.
[\w] Matches any word character. This is equivalent to [a-zA-Z0-9_]. For example, the regular expression [\w]+ would match one or more word characters.
[\W]Matches any non-word character. This is equivalent to [^a-zA-Z0-9_]. For example, the regular expression [\W]+ would match one or more non-word characters.
[\d]Matches any digit. This is equivalent to [0-9]. For example, the regular expression [\d]+ would match one or more digits.
[\D]  Matches any non-digit character. This is equivalent to [^0-9]. For example, the regular expression [\D]+ would match one or more non-digit characters.
[\s] Matches any whitespace character. For example, the regular expression [\s]+ would match one or more whitespace characters.
[\S]Matches any non-whitespace character. For example, the regular expression [\S]+ would match one or more non-whitespace characters.

Using character classes in regular expressions can make your pattern matching much more flexible and powerful.

Anchors

Anchors are used to match specific positions in the text. Here are some common anchors:

  • ^ – Matches the start of the line.
  • $ – Matches the end of the line.
  • \b – Matches a word boundary.

Groups

Groups are used to group together multiple characters or expressions. They are enclosed in parentheses. Here are some examples:

  • (abc) – Matches the sequence “abc”.
  • (a|b) – Matches either “a” or “b”.
  • (?:abc) – Matches the sequence “abc”, but does not create a capturing group.

Lookarounds

Lookarounds are used to look ahead or behind the current position in the text without consuming any characters. Here are some examples:

(?=abc)Matches the current position if it is followed by the sequence “abc”.
(?<=abc)Matches the current position if it is preceded by the sequence “abc”.
(?!abc)Matches the current position if it is not followed by the sequence “abc”.
(?<!abc) Matches the current position if it is not preceded by the sequence “abc”.

Quantifiers 

Quantifiers in regular expressions are used to specify the number of times that a character or group of characters should appear in a match. Here are some common quantifiers and examples of how to use them:

* –Matches zero or more occurrences of the preceding character. For example, the regular expression ab*c would match “ac”, “abc”, “abbc”, and so on.
+ – Matches one or more occurrences of the preceding character. For example, the regular expression ab+c would match “abc”, “abbc”, “abbbc”, and so on.
? – Matches zero or one occurrence of the preceding character. For example, the regular expression ab?c would match “ac” or “abc”.
{n} Matches exactly n occurrences of the preceding character. For example, the regular expression a{3} would match “aaa”.
{n,}Matches n or more occurrences of the preceding character. For example, the regular expression a{2,} would match “aa”, “aaa”, “aaaa”, and so on.
{n,m}Matches between n and m occurrences of the preceding character. For example, the regular expression a{2,4} would match “aa”, “aaa”, or “aaaa”, but not “a” or “aaaaa”.
. –Matches any single character except a newline character. For example, the regular expression a.c would match “abc”, “adc”, “aec”, and so on.

Quantifiers can make your regular expressions much more powerful and flexible. By specifying the number of occurrences of a character or group of characters, you can match more specific patterns and make your regular expressions more precise.

Backreferences

In regular expressions, a backreference allows you to refer to a previously matched group and use it in another part of the pattern. This can be useful for finding repeated patterns or for replacing parts of a string.
Backreferences are created by using a backslash followed by the number of the group you want to reference. The group is usually enclosed in parentheses in the pattern. Here’s an example:

(\w+)\s+\1

This pattern would match any word that appears twice in a row, separated by one or more whitespace characters. The \w+ part of the pattern matches one or more word characters (letters, numbers, or underscores), and the \s+ part matches one or more whitespace characters. The (\w+) part is enclosed in parentheses to create a capturing group, and the \1 backreference refers to the first capturing group.

Substitutions

In JavaScript, substitutions with regular expressions are performed using the replace() method on a string. This method takes two arguments: the pattern to match, and the replacement string.

Here’s an example:

let input_str = "The quick brown fox jumps over the lazy dog";
let output_str = input_str.replace(/fox/, "cat");
console.log(output_str);

This code would output the string “The quick brown cat jumps over the lazy dog”. The regular expression /fox/ matches the word “fox” in the input string, and the replacement string “cat” is used to replace it.

Regular Expression syntax

/^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/ 

^: matches the start of a string
[a-zA-Z0-9._%+-]+: matches one or more alphanumeric characters, plus some special characters that are commonly allowed in email addresses (like dots, underscores, and hyphens)
@ matches the “@” symbol
[a-zA-Z0-9.-]+: matches one or more alphanumeric characters, plus dots and hyphens, to match the domain name
. : matches a literal dot
[a-zA-Z]{2,: matches two or more alphabetic characters to match the top-level domain
$ : matches the end of the string

What are Regular Expression Flags?

Regular expression flags are additional options that can be used to modify the behavior of a regular expression when matching a string. Here are some of the most common regular expression flags:

g: Global flag, which tells the regular expression to match all occurrences of the pattern in the input string, rather than just the first occurrence.
i:Case-insensitive flag, which tells the regular expression to match characters regardless of case. For example, /abc/i would match “abc”, “ABC”, “aBc”, and so on.
m: Multiline flag, which tells the regular expression to match the start and end of each line in a multiline string, rather than just the start and end of the whole string.
s:Dot-all flag, which tells the regular expression to treat the dot character (.) as matching any character, including line breaks.
u:Unicode flag, which tells the regular expression to treat the pattern and input string as Unicode strings, rather than ASCII.
y:Sticky flag, which tells the regular expression to start matching at the current position in the input string, rather than at the beginning.

To use a regular expression flag, you simply append it to the end of the regular expression, separated by a forward slash. For example, the regular expression /hello/gi would match all occurrences of the string “hello” in a case-insensitive manner.

Regex Examples

Matching a phone number in the format (555) 555-5555

/^\(\d{3}\)\s\d{3}-\d{4}$/

Explanation:

  • ^ and $ match the start and end of the string, respectively.
  • \( and \) match literal parentheses.
  • \d{3} matches three digits.
  • \s matches a single whitespace character.
  • \d{4} matches four digits.

Matching an email address

/^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$/

Explanation:

  • ^ and $ match the start and end of the string, respectively.
  • [\w-\.]+ matches one or more word characters, hyphens, or periods.
  • @ matches a literal at sign.
  • ([\w-]+\.)+ matches one or more instances of a group of word characters and a period.
  • [\w-]{2,4} matches two to four word characters, which is the TLD portion of the email address.

Matching a URL starting with “http://” or “https://”:

/^(https?:\/\/)?[a-z0-9]+\.[a-z]+(\/[a-z0-9\-._~:\/\?#\[\]@!$&'()*+,;=]*)?$/i

Explanation:

  • ^ and $ match the start and end of the string, respectively.
  • (https?:\/\/)? matches the optional “http://” or “https://” protocol.
  • [a-z0-9]+\.[a-z]+ matches the domain name.
  • (\/[a-z0-9\-._~:\/\?#\[\]@!$&'()*+,;=]*)? matches the optional path and query string.
  • /i makes the regular expression case-insensitive.

Regular Expression Methods

Regular expressions are often used in programming to search for and manipulate text. Here are some common methods for working with regular expressions in various programming languages:

test():

This method is used to test a string against a regular expression and returns a boolean value indicating whether or not the string matches the regular expression.

let myRegex = /hello/;
let myString = "Hello, world!";
let result = myRegex.test(myString);
console.log(result); // outputs false

match():

This method is used to search a string for a regular expression and returns an array of all matches found.

let myRegex = /hello/;
let myString = "Hello, world!";
let result = myString.match(myRegex);
console.log(result); // outputs null

search():

This method is used to search a string for a regular expression and returns the index of the first match found, or -1 if no match is found.

let myRegex = /hello/;
let myString = "Hello, world!";
let result = myString.search(myRegex);
console.log(result); // outputs -1

replace():

This method is used to replace text in a string that matches a regular expression with a specified replacement string.

let myRegex = /hello/;
let myString = "Hello, world!";
let newString = myString.replace(myRegex, "hi");
console.log(newString); // outputs "Hi, world!"

Conclusion

Regular expressions are a powerful tool for pattern matching and text manipulation. In this guide, we covered the basics of regular expression syntax, character classes, anchors, groups, and lookarounds. We also mentioned some advanced topics that you can explore further.

With this comprehensive guide, you should have a good understanding of regular expressions and be able to use them effectively in your text processing tasks.

Press ESC to close