Regex for SEO

Finally, apply advanced segments to your search console data with regex!

Posted by Craig 👋 on October 16 2019

You need to track the performance of 3 out of the 4 sections of your client's website.

You want to pull all keywords data for an obscure pattern match.

You want to understand how many people are using certain brand terms in their queries.

You want to segment your top performers.

Well, now you can.

We've baked Regular Expression into out 'Index' reports, allowing you to match regex against both URLs or Queries from Search Console Data.

This means you can do advanced segmentation on your data to gain a deeper understanding of your analysis and make you a hero in your reporting.

Try it out now >>

An SEO's Guide to Regular Expression

The following is some regex knowledge which has served me well throughout my career. 

While it won't cover every possible situation, it will enable you to add regex to your analysis toolbox and set you down your own exploratory path. 

What is regex?

Regular Expression aka regex is simply a string of text which matches a pattern. If you've spent any time in Excel, you'll have come across the wildcard character (*) which matches any character - regex is similar to this.

Regex is well worth knowing, you can use it in Google Analytics, Data Studio and even Screaming Frog.

Let's look at some examples:

Here we've written the word 'awesome' as our regular expression to match against the string "regex is awesome". We can see 'awesome' has been highlighted aka matched.

While primitive, I hope that gets the point across, we are just matching patterns. However, let's look at some special characters now.

In this example, we are trying to match strings which contain the words 'error code' followed by an error code beginning with a letter and ending with exactly 2 numbers.

As we can see the 2 strings are matched. Yay.

Let's break it down.

The words 'error code' were matched as expected, this is just straight up matching, as we've already seen. So far so good.

[a-z] and [0-9] are what you use to match any lower-case letter or number (or whatever range you want). The {2} characters are what's known as a Quantifier, which is a fancy way of saying "repeat the last item this many times more" - in our case, it is repeating [0-9].

If the Quantifier scares you, it's the same as doing this:

OK, let's look at something a little more exciting. Let's say we want to match all error codes beginning with an 'a' and 2 following characters (numbers or letters - not just 2 numbers like our last example).

Let's break this down:

"." (period/dot/full stop) is our wildcard in regex, it matches any character. 

"*" (asterisk) matches the last character infinite times, so these 2 together will match anything until the next lot of patterns are matched… in this case the "[a]"

With the [a] we are specifying it must start with a letter 'a' next we have [a-z0-9]{2} we've already covered these, the difference is we've put [a-z] and [0-9] in the same box, meaning it can be either of those things!

OK, let's change things up a bit…

Here we have 2 URLs: /regex-guide and /regex-training

If we wanted to match these we can use a '|' character… this is known as OR; meaning match this-string OR this-other-string

But what it you have something like the following and don't want to match all the different URLs?

You can add a start and/or end to your patterns using the following:

^ signals the start of a string

$ signals the end of a string

Tada!

Hopefully, that puts you in good stead for getting started with regex. My advice is to spend 30 minutes with a tool like https://regex101.com/ and your own data; practice what I've shown you and then have a play (search for a regex cheatsheet).

To follow my examples or if you want to unlock your regex analysis skills with Big Metrics, be sure to select Python as your regex flavour on regex101.com

#ProTip: Even if you're a regex pro, it's best to always use a regex testing tool to double-check your queries.