Spark Regex Cheatsheet

About the Cheatsheet

Some parts of Spark (namely Page Groups and Variables) rely on regex to function, and whilst Regex can be incredibly powerful, we know that it can be painful to work with at times.

To help you out, we've pulled together a "Regex Cheatsheet" for some common tasks.

Regex for Pagegroups

A few things to note;

  • Only the URL path is used - don't include the domain in your regex.
  • Our matching is currently set to be case insensitive.
  • URL parameters are currently not included in the match.
Task Pattern
Match every page .*
Match only the homepage ^/$
Match only one page ^/about\/?$
Match all pages in a subdirectory ^/directory/.*\/?$
Match everything except a directory ^((?!directory).)*(\/)$

Regex for Variables

Once you've provided Spark with the "Parent Selector" for a variable, the full HTML for the first match of that element will be passed into your Regex pattern.

For instance, you might target p#location, which could return HTML similar to the below;

<p id="location">Businesses in Suffolk</p>

The extraction regex is then used to get the particular value you need out of this element. Some examples to extract values from this are below.

Task Pattern Example
All element text <.*>(.*)<\/.*> Businesses in Suffolk
A particular word <.*>Businesses in (.*)<\/.*> Suffolk