Skip to content

Regular Expressions

Regular Expressions (or RegEx) provide a useful approach within Discover to extract meaningful data from a captured data input. RegEx is used across the solution and specifically in the following capabilities:

  • Hit Attributes: Captured patterns from the Request or Response documents
  • Step Attributes: Specifically targeting the JSON messages sent by the UIC (button clicks, form fills, etc..)
  • Advanced Mode Events: Pulling specific values (e.g. Order ID) out of a complex JSON object or HTML response
  • Privacy Rules: Masking PII / SPII before it is stored
  • Replay Rules

Using RegEx

Proficiency using RegEx comes with practice, tools such as regexr or regex101 (set to JavaScript or PCRE) flavour, can greatly support your experience and testing. YouTube also has a wealth of videos that can help.

RegEx is about identifying patterns, here's how it's structured components act on patterns Discover may be provided with.

Component Purpose Examples
Literals Exact characters to match /abc/ matches the string abc
Shorthands Presets for common character types \d for digits, \w for word chars, \s for spaces
Quantifiers How many times a character should repeat + 1 or more, * any, ? 0 or 1 only, {n} exactly n times
Anchors Match positions in the text, not characters ^ denotes the start, $ the end, \b a word boundary
Groups Treat part of the pattern as a single unit (abc) captures group abc; (?:...) is non‑capturing
Flags Modify how the regex engine searches i ignore case, g global, m multiline

Let's look at some of these components.

Flags

These can also be called Modifiers. The following are a list and examples of flags used with RegEx.

Flag Name Description Example
i Ignore Case Makes the match case-insensitive /error/i matches error, Error, and ERROR
g Global Finds all matches rather than stopping early 123-456".match(/\d+/g returns 123, and 456
m Multiline ^ and $ match start / end of lines Useful for multi-line logs or HTML blocks
s Dot All Allows . to match newline characters /<div>.*<\/div>/s matches across lines
u Unicode Supports non-ASCII / International characters /\u{61}/u matches a
y Sticky Matches only from the exact lastIndex Used in high-performance custom JS parsers

i, g, or no flags are the most used within Discover

The i flag is the most frequently used flag in Discover events. Since URLs and form field names can sometimes vary in casing (e.g., checkout vs Checkout), always use the ignore case option unless you have a specific reason to be case-sensitive.

The g flag is useful if you are trying to count how many times a specific error appears on a single page, the global flag is necessary to capture every instance rather than just the first one. Here are some of the other available RegEx Flags.

Shorthand

The following are a list and examples of flags used with RegEx.

Character Matches Example
\d Any digit 0-9 \d matches 22
\D Any non-digit \D matches AB
\w Any word character letter, digit, or _ \w+ matches Order_123
\s Any whitespace space, tab, newline \s matches the space in " "
. Wildcard: matches any single character h.t matches hat, hot, and h9t

Quantifiers

These specify how many instances of a character, group, or character class must be present in the input for a match to be found. Quantifiers are "greedy" by default, meaning they will match as much text as possible. You can make them "lazy" by adding a ? after them, which tells the engine to match the shortest possible string.

With Discover distinguishing between greedy and lazy quantifiers is key to preventing over-matching when parsing HTML or JSON.

Quantifier Description Example
* 0 or more repetitions a* matches (none), a, aaaa
+ 1 or more repetitions a+ matches a, aaaa
? optional single character colou?r matches color or colour
{n} exact repetition count \d{5} matches a 5‑digit product code, 12345 or 55225
{n,m} bounded repetition range \d{2,4} matches 12, 123, or 1234

Examples

Often, a transaction ID is contained within a JSON response. A simple "Start and End Tag" might fail if the JSON structure changes slightly.

The Data: {"status": "success", "order_ref": "ABC-12345-XYZ", "timestamp": 17050123} The RegEx: "order_ref"\s*:\s*"([^"]+)"

This expression looks for the key order_ref, accounts for optional whitespace \s*, and captures everything inside the quotes ([^"]+). Within and advanced event, you can access this using RegExp.$1.

Masking Credit Cards (Privacy). To ensure PCI compliance, you must mask card numbers in the browser before they are transmitted. The RegEx: \b(?:\d[ -]*?){13,16}\b

The expression looks for 13 to 16 digits that might be separated by spaces or hyphens. In Privacy configurations, you would map this RegEx to a maskType (e.g., replacing digits with X).

Capturing Error Messages. If the site you are working with returns error messages in a consistent UI component:

The Data: <div class="error-msg">Invalid Promo Code: SPRING20</div> The RegEx: <div class="error-msg">([^<]+)</div>

The expression locates the opening div and captures everything until it hits the next < character.

Best Practices

Test & Event Tester

It is important to always deploy any significant expressions or updates to expression in a testing environment, pre-production for thorough testing prior to moving into production. Discover provides the Event Tester execute expression against real session data in regard to Hit Attributes and/or Events.

  1. Explore your expression in tools/services such as regexr or regex101
  2. Use Event Tester to ensure you are getting the right results
  3. Deploy into a pre-production environment for testing

.* pattern

The pattern .* tells the engine within Discover to match everything. In a large HTML response, this can cause "backtracking," where the engine tries every possible combination, slowing down any processing. Be specific. Use [^"]+ (everything except a quote) or \d+ (only digits) instead of .+.

Anchor

Anchor expressions where possible, use anchors to tell the engine where to start looking.

  • ^ (Start of string)
  • $ (End of string)
  • Example: If searching for a URL path, use ^/checkout/confirmation rather than just checkout/confirmation.

Non-Capturing Groups

Use Non-Capturing Groups. If you need to group parts of a RegEx but don't need to extract the value, use (?:...).

  • Example: (?:https|http):// is faster than (https|http):// because the system doesn't have to "save" the result for later use.