regex_extract function
The regex_extract function returns a list of strings that match a regular expression from the source string.
Syntax
regex_extract(
str lvarchar,
re lvarchar,
limit integer DEFAULT 0,
copts integer DEFAULT 1)
returns lvarchar
regex_extract(
str clob,
re lvarchar,
limit integer DEFAULT 0,
copts_string lvarchar)
returns lvarchar
Parameters
- str
- The string to search. Can be of type CHAR, NCHAR, VARCHAR, NVARCHAR, LVARCHAR, or CLOB. A null value is treated as an empty string.
- re
- The regular expression. Can be of type CHAR, NCHAR, VARCHAR, NVARCHAR, or LVARCHAR. Cannot be null.
- limit (Optional)
- 0 = Default. All matches are returned.
- copts (Optional)
- The type of regex search:
- 0 = Basic regex
- 1 = Default. Extended POSIX regex
- 2 = Basic regex and ignore case
- 3 = Extended POSIX regex and ignore case
- copts_string (Optional)
- The type of regex search:
- basic = Basic regex
- extended = Default. Extended POSIX regex
- basic,icase = Basic regex and ignore case
- extended,icase = Extended POSIX regex and ignore case
- basic,rtrim = Basic regex with rtrim
- extended,rtrim = Extended POSIX regex with rtrim
- basic,icase,rtrim = Basic regex and ignore case with rtrim
- extended,icase,rtrim = Extended POSIX regex and ignore case with rtrim
Description
Use the regex_extract function iteratively return each substring that matches a regular expression. You can limit the number of substrings returned.
Returns
A set of text values that match the input regular expression pattern.
No rows found = A 0-length match, for example, a newline character.
An exception = An error occurred.
Example: Find a pattern within words
In this example, you want to find the patterns "would" and
"wood":
How much wood could a
woodchuck chuck if a woodchuck could chuck wood?
A woodchuck could chuck as much wood as a woodchuck would chuck
if a woodchuck could chuck wood.
The following regular expression matches both the word "wood" and the word
"would":
wo[ou]l?d
When you use this regular expression, the regex_extract function shows that
the two words occur ten times in the string, but does not provide the entire word in which the
pattern is
found:
execute function
regex_extract(
'How much wood could a woodchuck chuck if a woodchuck could chuck wood?
A woodchuck could chuck as much wood as a woodchuck would chuck
if a woodchuck could chuck wood.',
'wo[ou]l?d'
);
(expression) wood
(expression) wood
(expression) wood
(expression) wood
(expression) wood
(expression) wood
(expression) wood
(expression) would
(expression) wood
(expression) wood
10 row(s) retrieved.
Example: Return the pattern plus the rest of the word
You can expand the regular expression to include the entire word. Start by specifying that more
lowercase characters can follow the primary subexpression:
wo[ou]l?d[a-z]*
Next, add that the word ends with a space or a punctuation character. For completeness, you can
specify more punctuation characters than the ones that occur in the text.
Note: If a hyphen appears
as the first character in a character class, it means a literal hyphen, not a range of
values.
wo[ou]l?d[a-z]*[- .?!:;]
When you run the regex_extract function with this expression, the function
returns the whole words in which the pattern
occurs:
execute function
regex_extract(
'How much wood could a woodchuck chuck if a woodchuck could chuck wood?
A woodchuck could chuck as much wood as a woodchuck would chuck
if a woodchuck could chuck wood.',
'wo[ou]l?d[a-z]*[- .?!:;]'
);
(expression) wood
(expression) woodchuck
(expression) woodchuck
(expression) wood?
(expression) woodchuck
(expression) wood
(expression) woodchuck
(expression) would
(expression) woodchuck
(expression) wood.
10 row(s) retrieved.
In the following example, you limit the results to the first
two:
execute function
regex_extract(
'How much wood could a woodchuck chuck if a woodchuck could chuck wood?
A woodchuck could chuck as much wood as a woodchuck would chuck
if a woodchuck could chuck wood.',
'wo[ou]l?d[a-z]*[- .?!:;]',
2
);
(expression) wood
(expression) woodchuck
2 row(s) retrieved.
Example: Return the pattern plus the beginning of the word
You can include the beginning of the word in the regular expression. The beginning of a word can
be a space or the beginning of the line, ( |^), followed by upper or lowercase
letters:
( |^)[A-Za-z]*
In this example, you want to find all instances of the string "tter" in the following
text:
Betty bought some bitter
butter and it made her batter bitter,
so Betty bought some better
butter to make her bitter batter better.
The following statement returns the pattern plus the beginnings of the
words:
execute function
regex_extract(
'Betty bought some bitter butter and it made her batter bitter,
so Betty bought some better butter to make her bitter batter better.',
'( |^)[A-Za-z]*tter'
);
(expression) bitter
(expression) butter
(expression) batter
(expression) bitter
(expression) better
(expression) butter
(expression) bitter
(expression) batter
(expression) better
9 row(s) retrieved.
Example: A match with 0 length
If the regular expression results in a 0-length match, the query returns the message "No rows
found." For example, although the regex_match function returns
t for a match on a begin-line character ("^"), the
regex_extract function returns no rows because a search of "^" matches a string
that has a length of
0:
execute function regex_match('Hello world', '^');
(expression) t
1 row(s) retrieved.
execute function regex_extract('Hello world', '^');
No rows found.