Title: Your Oracle RX Regular Expressions in an Oracle World
1Your Oracle RX - Regular Expressions in an
Oracle World
2Definition
- regular expressions provide a concise and
flexible means for identifying strings of text of
interest, such as particular characters, words,
or patterns of characters
3Regular Expressions
- Used to search strings to find matching patterns
- Match patterns can be fairly simple to extremely
difficult - Its much easier to understand your own
expressions! - Search for match starts at beginning of string
and stops when first match is found
4Found In
- Text Editors
- Unix Utilities
- ed (text editor)
- Grep
- Programming Languages
- Perl
- Tcl
- And since Oracle 10g
- SQL
- Application Express
5Patterns
- Generally characters and patterns represent
themselves - Special Characters
6Sample Patterns
- Pattern
- 800
- matches 800
- ORA
- matches ORA
- ..-.
- Matches
- 44313-2323
- A4313-d3r3
- 444-313-23234
7Simple Repetition
8Quiz
- Given a 9 digit zip code will the pattern match?
- '..-. '
- '.?-.'
- '.-.'
Yes
Yes
Yes
9Quantifies
- count defines an exact repetition count of the
prior object - A5 matches AAAAA and AAAAAAAA
- Zip code could be defined as .5-.4
10More on Quantifiers
- m,n defines an exact repetition count of the
prior object - m minimum number of matches
- n maximum number of matches
- A1,5 matches A and AAA and AAAAAA
- m, defines a repetition count of m or more
11Sample Patterns
- Does the pattern 216-.3,-.4,4 match?
- 216-588-5023
- 216-5888888-5023
- 011-216-588-5023
- 011-216--58-5-23
- 216-5888-888-5023
Yes
Yes
Yes skip the 011 and you get the match
Yes . Matches anything
Yes 3,) eats all charters to the -5023
12Anchors
- Anchors
- Start of line
- End of line
- Does ..-. match?
- 44313-1234 441313-1234 4431--abcd
- 44313-1234c a44313-1234
No
Yes
Yes
No
No
13Alternate and Grouping
- char character list
- alternation (boolean or operator)
- ( ) group subexpression
-
14Character Expression
- abc defines a list of characters that can be
matched to a single a or b or c - Our zip code match can be expressed as
- 01234567895-01234567894
4 of the prior pattern
5 of the prior pattern
A dash
Any number
Any number
15More on alternation
- 91 9 or 1
- What is this string matching?
- (81)(91)12345678902-12345678903-
12345678904
Group 1
Group 2
Order makes no difference
Or the groups
900-555-5125 Or 823-123-4567
16Ranges
- a-z matches any letter from a to z
- 0-9 matches any digit from 0 to 9
-
- a-zA-Z matches any letter from a to Z
- Does a0-3a-zA-Z match
- 1A 45a ?34 9
Order is important must start with lower and go
to higher
Yes
Yes
No
Yes
17Predefined Character Classes(restricted to
character lists)
18New Zip Code
- Does digit5-digit4 match
- 44313-1234 441313-1234 4431--abcd
- 44313-1234c a44313-1234
Yes
Yes
No
Yes
Yes
19Groups
- What is?
- (digit5)(-digit4)?
- Note
- digit
- This is a special case of as it is in a
character list
When is not at the beginning of a string it
negates
20Greediness
- Regular expression operators are greedy, they
match the maximum set - a-z will match the entire string
- abcedef
- a-z? add ? after the quantity, and you get
the lazy or minimum match - Will match just the letter a in 123abce
21Eagerness
Greedy and Eager are not the same
- Oracles regular expression parser is eager. What
that means is that the parser will stop looking
for a match once one is found in an alternation
group. - e.g. given pattern regexregex not
- And text regex not
- Eager will return regex
22What are these?
- digit3-digit2-digit4,4
- Matches SSN 302-77-1234
- -?(0-9.?0-9.0-9)(eE-?0-9)?
- Matches fp number 23.344E12
SELECT REGEXP_REPLACE( '23.2323e12','-?(
0-9.?0-9.0-9)(eE-?0-9)?','match')
FROM dual
23Amtrust Example
Matched 10-MAR-2008
- Whats the functional difference between
- ORASP2EXPIMPKUPDBVLCDQSMRMANLRMLFIPLS
AMDTNSNNCNNONNLNPLNNFNMPNCRNZEMODO2FO2
IO2UPCBPCFPCCSQLEPC-0-90-90-90-9Sq
l\Loader-0-90-9SQL\Loader-0-90-9 - and
- (ORASP2EXPIMPKUPDBVLCDQSMRMANLRMLFIPLS
AMDTNSNNCNNONNLNPLNNFNMPNCRNZEMODO2FO2
IO2UPCBPCFPCCSQLEPC)-0-90-90-90-9Sq
l\Loader-0-90-9SQL\Loader-0-90-9
Escape
Does not match 10-MAR-2008
24Amtrust Example
- Whats the functional difference between
- ORASP2EXPIMPKUPDBVLCDQSMRMANLRMLFIPLS
AMDTNSNNCNNONNLNPLNNFNMPNCRNZEMODO2FO2
IO2UPCBPCFPCCSQLEPC-0-90-90-90-9Sq
l\Loader-0-90-9SQL\Loader-0-90-9 - and
- (ORASP2EXPIMPKUPDBVLCDQSMRMANLRMLFIPLS
AMDTNSNNCNNONNLNPLNNFNMPNCRNZEMODO2FO2
IO2UPCBPCFPCCSQLEPC)-0-90-90-90-9Sq
l\Loader-0-90-9SQL\Loader-0-90-9
Group instead of Character List
25Groups
- Specified with ( )
- Oracle supports up to 9
- \1 is defined by the first open (
- \2 is defined by the second open (
- \3 is defined by the third open (
- And so on through to \9
- Once defined they can be used later in a pattern
back reference
26Back References
- Round bracket create groups which can be
referenced later in the same pattern or in a
replacement pattern. - HTML tag
- lt(A-ZA-Z0-9)\b(gt)gt.lt/\1gt
Must start with a capital letter possibly
followed by additional letters
Opening tag again, must match exactly the text in
the first parens
Optional group can occur 0 or more times these
are attributes
A blank followed by zero or more letters not gt
27Back References
- Given pattern ((0-9)3-(0-9)3-(0-94))
- And string My number, 216-588-5023, is working
- What are
- \1
- \2
- \3
- \4
216-588-5023
6
8
5023
28Oracle Reg Expression Functions
- 10g provides these regular expression analogs for
existing string functions - REGEXP_LIKE
- REGEXP_INSTR
- REGEXP_SUBSTR
- REGEXP_REPLACE
- Source text can be CHAR, VARCHAR2, NCHAR,
NVARCHAR2, CLOB or NCLOB
29Oracle 10g Regular Expressions
- REGEXP_LIKE returns TRUE if a match found,
otherwise FALSE - REGEXP_INSTR returns character position of first
match, otherwise 0 - REGEXP_SUBSTR returns first matched string, null
if no matches found - REGEXP_REPLACE replaces all matched strings,
returns original string if no matches found
30Function Match Parameters
31REGEXP_LIKE
- REGEXP_LIKE(
- source,
- pattern, -- Match pattern
- match_parameters -- Function parameters
- ) RETURN BOOLEAN
-
32REGEXP_LIKE
- u31125_at_DEDWgt SELECT 'Yes this is a match'
- 2 FROM dual
- 3 WHERE regexp_like('44313-2345',
- 4 'digit5-digit
4' - 5 )
- 'YESTHISISAMATCH'
- -------------------
- Yes this is a match
33REGEXP_INSTR
- REGEXP_INSTR(
- source,
- pattern,
- start_position, --start searching from here
- occurrence, --which occurrence should be ret
- return_position, -- 0 start of occurrence
- -- 1 end of
occurrence - match_parameters
- ) RETURN NUMBER
34REGEXP_INSTR
- U31125_at_DEDWgtSELECT
- 2 REGEXP_INSTR(
- 3 'The quick red fox jumped over the
lazy brown dog.', - 4 'quick',
- 5 1, --start searching from here
- 6 1, --which occurrence should be ret
- 7 0, -- 0 start of occurrence
- 8 -- 1 end of occurrence
- 9 NULL -- match_parameters
- 10 ) match_pos
- 11 FROM dual
- MATCH_POS
- ----------
- 5
- 1 row selected.
35REGEXP_INSTR
- U31125_at_DEDWgtSELECT
- 2 REGEXP_INSTR(
- 3 'The quick red fox jumped over the
lazy brown dog.', - 4 'quick',
- 5 1, --start searching from here
- 6 1, --which occurrence should be ret
- 7 1, -- 0 start of occurrence
- 8 -- 1 end of occurrence
- 9 NULL -- match_parameters
- 10 ) match_pos
- 11 FROM dual
- MATCH_POS
- ----------
- 10
- 1 row selected.
36REGEXP_REPLACE
- REGEXP_REPLACE(
- source,
- pattern,
- rep_string, -- replaces matched text
- position, -- search start position
- occurrence, -- match occurrence
- -- 0 replaces all matches
- match_parameters
- ) RETURN VARCHAR2
37REGEXP_REPLACE
- Compress two or more spaces
- SELECT
- REGEXP_REPLACE(
- '500 Oracle Parkway, Redwood
Shores, CA', - '( )2,',
- ' '
- ) "REGEXP_REPLACE"
- FROM DUAL
- REGEXP_REPLACE
- ------------------------------------------
- 500 Oracle Parkway, Redwood Shores, CA
38REGEXP_REPLACE
- Provide better error message
- SELECT REGEXP_REPLACE (
- Error on employee lts1gt whose name is
lts2gt', - '(.)lts1gt(.)lts2gt',
- '\1A31124\2RGravenstein'
- ) AS "REGEXP_REPLACE"
- FROM DUAL
- REGEXP_REPLACE
- --------------------------------------------------
- Error on employee A31124 whose name is
RGravenstein - 1 row selected.
39REGEXP_SUBSTR
- REGEXP_SUBSTR(
- source,
- pattern,
- position, -- search start position
- occurrence, -- which occurrence should
- -- be found and sub-stringed
- match_parameter
- ) RETURN VARCHAR2
40Parsing Example
- set serveroutput on
- DECLARE
- x VARCHAR2(20)
- y VARCHAR2(20)
- c VARCHAR2(40) '13,46,810,34,76,1112'
- BEGIN
- x REGEXP_SUBSTR(c,'', 1, 1)
- y REGEXP_SUBSTR(c,',', 1, 2)
- dbms_output.put_line('lt'x'-'y'gt')
- END
- /
- lt1-46gt
- PL/SQL procedure successfully completed.
First non-colon found
Starting after the first non-comma through to the
second comma
41Where to use Regular Expression
- Validation of
- e-mail addresses
- Credit card numbers
- SSN
-
- Complicated string parsing
- Where the standard Oracle functions LIKE, INSTR,
REPLACE and SUBSTR cant do the job without a lot
of work
Lots of examples can be found on the internet
42Performance Considerations
- Regular expressions are more compute intensive
than non-regular expression equivalents. - Most database processes are IO bound and
therefore some additional cpu load is normally
not an issue
43References
The best reference
- http//www.regular-expressions.info/reference.html
- http//www.oracle.com/technology/oramag/webcolumns
/2003/techarticles/rischert_regexp_pt1.html - http//www.psoug.org/reference/regexp.html
- http//www.dba-oracle.com/t_regular_expressions.ht
m - http//rootshell.be/yong321/computer/OracleRegExp
.html - http//www.databasejournal.com/features/oracle/art
icle.php/3501826 - John Garmany Being Regular with Regular
Expresssions Collaborate08