Your Oracle RX Regular Expressions in an Oracle World - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Your Oracle RX Regular Expressions in an Oracle World

Description:

regular expressions provide a concise and flexible means for ... printable characters [:print:] any numeric [:digit:] control characters (non-printing) ... – PowerPoint PPT presentation

Number of Views:355
Avg rating:3.0/5.0
Slides: 44
Provided by: rumpigra
Category:

less

Transcript and Presenter's Notes

Title: Your Oracle RX Regular Expressions in an Oracle World


1
Your Oracle RX - Regular Expressions in an
Oracle World
  • Gravenstein, Rebholtz

2
Definition
  • regular expressions provide a concise and
    flexible means for identifying strings of text of
    interest, such as particular characters, words,
    or patterns of characters

3
Regular Expressions
  • Used to search strings to find matching patterns
  • Match patterns can be fairly simple to extremely
    difficult
  • Its much easier to understand your own
    expressions!
  • Search for match starts at beginning of string
    and stops when first match is found

4
Found In
  • Text Editors
  • Unix Utilities
  • ed (text editor)
  • Grep
  • Programming Languages
  • Perl
  • Tcl
  • And since Oracle 10g
  • SQL
  • Application Express

5
Patterns
  • Generally characters and patterns represent
    themselves
  • Special Characters

6
Sample Patterns
  • Pattern
  • 800
  • matches 800
  • ORA
  • matches ORA
  • ..-.
  • Matches
  • 44313-2323
  • A4313-d3r3
  • 444-313-23234

7
Simple Repetition
  • Quantifiers

8
Quiz
  • Given a 9 digit zip code will the pattern match?
  • '..-. '
  • '.?-.'
  • '.-.'

Yes
Yes
Yes
9
Quantifies
  • count defines an exact repetition count of the
    prior object
  • A5 matches AAAAA and AAAAAAAA
  • Zip code could be defined as .5-.4

10
More on Quantifiers
  • m,n defines an exact repetition count of the
    prior object
  • m minimum number of matches
  • n maximum number of matches
  • A1,5 matches A and AAA and AAAAAA
  • m, defines a repetition count of m or more

11
Sample Patterns
  • Does the pattern 216-.3,-.4,4 match?
  • 216-588-5023
  • 216-5888888-5023
  • 011-216-588-5023
  • 011-216--58-5-23
  • 216-5888-888-5023

Yes
Yes
Yes skip the 011 and you get the match
Yes . Matches anything
Yes 3,) eats all charters to the -5023
12
Anchors
  • Anchors
  • Start of line
  • End of line
  • Does ..-. match?
  • 44313-1234 441313-1234 4431--abcd
  • 44313-1234c a44313-1234

No
Yes
Yes
No
No
13
Alternate and Grouping
  • char character list
  • alternation (boolean or operator)
  • ( ) group subexpression

14
Character Expression
  • abc defines a list of characters that can be
    matched to a single a or b or c
  • Our zip code match can be expressed as
  • 01234567895-01234567894

4 of the prior pattern
5 of the prior pattern
A dash
Any number
Any number
15
More on alternation
  • 91 9 or 1
  • What is this string matching?
  • (81)(91)12345678902-12345678903-
    12345678904

Group 1
Group 2
Order makes no difference
Or the groups
900-555-5125 Or 823-123-4567
16
Ranges
  • a-z matches any letter from a to z
  • 0-9 matches any digit from 0 to 9
  • a-zA-Z matches any letter from a to Z
  • Does a0-3a-zA-Z match
  • 1A 45a ?34 9

Order is important must start with lower and go
to higher
Yes
Yes
No
Yes
17
Predefined Character Classes(restricted to
character lists)
18
New Zip Code
  • Does digit5-digit4 match
  • 44313-1234 441313-1234 4431--abcd
  • 44313-1234c a44313-1234

Yes
Yes
No
Yes
Yes
19
Groups
  • What is?
  • (digit5)(-digit4)?
  • Note
  • digit
  • This is a special case of as it is in a
    character list

When is not at the beginning of a string it
negates
20
Greediness
  • Regular expression operators are greedy, they
    match the maximum set
  • a-z will match the entire string
  • abcedef
  • a-z? add ? after the quantity, and you get
    the lazy or minimum match
  • Will match just the letter a in 123abce

21
Eagerness
Greedy and Eager are not the same
  • Oracles regular expression parser is eager. What
    that means is that the parser will stop looking
    for a match once one is found in an alternation
    group.
  • e.g. given pattern regexregex not
  • And text regex not
  • Eager will return regex

22
What are these?
  • digit3-digit2-digit4,4
  • Matches SSN 302-77-1234
  • -?(0-9.?0-9.0-9)(eE-?0-9)?
  • Matches fp number 23.344E12

SELECT REGEXP_REPLACE( '23.2323e12','-?(
0-9.?0-9.0-9)(eE-?0-9)?','match')
FROM dual
23
Amtrust Example
Matched 10-MAR-2008
  • Whats the functional difference between
  • ORASP2EXPIMPKUPDBVLCDQSMRMANLRMLFIPLS
    AMDTNSNNCNNONNLNPLNNFNMPNCRNZEMODO2FO2
    IO2UPCBPCFPCCSQLEPC-0-90-90-90-9Sq
    l\Loader-0-90-9SQL\Loader-0-90-9
  • and
  • (ORASP2EXPIMPKUPDBVLCDQSMRMANLRMLFIPLS
    AMDTNSNNCNNONNLNPLNNFNMPNCRNZEMODO2FO2
    IO2UPCBPCFPCCSQLEPC)-0-90-90-90-9Sq
    l\Loader-0-90-9SQL\Loader-0-90-9

Escape
Does not match 10-MAR-2008
24
Amtrust Example
  • Whats the functional difference between
  • ORASP2EXPIMPKUPDBVLCDQSMRMANLRMLFIPLS
    AMDTNSNNCNNONNLNPLNNFNMPNCRNZEMODO2FO2
    IO2UPCBPCFPCCSQLEPC-0-90-90-90-9Sq
    l\Loader-0-90-9SQL\Loader-0-90-9
  • and
  • (ORASP2EXPIMPKUPDBVLCDQSMRMANLRMLFIPLS
    AMDTNSNNCNNONNLNPLNNFNMPNCRNZEMODO2FO2
    IO2UPCBPCFPCCSQLEPC)-0-90-90-90-9Sq
    l\Loader-0-90-9SQL\Loader-0-90-9

Group instead of Character List
25
Groups
  • Specified with ( )
  • Oracle supports up to 9
  • \1 is defined by the first open (
  • \2 is defined by the second open (
  • \3 is defined by the third open (
  • And so on through to \9
  • Once defined they can be used later in a pattern
    back reference

26
Back References
  • Round bracket create groups which can be
    referenced later in the same pattern or in a
    replacement pattern.
  • HTML tag
  • lt(A-ZA-Z0-9)\b(gt)gt.lt/\1gt

Must start with a capital letter possibly
followed by additional letters
Opening tag again, must match exactly the text in
the first parens
Optional group can occur 0 or more times these
are attributes
A blank followed by zero or more letters not gt
27
Back References
  • Given pattern ((0-9)3-(0-9)3-(0-94))
  • And string My number, 216-588-5023, is working
  • What are
  • \1
  • \2
  • \3
  • \4

216-588-5023
6
8
5023
28
Oracle Reg Expression Functions
  • 10g provides these regular expression analogs for
    existing string functions
  • REGEXP_LIKE
  • REGEXP_INSTR
  • REGEXP_SUBSTR
  • REGEXP_REPLACE
  • Source text can be CHAR, VARCHAR2, NCHAR,
    NVARCHAR2, CLOB or NCLOB

29
Oracle 10g Regular Expressions
  • REGEXP_LIKE returns TRUE if a match found,
    otherwise FALSE
  • REGEXP_INSTR returns character position of first
    match, otherwise 0
  • REGEXP_SUBSTR returns first matched string, null
    if no matches found
  • REGEXP_REPLACE replaces all matched strings,
    returns original string if no matches found

30
Function Match Parameters
31
REGEXP_LIKE
  • REGEXP_LIKE(
  • source,
  • pattern, -- Match pattern
  • match_parameters -- Function parameters
  • ) RETURN BOOLEAN

32
REGEXP_LIKE
  • u31125_at_DEDWgt SELECT 'Yes this is a match'
  • 2 FROM dual
  • 3 WHERE regexp_like('44313-2345',
  • 4 'digit5-digit
    4'
  • 5 )
  • 'YESTHISISAMATCH'
  • -------------------
  • Yes this is a match

33
REGEXP_INSTR
  • REGEXP_INSTR(
  • source,
  • pattern,
  • start_position, --start searching from here
  • occurrence, --which occurrence should be ret
  • return_position, -- 0 start of occurrence
  • -- 1 end of
    occurrence
  • match_parameters
  • ) RETURN NUMBER

34
REGEXP_INSTR
  • U31125_at_DEDWgtSELECT
  • 2 REGEXP_INSTR(
  • 3 'The quick red fox jumped over the
    lazy brown dog.',
  • 4 'quick',
  • 5 1, --start searching from here
  • 6 1, --which occurrence should be ret
  • 7 0, -- 0 start of occurrence
  • 8 -- 1 end of occurrence
  • 9 NULL -- match_parameters
  • 10 ) match_pos
  • 11 FROM dual
  • MATCH_POS
  • ----------
  • 5
  • 1 row selected.

35
REGEXP_INSTR
  • U31125_at_DEDWgtSELECT
  • 2 REGEXP_INSTR(
  • 3 'The quick red fox jumped over the
    lazy brown dog.',
  • 4 'quick',
  • 5 1, --start searching from here
  • 6 1, --which occurrence should be ret
  • 7 1, -- 0 start of occurrence
  • 8 -- 1 end of occurrence
  • 9 NULL -- match_parameters
  • 10 ) match_pos
  • 11 FROM dual
  • MATCH_POS
  • ----------
  • 10
  • 1 row selected.

36
REGEXP_REPLACE
  • REGEXP_REPLACE(
  • source,
  • pattern,
  • rep_string, -- replaces matched text
  • position, -- search start position
  • occurrence, -- match occurrence
  • -- 0 replaces all matches
  • match_parameters
  • ) RETURN VARCHAR2

37
REGEXP_REPLACE
  • Compress two or more spaces
  • SELECT
  • REGEXP_REPLACE(
  • '500 Oracle Parkway, Redwood
    Shores, CA',
  • '( )2,',
  • ' '
  • ) "REGEXP_REPLACE"
  • FROM DUAL
  • REGEXP_REPLACE
  • ------------------------------------------
  • 500 Oracle Parkway, Redwood Shores, CA

38
REGEXP_REPLACE
  • Provide better error message
  • SELECT REGEXP_REPLACE (
  • Error on employee lts1gt whose name is
    lts2gt',
  • '(.)lts1gt(.)lts2gt',
  • '\1A31124\2RGravenstein'
  • ) AS "REGEXP_REPLACE"
  • FROM DUAL
  • REGEXP_REPLACE
  • --------------------------------------------------
  • Error on employee A31124 whose name is
    RGravenstein
  • 1 row selected.

39
REGEXP_SUBSTR
  • REGEXP_SUBSTR(
  • source,
  • pattern,
  • position, -- search start position
  • occurrence, -- which occurrence should
  • -- be found and sub-stringed
  • match_parameter
  • ) RETURN VARCHAR2

40
Parsing Example
  • set serveroutput on
  • DECLARE
  • x VARCHAR2(20)
  • y VARCHAR2(20)
  • c VARCHAR2(40) '13,46,810,34,76,1112'
  • BEGIN
  • x REGEXP_SUBSTR(c,'', 1, 1)
  • y REGEXP_SUBSTR(c,',', 1, 2)
  • dbms_output.put_line('lt'x'-'y'gt')
  • END
  • /
  • lt1-46gt
  • PL/SQL procedure successfully completed.

First non-colon found
Starting after the first non-comma through to the
second comma
41
Where to use Regular Expression
  • Validation of
  • e-mail addresses
  • Credit card numbers
  • SSN
  • Complicated string parsing
  • Where the standard Oracle functions LIKE, INSTR,
    REPLACE and SUBSTR cant do the job without a lot
    of work

Lots of examples can be found on the internet
42
Performance Considerations
  • Regular expressions are more compute intensive
    than non-regular expression equivalents.
  • Most database processes are IO bound and
    therefore some additional cpu load is normally
    not an issue

43
References
The best reference
  • http//www.regular-expressions.info/reference.html
  • http//www.oracle.com/technology/oramag/webcolumns
    /2003/techarticles/rischert_regexp_pt1.html
  • http//www.psoug.org/reference/regexp.html
  • http//www.dba-oracle.com/t_regular_expressions.ht
    m
  • http//rootshell.be/yong321/computer/OracleRegExp
    .html
  • http//www.databasejournal.com/features/oracle/art
    icle.php/3501826
  • John Garmany Being Regular with Regular
    Expresssions Collaborate08
Write a Comment
User Comments (0)
About PowerShow.com