Regular Expressions: Concepts - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Regular Expressions: Concepts

Description:

The purpose of a regex to match a string to a pattern ... Globs are wildcard matches, often in UNIX or DOS shell. Example: *.pl. Regex explained ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 13
Provided by: joereyn
Category:

less

Transcript and Presenter's Notes

Title: Regular Expressions: Concepts


1
Regular Expressions Concepts
  • CSC492 Topics in Perl
  • Joe Reynoldson
  • 3/14/05

2
Regular Expressions (regex) Basics
  • The purpose of a regex to match a string to a
    pattern
  • The syntax of regex is simple (although sometimes
    confusing), and it basically represents a whole
    new language
  • Regex is not unique to Perl, but regex languages
    are unique
  • Many (including Java) imitate perl regex
  • Regexes and globs are 2 different things
  • Globs are wildcard matches, often in UNIX or DOS
    shell
  • Example .pl

3
Regex explained
  • Regexes divide an infinite set of input strings
    into 2 groups
  • Matched
  • Didn't match
  • There's no in between (fuzzy matching)
  • regexes are often used as boolean statements in
    conditional expressions (such as in if statements
    or while loops)

4
Matching with the matching operator
  • The matching operator ( m// ) matches _ against
    the pattern specified between the slashes
  • Most Perl hackers leave off the leading 'm' and
    just use slashes

while(ltINgt) if( /Waldo/ ) print "I
found Waldo in _!\n"
5
Metacharacters Wildcard ( . )
  • There are many characters which have special
    meaning in regexes
  • The dot ( . ) is a wildcard which matches any
    character (except newline... we'll get to that)
  • Matches if _ is I have a cat or I have a car
  • Also matches can, cab, cam, and ca9

/I have a ca./
6
Metacharacters Escape ( \ )
  • To match a period, you must escape it
  • The escape character ( \ ) is our second
    metacharacter
  • Use escape whack to match a whack/\\/
  • Use whack-a-mole to relieve tension

/3\.14159/
7
Matching Operator Variation
  • The slashes can be replaced by pretty much any
    character
  • This is handy when matching against data with
    many slashes (such as the full path to a file)
  • Perhaps m/home/j/jreynold would be more
    explicit for non-hackers
  • Better than the alternative

if( /home/j/jreynold ) print "_ contains
my home directory\n"
/\/home\/j\/jreynold/
8
A Quick CGI Example
  • Suppose you want to determine if a web page was
    requested by a browser in the usd.edu domain
  • Of course this regex fails miserably, but how?
    (Hint There are an infinite number of input
    strings)

_ ENV'REMOTE_HOST' if(/\.usd\.edu/ )
print "You appear to be on a ltbgtU.lt/bgt
computer\n"
9
Metacharacters Simple Quantifiers
  • Quantifiers allow a programmer to specify how
    many matches s?he wants (hee hee)
  • Here they are in no particular order
  • ? matches a pattern 0 or 1 time (/s?he/ would
    match she or he, and now youre in on my geeky
    joke! You too can be the life of the party!)
  • matches the pattern 0 or more times (/she/
    matches he, she, sshe, ssshe, )
  • matches the pattern 1 or more times (/she/
    matches she, sshe, ssshe, )

10
Escaping Quantifiers
  • Each quantifier is a metacharacter, and therefore
    they must be escaped if you'd like to explicitly
    match them
  • /c\/ matches c, c, c, ... (but not c)
  • /get out\??/ matches get out and get out?
  • /\bold\/ matches bold, bold, bold, oh
    you get the picture...

11
Metacharacters Grouping Patterns
  • Parenthesis ( ( ) ) can group items together for
    the sake of quantifying

print "Continue? (yes or no) " chomp(_
ltSTDINgt) if( /y(es)?/ ) check for y or yes
print "Continuing!\n"
12
Metacharacters Alternation
  • Alternative is a fancy regex term for 'or'
  • The vertical bar ( ) placed between two
    patterns mean match the left hand side or the
    right hand side
  • /textpdfhtml/ means match the string text, or
    the string pdf, or the string html
  • From our previous example

if( /(Yy)(eseahah)?/ )
Write a Comment
User Comments (0)
About PowerShow.com