Regular Expressions - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Regular Expressions

Description:

Matching either succeeds or fails. Sometimes you may want to replace a ... How would we match a pattern that starts and ends with the same letter or word? ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 46
Provided by: andrew184
Category:

less

Transcript and Presenter's Notes

Title: Regular Expressions


1
Software Tools
  • Regular Expressions

2
What is a Regular Expression?
  • A regular expression is a pattern to be matched
    against a string. For example, the pattern Bill.
  • Matching either succeeds or fails.
  • Sometimes you may want to replace a matched
    pattern with another string.
  • Regular expressions are used by many other Unix
    commands and programs, such as grep, sed, awk,
    vi, emacs, and even some shells.

3
Simple Uses of Regular Expressions
  • If we are looking for all the lines in a file
    that contain the string Shakespeare, we could use
    the grep command
  • grep Shakespeare movie result
  • Here, Shakespeare is the regular expression that
    grep looks for in the file movie.
  • Lines that match are redirected to result.

4
Simple Uses of Regular Expressions
  • In Perl, we can make Shakespeare a regular
    expression by enclosing it in slashes
  • if(/Shakespeare/)
  • print _
  • What is tested in the if-statement?
  • Answer _.
  • When a regular expression is enclosed in
    slashes, _ is tested against the regular
    expression, returning true if there is a match,
    false otherwise.

5
Simple Uses of Regular Expressions
  • if(/Shakespeare/)
  • print _
  • The previous example tests only one line, and
    prints out the line if it contains Shakespeare.
  • To work on all lines, add a loop
  • while()
  • if(/Shakespeare/)
  • print

6
Simple Uses of Regular Expressions
  • What if we are not sure how to spell Shakespeare?
  • Certainly the first part is easy Shak, and there
    must be a r near the end.
  • How can we express our idea?
  • grep grep "Shak.r" movie result
  • Perl while()
  • if(/Shak.r/)
  • print
  • . means zero or more of any character.

7
Simple Uses of Regular Expressions
  • grep grep "Shak.r" movie result
  • The double quotes in this grep example are needed
    to prevent the shell from interpreting as all
    files.
  • Since Shakespeare ends in e, shouldnt it be
  • Shak.r.
  • Answer No need. Any character can come before
    or after the pattern.
  • Shak.r is the same as .Shak.r.

8
Substitution
  • Another simple regular expression is the
    substitute operator.
  • It replaces part of a string that matches the
    regular expression with another string.
  • s/Shakespeare/Bill Gates/
  • _ is matched against the regular expression
    (Shakespeare).
  • If the match is successful, the part of the
    string that matched is discarded and replaced by
    the replacement string (Bill Gates).
  • If the match is unsuccessful, nothing happens.

9
Substitution
  • The program
  • cat movie
  • Titanic
  • Saving Private Ryan
  • Shakespeare in Love
  • Life is Beautiful
  • cat sub1
  • !/usr/local/bin/perl5 -w
  • while()
  • if(/Shakespeare/)
  • s/Shakespeare/Bill Gates/
  • print
  • sub1 movie
  • Bill Gates in Love

10
Substitution
  • An even shorter way to write it
  • cat sub2
  • !/usr/local/bin/perl5 -w
  • while()
  • if(s/Shakespeare/Bill Gates/)
  • print
  • sub2 movie
  • Bill Gates in Love

11
Patterns
  • A regular expression is a pattern.
  • Some parts of the pattern match a single
    character (a).
  • Other parts of the pattern match multiple
    characters (.).

12
Single-Character Patterns
  • The dot . matches any single character except
    the newline (\n).
  • For example, the pattern /a./ matches any
    two-letter sequence that starts with a and is not
    a\n.
  • Use \. if you really want to match the period.
  • cat test
  • hi
  • hi bob.
  • cat sub3
  • !/usr/local/bin/perl5 -w
  • while()
  • if(/\./) print
  • sub3 test
  • hi bob.

13
Single-Character Groups
  • If you want to specify one out of a group of
    characters to match use
  • /abcde/
  • This matches a string containing any one of the
    first 5 lowercase letters, while
  • /aeiouAEIOU/
  • matches any of the 5 vowels in either upper or
    lower case.

14
Single-Character Groups
  • If you want in the group, put a backslash
    before it, or put it as the first character in
    the list
  • /abcde/ matches abcde
  • /abcde\/ okay
  • /abcde/ also okay
  • Use - for ranges of characters (like a through
    z)
  • /0123456789/ any single digit /0-9/
    same
  • If you want - in the list, put a backslash before
    it, or put it at the beginning/end
  • /X-Z/ matches X, Y, Z
  • /X\-Z/ matches X, -, Z
  • /XZ-/ matches X, Z, -
  • /-XZ/ matches -, X, Z

15
Single-Character Groups
  • More range examples
  • /0-9\-/ match 0-9, or minus
  • /0-9a-z/ match any digit or lowercase
    letter
  • /a-zA-Z0-9_/ match any letter, digit,
    underscore
  • There is also a negated character group, which
    starts with a immediately after the left
    bracket. This matches any single character not in
    the list.
  • /0123456789/ match any single
    non-digit /0-9/ same
  • /aeiouAEIOU/ match any single non-vowel
  • /\/ match any single character except

16
Single-Character Groups
  • For convenience, some common character groups are
    predefined
  • Predefined Group Negated Negated Group
  • \d (a digit) 0-9 \D (non-digit) 0-9
  • \w (word char) a-zA-Z0-9_ \W
    (non-word) a-zA-Z0-9_
  • \s (space char) \t\n \S (non-space) \t\n
  • \d matches any digit
  • \w matches any letter, digit, underscore
  • \s matches any space, tab, newline
  • You can use these predefined groups in other
    groups
  • /\da-fA-F/ match any hexadecimal digit

17
Multipliers
  • Multipliers allows you to say one or more of
    these or up to four of these.
  • means zero or more of the immediately previous
    character (or character group).
  • means one or more of the immediately previous
    character (or character group).
  • ? means zero or one of the immediately previous
    character (or character group).

18
Multipliers
  • Example
  • /Gate?s/
  • matches a G followed by one or more as followed
    by t, followed by an optional e, followed by s.
  • , , and ? are greedy, and will match as many
    characters as possible _ "Bill xxxxxxxxx
    Gates"
  • s/x/Cheap/ gives Bill Cheap Gates

19
General Multiplier
  • How do you say five to ten xs?
  • /xxxxxx?x?x?x?x?/ works, but ugly
  • /x5,10/ nicer
  • How do you say five or more xs?
  • /x5,/
  • How do you say exactly five xs?
  • /x5/
  • How do you say up to five xs?
  • /x0,5/

20
General Multiplier
  • How do you say c followed by any 5 characters
    (which can be different) and ending with d?
  • /c.5d/
  • is the same as 0,
  • is the same as 1,
  • ? is the same as 0,1

21
Pattern Memory
  • How would we match a pattern that starts and ends
    with the same letter or word?
  • For this, we need to remember the pattern.
  • Use ( ) around any pattern to put that part of
    the string into memory (it has no effect on the
    pattern itself).
  • To recall memory, include a backslash followed by
    an integer.
  • /Bill(.)Gates\1/

22
Pattern Memory
  • Example
  • /Bill(.)Gates\1/
  • This example matches a string starting with
    Bill, followed by any single non-newline
    character, followed by Gates, followed by that
    same single character.
  • So, it matches
  • Bill!Gates! Bill-Gates-
  • but not
  • Bill?Gates! Bill-Gates_
  • (Note that /Bill.Gates./ would match all four)

23
Pattern Memory
  • More examples
  • /a(.)b(.)c\2d\1/
  • This example matches a string starting with a, a
    character (1), followed by b, another single
    character (2), c, the character 2, d, and the
    character 1.
  • So it matches a-b!c!d-.

24
Pattern Memory
  • The reference part can have more than a single
    character.
  • For example
  • /a(.)b\1c/
  • This example matches an a, followed by any number
    of characters (even zero), followed by b,
    followed by the same sequence of characters,
    followed by c.
  • So it matches aBillbBillc and abc, but not
    aBillbBillGatesc.

25
Alteration
  • How about picking from a set of alternatives when
    there is more than one character in the patterns.
  • The following example matches either Gates or
    Clinton or Shakespeare
  • /GatesClintonShakespeare/
  • For single character alternatives,
  • /abc/
  • is the same as
  • /abc/.

26
Anchoring Patterns
  • Anchors requires that the pattern be at the
    beginning or end of the line.
  • matches the beginning of the line (only if is
    the first character of the pattern)
  • /Bill/ match lines that begin with Bill
  • /Gates/ match lines that begin with Gates
  • /Bill\/ match lines containing Bill
    somewhere
  • /\/ match lines containing
  • matches the end of the line (only if is the
    last character of the pattern)
  • /Bill/ match lines that end with Bill
  • /Gates/ match lines that end with Gates
  • /Bill/ match with contents of scalar Bill
  • /\/ match lines containing

27
Precedence
  • So what happens with the pattern ab
  • Is this (ab) or
  • a(b) ?
  • Precedence of patterns from highest to lowest
  • Name Representation
  • Parentheses ( )
  • Multipliers ? m,n
  • Sequence anchoring abc
  • Alternation
  • By the table, has higher precedence than , so
    it is interpreted as a(b).

28
Precedence
  • What if we want the other interpretation in the
    previous example?
  • Answer Simple, just use parentheses (ab)
  • Use parentheses in ambiguous cases to improve
    clarity, even if not strictly needed.
  • When you use parentheses for precedence, they
    also go into memory (\1, \2, \3).

29
Precedence
  • More precedence examples
  • abc matches ab, abc, abcc, abccc,
  • (abc) matches "", abc, abcabc, abcabcabc,
  • ab matches a at beginning of line, or b
    anywhere
  • (ab) matches either a or b at the beginning
    of line
  • abcd a, or bc, or d
  • (ab)(cd) ac, ad, bc, or bd
  • (Bill Gates)(Bill Clinton) Bill Gates, Bill
    Clinton
  • Bill (GatesClinton) Bill Gates, Bill Clinton
  • (Mr\. Bill)(Bill (GatesClinton))
  • Mr. Bill, Bill Gates, Bill Clinton
  • (Mr\. )?Bill( Gates Clinton)?
  • Bill, Mr. Bill, Bill Gates, Bill Clinton,
  • Mr. Bill Gates, Mr. Bill Clinton

30
  • What if you want to match a different variable
    than _?
  • Answer Use .
  • Examples
  • name "Bill Shakespeare"
  • name /Bill/ true
  • name /(.)\1/ also true (matches ll)
  • if(name /(.)\1/)
  • print "name\n"

31
  • An example using to match
  • cat match1
  • !/usr/local/bin/perl5 -w
  • print "Quit (y/n)? "
  • if( /yY/)
  • print "Quitting\n"
  • exit
  • print "Continuing\n"
  • match1
  • Quit (y/n)? y
  • Quitting

32
  • Another example using to match
  • cat match2
  • !/usr/local/bin/perl5 -w
  • print "Wakeup (y/n)? "
  • while( /nN/)
  • print "Sleeping\n"
  • print "Wakeup (y/n)? "
  • match2
  • Wakeup (y/n)? n
  • Sleeping
  • Wakeup (y/n)? N
  • Sleeping
  • Wakeup (y/n)? y

33
Ignoring Case
  • In the previous examples, we used yY and nN
    to match either upper or lower case.
  • Perl has an ignore case option for pattern
    matching /somepattern/i
  • cat match1a
  • !/usr/local/bin/perl5 -w
  • print "Quit (y/n)? "
  • if( /y/i)
  • print "Quitting\n"
  • exit
  • print "Continuing\n"
  • match1a
  • Quit (y/n)? Y
  • Quitting

34
Slash and Backslash
  • If your pattern has a slash character (/), you
    must precede each with a backslash (\)
  • cat slash1
  • !/usr/local/bin/perl5 -w
  • print "Enter path "
  • path
  • if(path /\/usr\/local\/bin/)
  • print "Path is /usr/local/bin\n"
  • slash1
  • Enter path /usr/local/bin
  • Path is /usr/local/bin

35
Different Pattern Delimiters
  • If your pattern has lots of slash characters (/),
    you can also use a different pattern delimiter
    with the form msomepattern
  • The can be any non-alphanumeric character.
  • cat slash1a
  • !/usr/local/bin/perl5 -w
  • print "Enter path "
  • path
  • if(path m/usr/local/bin)
  • if(path m_at_/usr/local/bin_at_) also works
  • print "Path is /usr/local/bin\n"
  • slash1a
  • Enter path /usr/local/bin
  • Path is /usr/local/bin

36
Special Read-Only Variables
  • After a successful pattern match, the variables
    1, 2, 3, are set to the same values as \1,
    \2, \3,
  • You can use 1, 2, 3, later in your program.
  • cat read1
  • !/usr/local/bin/perl5 -w
  • _ "Bill Shakespeare in Love"
  • /(\w)\W(\w)/ match first two words
  • 1 is now "Bill" and 2 is now "Shakespeare"
  • print "The first name of 2 is 1\n"
  • read1
  • The first name of Shakespeare is Bill

37
Special Read-Only Variables
  • You can also use 1, 2, 3, by placing the
    match in a list context
  • cat read2
  • !/usr/local/bin/perl5 -w
  • _ "Bill Shakespeare in Love"
  • (first, last) /(\w)\W(\w)/
  • print "The first name of last is first\n"
  • read2
  • The first name of Shakespeare is Bill

38
Special Read-Only Variables
  • Other read-only variables
  • is the part of the string that matched the
    pattern.
  • is the part of the string before the match
  • is the part of the string after the match
  • cat read3
  • !/usr/local/bin/perl5 -w
  • _ "Bill Shakespeare in Love"
  • / in /
  • print "Before \n"
  • print "Match \n"
  • print "After '\n"
  • read3
  • Before Bill Shakespeare
  • Match in
  • After Love

39
More on Substitution
  • If you want to replace all matches instead of
    just the first match, use the g option for
    substitution
  • cat sub3
  • !/usr/local/bin/perl5 -w
  • _ "Bill Shakespeare in love with Bill
    Gates"
  • s/Bill/William/
  • print "Sub1 _\n"
  • _ "Bill Shakespeare in love with Bill
    Gates"
  • s/Bill/William/g
  • print "Sub2 _\n"
  • sub3
  • Sub1 William Shakespeare in love with Bill
    Gates
  • Sub2 William Shakespeare in love with William
    Gates

40
More on Substitution
  • You can use variable interpolation in
    substitutions
  • cat sub4
  • !/usr/local/bin/perl5 -w
  • find "Bill"
  • replace "William"
  • _ "Bill Shakespeare in love with Bill
    Gates"
  • s/find/replace/g
  • print "_\n"
  • sub4
  • William Shakespeare in love with William Gates

41
More on Substitution
  • Pattern characters in the regular expression
    allows patterns to be matched, not just fixed
    characters
  • cat sub5
  • !/usr/local/bin/perl5 -w
  • _ "Bill Shakespeare in love with Bill Gates"
  • s/(\w)//g
  • print "_\n"
  • sub5


42
More on Substitution
  • Substitution also allows you to
  • ignore case
  • use alternate delimiters
  • use
  • cat sub6
  • !/usr/local/bin/perl5 -w
  • line "Bill Shakespeare in love with bill
    Gates"
  • line sbillWilliamgi
  • line s_at_Shakespeare_at_Gates_at_gi
  • print "line\n"
  • sub6
  • William Gates in love with William Gates

43
split
  • The split function allows you to break a string
    into fields.
  • split takes a regular expression and a string,
    and breaks up the line wherever the pattern
    occurs.
  • cat split1
  • !/usr/local/bin/perl5 -w
  • line "Bill Shakespeare in love with Bill
    Gates"
  • _at_fields split(/ /,line)
  • split line using space as delimiter
  • print "fields0 fields3 fields6\n"
  • split1
  • Bill love Gates

44
split
  • You can use _ with split.
  • split defaults to look for space delimiters.
  • cat split2
  • !/usr/local/bin/perl5 -w
  • _ "Bill Shakespeare in love with Bill
    Gates"
  • _at_fields split
  • split _ using space (default) as delimiter
  • print "fields0 fields3 fields6\n"
  • split2
  • Bill love Gates

45
join
  • The join function allows you to glue strings in a
    list together.
  • cat join1
  • !/usr/local/bin/perl5 -w
  • _at_list qw(Bill Shakespeare dislikes Bill
    Gates)
  • line join(" ", _at_list)
  • print "line\n"
  • join1
  • Bill Shakespeare dislikes Bill Gates
  • Note that the glue string is not a regular
    expression, just a normal string.
Write a Comment
User Comments (0)
About PowerShow.com