Using regular expressions - PowerPoint PPT Presentation

About This Presentation
Title:

Using regular expressions

Description:

Using regular expressions Search for a single occurrence of a specific string. Search for all occurrences of a string. Approximate string matching. – PowerPoint PPT presentation

Number of Views:239
Avg rating:3.0/5.0
Slides: 24
Provided by: usersDrew2
Learn more at: https://users.drew.edu
Category:

less

Transcript and Presenter's Notes

Title: Using regular expressions


1
Using regular expressions
  • Search for a single occurrence of a specific
    string.
  • Search for all occurrences of a string.
  • Approximate string matching.

2
Forming RegExps
  • Strings
  • Variables
  • Patterns

3
Strings and Variables
  • /Joey Ramone/ - match a specific string.
  • /name/, where name Joey Ramone - match the
    string stored in a variable.
  • /Joey name/ - matching a pattern defined by a
    mixture of strings and variables.

4
Character classes
  • abc match abc
  • . match any single character (i.e. a.b).
  • abc match a or b or c
  • 0123456789 match 0 or 1 or or 9
  • 0-9 same as previous
  • a-z match a or b or or z
  • A-Z same as previous only with caps
  • match any single occurrence of any of the
    characters found within.
  • 0-9a-zA-Z- match any alphanumeric or the
    minus sign

5
Negated character classes
  • 0-9 match any single character that is not a
    numeric digit
  • aeiouAEIOU match any single character that
    is not a vowel
  • Works only for single characters
  • Well discuss matching negated strings of
    characters later.

6
Escape characters
  • \ - use the backslash to match any special
    character as the character itself.
  • /\name/ - match the literal string name.
  • /a\.b/ - match the literal string a.b rather
    than a followed by any character, followed by
    b.

7
Convenience character classes
  • \d (a digit) - 0-9
  • \D (digits, not!) - 0-9
  • \w (word char) - a-zA-Z0-9_
  • \W (words, not!) - a-zA-Z0-9_
  • \s (space char) - \r\t\n\f
  • \S (space, not!) - \r\t\n\f

8
Sequences
  • - one or more of preceding pattern
  • /a-zA-Z/ (match a string of alpha characters
    such as a name).
  • ? (match zero or one instance of preceding
    character).
  • /a-zA-Z-?a-zA-Z (Now we can match
    hyphenated names).

9
Sequences
  • (match zero or more of preceding pattern)
  • Example list of names
  • George Harrison
  • Paul McCartney
  • Richard Ringo Starkey
  • John Winston Lennon
  • /a-zA-Z a-zA-Z/ (match first and last name)
  • /a-zA-Z a-zA-Z\ a-zA-Z/ (match first
    name, middle name, if it exists, and last name)

10
Sequences
  • k match k instances of preceding pattern.
  • Example floating point numbers to 2 decimal
    places
  • /0-9\.0-92
  • k,j match at least k instances of preceding
    pattern, but no more than j.
  • Example floating point numbers that may or may
    not have a decimal component.
  • /0-9\.?0-90,2/

11
Grouping
  • /(JohnPaulGeorgeRingo)/ matches any one of
    either John, Paul, George, or Ringo
  • /((JohnPaulGeorgeRingo) )/
  • Matches the Beatles names listed in any order.
  • John Paul George Ringo
  • Paul George John Ringo
  • Ringo Paul George John
  • Actually, this will also match
  • Paul Paul Paul Paul Paul Paul Paul Paul Paul
  • Be careful about what assumptions you make.

12
Problem
  • Write a regular expression that will match social
    security number.
  • Format 555-55-5555

13
A solution
  • /0-93-0-92-0-94/

14
Problem
  • Write a regular expression that will match a
    phone number.
  • Formats
  • 319-337-3663
  • 319.337.3663

15
A solution
  • /0-93\.-0-93\.-0-94

16
Add another format
  • 3193373663

17
A solution
  • /0-93\.-?0-93\.-?0-94/

18
Problem
  • Write a regular expression that will match an
    email address.
  • Legal characters for names are
  • Letters, numbers, -, and _
  • Legal characters for domain names are
  • Letters only
  • Assume form username_at_machine.domain.suffix

19
A solution
  • /a-z0-9-_\_at_a-z(\.a-z)2/
  • More general version
  • /a-z0-9-_\_at_a-z(\.a-z)/

20
Problem
  • Write a regular expression that will match an
    HTML anchor start tag.
  • Assume anchor tag is of the form
  • lta hrefsome urlgtsome anchor textlt/agt

21
A solution
  • /lta hrefgt/
  • Actually, quotes are not required
  • So it should be
  • /lta href?gt?gt/
  • How would we assign the url to a variable?

22
A solution
  • (url)
  • (htmlText m/lta href?gt?gt/)

23
Take Away
  • There is almost always a pattern that will match
    what you want it to match.
  • The best way to learn is to simply jump in and
    start writing your own patterns.
  • If you have a question about how to construct
    one, feel free to ask me.
  • One typically learns Perl by asking people with
    more experience.
Write a Comment
User Comments (0)
About PowerShow.com