Regular Expressions - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Regular Expressions

Description:

1 will contain that thing that he/she likes. Matching Modes ... myGarage =~ s/Jeep|Cougar/Boeing/g; Gives me jets in exchange for cars ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 29
Provided by: ianpat
Learn more at: https://www.csh.rit.edu
Category:

less

Transcript and Presenter's Notes

Title: Regular Expressions


1
address m/(\d .)\n(.?, (A-Z2)
(\d5)-?(\d0,5)/
2
Introduction to Regular Expressions
  • Its all about patterns
  • Character Classes match any text of a certain
    type
  • Repetition operators specify a recurring pattern
  • Search flags change how the RegEx operates
  • In this presentation
  • green denotes a character class
  • yellow denotes a repetition quantifier
  • orange denotes a search flag or other symbol
  • My examples use Perl syntax

3
Introduction to Regular Expressions
  • Basic syntax
  • All RegEx statements must begin and end with /
  • /something/
  • Escaping reserved characters is crucial
  • /(i.e. / is invalid because ( must be closed
  • However, /\(i\.e\. / is valid for finding (i.e.
  • Reserved characters include
  • . ? ( ) / \
  • Also some characters have special meanings based
    on their position in the statement

4
Regular Expression Matching
  • Text Matching
  • A RegEx can match plain text
  • ex. if (name /Dan/) print match
  • But this will match Dan, Danny, Daniel, etc
  • Full Text Matching with Anchors
  • Might want to match a whole line (or string)
  • ex. if (name /Dan/) print match
  • This will only match Dan
  • anchors to the front of the line
  • anchors to the end of the line

5
Regular Expression Matching
  • Order of results
  • The search will begin at the start of the string
  • This can be altered, dont ask yet
  • Every character is important
  • Any plain text in the expression is treated
    literally
  • Nothing is neglected (close doesnt count)
  • / s/ is not the same as / s/
  • Far easier to write than to debug!

6
Regular Expression Char Classes
  • Allows specification of only certain allowable
    chars
  • dofZ matches only the letters d, o, f, and Z
  • If you have a string dog then /dofZ/ would
    match d only even though o is also in the
    class
  • So this expression can be stated match one of
    either d, o, f, or Z.
  • A-Za-z matches any letter
  • a-fA-F0-9 matches any hexadecimal character
  • /\\ matches anything BUT , , /, or \
  • The in the front of the char class specifies
    not
  • In a char class, you only need to escape \ ( -

7
Regular Expression Char Classes
  • Special character classes match specific
    characters
  • \d matches a single digit
  • \w matches a word character (A-Z, a-z, _)
  • \b matches a word boundary /\bword\b/
  • \s matches a whitespace character (spc, tab,
    newln)
  • . wildcard matches everything except newlines
  • Use very carefully, you could get anything!
  • To match anything but capitalize the char
    class
  • i.e. \D matches anything that isnt a digit

8
Regular Expression Char Classes
  • Character Class Examples
  • bodyPart /e\w\w/
  • Matches ear, eye, etc
  • thing 1, 2, 3 strikes! thing /\s\d/
  • Matches 2
  • thing 1, 2, 3 strikes! thing /\s\d/
  • Matches 1
  • Not always useful to match single characters
  • phone /\d\d\d-\d\d\d-\d\d\d\d/
  • Theres a better way

9
Regular Expression Repetition
  • Repetition allows for flexibility
  • Range of occurrences
  • weight /\d2,3/
  • Matches any weight from 10 to 999
  • name /\w5,/
  • Matches any name longer than 5 letters
  • if (SSN /\d9/) print Invalid SSN!
  • Matches exactly 9 digits

10
Regular Expression Repetition
  • General Quantifiers
  • Some more special characters
  • favoriteNumber /\d/
  • Matches any size number or no number at all
  • firstName /\w/
  • Matches one or more characters
  • middleInitial /\w?/
  • Matches one or zero characters

11
Regular Expression Repetition
  • Greedy vs Nongreedy matching
  • Greedy matching gets the longest results possible
  • Nongreedy matching gets the shortest possible
  • Lets say robot The12thRobotIs2ndInLine
  • robot /\w\d/ (greedy)
  • Matches The12thRobotIs2
  • Maximizes the length of \w
  • robot /\w?\d/ (nongreedy)
  • Matches The12
  • Minimizes the length of \w

12
Regular Expression Repetition
  • Greedy vs Nongreedy matching
  • Suppose txt something is so cool
  • txt /something/
  • Matches something
  • txt /so(mething)?/
  • Matches something and the second so
  • txt /so(mething)??/
  • Matches only so and the second so
  • Doesnt really make sense to do this

13
Regular Expression Real Life Examples
  • Using what youve learned so far, you can
  • Validate a standard 8.3 file name
  • path /\w1,8\.A-Za-z0-92,3/
  • Account for poorly spelled user input
  • answer /ban1,2an1,2a/
  • iansLastName /Paet1,2ersoen/
  • iansFirstName /E?Ii?aeo?n/
  • Matches Ian, Ean, Eian, Eon, Ien, Ein
  • At least everyone gets the n right

14
Alternation
  • Alternation allows multiple possibilities
  • Let story He went to get his mother
  • story /(HeShe)\b.?\b(hisher)\b.?
    (motherfatherbrothersisterdog)/
  • Also matches She punched her fat brother
  • Make sure the grouping is correct!
  • ans /(truefalse)/
  • Matches only true or false
  • ans /truefalse/ (same as /(truefalse)/)
  • Matches true never or not really false

15
Grouping for Backreferences
  • Backreferences
  • With all these wildcards and possible matches, we
    usually need to know what the expression finally
    ended up matching.
  • Backreferences let you see what was matched
  • Can be used after the expression has evaluated or
    even inside the expression itself
  • Handled very differently in different languages
  • Numbered from left to right, starting at 1

16
Grouping for Backreferences
  • Perl backreferences
  • Used inside the expression
  • txt /\b(\w)\s\1\b/
  • Finds any duplicated word, must use \1 here
  • Used after the expression
  • class /(.?)-(\d)/
  • The first word between hyphens is stored in the
    Perl variable 1 (not \1) and the number goes in
    2
  • print I am in class 1, section 2

17
Grouping for Backreferences
  • Java backreferences
  • Annoying but still useful
  • Pattern p Pattern.compile((.?)-(\\d))
  • Matcher m p.matcher(mySchedule)
  • m.find()
  • System.out.println(I am in class m.group(1)
  • , section m.group(2))
  • Ugly, but usually better than the alternative
  • m.group() returns the entire string matched

18
Grouping for Backreferences
  • Javascript backreferences
  • Used inside the expression
  • Not supported
  • Used after the expression
  • /(.?)-(\d)/.test(class)
  • alert(RegExp.1)
  • str str.replace(/(\S)\s(\S)/, 2 1)
  • RegExp supports all of Perls special
    backreference variables (wait a few slides)

19
Grouping for Backreferences
  • PHP/Python backreferences
  • Allows the use of specifically named
    backreferences
  • Groups also maintain their numbers
  • .NET backreferences
  • Allows named backreferences
  • If you try to access named groups by number,
    stuff breaks
  • Check the web for info on how to use
    backreferences in these and other languages.

20
Grouping without Backreferences
  • Sometimes you just need to make a group
  • If important groups must be backreferenced,
    disable backreferencing for any unimportant
    groups
  • sentence /(?HeShe) likes (\w)\./
  • I dont care if its a he or she
  • All I want to know is what he/she likes
  • Therefore I use (?) to forgo the backreference
  • 1 will contain that thing that he/she likes

21
Matching Modes
  • Matching has different functional modes
  • Modes can be set by flags outside the expression
    (only in some languages implementations)
  • name /a-z/i
  • i turns off case sensitivity
  • xml /title(\w ).keywords(\w )/s
  • s enables . to match newlines
  • report /\sName\s\S?The End.\s/m
  • m allows newlines between and

22
Matching Modes
  • Matching has different functional modes
  • Modes can be set by flags inside the expression
    (except in Javascript and Ruby)
  • password /a-z(?i)a-jp-xz0-94,11/
  • If an insane web site specifies that your
    password must begin with a lowercase letter
    followed by 4 to 11 upper/lower alphanumeric
    characters excluding k through o and y.
  • element /(?i)A-Z(?-i)a-z?/
  • (?i) makes the first letter case insensitive (if
    they type o, but meant O, we still know they mean
    oxygen). (?-i) makes sure the second letter is
    lowercase, otherwise its 2 elements

23
Regular Expression Replacing
  • Replacements simplify complex data modification
  • Generally the first part of a replace command is
    the regular expression and the second part is
    what to replace the matched text with
  • Usually a backreference variable can be used in
    the replacement text to refer to a group matched
    in the expression
  • The RegEx engine continues searching at the point
    in the string following the replacement
  • Replacements use all the same syntax, but have
    several unique features and are implemented very
    differently in various languages.

24
Regular Expression Replacing
  • Perl replacement syntax
  • phone s/\D//
  • Removes the first non-digit character in a phone
  • Note that leaving the replacement blank deletes
  • html s/(\s)/1\t/
  • Adds a tab to a line of HTML using backreferences
  • sample s/abc/ABC/
  • Might not do what is expected
  • The second part is NOT a regular expression, its
    a string

25
Regular Expression Replacing
  • Java replacement syntax (sucks)
  • Pattern p Pattern.compile(\\\\\\\\server(\\d))
  • p.matcher(netPath).replaceAll(\\\\workstation1)
  • Yes, you actually have to use 8 \s to make \\
  • Any \ in the expression needs to be doubled
  • Matcher should parse replacement for 1
  • This has the same effect but is slightly faster
    than
  • netPath.replaceAll(\\\\\\\\server(\\d),

  • \\\\workstation1)
  • No, you cant seem to use .replace()

26
Replacement Modes
  • Replacements can be performed singly or globally
  • The examples I have been using replace only
    single occurrences of patterns
  • Use the g flag to force the expression to scan
    the entire string
  • phone s/\D//g
  • Removes all non-digits in the phone number
  • myGarage s/JeepCougar/Boeing/g
  • Gives me jets in exchange for cars
  • Dont use it if its not necessary

27
Combining Replace and Match Modes
  • Combining modes is easy
  • To combine modes, just append the flags
  • alphabet /Q//gi
  • Get rid of the pesky letter Q (and q too)
  • response /(?im)(aeiou.?)(?-m)(.)/
  • This example sucks. Point is you can combine
    modes inside the statement, too.

28
References for Learning More
  • Tutorials for other programming languages
  • http//www.regular-expressions.info/
  • In-depth syntax
  • http//kobesearch.cpan.org/htdocs/perl/perlreref.h
    tml
  • Code Search (ex ip address regex)
  • http//www.google.com/codesearch
Write a Comment
User Comments (0)
About PowerShow.com