Python and Perl - PowerPoint PPT Presentation

1 / 9
About This Presentation
Title:

Python and Perl

Description:

Compile a pattern with subgroups like 'patt(sub1)er(sub2)n' = compile returns a re object. reobj = re.compile('p(sub1)a(sub2)t') match the re object with a ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 10
Provided by: andersb6
Category:

less

Transcript and Presenter's Notes

Title: Python and Perl


1
Python and Perl
  • Lecture 2
  • Regular Expressions

2
Why use Regular Expressions?
  • It is very powerful when extracting information
    from flat files.
  • Easy to identify text rows with interesting data.
  • Easy to retrieve sub groups/sub strings from text
    rows.
  • Easy to identify numbers, words, white spaces and
    separators.

3
Real World exempel
ID TRBG361 standard mRNA PLN 1859
BP. XX AC X56734 S46826
Regular expression /ID/
Regular expression /AC\s\w/
4
Meta characters - 1
  • Ordinary characters match them selves (a vs
    a, H vs H, Arsenal vs Arsenal).
  • Meta characters are special characters that
    controls how other characters are interpreted.
  • . Matches any characters (except newline \n)
  • The characters that follows must match
    the first characters on a line/string.
  • The characters before must match the last
    characters in the row/string.
  • The character before matches zero or
    many occurrences.
  • The character before matches one or
    many occurrences.
  • ? The character before ? will match but is
    not necessary. (optional)

5
Meta characters - 2
  • The characters preceding n , will
    match n repeats (an intervall like a2,4
    matches aa, aaa, aaaa).
  • Characters enclosed like xyz , matches
    either one of x, y, z.
  • Characters on either side like x y,
    matches x or y.
  • () Characters enclosed by (), like
    ab(xyz)cd, determines a sub group xyz in the
    match.
  • \ Meta characters preceded by \, matches
    themselves and revokes the meaning of the
    special meta feature. Like \ match plus sign.

6
Special Sequences
  • \d Matches any number, same as 0-9.
  • \D Inversed form of \d, matches any non
    number.
  • \s Matches any white spaces, same as
    \t\n\r\f\v\b.
  • \S Inversed form of \s, matches any non white
    space character
  • \w Matches any alphanumeric character, same as
    a-zA-Z0-9_.
  • \W Inversed form of \w.

7
How to apply RE? - overview
  • Python
  • Compile a pattern
  • gt returns a re object.
  • reobj re.compile(pattern)
  • match the re object with a stringvariable.
  • gt returns a match object.
  • mobj reobj.match(stringvar)
  • Test the match object
  • if mobj
  • Code block for match
  • else
  • Code block for non-match
  • Perl
  • Use the string operator
  • string m/pattern/
  • gt returns True if matching was succesful.
  • example
  • if ( string m/pattern/ )
  • code block for match

See Lecture 1, example 7 for a complete program
using Reg Exp.
8
How to apply RE with sub groups? - overview
  • Python
  • Compile a pattern with subgroups like
    patt(sub1)er(sub2)n
  • gt compile returns a re object.
  • reobj re.compile(p(sub1)a(sub2)t)
  • match the re object with a stringvariable.
  • gt match/search returns a match object.
  • mobj reobj.match(stringvar)
  • Test the match object
  • if mobj
  • Extract sub groups
  • subgr1 mobj.group(1)
  • subgr2 mobj.group(2)
  • Perl
  • Use the string operator
  • if ( stringm/p(sub1)a(sub2)t/)
  • subgr1 1
  • subgr2 2
  • 1, 2, etc is built in variables in Perl. They
    are empty until a match was succesful.

import re racc re.compile(KW\s(.)\s(.))
m racc.match(line) if m subgr1
m.group(1) subgr2 m.group(2)
9
How does the syntax looks like?
My RE Perl program Filename RE.pl line
AC M5032 if ( line m/AC\s\w/
) print Found a match\n else print No
match found\n
My RE Python program Filename RE.py import
re line AC M50362 racc
re.compile(AC\s\w) if racc.match(line) pri
nt Found a match else print No match found
Write a Comment
User Comments (0)
About PowerShow.com