CSE467567 Computational Linguistics - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

CSE467567 Computational Linguistics

Description:

Suppose I want to find an e-mail about geese. The e-mail may mention 'geese' or 'goose'; also, if it appears at the start of a ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 20
Provided by: arts57
Category:

less

Transcript and Presenter's Notes

Title: CSE467567 Computational Linguistics


1
CSE467/567Computational Linguistics
  • Carl Alphonce
  • cse-467-alphonce_at_cse.buffalo.edu
  • Computer Science Engineering
  • University at Buffalo

2
Words the building blocks of sentences
3
Words have internal structure
  • readable read able
  • readability read able ity
  • the structure of words can be described using a
    regular grammar

4
Chomsky hierarchy
5
Problem
  • I often need to find an e-mail, but I have
    thousands of e-mails in my various folders.
    Suppose I want to find an e-mail about geese.
    The e-mail may mention geese or goose also,
    if it appears at the start of a sentence, its
    initial letter will be capitalized. Need to
    match goose, geese, Goose or Geese.

6
Regular expressions (in Perl)
  • a regular expression is an algebraic notation
    for characterizing a set of strings p. 22
  • Regular expressions are commonly used to specify
    search strings. The UNIX utility program grep
    lets the user specify a pattern to search for in
    files.

7
Sequences of characters
  • Matching a sequence of characters
  • //
  • Examples
  • /a/ matches the character a
  • /fred/ matches the string fred
  • Note
  • /fred/ does not match Fred! Patterns are
    case-sensitive.

8
Disjunction
  • Square brackets are used to indicate disjuction
    alternate ways in which a pattern can be
    satisfied.
  • Examples
  • /Ff/ matches either f or F
  • /Ffred/ matches either fred or Fred

9
Ranges
  • Sometimes it is useful to specify any digit or
    any letter.
  • Any digit can be written as /0123456789/,
    since any of the ten digits satisfies the
    pattern.
  • An alternative is to use a special range
    notation /0-9/
  • Any letter can be specified as /A-Za-z/

10
Negation
  • To search for the opposite of a pattern, use the
    caret () in front of the pattern, enclosed in
    square brackets.
  • Examples
  • /a/ matches anything except a
  • /0-9/ matches anything except a digit

11
Matching 0 or 1 occurrence
  • The ? matches zero or one occurrences of the
    preceding expression.
  • Examples
  • /a?/ matches a or (nothing)
  • /cats?/ matches cat or cats
  • Note that the preceding expression, in these
    examples, is a single letter. Well see how to
    form longer expressions later.

12
The Kleene star and plus
  • The Kleene star () matches zero or more
    occurrences of the preceding expression.
  • Examples
  • /a/ matches , a, aa, aaa, etc.
  • /ab/ matches , a, b, aa, ab, ba,
    bb, etc.
  • matches one or more occurrences
  • is not necessary /ab/ is equiv. to
    /abab/

13
Wildcard
  • The period (.) matches any single character
    except the newline (\n).

14
Anchors
  • Anchors are used to restrict a match to a
    particular position within a string.
  • anchors to the start of a line
  • anchors to the end of a line
  • \b anchors to a word boundary
  • \B anchors to a non-boundary

15
Conjunction
  • Two regular expressions are conjoined by
    juxtaposition (placing the expressions side by
    side).
  • Examples
  • /a/ matches a
  • /m/ matches m
  • /am/ matches am but not a or m alone

16
Disjunction
  • We have already seen disjunction of characters
    using the square bracket notation
  • General disjunction is expressed using the
    vertical bar (), also called the pipe symbol.

17
Grouping
  • Parentheses, ( and ), are used to group
    subpatterns of a larger pattern.
  • Ex /Gg(ee)(oo)se/

18
Replacement
  • In addition to matching, we can do replacements
    when a match is found
  • Example
  • To replace the British spelling of color with
    the American spelling, we can write
  • s/colour/color/

19
Registers saving matches
  • To save a match from part of a pattern, to reuse
    it later on, Perl provides registers
  • Registers are named \, where is the number of
    the register
  • Ex.
  • DE DO DO DO DE DA DA DA
  • IS ALL I WANT TO SAY TO YOU
  • /(DAEO.)/ will match the first line
  • /(DAEO)(.DAEO) \2 \2\s \1 (.DAEO) \3 \3/
    matches more specifically
Write a Comment
User Comments (0)
About PowerShow.com