Layered Combinator Parsers with a Unique State - PowerPoint PPT Presentation

About This Presentation
Title:

Layered Combinator Parsers with a Unique State

Description:

use an ordinary parse function. generateOffsideToken ... what are we parsing: stack of contexts ... syntax specified by parse functions. grammar is not a datastructure ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 29
Provided by: Koop7
Category:

less

Transcript and Presenter's Notes

Title: Layered Combinator Parsers with a Unique State


1
Layered Combinator Parsers with a Unique State
  • Pieter KoopmanRinus PlasmeijerNijmegen, The
    Netherlands

2
Overview
  • conventional parser combinators
  • requirements new combinators
  • system-architecture
  • new parser combinators
  • separate scanner and parser
  • error handling

3
parser combinators
  • Non deterministic, list of results
  • Parser s r s -gt ParseResult s r
    ParseResult s r (s,r)
  • fail yield
  • fail \ss yield r \ss (ss,r)
  • recognize symbol
  • satisfy (s-gtBool) -gt Parser s ssatisfy f
    pwhere p sss f s (ss,s) p _
  • symbol sym satisfy (() sym)

4
parser combinators 2
  • sequence-combinators
  • (ltgt) infixr 6(Parser s r)(r-gtParser s
    t)-gtParser s t(ltgt) p1 p2 \ss1 tuple
    \\ (ss2,r1) lt- p1 ss1
    , tuple lt- p2 r1 ss2
  • (ltgt)infixl 6(Parser s(r-gtt))(Parser s
    r)-gtParser s t
  • (ltgt) p1 p2 \ss1 (ss3,f r)
  • \\ (ss2,f) lt- p1 ss1
  • , (ss3,r) lt- p2 ss2
  • choose-combinator
  • (ltgt) infixr 4(Parser s r) (Parser s
    r)-gtParser s r(ltgt) p1 p2 \ss p1 ss p2
    ss

5
parser combinators 3
  • some useful abbreviations
  • (_at_gt) infixr 7(_at_gt) f p yield f ltgt p
  • (ltgt) infixl 6(ltgt) p1 p2 (\h tht) _at_gt p1
    ltgt p2

6
parser combinators 4
  • Kleene star
  • star p p ltgt star p
  • ltgt yield
  • plus p p ltgt star p
  • parsing an identifier
  • identifier Parser Char String
  • identifier toString _at_gt
    satisfy isAlpha
  • ltgt star (satisfy isAlphanum)

7
parser combinators 5
  • context sensitive parsers
  • twice the same character
  • doubleChar satisfy isAlpha ltgt \c -gt symbol c
  • arbitrary look ahead
  • lookAhead
  • symbol 'a' gt symbol 'b' ltgt symbol
    'a' gt symbol 'c'

8
parser combinators 5
  • context sensitive parsers
  • twice the same character
  • doubleChar satisfy isAlpha ltgt \c -gt symbol c
  • arbitrary look ahead
  • lookAhead
  • symbol 'a' gt symbol 'b' ltgt symbol
    'a' gt symbol 'c' ltgt star (satisfy isSpace)
    gt symbol 'a' ltgt symbol 'x'

9
properties of combinators
  • concise and clear parsers
  • full power of fpl available
  • context sensitive
  • arbitrary look-ahead
  • can be efficient, continuations IFL '98
  • - no error handling (messages recovery)
  • - no unique symbol tables
  • - separate scanner yields problems
  • scan entire file before parser starts

10
Requirements
  • parse state with
  • error file
  • notion of position
  • user-defined extension e.g. symbol table
  • possibility to add separate scanner
  • efficient implementation, continuations
  • for programming languages we want a single result
    (deterministic grammar)

11
Uniqueness
  • files and windows that should be single-threaded
    are unique in Clean
  • fwritec Char File -gt File
  • data-structures can be updated destructively when
    they are unique
  • only unique arrays can be changed

12
System-architecture
  • replace the list of symbols by a structure
    containing
  • actual input
  • position
  • error administration
  • user defined part of the state
  • use a type constructor class to allow multiple
    levels

13
Type constructor class
  • Reading a symbol
  • class PSread ps s st (ps s st)-gt(s, ps s
    st)
  • Copying the state is not allowed,use functions
    to manipulate the input
  • class PSsplit ps s st (s, ps s st)-gt(s, ps
    s st)
  • class PSback ps s st (s, ps s st)-gt(s, ps
    s st)
  • class PSclear ps s st (s, ps s st)-gt(s, ps
    s st)
  • Minimal parser state requires Clean 2.0
  • class ParserState ps symbol state
  • PSread, PSsplit, PSback, PSclear ps symbol
    state

14
New parser combinators
  • Parsers have three arguments
  • success-continuation determines action upon
    success
  • SuccCont Item failCont State -gt (Result,
    State)
  • fail-continuation specifies what to do if parser
    fails
  • FailCont State -gt (Result, State)
  • current input state
  • State (Symbol, ParserState)

15
New parser combinators 2
  • yield and fail, apply appropriate continuation
  • yield r \succ fail tuple succ r fail tuple
  • failComb \succ fail tuple fail tuple
  • sequence of parsers, change continuation
  • ltgt p1 p2 \sc fc t -gt p1 (\a _ -gt p2 a sc fc)
    fc t
  • choice, change continuations
  • (ltgt) p1 p2 \succ fail tuple p1 (\r f t
    succ r fail (PSclear t)) (\t2 p2 succ
    fail (PSback t2)) (PSsplit tuple)

16
string input
  • a very simple instance of ParserState
  • StringInput symbol state si_string
    String // string holds input ,
    si_pos Int // index of current char
    , si_hist Int // to remember
    old positions , si_state state
    // user-defined extension , si_error
    ErrorState
  • instance PSread StringInput Char statewhere
    PSread sisi_string,si_pos
    (si_string.si_pos,si si_pos si_pos1)
  • instance PSsplit StringInput Char statewhere
    PSsplit (c,sisi_pos,si_hist) (c,si
    si_hist si_possi_hist)
  • instance PSback StringInput Char statewhere
    PSback (_,sisi_string,si_histht)
    (si_string.h-1,si si_pos h, si_hist
    t)

17
Separate scanner and parser
  • sometimes it is convenient to have a separate
    scannere.g. to implement the offside rule
  • task of scanner and parser is similar.So, use
    the same combinators
  • due to the type constructor class we can nest
    parser states

18
a simple scanner
  • use of combinators doesnt change
  • produces tokens (algebraic datatype)
  • scanner skipSpace gt ( generateOffsideToke
    n ltgt satisfy isAlpha ltgt star (satisfy
    isAlphanum) lt_at_
    testReserved o toString ltgt plus (satisfy
    isDigit)lt_at_ IntToken o to_number 0 ltgt symbol
    '' lt_at_ K EqualToken ltgt symbol '('
    lt_at_ K OpenToken ltgt symbol ')'
    lt_at_ K CloseToken )

19
generating offside tokens
  • use an ordinary parse function
  • generateOffsideToken
  • pAcc getCol ltgt \col -gt // get current
    coloumn
  • pAcc getOffside ltgt \os_col -gt // get offside
    position
  • handleOS col os_col
  • where
  • handleOS col os_col
  • EndGroupGenerated os_col
  • col lt os_col
  • pApp popOffside (yield
    EndOfGroupToken)
  • pApp ClearEndGroup failComb
  • col lt os_col
  • pApp SetEndGroup (yield
    EndOfDefToken)
  • failComb

20
Parser state for nesting
  • parser state contains scanner and its state
  • NestedInput token state
  • E. .ps sym scanState
  • ni_scanSt (ps sym scanState)
  • , ni_scanner (ps sym scanState) -gt
  • (token, ps sym scanState))
  • , ni_buffer token
  • , ni_history token
  • , ni_state state
  • can be nested to any depth
  • we can, but doesnt have to, use this

21
Parser state for nesting 2

NestedInput
File
ScanState
ErrorState
OffsideState
scanner
HashTable
22
Parser state for nesting 3
  • apply scanner to read token
  • instance PSread NestedState token state
  • where
  • PSread nsns_scanner, ns_scanSt
  • (tok, state) ns_scanner ns_scanSt
  • (tok, ns ns_scanSt state)
  • here, we ignored the buffer
  • define instances for other functions in class
    ParserState

23
error handling
  • general error correction is difficult
  • correct simple errors
  • skip to new definition otherwise
  • Good error messages
  • location position in file
  • what are we parsing stack of contexts
  • Error t.icl,20,caseAlt,Expression )
    expected instead of

24
error handling 2
  • basic error generation
  • parseError expected val \succ fail (t,ps)
    let msg toString expected "
    expected instead of " toString t in
    succ val fail (PSerror msg (PSread ps))
  • useful primitives
  • wantSymbol sym symbol sym ltgt parseError sym
    sym
  • want p msg value p ltgt parseError msg value
  • skipToSymbol sym symbol sym ltgt
    parseError sym sym gt star (satisfy ((ltgt)
    sym)) gt symbol sym

25
Parser
  • Parsing expressions
  • pExpression "Expression" gt BV _at_gt
    match mBasicValue ltgt pIdentifier ltgt
    symbol CaseToken gt pDeter Case
    _at_gt pCompoundExpression lt wantSymbol
    OfToken ltgt star pCaseAlt lt
    skipToSymbol EndOfGroupToken
  • ltgt symbol OpenToken gt pCompoundExpression
    lt wantSymbol CloseToken

26
identifiers in hashtable
  • use a parse-function
  • hashtable is user defined state in ParserState
  • pIdentifier
  • match mIdentToken
  • ltgt \ident pAccSt (putNameInHashTable ident)
  • lt_at_ \nameapp_symbUnknownSymbol name,
    app_args
  • the function pAccSt applies a function to the
    user defined state

27
limitations of this approach
  • syntax specified by parse functions
  • grammar is not a datastructure
  • no detection of left recursionruntime error
    instead of nice message
  • no automatic left-factoringdo it by hand, or
    runtime overheadp1 p ltgt q1 ltgt p ltgt q2p2
    p ltgt (q1 ltgt q2)

28
discussion
  • old advantages
  • concise, fpl-power, arbitrary look ahead, context
    sensitve
  • new advantages
  • unique and extendable parser state
  • one or more layers
  • decent error handling,simple error correction
    can be added
  • still efficient, overhead lt 2
  • non-determinism only when needed
Write a Comment
User Comments (0)
About PowerShow.com