Getting Started with ANTLR - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Getting Started with ANTLR

Description:

... input fits into memory and, as a result, can buffer all characters in memory ... to character strings in the buffer rather than creating String objects for ... – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 13
Provided by: columbusst
Category:

less

Transcript and Presenter's Notes

Title: Getting Started with ANTLR


1
Getting Started with ANTLR
  • Chapter 1

2
Domain Specific Languages
  • DSLs are high-level languages designed for
    specific tasks
  • DSLs include data formats, configuration file
    formats, text-processing languages,
  • DSLs make their users effective in a specific
    domain

3
The Big Picture
  • Translators map input sentences to output
    sentences
  • Translators have to recognize many different
    sentences
  • We break recognition into two similar but
    distinct tasks lexical analysis and parsing

4
Lexical Analysis
  • Lexical analysis consists of reading the input
    stream, character by character.
  • Characters are combined and output as tokens
  • if (x gt 312) system.out.println(Hi)
  • Tokens if, (, x,WS, gt,WS, 312, ),,
    system.out.println, (,Hi, ), ,

5
Lexical Analysis
  • Tokens carry additional information in addtion to
    the characters they represent
  • ANTLR generates a lexical analyser, a Lexer,
    based on an input grammar it is provided
  • We will be building grammars and having ANTLR
    generate the lexer code for us

6
Parsing
  • Parsing consists of reading tokens and trying to
    organize them into a valid sentence in the
    language
  • The parser can generate output immediately based
    on the sentences it recognizes or preserve the
    structure in the form of an abstract syntax tree
    (AST) which can be further processed

7
Translation Data Flow
Tokens
Output
Characters
Lexer
Parser
AST
Tree Walker
8
Finally
  • An emitter can take the output of the parser and
    emit output based on all computations of the
    previous phases
  • Emitter can use templates (documents with holes)
    that can be filled in
  • ANTLR uses the StringTemplate engine to make it
    easier to build emitters

9
Characters, Tokens, ASTs
  • Lexers consume characters from a CharStream such
    as ANTLRStream or ANTLRFileStream
  • These streams assume that the entire input fits
    into memory and, as a result, can buffer all
    characters in memory
  • Tokens point directly to character strings in the
    buffer rather than creating String objects for
    each token

10
Characters, Tokens, ASTs
Characters (CharStream)
... W I D T H 2 0 0
\ n
tokens (Token)
ID WS WS INT WS
x x x

AST (CommonTree)
INT
ID
11
Characters, Tokens, ASTs
  • AST nodes point at token objects rather than
    copying token data into a tree node
  • CommonTree is a predefined node containing a
    Token payload.
  • The type of an ANTLR AST node is treated as
    Object so there are no restrictions on tree data
    types

12
A-mazing Analogy
  • Book focuses on the discovery of the implicit
    tree structure in input sentences and the
    generation of structured text
  • A maze can be thought of as a language
    recognizer. Imagine a maze with words written on
    the floor
  • Any sentence that guides you from the entrance to
    the exit is valid
Write a Comment
User Comments (0)
About PowerShow.com