Analyzing SQL embedded in COBOL and Java - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Analyzing SQL embedded in COBOL and Java

Description:

Desired results. Show relation between host- and embedded language ... Desired results: Metrics. First idea: reuse complexity ... results: ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 42
Provided by: win4
Category:

less

Transcript and Presenter's Notes

Title: Analyzing SQL embedded in COBOL and Java


1
Analyzing SQL embedded in COBOL and Java
  • Martin van der Vlist
  • Supervisor Mark van den Brand

2
Contents
  • Introduction
  • Background used techniques
  • Desired results
  • COBOL analysis results
  • Java analysis results
  • Work in progress

3
Introduction
  • Goal asses quality of embedded languages
  • Mean
  • Analyze SQL embedded in COBOL/Java
  • Output data in SQuAVisiT format
  • Reuse existing parsers for COBOL/Java?
  • Surprisingly not much research found
  • Software Improvement Group

Old programs never die, they even don't fade...
4
Contents
  • Introduction
  • Background used techniques
  • Desired results
  • COBOL analysis results
  • Java analysis results
  • Work in progress

5
Background Parsing
Architecture of traditional parser like ANTLR
and JavaCC
String x something String x something()
6
Background Parser comparison
  • Pico language used for comparison
  • Easy to parse language
  • Extended for embedded SQL
  • begin
  • declare
  • input natural,
  • output natural,
  • repnr natural,
  • rep natural
  • input 10
  • output 1
  • while input - 1 do
  • rep output
  • repnr input
  • while repnr - 1 do
  • output output rep
  • repnr repnr - 1
  • od
  • input input - 1
  • od
  • end

7
Background JavaCC
  • Generates traditional top-down parsers in Java
  • Includes tree builder and visitor generation
  • 'Special' features
  • Lexical states
  • Java actions
  • Mixed lexer/parser grammars allowed
  • Lookahead specification per production
  • Used in other LaQuSo tools

8
Background JavaCC embedded
  • No direct support for multiple languages
  • With a trick, multiple lexer/parser combinations
    can share input
  • Doesn't work with lookahead

//Added productions in the grammar void
SqlStatement() //the sql production is
opened ltSQL_BEGINgt sql() ltSQL_ENDgt void
sql() //this calls the java function
defined elsewhere delegate()
9
Background ANTLR
  • Also top-down parser
  • Multiple output languages (Java, C, C, )
  • Special features
  • No real separation between lexer and parser
  • Automatic lookahead per production
  • For non-recursive structures
  • Java actions
  • Modularity, provided that language is still in
    LL()

10
Background ANTLR embedded
  • Similar construct
  • Because java actions in Lexer are supported, no
    issues with lookahead

SQL EXEC SQL // create a new SQL
lexer/parser duo // that feeds off the
current input stream SqlLexer j new
SqlLexer(input) CommonTokenStream tokens new
CommonTokenStream(j) tokens.discardTokenType(Sq
lLexer.WS) SqlParser p new
SqlParser(tokens) // parse the embedded
language p.sqlStatement() // do something
with the generated output tablenames.addAll(p.ta
blenames) SQL_END END-EXEC
emit(Token.EOF_TOKEN)
11
Background SDF
  • Developed at CWI
  • Any context-free grammar supported
  • Scannerless Generalized LR parser
  • Modular approach
  • Strings etc can be reused in all languages
  • More than just a parser-generator

12
Background SDF embedded
  • Because of modular structure, just include the
    embedded grammar

module languages/PicoSql/PicoSql imports
languages/pico/syntax/Pico imports
languages/pico/syntax/comments imports
languages/tinySql/TinySql exports context-free
syntax "EXEC SQL" SelectStm "END-EXEC" -gt
STATEMENT "EXEC SQL" Stm "END-EXEC" -gt
STATEMENT context-free start-symbols PROGRAM
13
Background JavaCC for COBOL
  • SDF and ANTLR have better support for embedded
    languages
  • JavaCC still preferred
  • Several COBOL dialects
  • Working grammar for JavaCC available
  • Embedded languages not directly supported
  • Workaround use lexical states

14
Background JavaCC for COBOL
  • Using lexical states, analysis can be completely
    separated
  • Often used for comments / .... /
  • Maintenance issue copy/paste not desired
  • Solution recognize entire query as token
  • TOKEN
  • "EXEC SQL" IN_SQL_CODE
  • ltIN_SQL_CODEgt MORE
  • ltgt
  • ltIN_SQL_CODEgt TOKEN
  • "END-EXEC" DEFAULT

15
Background JavaCC for COBOL
  • Benefits of approach
  • Recovery after parse error in query
  • SQL parser is reusable

16
Background JavaCC Visitor
  • One of the recommended methods
  • Visitor.visit(AST t, Argu a) calls
    t.accept(this, a)
  • Node.accept(Visitor v, Argu a) calls
    v.visit(this, a)

17
Contents
  • Introduction
  • Background used techniques
  • Desired results
  • COBOL analysis results
  • Java analysis results
  • Work in progress

18
Desired results
  • Show relation between host- and embedded language
  • Dependency graph for tables
  • Complexity metrics for queries
  • Output compatible with SQuAVisiT

The evil is in the details
19
Desired results Metrics
  • First idea reuse complexity metrics used for
    COBOL
  • Not applicable Cyclomatic, if-count etc
  • Questionable Halstead, no proof
  • Other complexity metrics
  • Example number of tables, number of parameters
  • All are questionable no solid proof

20
Desired results interface SQuAViSiT
  • SQuAVisiT calls batchfile/executable per file
    with arguments
  • batchfile ltin_pathgt ltout_pathgt ltlog_pathgt
    ltmetrics_pathgt ltcalls_pathgt
  • Path is the other option
  • batchfile ltin_dirgt ltout_dirgt ltlog_pathgt
    ltmetrics_pathgt ltcalls_pathgt
  • All subdirectories should be analyzed by
    batchfile
  • Metrics can be read in CSV format
  • Call files have special format

21
Contents
  • Introduction
  • Background used techniques
  • Desired results
  • COBOL analysis results
  • Java analysis results
  • Work in progress

22
COBOL analysis
  • Parsing issues
  • Dynamic SQL
  • Case study results

Cobol isnt dead yet, it just smells funny
23
COBOL parsing issues
  • Issues with COBOL-parser architecture
  • SQL dialect
  • Performance
  • 25 minutes for case study initial
  • 2 minutes after change to file-list based
  • 158 minutes using multithreaded on P4 HT

24
COBOL Dynamic SQL
  • Even in COBOL, Strings can be executed as SQL
  • Partial analysis created, supporting
  • Some string functions
  • COBOL data structures
  • No flow graph analysis
  • Not used often in COBOL, test cases irrelevant

25
COBOL Dynamic SQL
  • Record example
  • Referenced as Q_TABLENAME OF TEST_QUERY in COBOL
  • Referenced as TEST_QUERY.Q_TABLENAME in SQL

01 TEST_QUERY. 03 FILLER PIC X(15) VALUE
"SELECT id FROM ". 03 Q_TABLENAME PIC X(15).
03 FILLER PIC X(6) VALUE "WHERE ". 03
Q_FIELDNAME PIC X(15). 03 FILLER PIC X(3)
VALUE " ". 03 Q_VALUE PIC X(15).
26
COBOL case study
  • Software of Insurance company
  • COBOL with PL/SQL embedded
  • 2816 files, 1603 files with queries
  • 1.62 Million lines of code

27
Case study dependencies
IVA100 table calls
All table calls
28
Case study dependencies Regular structures of
IVA100
29
Case study dependencies 3 Irregular structure
of FP101A
30
Case study dependencies 4 Matrix view
31
Case study metrics
32
Contents
  • Introduction
  • Background used techniques
  • Desired results
  • COBOL analysis results
  • Java analysis results
  • Work in progress

33
Java analysis
  • All queries are dynamic String argument to
    method
  • Statement.execute(String query)
  • Connection.prepare(String query)
  • Any Java operation possible to assemble query
  • Static Analysis for values is NP-hard
  • Solution pattern matching
  • Interpret basic String and String Builder
    operations
  • Interpret assignments
  • Regard integers as random values

34
Java analysis overview
  • Three analyzers using Visitor-pattern
  • Process overview
  • Store list of classes
  • Analyze dependencies
  • Analyze String values for relevant methods
  • If needed, do dependency analysis for referenced
    types
  • Output results to text files

35
Java storage of intermediate results
  • JavaCC generated parser creates AST
  • ClassNames visitor creates 2 HashMaps
  • Dependency Analyzer generates 'smart tree'Class
    definitions contain
  • search functions up and down
  • add/remove node functions
  • decision to analyze another file
  • Value Analyzer enriches 'smart tree'

36
Java storage of intermediate results
  • JavaCC generated parser creates AST
  • ClassNames visitor creates 2 HashMaps
  • Dependency Analyzer generates 'smart tree'Class
    definitions contain
  • Value Analyzer enriches 'smart tree'
  • Add ValueNode where appropriate
  • Add DecisionScopeNode at decisions
  • Delete DecisionScopeNode and merge results

37
Java results
  • Case study Compiere
  • GPL project
  • 0.7 million lines of code
  • Method calls found found 1972
  • With only one option 1555
  • Value only partially detected 395
  • With only one option 293
  • Discard reasons
  • Too many alternatives 3
  • No value known (members?) 164
  • Method return value used 260
  • Variable couldnt be located 32
  • Total alternative count 11.055
  • Parsed queries 5.427
  • 682 have unknown values as tablename
  • At least 86 unparsable queries are invalid
  • Parse errors mostly due to other SQL dialect

38
Contents
  • Introduction
  • Background used techniques
  • Desired results
  • COBOL analysis results
  • Java analysis results
  • Work in progress

39
Work in progress
  • Case study at Logica
  • A Java program with embedded SQL
  • Appointment at Monday
  • Improve Java analysis
  • Write thesis

40
W.I.P. improvements Java analysis
  • Make output readable for SQuAVisiT
  • Analyze class members (initialization)
  • Cleanup

41
Questions
Write a Comment
User Comments (0)
About PowerShow.com