Java Development for HLT - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Java Development for HLT

Description:

TvGuide = Evolutionary Re-design. Application. Artefact. Facility. Software. Application ... Encapsulation for TvGuide. Application. Components. Framelets ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 44
Provided by: RQ
Category:

less

Transcript and Presenter's Notes

Title: Java Development for HLT


1
Java Development for HLT
  • Lars Degerstedt
  • Linköping university, IDA
  • larde_at_ida.liu.se
  • Towards available and useful NLP software

2
This Lecture
  • 1st hour - Course introduction
  • purpose
  • motivation
  • course overview
  • relevance for NLP
  • 2nd hour experiences of NLPLAB

3
Aim of the Course
  • Use of the Java platform for NLP
  • Experience from software design
  • Experience of mainstream techniques

How can basic NLP research lead to products?
4
Trad. NLP(LAB) Results
  • Larger projects slower growth/person
  • Conflict between results
  • paper or code?
  • Subexpert vs. holistic view
  • the GUI is not important
  • Code is (at best) stable but not mature. Even
    less useful.

5
Weak points of NLP Software
  • Closed architecture
  • Weak on software methodology
  • No differentiation of users
  • Difficult to use
  • Difficult to integrate
  • Unclear in functionality
  • No reuse
  • Weak maintenance
  • Imposes new formalisms
  • A lot of bugs...

6
Weak points of NLP Development
  • Waterfall methods
  • Little real usage during development
  • think-a-year then code-a-week
  • prolog, lisp, java
  • No research value
  • Large projects
  • Subspecialists
  • Lack of programmers

7
So, What are the Solutions?
  • Use commercially available technique
  • what can we learn from industry?
  • Global cooperation on code-level
  • join mainstream technology?
  • Adjust our working methods
  • how do we better interact with society/real
    usage?

8
Selected Course Topics
  • Lecture 2 - Java
  • Language and platform
  • Lecture 3 object-oriented design
  • Basic concepts/techniques
  • Lecture 4 design patterns
  • Extremely useful architectural techniques

9
History of OO-related Concepts (My View)
Component Systems
Iterative Development/ software evolution
Prog. in the large
System architecture
Operating system design/ scripts
Components
Subsystems/ modules
OO Frameworks
Design patterns
Objects
Object-oriented design
Web-centered Development/ Open Source
Interfaces/APIs
Contracts
Protocols
Code-level design
Formal specification
Idioms
High-level languages
Declarative languages
Extensive free libraries
Prog. in the small
Time of creation
60s
70s
80s
90s
This is just a sketch!
10
What is Java?
  • In short C syntax, byte code,
    platform-neutral
  • High-level platform
  • Unix/C is more low level
  • Easy access beyond the desktop...
  • Sub-platforms J2SE, J2EE, J2ME, JINI
  • Buzz security, connectivity, heterogenuous,
    multimedia, Swing, XML, beans, distributed

11
Why use Java for NLP?
  • HL-quality information available.
  • Mature community free code!
  • Rewrite for Java - not C
  • Utilities sound, 3d graphics, xml
  • Integration with industry.
  • Joining the OO-movement.

12
What is software development?
One Project View
Evolutionary Process View
Project
Project
Analysis
Specification
Design
Project
Implementation
Evaluation
Testing
Project
Project
Development in the Small
Development in the Large
13
What is Design? (Part 1 The Ws)
  • What software units, ui, interaction, language
  • When role in dev. model, time constraints
  • Who (by/for) product-design, linguistics,
    hackers
  • Where organization, legasy, single/multiple
    project
  • Why internal/external readers/publication

14
What is Design? (Part 2 Definition)
  • Theory of something (not everything)
  • Design is sold (not proven)
  • Defines the system, rather than realizes it
  • Partitioning of the system
  • Contracts for the interaction between the parts
  • Design phase result a specification. E.g.
  • Interfaces/APIs (with comments)
  • documents
  • Conceptual prototype
  • Intertwined concepts architecture, development,
    requirements analysis/capture, implementation

15
What is Object-Oriented Design?
Design in the Small
  • Object-Oriented Modeling
  • Finding the objects
  • Domain and artefact models problem vs domain
  • Taxonomy and aggregation
  • Real-life mapping/customer satisfaction ui
    prototypes and scenarios
  • Object Interface Design
  • Abstract Data Types (ADTs) dataop,
    information-hiding, ...
  • Object Roles information, system, passive,
    active, ...
  • Object as Machines statemethod, orthogonal
    methods,...

16
What are Design Patterns?
Design in the Middle
  • Micro architecture - abstract designs.
  • not a concept - a catalogue!
  • Useful reuse of successful design.
  • Used abstracts from experience.
  • Usable includes coding details.

17
Why Design Patterns for NLP?
Design patterns are truly useful!!
  • Fill a gap between library modules and system
    architectures.
  • Patterns are open-ended, not straight-jackets.
  • Codify the (oo) design expertice.
  • Open question How do NLP design patterns look
    like?

18
This is a Project Course!
Use your own code - write code you would want to
use.
  • Not a basic programming course.
  • Creative ideas but concrete results.
  • Write reusable (generic enough) code try to
    reuse when possible

19
Course Examination
  • Individual examination
  • cooperation is encouraged.
  • Two parts term paper and project
  • Term paper (1/3) 75 hours
  • Project (2/3) 125 hours
  • Metrics for finished project
  • Two iterations (with deliverables)
  • Well-designed code (document how/why)

20
Course Literature
  • Recommended readings
  • See the course pm at GSLT course page
  • Recommended book to buy
  • Erich Gamma et al. Design Patterns Elements of
    Reusable Object-Oriented Software, Addison-Wesley
    1994
  • Further readings
  • Stefan Sigfried, Understanding OO Software
    Engineering, IEEE Press 1996
  • Clemens Szyperski, Component Software Beyound
    Object-oriented Programming, Addison-Wesley 2001.
  • www.javaworld.com

21
Related NLP activities
  • nlpFarm and openNLP
  • NLP OSS development and platform
  • GATE 2
  • tool-box for NLP processing
  • SVENSK
  • Swedish NLP platform
  • NLSR
  • NLP software registry (DFKI)

22
nlpFarm
  • An OSS Java-software at SourceForge.
  • Farmstead mission
  • A place where early research prototypes
  • evolve into robust and useful open source.
  • practical work towards useful things
  • Global/Scandinavian cooperation?
  • Will nlpFarm work? It is an OpenNLP experiment
    sponsored by Vinnova

23
SVENSK
  • Language processing tool-box for Swedish.
  • Reuse of existing NLP components.
  • Based on the GATE architecture.
  • Its successor Kaba for information access and
    refinement only.

24
GATE 2
  • GATE document manager, gui, components.
  • Installed at gt 250 sites.
  • GATE 2 rewritten in Java
  • A platform for Language engineering.
  • Broad range of packages
  • gate.sgml, gate.swing, gate.email

25
This Lecture (2nd Hour)
  • 1st hour - Course introduction
  • 2nd hour Experiences of NLPLAB
  • Evolutionary Process Model
  • Iterative Method
  • BirdQuest
  • TvGuide
  • nlpFarm

26
NLPLAB Projects of Today
TvGuide
BirdQuest
TvGuide
App
Quaks, JavaChart, PGP,...
QUAC, DM, FS, TGEN, Guidia,...
MOLINC, FUNs, JavaChart, ...
Facility
2 persons
5 persons
4 persons
Iterative, incremental with free evolution
(mixing bottom-up and top-down design)
27
Evolutionary Process Model for NLP/LE
Application Artefacts
p/n
p/n
p
Artefact Construction Theory
n
Language Modeling
p
n
p/n
p/n
Facility artefacts
p possibilties n needs
Multi-dimensional approach to NLP/LE development
avoid one sided approaches!
28
Issues in Evolutionary Design
  • Iterative and Incremental Design
  • Robust for change formal revisions
  • Refactorings
  • Respect of Legacy both theory and code
  • ...but dont be a slave under it!
  • Free evolution of design
  • Mixed bottom-up and top-down design
  • Multiple-project approach
  • Use feedback (both pos. and neg.) seriously!
  • It is a bumpy ride!
  • ...sometimes improvements make it worse!

29
Application-Driven Dialogue System Development
30
Two Problems
  • Too much time is devoted to discussions on
    features of the system that are interesting but
    often rare and hard to realise
  • it is not easy to subdivide the work with design
    and implementation into manageable pieces when
    developing a dialogue system.

How does the incremental evolution path of a
dialogue application look like?
31
Development space for DM
DM Framework Customisation
DM Capabilities
Tools
Sub-dialogue control
Framework templates
History
Code patterns
Atomic request handling
DM Design
Knowledge representation
Modularisation
Interfaces
32
BirdQuest Two GUIs Phase-Based Design
  • Bird encyclopaedia
  • Corpus with user questions
  • Dialogue systems framework

33
Client-Server Design for BirdQuest
Application
UI Layout
Bird Database
UI Feature Code
JDBC
Server Code
Server Code
JDBC
HTTP
Browser
Web Server
RMI
Server Code
RMI Server
UI Layout
Web Servlet (UI Feature Code)
34
Phase-based NLP of BirdQuest
35
TvGuide Evolutionary Re-design
System Develop- Ment (round 1)
Application Artefact
Facility Software
Evaluation Dialogue Model Re-design
Application Artefact
Facility Software
System Develop-ment (round 2)
36
Encapsulation for TvGuide
(non-strict) Layering
Subsystems
Application
Components
?
Framelets/Tools
Libraries
KR?
37
Summing up Two Basic Design Dimensions
Splits the problem!
Problem Division Horizontal Design
Parsing
Access
Generate
Sign of Success High cohesion and Low
coupling between modules
Agenda
List
Array
Abstraction/Layering Vertical Design
Creates a language!
38
http//nlpfarm.sourceforge.net
  • Public web resource with open source
  • A place to work
  • Cooperation over time and place
  • Development support
  • Mostly facility software
  • Formal release system
  • Towards robust and useful code
  • Link between research and industrial products

39
Experiences from nlpFarm - Method
  • Separate application from facility
  • Different structure and methods
  • Interdependent artefacts needs and possibilites
  • Variation of evolutionary approach, e.g.
  • Bottom-up vs top-down
  • Theory vs code
  • Background of personel / type of result
  • Discriminate beginners from experts
  • Newbies have creative eyes of a child
  • Experts should focus on hidden continuity work
  • Software experts should make the overall design
  • Dont work alone find feedback

40
Experiences from nlpFarm - Design
Facility Software
Library modules
Framelets
1. Non-strict Layering 2. Work bottom-up with
real applications 3. Add code only 4. Design
patterns in kernel 5. Inheritance/taxonomy
in external layer
Kernel
External API package
2nd Layer
Kernel Packages
...?
Application Artefact
Old Applications
1. Method important 2. Focus on the possible 3.
Look at the whole 4. Avoid duplication 5. Reuse
from Legacy
New Application
Facility Software
41
Experience from nlpFarm - Implementation
  • Inter-project conventions are hard to follow
  • Code conventions important for continuity
  • Project build support saves time improves result
  • Version management hard with beginners code
  • Automatic testing is important
  • Context-independent unit-tests for facility
    software
  • System-tests for applications with support for
    incremental evolution
  • Code quality is generally low and programming is
    time-consuming
  • Stay focused and make existing solutions a
    little better
  • There is no script-layer where everything
    becomes easy
  • Software construction is inherently creative
    where every problem is unique dont kid
    yourself!

42
Experiences from nlpFarm - Community
  • Too early?
  • Not all can be users or script fillers
  • Kernel of developers must exist (gt 3?)
  • Projects/community are not important, but results
    are
  • Are linguists like programmers?
  • Will the Open Source/free software manifesto work
    outside Hackerdale?
  • Willingness to engage in the e-society for its
    own sake
  • What is the modern (90s) evolutionary society
    vision of NLP?
  • OpenNLP needs a vision like GNU, but still lacks
    one...
  • A talking thinking computer? Hm,...?

43
Summing Up
  • The Java the language for NLP?
  • It has kept its promise so far!
  • Java 1.5 is coming...
  • Higher-orderness/meta-programming is still a
    problem
  • The Java platform for NLP?
  • Better than promised in many ways
  • Example of well-handled software evolution
  • Many elegant designs
  • Still Open Knowledge Representation and
    Mainstream Technology?
  • XML in Java shows both possibilities and problems
  • XML is a format at a low layer in the formalism
    stack!
  • XML as a script-language, e.g. the build-tool Ant
    shows the way?
  • W3C is an example of evolution of representation
    formats...
Write a Comment
User Comments (0)
About PowerShow.com