Title: Dickson K.W. Chiu
1A Script Language for Generating Internet-bots
- Dickson K.W. Chiu
- Department of Computer Science Engineering
- Chinese University of Hong Kong
- Shatin, Hong Kong
- kwchiu_at_ieee.org
2Agenda
- Introduction and Motivation
- WebScript Operating Environment
- The WebScript Language
- Example Application
- Conclusion and Further Work
3Motivation of WebScript
- Most web services and agents available only
through manual web pages (e.g. online ordering) - Need human attention
- Long delay impairing E-commerce
- Motivated by scripts in terminal emulation
programs (Telix/Procomm) - Generic tool for automating web interactions
- Also useful for casual end-users e.g. get stock
price into personal db from a web page
4WebScript Features
- Minimal core language
- Complete set of primitive for responding to html
forms - Information extraction from web pages based on
pattern-matching - Interfacing to back-end databases or storing data
to files - Raising exceptions and alerting
5Architecture Operating Environment
- Translator (instead of interpreter)
- Perl or Java target language
- Stand-alone utility at client
- Server-side utility - script supplied by client
or from repository - Translator service for thin clients (translated
code executed at client-side) - Programming productivity tool
- Part of complex information system
6The WebScript Language Mechanism
- Based on HTML features
- Automate HTTP messages
- Simulates a user browsing a target web page,
entering information and pressing buttons - Carry out delegated actions and/or extract
relevant information from pages
7Basic Language Constructs
- Variables / Parameters - String type and
structured type - Structured type based on db table / class
definition - Simple control flow primitives
- Perl expression and functions
- Subroutines
8Interfacing Information Extraction
- Connect to db (ODBC, MySQL, Postgres)
- Send db statements (SQL) to obtain results (with
cursor) - Insert a tuple / object from a structured
variable - Download URL for processing
- Save to file or as objects in host db
- Extract information by matching regular
expressions
9HTML Form Dialogue
- From script variables and expressions, fill in
fields, select check-box / radio-buttons / pop-up
list, etc. - Press Buttons
10Example 1 Database driven script for checking
Registration Price
11Example 1 Database driven script for checking
Registration Price
- While (r.Reg_name ltgt NULL)
- / while theres another registrar /
- Checkpoint 1
- URL r.URL
- Expect title r.title Raise page_changed
- Extract first like 0-9\.?0-9 after
- r.pattern1 before r.pattern2 to newprice
- If (newprice NULL)
- Raise page_changed
- If newprice ltgt h.price
- Dbcommand h
- update registrars set date_changed
- curdate(), pricenewprice
- DBcontinue q1 result r / get next registrar
/ -
- Dbdisconnect h
- Return
- Webscript CheckDomainReg
- DBconnect (h, MySQL, localhost,
- OrderClerk, pwd, services)
- / registrars in a table in the RDBMS /
- Declare r h.registrars
- Declare newprice
- DBcommand h select from registrars
- result r continue q1
- Timeout 5000
- After retry 5 Raise
- On error retry Checkpoint 1
12Example 2 Online Domain Name Registration
Expect page available raise
domain_not_available Form 1 post
https//www.nicreg.com/cgi-bin/registrate.cgi
Fillform Dname default Button
Register On error retry no Form 1 post
https//www.nicreg.com/cgi-bin/autoccreg.cgi
Fillform Company order_form.company
Fillform Address order_form.address ...
Button Submit for Processing Expect page
Successful Registration DBdisconnect
h Return
- Webscript regdomainame (n string)
- / input form number /
- DBconnect (h, MySQL, localhost,
- OrderClerk, pwd, services)
- / order_form in a table in the RDBMS /
- Declare o h.order_form
- DBcommand h select from order_form where
order_numn result o - Timeout 5000
- After retry 5 raise
- On error retry Checkpoint 1
- Checkpoint 1
- URL http//www.niceg.com/registrate.html
- Expect title regist
- Form 1 post https//www.nicreg.com/cgi-bin/domain
_search.cgi - Fillform Dname order_form.domain
- Button Search
13Concluding Remarks
- Part of the ADOME-WFMS project
- Flexible script language Webscript for generating
Internet-bots - Simple, application oriented
- Tailor-made primitives for db access, web
dialogue and exception handling - Suitable for E-commerce environment
- Easier to develop, understand, debug and maintain
14Further Work
- Scripting with XML pages
- Java code generation
- Script development tools
- Recording, monitoring and debugging tools
- Displaying form and db fields for drag-and-drop
- Script development methodology