Title: Using OGSADAI in a commercial environment
 1Using OGSA-DAI in a commercial environment Terry
 Sloan EPCC Telephone 44 131 650 5155 
Email tsloan_at_epcc.ed.ac.uk 
 2Overview
- FirstDIG 
- INWA 
- Outstanding issues raised by these projects 
3First Data Investigation on the Grid 
FirstDIG http//www.epcc.ed.ac.uk/firstdig/  
 4Motivation
- Few UK e-Science projects involve service 
 companies such as First plc
- First plc 
- Operate worldwide in variety of transport sectors 
- Over 10000 vehicles in the UK, 23 of the market 
- UKs largest operator 
- The challenge for First 
- Meeting the needs of the travelling public whilst 
 making money
- Data integration and mining may assist but huge 
 range of fragmented data sources
5Data Sources in the Bus Industry
- Many different kinds of data involved with 
 running a bus company
- Mileage, revenue, customer contact, schedule, 
 fuel consumption, vehicle maintenance, routes
- Many means to collect data 
- Manually entered data at depot 
- Data collected on buses from ticket machines 
- Data collected on buses from GPS systems 
- GPS system notes when bus passes through a 
 predefined footprint and records the time at
 which this happens
6Answering Business Questions
- Want to combine data from more than one source 
- Complaints versus Lateness 
- Revenue versus Lost Miles 
- Complaints versus Lost Miles 
- Want data aggregated in some way 
- By Service 
- By Day 
- Want to consider subsets of the data 
- e.g. weekdays only
7Disparate Databases
- Data is typically stored in disparate databases 
- Various reasons for this Incremental 
 construction of systems.
- Not a problem for day-to-day running and querying 
 but
- Introduces challenges for Data Analysis 
- Systems introduced at different times 
- Different database engines 
- Different front-ends 
- Different operating systems 
- Different physical locations 
- Different ways of representing data 
- These issues are NOT unique to buses 
8OGSA-DAI
- OGSA-DAI 
- Open Grid Services Architecture  Data Access and 
 Integration
- Potentially provides a solution 
- Need business users to make transition from 
 science to commerce
- Grid middleware 
- Assists with the access and integration of data 
 from separate data sources via the Grid
- Represents databases as Grid Services 
- Enables access from other machines in a secure 
 manner
9FirstDIG Achievements
- Deployment at First South Yorkshire 
- Combined two databases to answer real business 
 questions
- The Customer Contact System 
- Microsoft Access 
- Information on customer complaints e.g. time, 
 service, nature
- The Mileage database 
- dBASE IV 
- Information on bus mileage e.g. lost miles 
- Produced generic Grid Data Service Browser 
- SQL access including joins across the databases 
10First Grid Data Service Browser 
 11Informing Business  Regional Policy 
Grid-enabled fusion of global data  local 
knowledge INWA http//www.epcc.ed.ac.uk/inwa/  
 12INWA
- An e-Social Science demonstrator 
- Demonstrates how grid technologies can improve 
 business
- Combining private and public data sources 
- Finance and Telecommunications 
- Uses many grid technologies 
- TOG from Sun DCG provides access to remote HPC 
 resource
- OGSA-DAI provides access control and discovery of 
 distributed heterogeneous data resources
- FirstDIG grid data service browser provides SQL 
 access to OGSA-DAI enabled resources
- Globus Toolkit 2 and 3 
13INWA Grid Infrastructure
User_at_Curtin
User_at_Edinburgh
FirstDIG
FirstDIG
Grid Engine
Bank
Telco
TOG
Globus Grid
Curtin
Bank data
Telco data 
 14References
- EPCC 
- http//www.epcc.ed.ac.uk/ 
- FirstDIG 
- http//www.epcc.ed.ac.uk/firstdig/ 
- OGSA-DAI 
- http//www.ogsadai.org.uk 
- INWA 
- http//www.epcc.ed.ac.uk/inwa 
- Sun Data  Compute Grids 
- http//www.epcc.ed.ac.uk/sungrid/ 
- Transfer-queue Over Globus (TOG) 
-  http//gridengine.sunsource.net/project/gridengin
 e/tog.html
15Outstanding issues raised by FirstDIG  INWA 
 16Outstanding IssuesUsability 
- OGSA-DAI is middleware, client toolkit helps 
- Incorporation of demo First browser helpfulish 
- But really want  
- Interfaces to real data analysis  dbms packages 
 eg SPSS
- Otherwise users could end up building 
 applications that replicate these eg the First
 Grid Data Service Browser
- Want to be able to point Access, Excel, etc at a 
 grid data source and examine it
17Outstanding issuesData
- CSV (Comma separated value) data sources 
- are common but current JDBC-ODBC drivers do not 
 have sufficient functionality (NOT an OGSA-DAI
 issue per se)
- No support for BIT type field 
- And others eg BOOLEAN, BINARY, etc 
- Certain characters (eg , gt) are not handled by 
 the OGSA-DAI XML parser
- Company names often have  in them 
- Dates from certain sources not handled properly 
- First Grid Data Service has to handle this 
 internally
18Outstanding issuesMiscellaneous
- Security 
- Rolemap file is not encrypted 
- If one GDS accesses another GDS the user security 
 credentials are not passed on so it does not work
- Installation  Testing 
- Install  Set-up 
- Well-explained but still a fair amount of user 
 effort involved
- Lack of an example OGSA-DAI site to point at to 
 test that your OGSA-DAI installation works
19Outstanding IssuesMiscellaneous
- Installation  Testing 
- Lack of an example OGSA-DAI site to point at to 
 test that your OGSA-DAI installation works
- Large results sets 
- Can increase JVM size but this is not scalable 
- This occurred on most datasets 
- Integration 
- DQP is a start .(Linux, OQL) 
- Why use OGSA-DAI ? 
- Easysoft etc 
- http//www.easysoft.com/products/2001/main.phtml
20Why use OGSA-DAI ?
a RDBMS engine that appears to client apps as a 
fully conformant ODBC 3.5 data source.can be 
used to provide real-time, heterogeneous access 
to multiple target data sources.