Title: Achieving%20Adaptivity%20for%20OLAP-XML%20Federations
1Achieving Adaptivity for OLAP-XML Federations
- Torben Bach Pedersen
- Aalborg University
- Joint work with Dennis Pedersen, TARGIT
2Overview
- Background OLAP-XML federations
- New challenges
- XML data changes
- Slow or unreliable XML sources
- Schema changes in data sources
- Other challenges
- Integration in TARGIT architecture
- Other applications of the techniques
- Conclusion and future work
- Related work
3Data Warehousing OLAP
- Multidimensional analysis TARGIT Analysis
4OLAP
- Good for complex ad hoc queries
- Simple natural, graphical queries
- Fast pre-aggregation
- A number of problems with physical integration
- Short-term and varying data needs
- Population, product info, ...
- Dynamical data
- Stock quotes, competitor pricing, ...
- Data with limited access
- Competitor product info, public databases, ...
5OLAP-XML Federations
- Traditional OLAP architecture
6OLAP-XML Federations
- Logical integration of XML data
- External dimensions
- External measures
- Data combined at query time
Client
Federation
XML
Cube
7OLAP-XML Federations
- Logical integration of XML data
- External dimensions
- External measures
- Data combined at query time
- Transparent for users
- Flexible many XML sources
- Quick running in a few mins
- Data is always fresh
- Performance often comparable to physical
integration
8XPath Queries for Fetching XML
- ltBooksgt
- ltBookgt
- ltTitlegt1984lt/Titlegt
- ltAuthorgtOrwelllt/Authorgt
- lt/Bookgt
- ltBookgt
- ltTitlegtOf Mice and Menlt/Titlegt
- ltAuthorgtSteinbecklt/Authorgt
- lt/Bookgt
- lt/Booksgt
- /Books/BookAuthorSteinbeck/Title
XPath
Dimension value
9Old And New TARGIT Architecture
10New Challenges
- Our previous work focused on basic aspects
- Flexibility
- General performance
- Implementation
- New what can go wrong? need for adaptivity
- XML data changes
- XML sources slow or unreliable
- Schema changes (XML, OLAP, federation)
- We often have no control over the XML sources
- A solution has broad interest views over XML
sources
11XML Data Changes
- Basic federation
- XML data is integrated at query time gt XML data
changes handled automatically - However, XML data is cached for performance
- Cache timeout value ensures fresh data (set
manually or automatically) - 0 cache timeout gt always fetch from source
- Only few current XML databases inform about
changes - Xyleme allows users to subscribe to changes
- Only delta should be transferred
12ICE Information and Content Exchange
- Protocol proposed by W3C for automatically
informing about and requesting changes - Supported by major vendors
- Push subscribe to changes and keep cache
up-to-date - Pull request changes from source at query time
13Slow and Unreliable XML Sources
- Overload, maintenance, HW breakdown, attacks
- Often we no influence on this
- Incremental presentation for user
- What if source is too slow or no reply at all?
- Inform user that the system is not working?
- Specification of alternative sources
- Several queries per external dimension/measure
- Increased fault tolerance, also better performance
14Slow and Unreliable XML Sources
- Start several queries and use the fastest
- Always uses the fastest, but heavy load on
sources - Use first response time as indicator for total
time - Start one query at a time
- Minimal load on sources, but slower
15Slow and Unreliable XML Sources
- Alternative sources of lower quality better than
no data? - Alternatives
- Expired cache data
- Google, Xyleme, The WayBack Machine
- Backup-disk, tape
- Etc.
Source Speed Quality
Local cache Fastest Fresh
Original source Fast? Freshest
Expired cache Fastest Old
Backup source Fast/slow Very old
16Slow and Unreliable XML Sources
- In practice?
- Sources with equal priority chosen at random
17Result Algorithm for Fetching XML Data
18Experiments
- 1st experiment fetching a 137 KB dimension
- Start 8 queries, when first 3 respond, (cancel)
last 5, when fastest query finish, (cancel)
remaining 2 - Fast reply good indication of overall speed
- 2nd experiment search local cache, then Google
cache
19Schema Changes In XML Sources
- How to synchronize XML views after schema change?
(solution described in separate paper)
Bibliography
Bibliography
Book
Publisher
Publisher
Author
AName
Price
Book
PName
Author
Title
PName
Price
Title
AName
/Bibliography/AuthorANameOrwell/Book/Title
20Additional Challenges
- Changes to federation schema
- Cache may be invalidated
- Discard affected cache results (unproblematic)
- OLAP data changes
- Cache may be invalidated
- Less frequent than XML data changes gt cache will
often have expired anyway - OLAP schema changes
- Federated schema may be invalidated
- Rare and easy to detect (and correct)
21Integrating Techniques - Architecture
22Integrating Techniques Query Processing
- Query Evaluator splits query into XMLOLAP parts
and determines query plan based on cost - Execution Engine coordinates and executes plan
- Cache Manager maintains cache, e.g., through ICE
- XML Component interface fetches XML data, chooses
between available XML sources (Algorithm 1) - View Synchronizer handles schema changes
- Metadata Manager manages info about external
dimensions and measures XML component
characteristics
23Other Applications
- All XPath-based views on XML data
- Links to parts of XML documents
- Web pages
- Documents (DocBook)
- Software applications
- and many more
- Automatic recreation of broken links
- Increased fault tolerance and performance using
alternative sources
?
24Conclusion and Future Work
- Operational problems in OLAP-XML federations
- XML data changes
- Slow and unreliable XML sources
- Using several sources (Algorithm 1)
- Experiment with Algorithm1
- Techniques integrated into federation
architecture - Schema evolution and other challenges
- Future work
- TARGIT implementation and testing
- Using techniques in other applications
25Related Work
- Data changes in XML/semistructured documents
- Xyleme Zhuge
- Schema changes in scientific documents
- Not XML
- Adaptive/dynamic query optimization
- Telegraph project
- We use once per source, rather than per tuple
- Does not consider one or more of OLAPXML
concepts, schema changes, slow and unreliable
sources - Own previous OLAP-XML work is not adaptive