Title: Query Optimization Techniques and Performance Issues in XML and Parallel databases
1Query Optimization Techniques and Performance
Issues in XML and Parallel databases
- CSE 8330
- Instructor Dr.Margaret H. Dunham
- Presenter Akshaya Aradhya
2Topics to be covered
- Introduction
- Query optimization in XML databases
- Query optimization in Parallel databases
- Comparison
- Conclusion and Future work
- Bibliography
3Introduction
- XML is an emerging standard for exchanging,
storing and representing the data - The data encoded in XML conforms to a DTD
(Document Type Definition) - XML structure is intuitive and it is easier to
interpret it using its tree like structure.
4Introduction
- XML data model is very complex when compared to
other relational models, which renders a larger
search space for optimizing XML queries - In order to optimize XML queries, we need to
study the equivalence issue related to the data
and the query in order to find out the query
equivalence before transforming the query
5Introduction
- The techniques used to classify the XML query
optimization techniques can be divided into
groups based on the content and structure - Content based query optimization Based on
statistics or classification - Query execution can be improved by classifying
the elements, which transform the query based on
constraints which are obtained from the data
6Introduction
- The application of parallel database systems can
be observed in decision support systems and a
wide range of modern database applications. - The machine architecture in parallel database
systems are based on parallel dataflow
architecture system, which make use of
conventional, shared nothing hardware design. - For each relation in the database, the tuples are
de-clustered (partitioned) across disk storage
units, which are attached to individual
processors.
7Introduction
- There are two properties demonstrated by
parallelism, which makes it very desirable. - The first one is called as linear scale-up, where
the system can perform a task k times the size
in a particular span of time, after the number of
processors are increased by k. - The second one is called as linear speedup where
the response time is reduced by k times if we
increase the number of processors by k times
8Introduction
- During the query processing stage in parallel
databases, parallelism can be exploited in three
different ways. - In the independent parallelism technique,
different processors can execute different
queries in parallel if the query operators do not
depend on each other. - By pipelining or by making use of inter-operator
parallelism, the output of the producer to the
consumer can be passed on in parallel by two or
more operators in a producer consumer
relationship. - Finally, in intra-operator or partitioned
parallelism technique, copies of the same query
operator can be run on multiple processors
simultaneously, where each of them can be
operated on a partition of the data.
9Optimization mechanism using ToXin tree
- ToXin indexing scheme was developed to overcome
the limitation of applying optimization for path
query processing. - This scheme was developed with the primary goal
of exploiting the path structure of the XML
databases in all the stages of query processing. - There are two types of index structures in Toxin
called Value index and Path index
10Optimization mechanism using ToXin tree
- Algorithm ConstructIndexTree
- Output Tree T
- ConstructIndexTree()
- 1. Perform a depth first traversal of the tree.
- 2. For each visited edge
- 2.1 Check whether the corresponding index
edge has been added - 2.1.1 For the current index edge of
the XML element - 2.1.1.1 Update the instance
function in two redundant hash tables
representing forward and backward
navigation tables - 2.1.1.2 Add the parent node
and child node - 2.2 If it has been added already, skip to
the next index edge - 3. Stop
11Optimization mechanism in Lore
- An input query is divided into a set of sub
queries where each operation is evaluated
separately, as a part of the query. - An effective execution order for these operations
is obtained by creating evaluations for all the
set of operations, which in turn helps in
executing the queries faster. - The final result can be obtained by joining all
the aggregation of the results together.
12Optimization mechanism in Lore
- Algorithm PlanSelectionAlgorithm
- Input Input list (for the query)
- Output Plan P
- PlanSelectionAlgorithm (input list)
- 1. Create a structure in order to track the
binding variables - 2. while input list is not empty
- 2.1 For each element in the input list
- 2.1.1 Based on the current bound
variables, find the cheapest access method for
the remaining steps - 2.1.2 If the step has the least cost,
mark the variables as bound and add it to the
plan P - 2.1.3 Remove the chosen step
- 3. Return the final plan P obtained from the
previous steps
13Optimizing queries in XML structured document
databases
- Using a set oriented algebraic technique named
PAT algebra, a series of set related operations
and rules are defined. - PAT expressions are obtained by transforming
input queries, after checking for the correctness
of their syntax. - Based on the relationship of elements in the DTD,
the PAT expressions can be normalized with the
help of the PAT algebra in order to get a new
query.
14Query optimization based on Schema
15Query optimization by pruning and rewriting
queries
16Query optimization by classification of elements
17Join Strategy Selection
18Optimal Serial Plan (in identical processors)
19Comparison between Relational Database Management
System vs. XML Database System
20(No Transcript)
21Comparison of algorithms
22Conclusion and Future work
- The tree generation algorithm and some of the
optimal plan selection and generation algorithms
run in polynomial time and hence, they need to be
optimized to run in linear time. - PAT algebra is being extended to make it more
suitable for query optimization. Frequency search
operations heavily make use of the indexing
techniques in PAT. - The future research will also be focused more
towards generation and use of partially
correlated sub-plans, which depend on bindings
passed between portions of query plan. - When a significant number of paths pass through a
small number of objects, a transformation which
introduces a group-by clause can be useful. - Further examination is being conducted in order
to implement the Toxin Graph and to check if the
Toxin Tree can be extended to be used as an
alternative to DOM for querying, updating and
storing XML documents
23Conclusion and Future work
- Value based grouping and join techniques are
being investigated along with multi-way
structural joins, new access methods for merged
operators and several structural pattern
techniques. - In addition to this, new optimization algorithms
have to be implemented to improve caching in Web
Service Management Systems, XQuery language
constructs are to be optimized. - Cost based decisions are to be integrated in
earlier stages of the query evaluation process
and the cost model has to be refined in order
model the CPU cost in a precise manner.
24Bibliography
- 1 Dunren Che, Karl Aberer, and Tamer. 2006.
Query optimization in XML structured-document
databases. The VLDB Journal 15, 3 (September
2006), 263-289. - 2 Jason McHugh and Jennifer Widom. 1999. Query
Optimization for XML. In Proceedings of the 25th
International Conference on Very Large Data Bases
(VLDB '99), Malcolm P. Atkinson, Maria E.
Orlowska, Patrick Valduriez, Stanley B. Zdonik,
and Michael L. Brodie (Eds.). Morgan Kaufmann
Publishers Inc., San Francisco, CA, USA, 315-326. - 3 Boag, S. Berglund, A. Chamberlin, D.
Siméon, J. Kay, M. Robie, J. Fernández, M. F.
(2007), 'XML Path Language (XPath) 2.0' ,
Technical report, W3C , http//www.w3.org/TR/2007/
REC-xpath20-20070123/ . - 4 Haw, S.C and Rao, G.S.V.R.K., 2005. Query
Optimization Techniques for XML Databases.
International Journal of Information Technology,
2(1) 97 104. - 5 S. Groppe and S. Bottcher Schema-based
Query Optimization for XQuery Queries,
Proceedings of the Advances in Databases and
Information Systems 2005, Tallinn, Estonia, 2005. - 6 Mary F. Fernandez and Dan Suciu. 1998.
Optimizing Regular Path Expressions Using Graph
Schemas. In Proceedings of the Fourteenth
International Conference on Data Engineering
(ICDE '98). IEEE Computer Society, Washington,
DC, USA, 14-23. - 7 Dung Xuan Thi Le, Stephane Bressan, David
Taniar, and Wenny Rahayu. 2007. Semantic XPath
query transformation opportunities and
performance. In Proceedings of the 12th
international conference on Database systems for
advanced applications (DASFAA'07), Ramamohanarao
Kotagiri, P. Radha Krishna, Mukesh Mohania, and
Ekawit Nantajeewarawat (Eds.). Springer-Verlag,
Berlin, Heidelberg, 994-1000.
25Bibliography
- 8 Atri Salminen and Frank Wm. TompaPat
expressions an algebra for text search. In Acta
Linguista Hungarica 41, pages 277 306, 1994. - 9 F.Rizzolo and A.Mendelzon. Indexing XML Data
with ToXin. In Proc. 4th Int. Workshop on the Web
and Database (in Conjunction with ACM SIGMOD),
Santa Barbara, CA, May 2001. - 10 Jason McHugh and Jennifer Widom Query
Optimization for XML. In proceedings of the 25th
Very Large Data Bases Conference, Edinburgh,
Scotland, 1999. - 11 Wei Sun Daxin Liu Wansong Zhang , "An
efficient method for XML queries optimization
based DTD abstraction and classification,"
Intelligent Control and Automation, 2004. WCICA
2004. Fifth World Congress on , vol.5, no., pp.
3926- 3929 Vol.5, 15-19 June 2004 - 12 Alberto O. Mendelzon. ToX The Toronto XML
Server. Proc. Int. Database Engineering and
Applications Symposium (IDEAS). IEEE CS Press.
Edmonton, Canada, July 2002. - 13 J. McHugh, S. Abiteboul, R. Goldman, D.
Quass, and J. Widom. Lore A Database Management
System for Semistructured Data. SIGMOD Record,
26(3)54-66, September 1997. - 14 McHugh, J., Widom. J., 1999b. Optimizing
branching path expressions. Technical Report,
Stanford University. - 15 Ke Geng, Gillian Dobbie, and Yulong Meng.
2009. Survey of XML Semantic Query Optimization.
In Proceedings of the 2009 Fourth International
Conference on Internet Computing for Science and
Engineering (ICICSE '09). IEEE Computer Society,
Washington, DC, USA, 297-300. - 16 Tae-Sun Chung and Hyoung-Joo Kim. 2002.
Extracting indexing information from XML DTDs.
Inf. Process. Lett. 81, 2 (January 2002), 97-103.
- 17 Wu, Y., Patel, J.M., Jagadish, H.V.
Structural join order selection for XML query
optimization. In ICDE, pp. 443-454. IEEE
Computer Society, New York (2003)
26Bibliography
- 18 Abdelkader Hameurlain, Franck Morvan
Evolution of Query Optimization Methods. T.
Large-Scale Data- and Knowledge-Centered Systems
1 211-242 (2009) - 19 Andreas M. Weiner, Theo Härder An
integrative approach to query optimization in
native XML database management systems. IDEAS
2010 64-74 - 20 Amol Deshpande and Lisa Hellerstein. 2008.
Flow Algorithms for Parallel Query Optimization.
In Proceedings of the 2008 IEEE 24th
International Conference on Data Engineering
(ICDE '08). IEEE Computer Society, Washington,
DC, USA, 754-763. - 21 S. M. Mahajan and V. P. Jadhav. 2011. A
survey of issues of query optimization in
parallel databases. In Proceedings of the
International Conference Workshop on Emerging
Trends in Technology (ICWET '11). ACM, New York,
NY, USA, 553-554. - 22 Sai Wu, Feng Li, Sharad Mehrotra, and Beng
Chin Ooi. 2011. Query optimization for massively
parallel data processing. In Proceedings of the
2nd ACM Symposium on Cloud Computing (SOCC '11).
ACM, New York, NY, USA, , Article 12 , 13 pages. - 23 David J. DeWitt and Jim Gray. 1990. Parallel
database systems the future of database
processing or a passing fad?. SIGMOD Rec. 19, 4,
104-112. - 24 Ashish Thusoo, Joydeep Sen Sarma, Namit
Jain, Zheng Shao, Prasad Chakka, Suresh Anthony,
Hao Liu, Pete Wyckoff, and Raghotham Murthy.
2009. Hive a warehousing solution over a
map-reduce framework. Proc. VLDB Endow. 2, 2
(August 2009), 1626-1629. - 25 Foto N. Afrati and Jeffrey D. Ullman. 2010.
Optimizing joins in a map-reduce environment. In
Proceedings of the 13th International Conference
on Extending Database Technology (EDBT '10),
Ioana Manolescu, Stefano Spaccapietra, Jens
Teubner, Masaru Kitsuregawa, Alain Leger, Felix
Naumann, Anastasia Ailamaki, and Fatma Ozcan
(Eds.). ACM, New York, NY, USA, 99-110. - 26 Utkarsh Srivastava, Kamesh Munagala,
Jennifer Widom, and Rajeev Motwani. 2006. Query
optimization over web services. In Proceedings of
the 32nd international conference on Very large
data bases (VLDB '06), Umeshwar Dayal, Khu-Yong
Whang, David Lomet, Gustavo Alonso, Guy Lohman,
Martin Kersten, Sang K. Cha, and Young-Kuk Kim
(Eds.). VLDB Endowment 355-366.