Query Optimization Techniques and Performance Issues in XML and Parallel databases - PowerPoint PPT Presentation

About This Presentation
Title:

Query Optimization Techniques and Performance Issues in XML and Parallel databases

Description:

Query Optimization Techniques and Performance Issues in XML and Parallel databases. CSE 8330. Instructor: Dr.Margaret H. Dunham. Presenter: AkshayaAradhya – PowerPoint PPT presentation

Number of Views:590
Avg rating:3.0/5.0
Slides: 27
Provided by: lyleSmuE62
Learn more at: https://s2.smu.edu
Category:

less

Transcript and Presenter's Notes

Title: Query Optimization Techniques and Performance Issues in XML and Parallel databases


1
Query Optimization Techniques and Performance
Issues in XML and Parallel databases
  • CSE 8330
  • Instructor Dr.Margaret H. Dunham
  • Presenter Akshaya Aradhya

2
Topics to be covered
  • Introduction
  • Query optimization in XML databases
  • Query optimization in Parallel databases
  • Comparison
  • Conclusion and Future work
  • Bibliography

3
Introduction
  • XML is an emerging standard for exchanging,
    storing and representing the data
  • The data encoded in XML conforms to a DTD
    (Document Type Definition)
  • XML structure is intuitive and it is easier to
    interpret it using its tree like structure.

4
Introduction
  • XML data model is very complex when compared to
    other relational models, which renders a larger
    search space for optimizing XML queries
  • In order to optimize XML queries, we need to
    study the equivalence issue related to the data
    and the query in order to find out the query
    equivalence before transforming the query

5
Introduction
  • The techniques used to classify the XML query
    optimization techniques can be divided into
    groups based on the content and structure
  • Content based query optimization Based on
    statistics or classification
  • Query execution can be improved by classifying
    the elements, which transform the query based on
    constraints which are obtained from the data

6
Introduction
  • The application of parallel database systems can
    be observed in decision support systems and a
    wide range of modern database applications.
  • The machine architecture in parallel database
    systems are based on parallel dataflow
    architecture system, which make use of
    conventional, shared nothing hardware design.
  • For each relation in the database, the tuples are
    de-clustered (partitioned) across disk storage
    units, which are attached to individual
    processors.

7
Introduction
  • There are two properties demonstrated by
    parallelism, which makes it very desirable.
  • The first one is called as linear scale-up, where
    the system can perform a task k times the size
    in a particular span of time, after the number of
    processors are increased by k.
  • The second one is called as linear speedup where
    the response time is reduced by k times if we
    increase the number of processors by k times

8
Introduction
  • During the query processing stage in parallel
    databases, parallelism can be exploited in three
    different ways.
  • In the independent parallelism technique,
    different processors can execute different
    queries in parallel if the query operators do not
    depend on each other.
  • By pipelining or by making use of inter-operator
    parallelism, the output of the producer to the
    consumer can be passed on in parallel by two or
    more operators in a producer consumer
    relationship.
  • Finally, in intra-operator or partitioned
    parallelism technique, copies of the same query
    operator can be run on multiple processors
    simultaneously, where each of them can be
    operated on a partition of the data.

9
Optimization mechanism using ToXin tree
  • ToXin indexing scheme was developed to overcome
    the limitation of applying optimization for path
    query processing.
  • This scheme was developed with the primary goal
    of exploiting the path structure of the XML
    databases in all the stages of query processing.
  • There are two types of index structures in Toxin
    called Value index and Path index

10
Optimization mechanism using ToXin tree
  • Algorithm ConstructIndexTree
  • Output Tree T
  • ConstructIndexTree()
  • 1. Perform a depth first traversal of the tree.
  • 2. For each visited edge
  • 2.1 Check whether the corresponding index
    edge has been added
  • 2.1.1 For the current index edge of
    the XML element
  • 2.1.1.1 Update the instance
    function in two redundant hash tables
    representing forward and backward
    navigation tables
  • 2.1.1.2 Add the parent node
    and child node
  • 2.2 If it has been added already, skip to
    the next index edge
  • 3. Stop

11
Optimization mechanism in Lore
  • An input query is divided into a set of sub
    queries where each operation is evaluated
    separately, as a part of the query.
  • An effective execution order for these operations
    is obtained by creating evaluations for all the
    set of operations, which in turn helps in
    executing the queries faster.
  • The final result can be obtained by joining all
    the aggregation of the results together.

12
Optimization mechanism in Lore
  • Algorithm PlanSelectionAlgorithm
  • Input Input list (for the query)
  • Output Plan P
  • PlanSelectionAlgorithm (input list)
  • 1. Create a structure in order to track the
    binding variables
  • 2. while input list is not empty
  • 2.1 For each element in the input list
  • 2.1.1 Based on the current bound
    variables, find the cheapest access method for
    the remaining steps
  • 2.1.2 If the step has the least cost,
    mark the variables as bound and add it to the
    plan P
  • 2.1.3 Remove the chosen step
  • 3. Return the final plan P obtained from the
    previous steps

13
Optimizing queries in XML structured document
databases
  • Using a set oriented algebraic technique named
    PAT algebra, a series of set related operations
    and rules are defined.
  • PAT expressions are obtained by transforming
    input queries, after checking for the correctness
    of their syntax.
  • Based on the relationship of elements in the DTD,
    the PAT expressions can be normalized with the
    help of the PAT algebra in order to get a new
    query.

14
Query optimization based on Schema
15
Query optimization by pruning and rewriting
queries
16
Query optimization by classification of elements
17
Join Strategy Selection
18
Optimal Serial Plan (in identical processors)
19
Comparison between Relational Database Management
System vs. XML Database System
20
(No Transcript)
21
Comparison of algorithms
22
Conclusion and Future work
  • The tree generation algorithm and some of the
    optimal plan selection and generation algorithms
    run in polynomial time and hence, they need to be
    optimized to run in linear time.
  • PAT algebra is being extended to make it more
    suitable for query optimization. Frequency search
    operations heavily make use of the indexing
    techniques in PAT.
  • The future research will also be focused more
    towards generation and use of partially
    correlated sub-plans, which depend on bindings
    passed between portions of query plan.
  • When a significant number of paths pass through a
    small number of objects, a transformation which
    introduces a group-by clause can be useful.
  • Further examination is being conducted in order
    to implement the Toxin Graph and to check if the
    Toxin Tree can be extended to be used as an
    alternative to DOM for querying, updating and
    storing XML documents

23
Conclusion and Future work
  • Value based grouping and join techniques are
    being investigated along with multi-way
    structural joins, new access methods for merged
    operators and several structural pattern
    techniques.
  • In addition to this, new optimization algorithms
    have to be implemented to improve caching in Web
    Service Management Systems, XQuery language
    constructs are to be optimized.
  • Cost based decisions are to be integrated in
    earlier stages of the query evaluation process
    and the cost model has to be refined in order
    model the CPU cost in a precise manner.

24
Bibliography
  • 1 Dunren Che, Karl Aberer, and Tamer. 2006.
    Query optimization in XML structured-document
    databases. The VLDB Journal 15, 3 (September
    2006), 263-289.
  • 2 Jason McHugh and Jennifer Widom. 1999. Query
    Optimization for XML. In Proceedings of the 25th
    International Conference on Very Large Data Bases
    (VLDB '99), Malcolm P. Atkinson, Maria E.
    Orlowska, Patrick Valduriez, Stanley B. Zdonik,
    and Michael L. Brodie (Eds.). Morgan Kaufmann
    Publishers Inc., San Francisco, CA, USA, 315-326.
  • 3 Boag, S. Berglund, A. Chamberlin, D.
    Siméon, J. Kay, M. Robie, J. Fernández, M. F.
    (2007), 'XML Path Language (XPath) 2.0' ,
    Technical report, W3C , http//www.w3.org/TR/2007/
    REC-xpath20-20070123/ .
  • 4 Haw, S.C and Rao, G.S.V.R.K., 2005. Query
    Optimization Techniques for XML Databases.
    International Journal of Information Technology,
    2(1) 97 104.
  • 5 S. Groppe and S. Bottcher Schema-based
    Query Optimization for XQuery Queries,
    Proceedings of the Advances in Databases and
    Information Systems 2005, Tallinn, Estonia, 2005.
  • 6 Mary F. Fernandez and Dan Suciu. 1998.
    Optimizing Regular Path Expressions Using Graph
    Schemas. In Proceedings of the Fourteenth
    International Conference on Data Engineering
    (ICDE '98). IEEE Computer Society, Washington,
    DC, USA, 14-23.
  • 7 Dung Xuan Thi Le, Stephane Bressan, David
    Taniar, and Wenny Rahayu. 2007. Semantic XPath
    query transformation opportunities and
    performance. In Proceedings of the 12th
    international conference on Database systems for
    advanced applications (DASFAA'07), Ramamohanarao
    Kotagiri, P. Radha Krishna, Mukesh Mohania, and
    Ekawit Nantajeewarawat (Eds.). Springer-Verlag,
    Berlin, Heidelberg, 994-1000.

25
Bibliography
  • 8 Atri Salminen and Frank Wm. TompaPat
    expressions an algebra for text search. In Acta
    Linguista Hungarica 41, pages 277 306, 1994.
  • 9 F.Rizzolo and A.Mendelzon. Indexing XML Data
    with ToXin. In Proc. 4th Int. Workshop on the Web
    and Database (in Conjunction with ACM SIGMOD),
    Santa Barbara, CA, May 2001.
  • 10 Jason McHugh and Jennifer Widom Query
    Optimization for XML. In proceedings of the 25th
    Very Large Data Bases Conference, Edinburgh,
    Scotland, 1999.
  • 11 Wei Sun Daxin Liu Wansong Zhang , "An
    efficient method for XML queries optimization
    based DTD abstraction and classification,"
    Intelligent Control and Automation, 2004. WCICA
    2004. Fifth World Congress on , vol.5, no., pp.
    3926- 3929 Vol.5, 15-19 June 2004
  • 12 Alberto O. Mendelzon. ToX The Toronto XML
    Server. Proc. Int. Database Engineering and
    Applications Symposium (IDEAS). IEEE CS Press.
    Edmonton, Canada, July 2002.
  • 13 J. McHugh, S. Abiteboul, R. Goldman, D.
    Quass, and J. Widom. Lore A Database Management
    System for Semistructured Data. SIGMOD Record,
    26(3)54-66, September 1997.
  • 14 McHugh, J., Widom. J., 1999b. Optimizing
    branching path expressions. Technical Report,
    Stanford University.
  • 15 Ke Geng, Gillian Dobbie, and Yulong Meng.
    2009. Survey of XML Semantic Query Optimization.
    In Proceedings of the 2009 Fourth International
    Conference on Internet Computing for Science and
    Engineering (ICICSE '09). IEEE Computer Society,
    Washington, DC, USA, 297-300.
  • 16 Tae-Sun Chung and Hyoung-Joo Kim. 2002.
    Extracting indexing information from XML DTDs.
    Inf. Process. Lett. 81, 2 (January 2002), 97-103.
  • 17 Wu, Y., Patel, J.M., Jagadish, H.V.
    Structural join order selection for XML query
    optimization. In ICDE, pp. 443-454. IEEE
    Computer Society, New York (2003)

26
Bibliography
  • 18 Abdelkader Hameurlain, Franck Morvan
    Evolution of Query Optimization Methods. T.
    Large-Scale Data- and Knowledge-Centered Systems
    1 211-242 (2009)
  • 19 Andreas M. Weiner, Theo Härder An
    integrative approach to query optimization in
    native XML database management systems. IDEAS
    2010 64-74
  • 20 Amol Deshpande and Lisa Hellerstein. 2008.
    Flow Algorithms for Parallel Query Optimization.
    In Proceedings of the 2008 IEEE 24th
    International Conference on Data Engineering
    (ICDE '08). IEEE Computer Society, Washington,
    DC, USA, 754-763.
  • 21 S. M. Mahajan and V. P. Jadhav. 2011. A
    survey of issues of query optimization in
    parallel databases. In Proceedings of the
    International Conference Workshop on Emerging
    Trends in Technology (ICWET '11). ACM, New York,
    NY, USA, 553-554.
  • 22 Sai Wu, Feng Li, Sharad Mehrotra, and Beng
    Chin Ooi. 2011. Query optimization for massively
    parallel data processing. In Proceedings of the
    2nd ACM Symposium on Cloud Computing (SOCC '11).
    ACM, New York, NY, USA, , Article 12 , 13 pages.
  • 23 David J. DeWitt and Jim Gray. 1990. Parallel
    database systems the future of database
    processing or a passing fad?. SIGMOD Rec. 19, 4,
    104-112.
  • 24 Ashish Thusoo, Joydeep Sen Sarma, Namit
    Jain, Zheng Shao, Prasad Chakka, Suresh Anthony,
    Hao Liu, Pete Wyckoff, and Raghotham Murthy.
    2009. Hive a warehousing solution over a
    map-reduce framework. Proc. VLDB Endow. 2, 2
    (August 2009), 1626-1629.
  • 25 Foto N. Afrati and Jeffrey D. Ullman. 2010.
    Optimizing joins in a map-reduce environment. In
    Proceedings of the 13th International Conference
    on Extending Database Technology (EDBT '10),
    Ioana Manolescu, Stefano Spaccapietra, Jens
    Teubner, Masaru Kitsuregawa, Alain Leger, Felix
    Naumann, Anastasia Ailamaki, and Fatma Ozcan
    (Eds.). ACM, New York, NY, USA, 99-110.
  • 26 Utkarsh Srivastava, Kamesh Munagala,
    Jennifer Widom, and Rajeev Motwani. 2006. Query
    optimization over web services. In Proceedings of
    the 32nd international conference on Very large
    data bases (VLDB '06), Umeshwar Dayal, Khu-Yong
    Whang, David Lomet, Gustavo Alonso, Guy Lohman,
    Martin Kersten, Sang K. Cha, and Young-Kuk Kim
    (Eds.). VLDB Endowment 355-366.
Write a Comment
User Comments (0)
About PowerShow.com