BTREE Indices - PowerPoint PPT Presentation

About This Presentation
Title:

BTREE Indices

Description:

Queries do not directly search the WWW for data; ... that each internal node has the minium number of descendents (M/2); this results ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 18
Provided by: stephencra
Learn more at: https://sites.pitt.edu
Category:
Tags: btree | indices | minium

less

Transcript and Presenter's Notes

Title: BTREE Indices


1
BTREE Indices
  • A little context information
  • Whats the purpose of an index?
  • Example of web search engines
  • Queries do not directly search the WWW for data
  • Rather, an index is searched, and then go
    directly to the data.
  • Example google.com

2
Google
  • So, not really searching the WWW, but querying an
    respresentation of a pre-searched WWW.
  • Googles indices map keys (search vocabulary
    elements) to web pages.
  • Require 50GB
  • Proposal suggests that its Document Index is
    btree based (10 GB)
  • The following image is from Brin and Page doc
  • http//www7.scu.edu.au/programme/fullpapers/1921/c
    om1921.htm

3

4
Google, cont.
  • The proposal of Sergey Brin and Lawrence Page
    estimated originally that they would need about
    100 Million pages.
  • Now over 1 Billion pages off by an order of
    magnitude.
  • How big is a billion pages? 4 terabytes!

5
Btree - requirements
  • What does the B stand for?
  • A Btree is a generalization of a Multiway tree
    which in turn is a generalization of a binary
    tree.

6
Btree - requirements
  • What does the B stand for?
  • Invented by R. Bayer in 1970
  • A Btree is a generalization of a Multiway tree
    which in turn is a generalization of a binary
    tree.
  • Requirements
  • Maintain balance
  • Minimize Disk I/O - why?

7
Btree requirements, cont.
  • Disk access speed
  • between 3ms and 10ms
  • Compare this to CPU speeds
  • So, although btrees are old technolgy, they
    remain useful!
  • Common to trees
  • RANDOM access - not direct access

8
Btree - definition
  • A multiway tree in which
  • All leaf nodes are on the same level
  • Every non-leaf node, except the root, has between
    M/2 and M descendents (leaf nodes
    have zero descendents)
  • The root can have 0 - M descendents
  • (All descendents are non-empty)
  • M is the order of the btree
  • What determines M?

9
Btrees
  • The order of a binary tree is trivially 2.
  • The order (M) of a btree is set at creation. A
    function of
  • Size of node - How is this determined?
  • Size of keys (or partial keys)
  • What does a node look like?

10
Btree - height
  • What is the height of a btree? Does it matter?

11
Btree - height
  • The maximum height of a btree index determines
    the path length or max number of accesses for a
    search. Remember, each node represents a
    potential disk access.
  • Assume that each internal node has the minium
    number of descendents (M/2) this results in
    maximum depth of tree.
  • For N elements,
  • max. height lt logm/2 ((N1)/2)
  • So search is O(logm/2 (N)

12
Some max. height examples
  • For M 200
  • log M/2 (1M) lt 3
  • log M/2 (1G) lt 5
  • log M/2 (1T) lt 7
  • log M/2 (1P) lt 8
  • log M/2 (1E) lt 9

13
Btree Improvements
  • Most Btrees today are really Btrees
  • Records (vs keys) are stored in leaf nodes
  • Leaf nodes are links to provide sequential as
    well as random access
  • Can relax the constraint on number of elements
    for leaf nodes without affecting algorithms.
  • Variable-length keys can relax bounds (m/2, m)
    for number of descendents
  • High Concurrency multi-granular locking

14
Btree Access Methods
  • Create a btree
  • Destroy a btree
  • Search for a specific record (query)
  • Insert a record
  • Delete a record
  • Read a record
  • Iterator operations
  • Consistency and concurrency

15
Btree Web Demo
  • http//sky.fit.qut.edu.au/maire/baobab/baobab.htm
    l

16
HPSS
  • High Performance Storage System
  • Collaboration between Govt Industry
  • Hierarchical storage management and services for
    very large storage environments.
  • Requirements that are very demanding in terms of
    total storage capacity, file sizes, data rates,
    number of objects stored,and number of users.
  • E.g., HD Real time digitized video in distributed
    network.

17
HPSS
  • Metadata (not user files) is stored in
    transactionally-protected BTREE files
  • Also mirrored for recovery performance
  • BTREE indices for
  • locating use files and ACL info. Must scale to
    billions of entries.
  • Managing disk storage
  • Single btree index over 20 GB in size.
  • Web page
  • http//www.sdsc.edu/hpss/hpss1.html
Write a Comment
User Comments (0)
About PowerShow.com