XQuery and Hierarchical Naming - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

XQuery and Hierarchical Naming

Description:

bspears-oops.mp3. jjackson-lame.mp3. jjackson-lame. bspears-oops. Directory. Other Services with Similar Directory Peer Architectures ... – PowerPoint PPT presentation

Number of Views:179
Avg rating:3.0/5.0
Slides: 29
Provided by: zack4
Category:

less

Transcript and Presenter's Notes

Title: XQuery and Hierarchical Naming


1
XQuery and Hierarchical Naming
  • Zachary G. Ives
  • University of Pennsylvania
  • CIS 455 / 555 Internet and Web Systems
  • February 7, 2008

2
Today
  • Reminder Homework 1 due 2/12 _at_ 1159PM
  • XQuery and joins
  • Addressing vs. naming
  • Hierarchical names

3
XQuerys Basic Form
  • The model bind nodes (or node sets) to
    variables operate over each legal combination of
    bindings produce a set of nodes
  • FLWOR statement pattern
  • for iterators that bind variables
  • let collections
  • where conditions
  • order by order-conditions
  • return output constructor

4
Example XML Data
Root
dblp
?xml
mastersthesis
inproceedings
university
mdate
school
key
country
key
author
title
year
mdate
name
2002
key
USA
1992
author
title
crossref
year
ee
ms/Brown92
2002..
PRPL
wisc
On
1997
wisc
Kurt Brown
conf/sigm../
sigmod-97
www
Wisconsin
Paul R.
5
XQuery and Joins
  • for i in doc (dblp.xml)/dblp/inproceedings,
    r in i/crossref/text(), c in doc
    (dblp.xml)/dblp/conf, n in c/_at_name
  • where c r
  • return i, c

6
Some Uses for Join in XML
  • Translation between values
  • SSN ? PennID
  • Joining or combining information
  • Amazon invoice info UPS tracking info
  • Restructuring information

  • ..?
  • Here, we separate authors from books, then join
    them back in upside-down fashion

7
Changing Nesting of XML Content
  • Re-nesting XML trees is a common operation
  • Simply nest the query blocks and correlate them
    similar to join
  • for u in doc(dblp.xml)/dblp/university, n
    u/name/text(),
  • k u/_at_key
  • where u/country USA
  • return
  • n
  • for mt in u/../mastersthesis,
    inst in mt/school/text()
  • where mt/year/text() 1992 and
    _______________
  • return mt/title

8
Collections Aggregation in XQuery
  • Given a collection, we can compute an average,
    count, etc. of its members
  • for paper in doc(dblp.xml)/dblp/inproceedings
  • let pauth paper/author
  • return paper/title
  • fncount(pauth)

a collection
9
Sorting in XQuery
  • We can order the sequence of result tuples
    output by the return clause
  • for x in doc(dblp.xml)/proceedings
  • order by x/title/text()
  • return x

10
Querying Defining Tags
  • Can get a nodes name by querying node-name()
  • for x in document(dblp.xml)/dblp/
  • return node-name(x)
  • Can construct elements and attributes using
    computed names
  • for x in document(dblp.xml)/dblp/,
  • year in x/year,
  • title in x/title/text(),
  • element node-name(x)
  • attribute year- year title

11
XQuery Summary
  • Very flexible and powerful language for XML
  • Focus is on database-style operations like joins
  • Performs tasks that cant be done with XPath or
    XSLT and that are tedious to program in Java
  • Integrating information from multiple sources
  • Joins, based on correspondences of values
  • Computing count, average, etc.
  • Today, XQuery is available
  • In RDBMSs (SQL Server, Oracle, DB2) and XML DBMS
    systems (MarkLogic)
  • As the basis of research prototypes for XQuery
    full text
  • As the basis of XQueryP a Web Services/AJAX
    programming language based on XQuery but with
    programming language features
  • http//2006.xmlconference.org/programme/presentati
    ons/38.html
  • We will discuss data integration and middleware
    later in the course

12
Hierarchical Naming Schemes
  • Thus far, weve seen XPath as a hierarchical
    naming scheme
  • Content-based naming describe the structure
    and values of a tree structure
  • Assumption XML tree resides in (or is being
    sent to) one place
  • But hierarchy is often used for naming and
    location

13
How Do We Find Things on the Internet?
  • Generally, using one of three means
  • Addresses or locations specify where something
    is, assuming that we understand how to navigate
  • Just like a physical address, we may still need a
    map!
  • In the Internet, addresses are typically IP
    addresses the routers know the map
  • Names are mapped into addresses via lookup
    services
  • Best-known example on the Internet DNS name
  • Cell phone numbers, email addresses, etc. are
    becoming names
  • Content-based addressing/naming
  • The actual data value is somehow used to find its
    location
  • The basis of publish-subscribe systems and
    peer-to-peer architectures

14
The Simplest Way of Going fromNames or Content ?
Locations
  • Directory-based lookup protocols are very common
  • Examples
  • Napster 1.0 peer-to-peer storage with central
    directory
  • Inverted index used to look up keywords in
    information retrieval
  • DNS distributed hierarchical directory
  • LDAP hierarchical Directory Information Tree

15
Napster 1.0, ca 2002
  • Hybrid of peer-to-peer storage with central
    directory showing whats currently available
  • What are the trade-offs implicit in this model?
    Why did it fail?

Peer1
jjackson-lame.mp3
Directory
Napster.com
jjackson-lame bspears-oops
Peer2
bspears-oops.mp3
Peer3
jjackson-lame.mp3
16
Other Services with Similar Directory Peer
Architectures
  • FolderSync now owned by Microsoft
  • Google Desktop Search with multiple machines
  • BitTorrent trackers are quite similar (well
    discuss BitTorrent more later)

17
Inverted Indices
  • A forward index documents to words
  • The inverted index words to word-occurrences
  • The basis of most information retrieval engines,
    Google, etc.
  • Can handle positional predicates
  • But how can we reconstruct previews?

18
Naming People and Devices LDAP
  • Lightweight Directory Access Protocol
  • Hierarchical naming system that can be
    partitioned and replicated

19
LDAPs Schema
  • LDAP information has an XML-like schema
  • A unique name in LDAP is called a Distinguished
    Name, dn and consists of a sequence of
    attributes representing a hierarchy, from
    most-specific to least-specific (as in DNS
    names)
  • o organization dc domain component
  • ou organizational unit
  • uid user ID
  • cn common name
  • c country st state l locality
  • Can also have objectClass the type of entity

20
LDAP Hierarchy
Brad Marshall LDAP Tutorial, quark.humbug.au/publi
cations/ldap_tut.html
21
Querying LDAP
  • LDAP queries are mostly attribute-value
    predicates
  • uidzives oupenn c usa
  • ((cnSusan Davidson)(cnZachary Ives)(cnVal
    Tannen))
  • objectclassposixAccount
  • (!cnVal Tannen)
  • How does this differ from XPath?
  • How might we process these queries?

22
The Backbone of Internet NamingDomain Name
Service
  • A simple, hierarchical name system with a
    distributed database each domain controls its
    own names

com
Top LevelDomains
edu


columbia
upenn
berkeley
amazon



www
www
cis
sas



www
www
www
23
Top-Level Domains (TLDs)
  • Mostly controlled by Network Solutions, Inc.
    today
  • .com commercial
  • .edu educational institution
  • .gov US government
  • .mil US military
  • .net networks and ISPs (now also a number of
    other things)
  • .org other organizations
  • 244, 2-letter country suffixes, e.g., .us, .uk,
    .cz, .tv,
  • and a bunch of new suffixes that are not very
    common, e.g., .biz, .name, .pro,

24
Finding the Root
  • 13 root servers store entries for all top level
    domains (TLDs)
  • DNS servers have a hard-coded mapping to root
    servers so they can get started

25
Excerpt from DNS Root Server Entries
  • This file is made available by InterNIC
    registration services under anonymous FTP as
  • file /domain/named.root
  • formerly NS.INTERNIC.NET
  • . 3600000 IN NS A.ROOT-SERVERS.NET.
  • A.ROOT-SERVERS.NET. 3600000 A 98.41.0.4
  • formerly NS1.ISI.EDU
  • . 3600000 NS B.ROOT-SERVERS.NET.
  • B.ROOT-SERVERS.NET. 3600000 A 128.9.0.107
  • formerly C.PSI.NET
  • . 3600000 NS C.ROOT-SERVERS.NET.
  • C.ROOT-SERVERS.NET. 3600000 A 192.33.4.12

(13 servers in total, A through M)
26
Supposing We Were to Build DNS
  • How would we start? How is a lookup performed?
  • (Hint what do you need to specify when you add
    a client to a network that doesnt do DHCP?)

27
Issues in DNS
  • We know that everyone wants to be my-domain.com
  • How does this mesh with the assumptions inherent
    in our hierarchical naming system?
  • What happens if things move frequently?
  • What happens if we want to provide different
    behavior to different requestors (e.g., Akamai)?

28
Next Time
  • Well look at alternative mechanisms for finding
    things
  • Publish-subscribe models
  • Gossip protocols, such as in routers
  • Flooding
  • and soon, peer-to-peer or content-based routing
Write a Comment
User Comments (0)
About PowerShow.com