DNS and ContentBased Addressing - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

DNS and ContentBased Addressing

Description:

Directories: Napster and LDAP. Today we see more schemes that build upon similar ideas: ... How does this mesh with the assumptions ... How Does It Work? ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 24
Provided by: zack4
Category:

less

Transcript and Presenter's Notes

Title: DNS and ContentBased Addressing


1
DNS and Content-Based Addressing
  • Zachary G. Ives
  • University of Pennsylvania
  • CIS 455 / 555 Internet and Web Systems
  • February 12, 2009

2
Reminders and Recap
  • Homework 1 Milestone 2 due 2/17
  • We have been discussing schemes for finding data
  • XPaths path queries over hierarchical XML
  • Content-based addressing keyword search
  • Directories Napster and LDAP
  • Today we see more schemes that build upon similar
    ideas
  • DNS hierarchical administration, heavy caching
    at all levels
  • Gnutella
  • Can make requests based on filter conditions
  • But flooding is expensive
  • XFilter find the data through a centralized
    crawler and XPath

3
The Backbone of Internet NamingDomain Name
Service
  • A simple, hierarchical name system with a
    distributed database each domain controls its
    own names

com
Top LevelDomains
edu


columbia
upenn
berkeley
amazon



www
www
cis
sas



www
www
www
4
Top-Level Domains (TLDs)
  • Mostly controlled by Network Solutions, Inc.
    today
  • .com commercial
  • .edu educational institution
  • .gov US government
  • .mil US military
  • .net networks and ISPs (now also a number of
    other things)
  • .org other organizations
  • 244, 2-letter country suffixes, e.g., .us, .uk,
    .cz, .tv,
  • and a bunch of new suffixes that are not very
    common, e.g., .biz, .name, .pro,

5
Finding the Root
  • 13 root servers store entries for all top level
    domains (TLDs)
  • DNS servers have a hard-coded mapping to root
    servers so they can get started
  • These can be updated by UDP messages (single
    packet)

6
Excerpt from DNS Root Server Entries
  • This file is made available by InterNIC
    registration services under anonymous FTP as
  • file /domain/named.root
  • formerly NS.INTERNIC.NET
  • . 3600000 IN NS A.ROOT-SERVERS.NET.
  • A.ROOT-SERVERS.NET. 3600000 A 98.41.0.4
  • formerly NS1.ISI.EDU
  • . 3600000 NS B.ROOT-SERVERS.NET.
  • B.ROOT-SERVERS.NET. 3600000 A 128.9.0.107
  • formerly C.PSI.NET
  • . 3600000 NS C.ROOT-SERVERS.NET.
  • C.ROOT-SERVERS.NET. 3600000 A 192.33.4.12

(13 servers in total, A through M)
7
Supposing We Were to Build DNS
  • How would we start? How is a lookup performed?
  • (Hint what do you need to specify when you add
    a client to a network that doesnt do DHCP?)

8
Issues in DNS
  • We know that everyone wants to be my-domain.com
  • How does this mesh with the assumptions inherent
    in our hierarchical naming system?
  • What happens if things move frequently?
  • What happens if we want to provide different
    behavior to different requestors (e.g., Akamai)?

9
Directories Summarized
  • An efficient way of finding data, assuming
  • Data doesnt change too often, hence it can be
    replicated and distributed
  • Hierarchy is relatively wide and flat
  • Caching is present, helping with repeated queries
  • Directories generally rely on names at their core
  • Sometimes we want to search based on other means,
    e.g., predicates or filters over content

10
Pushing the Search to the NetworkFlooding
Requests Gnutella
  • Node A wants a data item it asks B and C
  • If B and C dont have it, they ask their
    neighbors, etc.
  • What are the implications of this model?

G
D
H
C
B
A
E
I
F
11
Bringing the Data to the Router
Publish-Subscribe
  • Generally, too much data to store centrally but
    perhaps we only need a central coordinator!
  • Interested parties register a profile with the
    system (often in a central server)
  • In, for instance, XPath!
  • Data gets aggregated at some sort of router or by
    a crawler, and then gets disseminated to
    individuals
  • Based on match between content and the profile
  • Data changes often, but queries dont!

12
Bringing the Data to the Router
Publish-Subscribe
  • Generally, too much data to store centrally but
    perhaps we only need a central coordinator!
  • Interested parties register a profile with the
    system (often in a central server)
  • In, for instance, XPath!
  • Data gets aggregated at some sort of router or by
    a crawler, and then gets disseminated to
    individuals
  • Based on match between content and the profile
  • Data changes often, but queries dont!

13
An Example XML-Based Information Dissemination
  • Basic model (XFilter, YFilter, Xyleme)
  • Users are interested in data relating to a
    particular topic, and know the schema
  • /politics/usa//body
  • A crawler-aggregator reads XML files from the web
    (or gets them from data sources) and feeds them
    to interested parties

14
Engine for XFilter Altinel Franklin 00
15
How Does It Work?
  • Each XPath segment is basically a subset of
    regular expressions over element tags
  • Convert into finite state automata
  • Parse data as it comes in use SAX API
  • Match against finite state machines
  • Most of these systems use modified FSMs because
    they want to match many patterns at the same time

16
Path Nodes and FSMs
  • XPath parser decomposes XPath expressions into a
    set of path nodes
  • These nodes act as the states of corresponding
    FSM
  • A node in the Candidate List denotes the current
    state
  • The rest of the states are in corresponding Wait
    Lists
  • Simple FSM for /politics_at_topicpresident/usa//
    body

politics
usa
body
Q1_1 Q1_2 Q1_3
17
Decomposing Into Path Nodes
Q1/politics_at_topicpresident/usa//body
  • Query ID
  • Position in state machine
  • Relative Position (RP) in tree
  • 0 for root node if its not preceded by //
  • -1 for any node preceded by //
  • Else 1 (no of nodes from predecessor node)
  • Level
  • If current node has fixed distance from root,
    then 1 distance
  • Else if RP 1, then 1, else 0
  • Finaly, NextPathNodeSet points to next node

Q1
Q1
Q1
1
2
3
0
1
-1
1
2
-1
Q1-1
Q1-2
Q1-3
Q2//usa//body/p
Q2
Q2
Q2
1
2
3
-1
2
1
-1
0
0
Q2-1
Q2-2
Q2-3
18
Query Index
CL
  • Query index entry for each XML tag
  • Two lists Candidate List (CL) and Wait List (WL)
    divided across the nodes
  • Live queries states are in CL pending
    queries states are in WL
  • Events that cause state transition are generated
    by the XML parser

Q1-1
X
politics
WL
X
Q2-1
X
usa
Q1-2
X
X
body
Q1-3
Q2-2
X
X
p
Q2-3
X
19
Encountering an Element
  • Look up the element name in the Query Index and
    all nodes in the associated CL
  • Validate that we actually have a match

Query ID Position Rel. Position Level
startElement politics
Q1
1
Entry in Query Index
0
1
CL
Q1-1
NextPathNodeSet
X
politics
Q1-1
WL
X
20
Validating a Match
  • We first check that the current XML depth matches
    the level in the user query
  • If level in CL node is less than 1, then ignore
    height
  • else level in CL node must height
  • This ensures were matching at the right point in
    the tree!
  • Finally, we validate any predicates against
    attributes (e.g., _at_topicpresident)

21
Processing Further Elements
  • Queries that dont meet validation are removed
    from the Candidate Lists
  • For other queries, we advance to the next state
  • We copy the next node of the query from the WL to
    the CL, and update the RP and level
  • When we reach a final state (e.g., Q1-3), we can
    output the document to the subscriber
  • When we encounter an end element, we must remove
    that element from the CL

22
Publish-Subscribe Model Summarized
  • Currently not commonly used
  • Partly because XML isnt that widespread
  • This may change with the adoption of an XML
    format called RSS (Rich Site Summary or Really
    Simple Syndication)
  • Many news sites, web logs, mailing lists, etc.
    use RSS to publish daily articles
  • Seems like a perfect fit for publish-subscribe
    models!

23
Finding a Happy Medium
  • Weve seen two approaches
  • Do all the work at the data stores flood the
    network with requests
  • Do all the work via a central crawler record
    profiles and disseminate matches
  • An alternative, two-step process
  • Build a content index over whats out there
  • Typically limited in what kinds of queries can be
    supported
  • Most common instance an index of document
    keywords
Write a Comment
User Comments (0)
About PowerShow.com