SCUD: Scalable Counting of Unique Data - PowerPoint PPT Presentation

1 / 2
About This Presentation
Title:

SCUD: Scalable Counting of Unique Data

Description:

Laboratory for Advanced Systems Research, The University of ... Telematics Applications: [e.g., BMW Assist] Counting the number of unique cars per highway. ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 3
Provided by: marym5
Category:

less

Transcript and Presenter's Notes

Title: SCUD: Scalable Counting of Unique Data


1
SCUD Scalable Counting of Unique Data
Dmitry Kit, Prince Mahajan, Navendu Jain, Praveen
Yalagandula, Mike Dahlin, and Yin
Zhang Laboratory for Advanced Systems Research,
The University of Texas at Austin
Hewlett-Packard Labs, Palo Alto, CA
Laboratory for Advanced Systems Research
Our Approach
  • Push query processing into the network
  • Chaining of Aggregation functions

Counting information about unique data is an
important basic operation in large scale
distributed applications
  • Web Demo
  • Perform two operations on the system put/get.
  • Can Specify the address from which requests
    originate.
  • View the top-10 list of clients who performed a
    put or a get.
  • Details
  • Uses Bamboo DHT as the storage system.
  • When Bamboo receives a put or a get request it
    notifies SDIMS with the new access count
    (old_count1).
  • When this information is fully aggregated SDIMS
    contacts Bamboo.
  • Bamboo updates its local top-10 list.
  • If there is a change, this list is inserted into
    SDIMS under as a different aggregation.
  • The lists from each root is combined to form the
    final Top-10 list.
  • Location
  • http//z.cs.utexas.edu/users/dkit/bamboo_test.php
  • Use one aggregation function to aggregate
    information per attribute
  • Example Aggregating the number of accesses made
    by a client can be aggregated across multiple DHT
    trees.
  • Use a basic SUM aggregation for each
    client
  • Chain the output of the first aggregation
    function into the second aggregation. e.g.,
    Top-K
  • The results of the first aggregation reside on a
    large number of nodes (e.g., 1,000,000).
  • The chaining process allows us to combine this
    distributed information.
  • Example Maintain a Top-10 list of heavy hitter
    flows by source IPs in terms of the total bytes
    sent.
  • Several Applications
  • Distributed storage e.g., Bamboo/OpenDHT
  • Top-k users in a storage system grouped by
    activity type.
  • Put store information.
  • Get retrieve information.
  • Network Monitoring e.g., Heavy Hitters
  • Top-k flows by bytes and packets.
  • Aggregate MAX/MIN/AVG. incoming flows in an
    organization.
  • Content Distribution Networks e.g., Akamai
  • Counting the number of unique accesses per
    webpage.
  • Telematics Applications e.g., BMW Assist
  • Counting the number of unique cars per highway.

Top-10 aggregation
lt9.2.5.89, 4.2MBgtlt9.2.5.15, 2.2MBgtlt9.2.5.122,
1.5MBgtlt9.2.5.67, 1.3MBgtlt9.2.5.25, 900KBgt

lt9.2.5.230, 800KBgt
lt9.2.5.202, 760KBgt
lt9.2.5.122, 1.5MBgt
lt9.2.5.121, 400KBgt
lt9.2.5.25, 900KBgt
lt9.2.5.67, 1.3MBgt
  • Scalability
  • large number of unique items.
  • large number of distributed data sources at which
    the information about these items are being
    updated.
  • Existing schemes dont scale
  • High bandwidth cost, large processing delay, high
    response latency.
  • Hierarchical aggregation
  • Root node and nodes near the root O( items)
    message cost, storage cost.

lt9.2.5.10, 653KBgt
lt9.2.5.56, 257KBgt
lt9.2.5.240, 100KBgt
lt9.2.5.210, 120KBgt
lt9.2.5.15, 2.2MBgt
lt9.2.5.89, 4.2MBgt
URL http//www.cs.utexas.edu/users/ypraveen/sdims
Email sdims_at_cs.utexas.edu
2
Estimated Network Size 18 nodes Top 10 Gets by
client Client IP Number of Gets
127.0.0.1 1182 128.83.144.30 243
128.83.120.172 168 128.83.120.245 147
128.83.120.138 120 128.83.120.21 90
128.83.144.43 75 128.83.130.11 48
128.83.120.114 27 128.83.144.241 12 Top 10
Puts by client Client IP Number of Puts
127.0.0.1 1100 128.83.120.138 180
128.83.144.30 144
Write a Comment
User Comments (0)
About PowerShow.com