Distributed Data Storage and Parallel Processing Engine - PowerPoint PPT Presentation

About This Presentation

Title:

Distributed Data Storage and Parallel Processing Engine

Description:

10Gb/s inter-site connection on CiscoWave. 2Gb/s inter-rack connection ... Drive-by problem: visit a web site and get comprised by malware. MalStone-A: compute ... – PowerPoint PPT presentation

Number of Views:169

Avg rating:3.0/5.0

Slides: 34

Provided by: gu13125

Category:

more less

Transcript and Presenter's Notes

Title: Distributed Data Storage and Parallel Processing Engine

1
Distributed Data Storage and Parallel Processing
Engine
Sector Sphere
Yunhong Gu Univ. of Illinois at Chicago
2
What is Sector/Sphere?

Sector Distributed File System
Sphere Parallel Data Processing Engine (generic
MapReduce)
Open source software, GPL/BSD, written in C.
Started since 2006, current version 1.23
http//sector.sf.net

3
Overview

Motivation
Sector
Sphere
Experimental Results

4
Motivation
Super-computer model Expensive, data IO
bottleneck
Sector/Sphere model Inexpensive, parallel data
IO, data locality
5
Motivation
Parallel/Distributed Programming with MPI,
etc. Flexible and powerful. But too complicated
Sector/Sphere model (cloud model) Clusters are a
unity to the developer, simplified programming
interface. Limited to certain data parallel
applications.
6
Motivation
Systems for single data centers Requires
additional effort to locate and move data.
Sector/Sphere model Support wide-area data
collection and distribution.
7
Sector Distributed File System
User account Data protection System Security
Metadata Scheduling Service provider
System access tools App. Programming Interfaces
Security Server
Masters
Clients
SSL
SSL
Data
UDT Encryption optional
slaves
slaves
Storage and Processing
8
Sector Distributed File System

Sector stores files on the native/local file
system of each slave node.
Sector does not split files into blocks
Pro simple/robust, suitable for wide area, fast
and flexible data processing
Con users need to handle file size properly
The master nodes maintain the file system
metadata. No permanent metadata is needed.
Topology aware

9
Sector Performance

Data channel is set up directly between a slave
and a client
Multiple active-active masters (load balance),
starting from 1.24
UDT is used for high speed data transfer
UDT is a high performance UDP-based data transfer
protocol.
Much faster than TCP over wide area

10
UDT UDP-based Data Transfer

http//udt.sf.net
Open source UDP based data transfer protocol
With reliability control and congestion control
Fast, firewall friendly, easy to use
Already used in many commercial and research
software

11
Sector Fault Tolerance

Sector uses replications for better reliability
and availability
Replicas can be made either at write time
(instantly) or periodically
Sector supports multiple active-active masters
for high availability

12
Sector Security

Sector uses a security server to maintain user
account and IP access control for masters,
slaves, and clients
Control messages are encrypted
not completely finished in the current version
Data transfer can be encrypted as an option
Data transfer channel is set up by rendezvous, no
listening server.

13
Sector Tools and API

Supported file system operation ls, stat, mv,
cp, mkdir, rm, upload, download
Wild card characters supported
System monitoring sysinfo.
C API list, stat, move, copy, mkdir, remove,
open, close, read, write, sysinfo.
FUSE

14
Sphere Simplified Data Processing

Data parallel applications
Data is processed at where it resides, or on the
nearest possible node (locality)
Same user defined functions (UDF) are applied on
all elements (records, blocks, or files)
Processing output can be written to Sector files
or sent back to the client
Generalized Map/Reduce

15
Sphere Simplified Data Processing
for each file F in (SDSS datasets) for each
image I in F findBrownDwarf(I, )
SphereStream sdss sdss.init("sdss
files") SphereProcess myproc myproc-gtrun(sdss,"f
indBrownDwarf", ) myproc-gtread(result)
findBrownDwarf(char image, int isize, char
result, int rsize)
16
Sphere Data Movement

Slave -gt Slave Local
Slave -gt Slaves (Shuffle/Hash)
Slave -gt Client

17
Sphere/UDF vs. MapReduce

Record Offset Index
UDF
Hashing / Bucket
-
UDF
-

Parser / Input Reader
Map
Partition
Compare
Reduce
Output Writer

18
Sphere/UDF vs. MapReduce

Sphere is more straightforward and flexible
UDF can be applied directly on records, blocks,
files, and even directories
Native binary data support
Sorting is required by Reduce, but it is optional
in Sphere
Sphere uses PUSH model for data movement, faster
than the PULL model used by MapReduce

19
Why Sector doesnt Split Files?

Certain applications need to process a whole file
or even directory
Certain legacy applications need a file or a
directory as input
Certain applications need multiple inputs, e.g.,
everything in a directory
In Hadoop, all blocks would have to be moved to
one node for processing, hence no data locality
benefit.

20
Load Balance

The number of data segments is much more than the
number of SPEs. When an SPE completes a data
segment, a new segment will be assigned to the
SPE.
Data transfer is balanced across the system to
optimize network bandwidth usage.

21
Fault Tolerance

Map failure is recoverable
If one SPE fails, the data segment assigned to it
will be re-assigned to another SPE and be
processed again.
Reduce failure is unrecoverable
In small-medium systems, machine failure during
run time is rare
If necessary, developers can split the input into
multiple sub-tasks to reduce the cost of reduce
failure.

22
Open Cloud Testbed

4 Racks in Baltimore (JHU), Chicago (StarLight
and UIC), and San Diego (Calit2)
10Gb/s inter-site connection on CiscoWave
2Gb/s inter-rack connection
Two dual-core AMD CPU, 12GB RAM, 1TB single disk
Will be doubled by Sept. 2009.

23
Open Cloud Testbed
24
The TeraSort Benchmark

Data is split into small files, scattered on all
slaves
Stage 1 On each slave, an SPE scans local files,
sends each record to a bucket file on a remote
node according to the key.
Stage 2 On each destination node, an SPE sort
all data inside each bucket.

25
TeraSort
100 bytes record
Stage 2 Sort each bucket on local node
10-byte
90-byte
Value
Key
Bucket-0
Bucket-0
Bucket-1
Bucket-1
10-bit
0-1023
Stage 1 Hash based on the first 10 bits
Bucket-1023
Bucket-1023
26
Performance Results TeraSort
Run time seconds Sector v1.16 vs Hadoop 0.17
Data Size Sphere Hadoop (3 replicas) Hadoop (1 replica)
UIC 300GB 1265 2889 2252
UIC StarLight 600GB 1361 2896 2617
UIC StarLight Calit2 900GB 1430 4341 3069
UIC StarLight Calit2 JHU 1.2TB 1526 6675 3702
27
Performance Results TeraSort

Sorting 1.2TB on 120 nodes
Sphere Hash Local Sort 981sec 545sec
Hadoop 3702/6675 seconds
Sphere Hash
CPU 130 MEM 900MB
Sphere Local Sort
CPU 80 MEM 1.4GB
Hadoop CPU 150 MEM 2GB

28
The MalStone Benchmark

Drive-by problem visit a web site and get
comprised by malware.
MalStone-A compute the infection ratio of each
site.
MalStone-B compute the infection ratio of each
site from the beginning to the end of every week.

http//code.google.com/p/malgen/
29
MalStone
Text Record
Event ID Timestamp Site ID Compromise Flag
Entity ID 00000000005000000043852268954353585368
2008-11-08 175652.42264038572689543536285991
000000497829
Transform
Stage 2 Compute infection rate for each merchant
Site ID
Time
Flag
Key
Value
site-000X
site-000X
3-byte
site-001X
site-001X
000-999
Stage 1 Process each record and hash into
buckets according to site ID
site-999X
site-999x
30
Performance Results MalStone
Process 10 billions records on 20 OCT nodes
(local).
MalStone-A MalStone-B
Hadoop 454m 13s 840m 50s
Hadoop Streaming/Python 87m 29s 142m 32s
Sector/Sphere 33m 40s 43m 44s
Courtesy of Collin Bennet and Jonathan Seidman
of Open Data Group.
31
System Monitoring (Testbed)
32
System Monitoring (Sector/Sphere)
33
For More Information