Timm Morten Steinbeck, Computer Science/Computer Engineering Group Kirchhoff Institute f. Physics, Ruprecht-Karls-University Heidelberg - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Timm Morten Steinbeck, Computer Science/Computer Engineering Group Kirchhoff Institute f. Physics, Ruprecht-Karls-University Heidelberg

Description:

Timm Morten Steinbeck, Computer Science/Computer Engineering Group Kirchhoff Institute f. Physics, Ruprecht-Karls-University Heidelberg A Framework for Building ... – PowerPoint PPT presentation

Number of Views:135
Avg rating:3.0/5.0
Slides: 30
Provided by: unif57
Category:

less

Transcript and Presenter's Notes

Title: Timm Morten Steinbeck, Computer Science/Computer Engineering Group Kirchhoff Institute f. Physics, Ruprecht-Karls-University Heidelberg


1
Timm Morten Steinbeck, Computer Science/Computer
Engineering GroupKirchhoff Institute f. Physics,
Ruprecht-Karls-University Heidelberg
  • A Framework for Building Distributed Data Flow
    Chains in Clusters

2
Requirements
  • Alice A relativistic heavy ion physics
    experiment
  • Very large multiplicity ?15.000 particles/event
  • Full event size gt 70 MB
  • Last trigger stage (High Level Trigger, HLT) is
    first stage with complete event data
  • Data rate into HLT up to 25 GB/s

3
High Level Trigger
  • HLT has to process large volume of data to go
    from this...
  • ... to this
  • HLT has to perform reconstruction of particle
    tracks from raw ADC data.

1, 2, 123, 255, 100, 30, 5, 1, 4, 3, 2, 3, 4, 5,
3, 4, 60, 130, 30, 5, .......... ..........
4
High Level Trigger
  • HLT consists of Linux-PC farm (500-1000 nodes)
    with fast network
  • Nodes are arranged hierarchically
  • First stage reads data from detector and performs
    first level processing
  • Each stage sends produced data to next stage
  • Each further stage performs processing or merging
    on input data

5
High Level Trigger
  • The HLT needs a software framework to transport
    data through the PC farm.
  • Requirements for the framework
  • Efficiency
  • The framework should not use too much CPU cycles
    (which are needed for data analysis)
  • The framework should transport the data as fast
    as possible
  • Flexibility
  • The framework should consist of components which
    can be plugged together in different
    configurations
  • The framework should allow reconfiguration at
    runtime

6
The Data Flow Framework
  • Components are single processes
  • Communication based on publisher-subscriber
    principle
  • One publisher can serve multiple subscribers

Subscribe
Subscriber
Subscribe
Publisher
Subscriber
Subscribe
Subscriber
Subscriber
New Data
New Data
Publisher
Subscriber
New Data
Subscriber
7
Framework Architecture
  • Framework consists of mutually dependant packages

Data Source, Sink, Processing Components

Data Flow Components
Data Source, Sink, Processing Templates
Publisher Subscriber Base Classes
Communication Classes

Utility Classes
8
Utility Classes
  • General helper classes (e.g. timer, thread)
  • Rewritten classes
  • Thread-safe string class
  • Faster vector class

9
Communication Classes
  • Two abstract base classes
  • For small message transfers
  • For large data blocks
  • Derived classes for each network technology
  • Currently existing Message and block classes for
    TCP and SCI (Shared Memory Interconnect)

Message Base Class
TCP Message Class
SCI Message Class
Block Base Class
TCP Block Class
SCI Block Class
10
Communication Classes
  • Implementations foreseen for Atoll network
    (University of Mannheim) and Scheduled Transfer
    Protocol
  • API partly modelled after socket API (Bind,
    Connect)
  • Both implicit and explicit connection possible

Implicit Connection User Calls System
Calls Connect connect Send transfer
data Send transfer data Send transfer
data Disconnect disconnect
Explicit Connection User Calls System
Calls Send connect, transfer data,
disconnect Send connect, transfer data,
disconnect Send connect, transfer data,
disconnect
11
Publisher Subscriber Classes
  • Implement publisher-subscriber interface
  • Abstract interface for communication mechanism
    between processes/modules
  • Currently named pipes or shared memory available
  • Multi-threaded implementation

Subscribe
Subscriber
Subscribe
Publisher
Subscriber
Subscribe
Subscriber
Subscriber
New Data
New Data
Publisher
Subscriber
New Data
Subscriber
12
Publisher Subscriber Classes
  • Efficiency Event data not sent to subscriber
    components
  • Publisher process places data into shared memory
    (ideally during production)
  • Descriptors are sent to subscribers holding
    location of data in shared memory
  • Requires buffer management in publisher

Publisher
Subscriber
New Event
Event M Descriptor
Block 0
Block n
Shared Memory
Event M
Data Block 0
Data Block n
13
Data Flow Components
Several components to shape the data flow in a
chain
  • To merge data streams belonging to one event
  • To split and rejoin a data stream (e.g. for load
    balancing)

Event N, Part 1
Event N
EventMerger
Event N, Part 2
Event N, Part 3
Event 1
Event 1 Event 2 Event 3
Event 1 Event 2 Event 3
Processing
EventScatterer
EventGatherer
Processing
Processing
Event 3
14
Data Flow Components
Several components to shape the data flow in a
chain
  • To transparently transport data over the network
    to other computers (Bridge)
  • SubscriberBridgeHead has subscriber class for
    incoming data, PublisherBridgeHead uses publisher
    class to announce data

PublisherBridgeHead
SubscriberBridgeHead
Network
Node 2
Node 1
15
Component Template
  • Templates for user components are provided

Templates for user components are provided
Templates for user components are provided
  • To read out data from source and insert it into a
    chain (Data Source Template)
  • To accept data from the chain, process, and
    reinsert it (Analysis Template)
  • To accept data from the chain and process it,
    e.g. store it (Data Sink Template)

Buffer Management
Data Source Addressing Handling
Data Announcing
Output Buffer Mgt.
Accepting Input Data
Data Analysis Processing
Output Data Announcing
Input Data Addressing
Accepting Input Data
Data Sink Addressing Writing
Input Data Addressing
16
Benchmarks
  • Performance tests of framework
  • Dual Pentium 3, w. 733 MHz-800MHz
  • Tyan Thunder 2500 or Thunder HEsl motherboard,
    Serverworks HE/HEsl chipset
  • 512MB RAM, gt20 GB disk space, system, swap, tmp
  • Fast Ethernet, switch w. 21 Gb/s backplane
  • SuSE Linux 7.2 w. 2.4.16 kernel

17
Benchmarks
  • Benchmark of publisher-subscriber interface
  • Publisher process
  • 16 MB output buffer, event size 128 B
  • Publisher does buffer management, copies data
    into buffer, subscriber just replies to each
    event
  • Maximum performance more than 12.5 kHz

18
Benchmarks
  • Benchmark of TCP message class
  • Client sending messages to server on other PC
  • TCP over Fast Ethernet
  • Message size 32 B
  • Maximum message rate more than 45 kHz

19
Benchmarks
  • Publisher-Subscriber Network Benchmark
  • Publisher on node A, subscriber on node B
  • Connected via Bridge, TCP over Fast Ethernet
  • 31 MB buffer in publisher and receiving bridge
  • Message size from 128 B to 1 MB

Subscriber
Publisher
PublisherBridgeHead
SubscriberBridgeHead
Network
Node B
Node A
20
Benchmarks
  • Publisher-Subscriber Network Benchmark

Notes Dual CPU nodes 100 ? 2 CPUs Theor.
Rate 100 Mb/s / Evt. Size
21
Benchmarks
  • CPU load/event rate decrease with larger blocks
  • Receiver more loaded than sender
  • Minimum CPU load 20 sender, 30 receiver
  • Maximum CPU load _at_ 128 B events 90 receiver, 80
    sender
  • Management of gt 250.000 events in system!!

22
Benchmarks
  • Publisher-Subscriber Network Benchmark

23
Benchmarks
  • 32 kB event size Network bandwidth limits
  • 32 kB event size More than 10 MB/s
  • At maximum event size 12.3106 B of 12.5106 B

24
"Real-World" Test
  • 13 node test setup
  • Simulation of read-out and processing of 1/36
    (slice) of Alice Time Projection Chamber (TPC)
  • Simulated piled-up (overlapped) proton-proton
    events
  • Target processing rate 200 Hz (Maximum read out
    rate of TPC)

25
"Real-World" Test
Slice Merger
Event Merger
Patch Merger
Event Merger
Event Gatherer
T
T
T
T
T
T
T
T
T
T
T
T
Tracker (2x)
Event Scatterer
Cluster Finder
ADC Unpacker
File Publisher
Patches
1
6
5
4
3
2
26
"Real-World" Test
T
T
1, 2, 123, 255, 100, 30, 5, 0, 0, 0, 0, 1, 4, 0,
0, 0, 4, 1, 2, 60, 130, 30, 5, .......... .......
...
1, 2, 123, 255, 100, 30, 5, 40, 1, 4, 30, 4,
1, 2, 60, 130, 30, 5, .......... ..........
27
"Real-Word" Test
  • Third (line of) node(s) connects track segments
    to form complete tracks (Track Merging)
  • Second line of nodes finds curved particle track
    segments going through charge space-points
  • First line of nodes unpacks zero-suppressed,
    run-length-encoded ADC values and calculates 3D
    space-point coordinates of charge depositions

T
T
28
Test Results
  • Rate 270 Hz
  • CPU load 2 100
  • Network 6 130 kB/s event data
  • CPU load 2 75 - 100
  • Network 6 1.8 MB/s event data
  • CPU load 2 60 - 70

T
T
29
Conclusion Outlook
  • Framework allows flexible creation of data flow
    chains, while still maintaining efficiency
  • No dependencies created during compilation
  • Applicable to wide range of tasks
  • Performance already good enough for many
    applications using TCP on Fast Ethernet
  • Future work Use dynamic configuration ability
    for fault tolerance purposes, improvements of
    performance
  • More information http//www.ti.uni-hd.de/HLT
Write a Comment
User Comments (0)
About PowerShow.com