The Transition to MultiCore: Is Your Software Ready AN170 - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

The Transition to MultiCore: Is Your Software Ready AN170

Description:

Applications with poor synchronization among threads may not work properly in a ... POSIX provides lightweight primitives for MP programming (threads, mutexes) ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 48
Provided by: QNX
Category:

less

Transcript and Presenter's Notes

Title: The Transition to MultiCore: Is Your Software Ready AN170


1
The Transition to Multi-Core Is Your Software
Ready? (AN170)
  • Toby Foster
  • Product Marketing, Freescale

Sebastien Marineau-Mes Director, OS Group, QNX
2
Agenda
  • Overview
  • MPC8641D Overview
  • Asymmetric Multi-Processing
  • Symmetric Multi-Processing
  • The QNX Solution
  • The Role of Tools
  • QAs

3
The Transition to Multi-Core
  • Overview

4
Dual Core Example Applications
  • Dual core and high integration ideal for
  • High-end line card
  • Extensive processing power for extreme control
    plane activities
  • Mid-range line card
  • Capability to support both control and data
  • Services card
  • Upgrade platform with advanced features

Data plane ASIC/NPU
High-end line card




Management port


Mid-range line card
Services Card
5
Multiprocessing Configurations
  • Asymmetric Multiprocessing
  • Two separate OS or two copies of one non-SMP OS
  • Collapse two processors into one
  • Task offload or division of labor
  • Operating systems, data reside in different
    address spaces
  • Resource sharing handled by user
  • Static load balancing
  • Bound and Symmetric Multiprocessing
  • Homogenous OS support
  • High-performance option
  • Software transparency
  • Cores share address space for OS and data
  • Resource sharing handled by OS
  • Dynamic load balancing by OS (SMP)
  • Static task partitioning (BMP)

Memory Map Overlap
6
Usage of a Dual Core Device
A B D
E
Core1
Core2
One core handles data plane, one control plane
High End
Mid Range
7
Usage of a Dual Core Device
A B D
E
Core1
Core2
One core handles data plane, one control plane
High End
Mid Range
8
Usage of a Dual Core Device
A B D
E
Core1
Core2
One core handles data plane, one control plane
C F
Core1
Core2
Network and disk partioning
High End
Mid Range
9
Usage of a Dual Core Device
A B D
E
Core1
Core1
Core2
Core2
One core handles data plane, one control plane
Data plane ASIC
Task offload
C F
Core1
Core2
Network and disk partioning
High End
Mid Range
10
Usage of a Dual Core Device
A B D
E
Core1
Core1
Core1
Core2
Core2
Core2
One core handles data plane, one control plane
Data plane ASIC
Data plane ASIC
Task offload
Each core handles a separate aspect of control
plane
C F
Core1
Core2
Network and disk partioning
High End
Mid Range
11
MPC8641D Packed with Processing Power
  • Dual e600 PowerPC cores
  • AltiVecTM
  • 36-bit addressing
  • 1MB L2 Cache w/ECC per core
  • Dual Memory Controller
  • Dual DDR2/3 SDRAM
  • 64 bit data bus w/ECC
  • Support for up to 32GB memory
  • High Speed Interconnect
  • One x8/x4/x2/x1 PCIe AND
  • One x8/x4/x2/x1 PCIeOr One x4/x1 sRapidIO
  • Ethernet
  • 4x 10/100/1000 Ethernet Controllers w/
    Classification/Policing, 8 Rx/Tx Queues,
    Checksum Offload, QoS, Lossless Flow Control,
    and FIFO mode
  • 90nm SOI Process, 1023 Pin package
  • Availability
  • Alpha Samples Q206
  • Production Mid 2007

MPC8641D
MPX Bus
Peripheral Logic Bus
12
Asymmetric Integration
non-SMP OS
non-SMP OS
non-SMP OS
non-SMP OS
8641D
e600
e600
system logic
system logic
  • Two OS kernel images in physical memory
  • Each core executes a separate OS kernel image
  • Non-SMP OSes must cooperate in sharing resources
  • VxWorks, OSE, Integrity, Jaluna-1, many others

13
Asymmetric MP Memory Organization
e600 core0
OS, Apps "A"
OS "A"
OS "B"
  • Each OS kernel expects to control physical memory
    beginning at address 0
  • Each wants its own interrupt vectors
  • The MMU can relocate applications and shared
    memory appropriately
  • The 8641D includes a hardware translator to
    relocate physical address 0 for core1

MMU
Apps "A"
e600 core1
OS, Apps "B"
Apps "B"
Shared memory
MMU
Physical memory
14
Resources Shared or Multiple Instances
e600 core1
e600 core0
MPIC
Local Bus
SRIO
Multiple resource instances
Shared Resource
Partially shared or multiple instances in some
circumstances
15
QNX and Multi-core
  • QNX has done the heavy lifting to enable
    migration to multi-core
  • Let developers focus on product differentiation
  • Reliable, proven support for multi-core
    applications
  • 1997 Industrys first to bring SMP to embedded
  • 1984 High performance, transparent distributed
    messaging
  • Full support for asymmetric and symmetric
    multiprocessing
  • Linux and VxWorks interoperability
  • Migrate existing software base and enable new
    multi-core optimized applications
  • Multi-core capable tool suite
  • World class professional services and expert
    training
  • Active role in developing standards through
    Multi-core Exchange consortium
  • Enable portability of applications across various
    platforms
  • Derive common set of APIs that multi-core
    development tools can utilize to support
    interoperability

16
Asymmetric Processing
  • Asymmetric Model Pros
  • Only possible mode when different OSs are running
  • CPU core can be dedicated to specific
    applications
  • One possible mode for applications that cannot
    operate with parallel processing
  • Asymmetric Model Cons
  • Resource sharing / arbitration needs to be
    designed into system by developers
  • Neither OS owns the whole system
  • Memory, I/O, interrupts are shared
  • Evolution - complexity will increase as more
    cores are added
  • Static configuration, difficult to add dynamic
    resourcing
  • Time to market?
  • Contention possible during system initialization,
    during normal operation, on interrupts, on system
    error conditions. All must be dealt with by the
    designer.
  • Synchronization between cores done through
    application level messages
  • Sub-optimal performance
  • Complexity of the problem is not linear
  • Addition of more cores may require
    re-architecting application to take full
    advantage of additional CPUs

17
Homogeneous AMPNeutrino Transparent Distributed
Processing
Internet
  • Extends message passing bus over a transport
    layer
  • Applications / services can be built in a fully
    distributed manner without special code
  • Message queues
  • File systems
  • Hardware ports
  • Seamless sharing of I/O resources between cores
    (e.g. use a serial port owned by another core)

Flash File System
MessageQueues
NetworkingStack
Message-Passing Bus
Microkernel Core 0
Application
Message Bridge (Ethernet, RapidIO,Shared Memory)
Flash File System
Database
Microkernel Core 1
Application
18
Homogeneous AMPNeutrino Transparent Distributed
Processing
Internet
  • Extends message passing bus over a transport
    layer
  • Applications / services can be built in a fully
    distributed manner without special code
  • Message queues
  • File systems
  • Hardware ports
  • Seamless sharing of I/O resources between cores
    (e.g. use a serial port owned by another core)

Flash File System
MessageQueues
NetworkingStack
Message-Passing Bus
Microkernel Core 0
Application
Message Bridge (Ethernet, RapidIO,Shared Memory)
Flash File System
Database
Microkernel Core 1
Application
fd open(/dev/ffs1,) write(fd, )
19
Homogeneous AMPNeutrino Transparent Distributed
Processing
Internet
  • Extends message passing bus over a transport
    layer
  • Applications / services can be built in a fully
    distributed manner without special code
  • Message queues
  • File systems
  • Hardware ports
  • Seamless sharing of I/O resources between cores
    (e.g. use a serial port owned by another core)

Flash File System
MessageQueues
NetworkingStack
Message-Passing Bus
Microkernel Core 0
Application
Message Bridge (Ethernet, RapidIO,Shared Memory)
Flash File System
Database
Microkernel Core 1
Application
fd open(/dev/ffs1,) write(fd, )
20
Homogeneous AMPNeutrino Transparent Distributed
Processing
Internet
  • Extends message passing bus over a transport
    layer
  • Applications / services can be built in a fully
    distributed manner without special code
  • Message queues
  • File systems
  • Hardware ports
  • Seamless sharing of I/O resources between cores
    (e.g. use a serial port owned by another core)

Flash File System
MessageQueues
NetworkingStack
Message-Passing Bus
Microkernel Core 0
Application
Message Bridge (Ethernet, RapidIO,Shared Memory)
Flash File System
Database
Microkernel Core 1
Application
fd open(/net/core0/dev/ffs1,) write(fd, )
21
Homogeneous AMPNeutrino Transparent Distributed
Processing
Internet
  • Extends message passing bus over a transport
    layer
  • Applications / services can be built in a fully
    distributed manner without special code
  • Message queues
  • File systems
  • Hardware ports
  • Seamless sharing of I/O resources between cores
    (e.g. use a serial port owned by another core)

Flash File System
MessageQueues
NetworkingStack
Message-Passing Bus
Microkernel Core 0
Application
Message Bridge (Ethernet, RapidIO,Shared Memory)
Flash File System
Database
Microkernel Core 1
Application
fd open(/dev/ffs1,) write(fd, )
fd open(/net/core0/dev/ffs1,) write(fd, )
22
Heterogeneous AMP
  • Asymmetric Processing with Neutrino and Linux
  • Run Carrier Grade Linux on one core with QNX RTOS
    on the other
  • Inter-process communication between OSs
  • TIPC is emerging standard between applications
  • http//tipc.sourceforge.net/
  • Location Transparency
  • Higher performance than TCP/IP
  • Quality of Service
  • Linux benefits
  • Wide availability of open source and commercial
    software
  • No run time licensing
  • QNX benefits
  • Real time performance
  • High availability framework
  • Memory protection
  • Market leading distributed processing capability
  • No GPL contamination issues
  • Combined benefit best of both worlds

23
Symmetric Multiprocessing
SMP OS
SMP OS
  • One OS kernel image in physical memory
  • Both cores execute the same OS kernel image
  • SMP OS owns all of the resources
  • Linux, QNX, BSD only embedded SMP OSes

24
SMP Memory Organization
e600 core0
Apps "A"
OS
OS
Apps "A"
MMU
Shared memory
Apps "A"
  • The OS kernel resides at physical memory address
    0, addressable by both cores
  • The MMU relocates applications and shared memory
    appropriately

OS
Apps "B"
e600 core1
Shared memory
OS
Apps "B"
Physical memory
MMU
Shared memory
Apps "B"
25
What is Coherency?
  • Consistent view of memory across multiple agents
  • Buffer descriptors and data buffers updated by
    processor(s) as well as external agent(s)
  • Software-managed coherency
  • Processor overhead to keep track of who owns
    what when
  • Hardware-managed coherency
  • Each processors hardware ensures consistency of
    shared data by snooping other agentss broadcasts
    on the system bus

26
Performance Features of HW Coherency
  • Coherency protocol
  • MEI
  • MESI
  • Update mechanism
  • Push
  • Intervention
  • Cache Tags
  • Single-ported
  • Dual-ported

Processor A
Processor B
MPX Bus
Memory
I/O Device
27
Symmetric Processing
  • Symmetric Model Pros
  • Highly scalable. Supports multiple processing
    cores seamlessly without code modification
  • One OS sees all and handles all resource
    sharing / arbitration issues
  • Dynamic load balancing can handle processing
    bursts with OS controlled thread scheduling
  • Dynamic memory allocation means that all cores
    can draw on full pool of available memory without
    penalty.
  • High performance inter-core messaging and thread
    synchronization
  • Core-to-core application synchronization using
    POSIX OS primitives
  • System wide statistics / information gathering
    capability for performance optimizations,
    debugging, etc.
  • Symmetric Model Cons
  • Load balancing is dynamic and application may
    require dedicated CPU
  • Applications with poor synchronization among
    threads may not work properly in a true parallel
    processing environment
  • Difficult to change software
  • 3rd party software

Applications
OS
CPU
CPU
Cache
Cache
System Interconnect
I/O
I/O
Memory Controller
I/O
Memory
28
Multi-core Scaling Software
  • QNX conforms to POSIX (Portable Operating System
    Interface) Application Programming Interface
  • Allows straightforward porting of code from one
    OS to another that is also conformant
  • POSIX provides lightweight primitives for MP
    programming (threads, mutexes)
  • Application broken down into memory protected
    units called processes
  • Processes further divided into internal,
    schedulable units called threads
  • Threads share all of the same resources (memory
    space included)
  • PROCESSES run on individual cores concurrently in
    asymmetric mode (all threads for a process are
    tied to one core)
  • THREADS run on individual cores concurrently in
    symmetric operation

29
Scaling Applications Asymmetrically
Core-to-core IPC
  • Process per core required for full performance
  • State information maintained in shared memory or
    through IPC
  • Clustering protocols (e.g. TIPC)
  • Heavy-weight synchronization required
  • Potentially complex interaction required between
    processes to share work
  • Difficult to scale to more processors

30
Scaling Applications Symmetrically
  • Pool of POSIX worker threads
  • Dispatch work to worker threads
  • Scales very well / easily with SMP
  • Simply adjust number of worker threads to number
    of CPUs
  • No code change required
  • Very lightweight OS primitives to synchronize

Worker thread
Worker thread
Worker thread
Main thread
Threads
CPU 1
Worker thread
CPU 0
Process
Worker thread
Main thread
CPU N
Worker thread
31
The Transition to Multi-Core
  • The QNX Solution

32
AMP or SMP?
  • Sometimes this can be a clear cut decision
  • Two operating systems AMP
  • Application requires all available CPUs to
    maximize performance SMP
  • Pre-selecting the operating system can force the
    decision (usually AMP support only)
  • What if the versatility of SMP is desired but the
    control of AMP is needed?

33
QNX Bound Multiprocessing
  • The Best of Both Worlds
  • Bound Multiprocessing offers an approach that
    provides benefits of both asymmetric and
    symmetric modes
  • Support existing code base and multi-core
    optimized applications
  • Supports bound and symmetric operation,
    selectable by process / thread
  • Designer has full control over applications
  • Applications and/or threads can be bound to a
    specific core
  • Load balancing
  • OS dynamic or designer controlled
  • Tools to optimize load balancing
  • Resource sharing handled by OS
  • High Performance
  • Kernel support for message passing and thread
    synchronization

34
Multiprocessing Summary
35
The Transition to Multi-Core
  • The Role of Tools

36
The Role of Tools
  • The right toolset eases the transition to
    multi-core processors
  • Assess current software when moving to multi-core
  • Should processes be separated between cores?
  • Determine how closely coupled the current
    processes are
  • Where can concurrent processing help?
  • Show the current processing bottlenecks
  • Debugging in a multi-core environment
  • Characterize and debug interaction between
    threads on multiple CPUs
  • Tuning and Optimization in a multi-core
    environment
  • Move processes and threads between cores
  • Examine processing bottle necks
  • Examine inter-process communications

37
Instrumented Kernel
  • The instrumented kernel logs events which are
    filtered and stored into buffers which are
    captured and analyzed.

System calls
Interrupts
Process/thread creation
On/Off filters
Static event filters
User defined filters
Events
Microkernel
Event buffers
State changes
E1
E2
E3
E4
E5
E6
System Profiler
Network
Capture
File
38
Thread / Process Coupling QNX Momentics System
Profiler
Determine amount of messaging between processes.
39
Finding Processing Bottlenecks QNX Momentics
Application Profiler
Determine which threads are busiest
Pinpoint which source lines consume the most CPU.
Use call pairing to identify your programs
execution structure, then use the information to
make your code more efficient.
40
Load Balancing QNX Momentics System Profiler
Measure CPU activity for all cores and to
determine optimal load balancing
41
The Transition to Multi-core
  • Software Architecture and Optimization

42
Architecting Multi-core Applications
  • Design a concurrency model (task is either a
    thread or a process)
  • Assign each external event or each peripheral a
    separate task
  • Use one task to service events that occur at
    approximately the same rate
  • Assign separate tasks to operations of widely
    differing durations
  • Perform related computations (such as
    safety-critical or multi-stage, sequential)
    within a single task
  • Isolate unrelated operations into separate tasks
  • Assign proper priorities to tasks within a CPU
  • E.g. rate monotonic analysis (RMA)
  • For asymmetric operation, partition application
    appropriately
  • AMP or BMP

43
Partitioning System Applications
  • Partition by functionality
  • Processes related to a particular functionality
    are grouped on a CPU
  • Data path on CPU 0, control plane on CPU 1
  • Receive path on CPU 0, transmit path on CPU 1
  • Partition by CPU load
  • Process with high (or highly variable) CPU load
    runs on its own CPU
  • Routing application Route calculation on CPU 1,
    remainder of the application on CPU 0
  • High priority, high CPU usage threads can starve
    other threads
  • Partition by information-sharing requirements
  • Applications requiring access to same data
    grouped on a CPU (reduces contention and
    resulting serialization between cores)

44
Optimizing Multi-core Applications
  • Reduce contention
  • Minimize or remove core-core interactions to
    ensure most parallelism
  • Scale to number of available processors
  • Use system analysis tools to tune performance
  • Asymmetric operation
  • Properly partition to produce desired CPU loading
    for each core
  • Symmetric operation
  • Asymmetric application operation
  • Thread affinity
  • Bound Multiprocessing for dedicated CPU
    allocation
  • Select proper thread / process priorities to
    optimize real-time performance / CPU allocation

45
QNX Enables Multi-core Migration
  • The QNX provides complete solution
  • Proven OS support for any multi-core processing
    model
  • Full suite of development tools to characterize
    and optimize multi-core applications
  • Expert professional services and support
  • Market leading multi-core board support packages
  • Professional Training
  • Asymmetric Multiprocessing
  • Support existing software base, non-optimized
    uni-processor approach
  • Mixed OS environment

Design Needs
  • Bound Multiprocessing
  • Migrate existing software base
  • Mix existing applications with multi-core
    optimized applications
  • Transparent scaling beyond dual core
  • Symmetric Multiprocessing
  • Multi-core optimized applications
  • Transparent scaling beyond dual core

46
QNX, Freescale and Multi-core Processors
  • Freescale and QNX have collaborated on PPC for
    many years
  • QNX has extensive support of Freescale Processors
  • QNX and Freescale have existing customers
    shipping products using both multi-processing and
    distributed processing based on MPC744x
    processors
  • QNX and Freescale committed to enabling customer
    success on multi-core processors starting with
    the MPC8641D
  • See QNX Multi-core Edition running on the
    MPC8641D in the technology lab today

47
Thank You!
  • Questions and Answers
Write a Comment
User Comments (0)
About PowerShow.com