P1247676901NaYnE - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

P1247676901NaYnE

Description:

VLC Task. Main DivX Task. 18. DivX Encoder Description ... QCIF RESOLUTION, 25 frames/s. 176. 144. QCIF using ARM7 (60MHz) processors ( 1 for VLC) ... – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 31
Provided by: Mar1094
Category:

less

Transcript and Presenter's Notes

Title: P1247676901NaYnE


1
An Efficient Architecture for the Implementation
of Message Passing Programming Model on Massive
Multiprocessor SoC
Ferid Gharsalli, Amer Baghdadi, Marius Bonaciu,
Giedrius Majauskas, Wander Cesario, Ahmed A.
Jerraya TIMA laboratory 46 av. Felix Viallet,
38031Grenoble Cedex (France) Tel (33) 476 574
759 ferid.gharsalli, amer.baghdadi,
marius.bonaciu, giedrius.majauskas,
wander.cesario, ahmed.jerraya_at_imag.fr
2
Outline
  • Introduction
  • Flexible and Scalable Architecture for Parallel
    Computations
  • Parallel Programming Model
  • Application DivX Real Time Encoder
  • Conclusions

2
3
Efficient MP-SoC Design Method
Massive MP-SoC
Parallel Programming Model
Parallel Application
Slow and inefficient correspondence at early
stages of design
Many specifications
Flexible and Scalable Architecture for Parallel
Computations
Architecture
3
4
Outline
  • Introduction
  • Flexible and Scalable Architecture for Parallel
    Computations
  • Parallel Programming Model
  • Application DivX Real Time Encoder
  • Conclusions

4
5
Evolution of Embedded Applications
What is the main problem of today's Embedded
Applications?
  • GAP between design methods and actual
    architectures, mainly because
  • Massive Computation
  • Massive Data Transfer
  • Cost and Power Constraints
  • Huge Design Time

5
6
Key Issues
How the GAP can be filled?
  • Concurrency
  • to cope with Massive Computation
  • Efficient Data Transfer Architecture
  • to cope with Massive Data Transfer
  • Application Specific Communication/Comp.
  • to cope with Cost and Power Constraints
  • Higher Level Programming Model
  • to cope with Huge Design Time

6
7
Objective
What is our objective?
  • Multicore Architecture
  • to achieve Concurrency
  • Efficient Network on Chip
  • to achieve Efficient Data Transfer
  • Heterogeneous Communication/Computation
  • Specific HW/SW Interfacing to achieve Application
    Specific Comm./Comp.
  • High Level Parallel Programming Model
  • Message Passing to achieve Higher Level Design

? Highly flexible and scalable architectures
SW Comp.
SW Comp.
SW Comp.
SW Comp.


Task1
Task2
Task3
TaskX

MP-API
Task1
Task2
Task3
TaskN
Specific OS HAL
DesignFlow
CPU SubSystem
HW Adapt
Specific HW/SW Intrf.
API
Parallel Programming Model
NoC
7
8
Design Flow
Application Specifications
Application
Algorithm Specifications
High Level SW Description
High Level MP-SoC Architecture
8
9
Design Flow
Abs.M1
Abstract Module2
Abs.M3
Abs.M4
High Level MP-SoC Architecture
Task1
Task2
Task3
Task4
Task5
M.PPM
Module PPM
M.PPM
M.PPM
Abstract NoC

NoC PPM
High level MP-SoC Architecture
Low Level MP-SoC Architecture
8
10
Outline
  • Introduction
  • Flexible and Scalable Architecture for Parallel
    Computations
  • Parallel Programming Model
  • Application DivX Real Time Encoder
  • Conclusions

9
11
Without Parallel Programming Model
Task1
Task3
Task4
Task5
Task2
API
API
Specific OS HAL
Specific OS HAL
CPU2
CPU1

CPU SubSystem
CPU SubSystem
Specific HW/SW interf.
Specific HW/SW interf.
Communication Network (NoC)
10
12
With Parallel Programming Model
Task1
Task2
Task3
Task4
Task5
Specific OS HAL
Specific OS HAL
CPU2
CPU1

CPU SubSystem
CPU SubSystem
Specific HW/SW interf.
Specific HW/SW interf.
Communication Network (NoC)
11
13
Parallel Programming Model 1/2
What is a Parallel Programming Model?
  • HW/SW INTERFACE which
  • separates the high-level properties(SW) from
    low-level ones(HW)
  • ABSTRACT MACHINE which
  • provides certain operations to the programming
    level above(SW)
  • requires implementation for each operations of
    the architectures bellow(HW)

12
14
Parallel Programming Model 2/2
What are the properties of a Parallel Programming
Model?
  • EASY TO PROGRAM because it needs to conceal
    the
  • partitioning of a program into different modules
  • mapping of the tasks into different types of
    modules (HW,SW)
  • communication type between the tasks
  • synchronization method between the tasks
  • SOFTWARE DEVELOPMENT TECHNOLOGY
  • needs to allow the development of the
    application in a typical software design method
  • ARCHITECTURE INDEPENDENT
  • to be able to migrate from one architecture
    model to another, without having to be
    redeveloped or trivially modified
  • EASY TO BE UNDERSTOOD
  • EFFICIENTLY IMPLEMENTABLE
  • needs to offer efficient results over a high
    variety of different parallel architectures
  • COST MEASURES
  • the ability to decide that Operation A is better
    than Operation B for a particular problem

13
15
Types of Parallel Programming Models
EXPLICIT

IMPLICIT

Hard
Building parallel applications
Easy
Low
High
Efficient parallel applications
14
16
Hard to Debug
!!! Hard to debug when is designed to be
Application specific !!!
Data dependent computation
Application SW
C library bug
Incorrect FIFO counter value causes
5
5
12
Parallel Prog. Model
deadlock.
30
12
Context switch does not work correctly.
µ-Kernel/OS
Booting is not synchronized
13
among processors.
13
5
5
Lost some interrupts
Bugs
Wrong interrupt priority levels
Result of compressed video is not
correct.
Abnormal execution of a portion
of C code
DAC04, San Diego, CA Mohamed Wassim YOUSSEF,
Sungjoo YOO, Arif SASONGKO, Yanick PAVIOT, Ahmed
JERRAYA TIMA Laboratory Debugging HW/SW
Interface for MP-SoC Video Encoder System Design
Case
15
17
Application/PPM Communication
Parallel Application

Task1
Task2
Task3
Task4
Task5
Task6
TaskN
How the Application interacts with the Parallel
Programming Model?
MP_Init(this,argc,argv) MP_Finalize(this) MP_
ISend(this,buf,count,datatype,dest,tag,comm) M
P_IRecv(this,buf,count,datatype,source,tag,comm
,status) MP_IBSend(this,buf,count,datatype,de
st,tag,comm) MP_IBRecv(this,buf,count,datatype
,source,tag,comm,status) MP_ISSend(this,buf,
count,datatype,dest,tag,comm) MP_ISRecv(this,b
uf,count,datatype,source,tag,comm,status) MPI_Wa
it(this,request,status) MPI_Test(this,request,f
lag,status)
Shared memory Message passing RDMA
16
18
Outline
  • Introduction
  • Flexible and Scalable Architecture for Parallel
    Computations
  • Parallel Programming Model
  • Application DivX Real Time Encoder
  • Conclusions

17
19
DivX Encoder Description
DivX very popular implementation of the MPEG4
standard (ISO/IEC 14496-2)
quanta
Motion vectors
I
YUV
t
Motion Estimation
MPEG4/ISO Bitstream
DCT
Quant.
VLC
P
Motion vectors
Reference image
P
Motion Comp.
IDCT
DeQuant.
t-1
I
18
20
DivX Parallelization
VLC
19
21
DivX MP-SoC Architecture Generation


Splitter (Preprocessing)
Antenna (Video source)
Combiner (Postprocessing)
Main DivX1
Main DivXn
VLC1
VLCm
MPEG4 storage

Message Passing API
Parallel Programming Model
Flexible and Scalable High Level Architecture
Template Model
Parameters (Parallel.,Part., Mapp., Comm.,
Sync.,etc)
Design Flow
Antenna
Storage
HW Adapt
HW Adapt
Main DivXn
VLC1
VLCn
Main DivX1
Splitter
Combiner


MP-API
MP-API
MP-API
MP-API
Specific OS
Specific OS
Specific OS
Specific OS
CPU Subsystem
CPU Subsystem
CPU Subsystem
CPU Subsystem
HW Adapt
HW Adapt
Spec. HW/SW intrf.
Spec. HW/SW intrf.
Spec. HW/SW intrf.
Spec. HW/SW intrf.




NoC
20
22
Performance results 1/6
QCIF RESOLUTION, 25 frames/s
176
144
21
23
Performance results 2/6
22
24
Performance results 3/6
23
25
Performance results 4/6
CIF RESOLUTION, 25 frames/s
352
288
24
26
Performance results 5/6
25
27
Performance results 6/6
26
28
Outline
  • Introduction
  • Flexible and Scalable Architecture for Parallel
    Computations
  • Parallel Programming Model
  • Application DivX Real Time Encoder
  • Conclusions

27
29
Conclusions
  • todays MP-SoC architectures require
  • Multicore Based Architectures
  • Efficient Data Transfer Architectures
  • Application Specific Communication/Computation
  • High Level Programming Models
  • both, SW design and HW design, are crucial for
    obtaining efficient results
  • linking the SW designers work with the HW
    designers work at early stages of design is
    difficult
  • using Parallel Programming Models through the
    design flow is the right efficient linking
    method
  • example on a Real Time DivX Encoder Application
    was presented
  • experimental results prove an efficient and
    scalable MP-SoC architecture obtained through a
    very efficient design flow
  • Future work
  • testing this approach on different other
    applications (i.e. MP3 Real Time Encoder)
  • fully automating the design flow

28
30
Thank you
Write a Comment
User Comments (0)
About PowerShow.com