Cooperative Multithreading on Embedded Multiprocessor Architectures Enables Energyscalable Design PowerPoint PPT Presentation

presentation player overlay
1 / 10
About This Presentation
Transcript and Presenter's Notes

Title: Cooperative Multithreading on Embedded Multiprocessor Architectures Enables Energyscalable Design


1
Cooperative Multithreading on Embedded
MultiprocessorArchitectures Enables
Energy-scalable Design
  • Bo-Cheng Charles Lai 1
  • Patrick Schaumont 1
  • Wei Qin 2
  • Ingrid Verbauwhede 1,3

1. University of California, Los Angeles
2. Boston University 3. K.U.Leuven
2
Energy-Scaled Embedded Multiprocessor
system clk
chip boundary
V/f scaling under controlof the application
V/f
V/f
V/f
V/f
n
n
n
n
ARM
ARM
ARM
ARM
High V/f
1.65V/251MHz
Low V/f
0.79V/59MHz
D
I
D
I
D
I
D
I
V2f ratio
18.5
BUS
memory interface
test-and-set lock
main memory
GEZEL is used for system integration of ARM ISS
3
Outline
  • Energy-Scaled Multiprocessor System
  • Fast MPSOC simulation
  • Individual control of energy scaling on each core
  • Test-and-Set Lock
  • Cooperative Multithreading (2 Kbytes)
  • Fingerprint Minutiae Detection
  • Energy Scaling Exploration
  • Conclusions

4
Multiprocessor Architecture
  • ARM-Core
  • 5-stage StrongArm micro-architecture
  • SimIt ARM ISS of ARM-Core
  • Control of Voltage/frequency Scaling
  • Programmer selects power modes by a function call
  • Bus Architecture
  • Master/Slave Scheme
  • Test-and-set-lock supports synchronization
  • Snooping Cache Coherence
  • Cycle Accurate Performance Model
  • GEZEL integration and co-simulation
  • Fast simulation 400K cycles/sec, 4-ARM
    on3GHz-PIII/512MB

5
Test-and-set Lock
  • A Single Hardware Lock to Support Synchronization
  • Support atomic operations
  • Semaphores Are Located in Main Memory

memory
L1
hardware lock Lhw
memory interface
test-and-set lock
bus
6
Cooperative Multithreading
  • Multiprocessor Version of QuickThreads Library
    (2KBytes)
  • Circular Thread-Q to Maintain Stack Pointers
  • Four Routines to Control Threads
  • Create, start, yield, and abort

user_program.c
user thread queue Q
sp4
sp5
sp6
threadlock Lq
stack4
stack5
stack6
Multi-threaded Program
main thread stack pointers
sp1
sp0
sp3
sp2
proc0
proc1
proc2
proc3
7
Thread-parallel Minutiae Detection
256
detect
144X144
detect
combine
256
detect
detect
4 threads main thread
8
Energy Scaling Exploration
  • Compared to 1H (Nominal Case)
  • 2HL 16 faster, 12 energy reduction
  • 4LLLL 2.2 slower, 77 energy reduction

Reference Time Unit
1200
1_L
1000
Sample Design Constraint
800
2_LL
600
400
1_H
4_HHHH
200
4_LLLL
2_HH
2_HL
0
0.00
0.04
0.08
0.12
0.16
0.20
0.24
Dynamic Energy EAC J at Ceff 1nf per processor
() 2_HL two-processors one high-power, one
low-power
9
Conclusion
  • This Work Proposes
  • Energy-Scaled Multiprocessor Architecture
  • Individual energy scaling on each core
  • Compact Cooperative MultithreadingProgramming
    Model
  • 2KB thread library
  • Fast Cycle-Accurate Evaluation Platform
  • 400K cycles/s, 4-ARM on 3GHz-PIII/512MB
  • Performance and energy
  • Current Activities Future Work
  • More detail power model for MPSOC
  • Interconnect technologies and on-chip busfor
    MPSOC
  • Programming model evaluation on MPSOC

10
References
  • GEZEL Environment (UCLA) http//www.ee.ucla.edu/s
    chaum/gezel
  • SimIt Instruction Set Simulator (Boston Univ.)
    http//sourceforge.net/projects/simit-arm
  • EmSec Group (UCLA) http//www.emsec.ee.ucla.edu/
  • Project is supported by
  • SRC (Semiconductor Research Corp.)
  • NSF (National Science Foundation)
Write a Comment
User Comments (0)
About PowerShow.com