Title: Cooperative Multithreading on Embedded Multiprocessor Architectures Enables Energyscalable Design
1Cooperative Multithreading on Embedded
MultiprocessorArchitectures Enables
Energy-scalable Design
- Bo-Cheng Charles Lai 1
- Patrick Schaumont 1
- Wei Qin 2
- Ingrid Verbauwhede 1,3
1. University of California, Los Angeles
2. Boston University 3. K.U.Leuven
2Energy-Scaled Embedded Multiprocessor
system clk
chip boundary
V/f scaling under controlof the application
V/f
V/f
V/f
V/f
n
n
n
n
ARM
ARM
ARM
ARM
High V/f
1.65V/251MHz
Low V/f
0.79V/59MHz
D
I
D
I
D
I
D
I
V2f ratio
18.5
BUS
memory interface
test-and-set lock
main memory
GEZEL is used for system integration of ARM ISS
3Outline
- Energy-Scaled Multiprocessor System
- Fast MPSOC simulation
- Individual control of energy scaling on each core
- Test-and-Set Lock
- Cooperative Multithreading (2 Kbytes)
- Fingerprint Minutiae Detection
- Energy Scaling Exploration
- Conclusions
4Multiprocessor Architecture
- ARM-Core
- 5-stage StrongArm micro-architecture
- SimIt ARM ISS of ARM-Core
- Control of Voltage/frequency Scaling
- Programmer selects power modes by a function call
- Bus Architecture
- Master/Slave Scheme
- Test-and-set-lock supports synchronization
- Snooping Cache Coherence
- Cycle Accurate Performance Model
- GEZEL integration and co-simulation
- Fast simulation 400K cycles/sec, 4-ARM
on3GHz-PIII/512MB
5Test-and-set Lock
- A Single Hardware Lock to Support Synchronization
- Support atomic operations
- Semaphores Are Located in Main Memory
memory
L1
hardware lock Lhw
memory interface
test-and-set lock
bus
6Cooperative Multithreading
- Multiprocessor Version of QuickThreads Library
(2KBytes) - Circular Thread-Q to Maintain Stack Pointers
- Four Routines to Control Threads
- Create, start, yield, and abort
user_program.c
user thread queue Q
sp4
sp5
sp6
threadlock Lq
stack4
stack5
stack6
Multi-threaded Program
main thread stack pointers
sp1
sp0
sp3
sp2
proc0
proc1
proc2
proc3
7Thread-parallel Minutiae Detection
256
detect
144X144
detect
combine
256
detect
detect
4 threads main thread
8Energy Scaling Exploration
- Compared to 1H (Nominal Case)
- 2HL 16 faster, 12 energy reduction
- 4LLLL 2.2 slower, 77 energy reduction
Reference Time Unit
1200
1_L
1000
Sample Design Constraint
800
2_LL
600
400
1_H
4_HHHH
200
4_LLLL
2_HH
2_HL
0
0.00
0.04
0.08
0.12
0.16
0.20
0.24
Dynamic Energy EAC J at Ceff 1nf per processor
() 2_HL two-processors one high-power, one
low-power
9Conclusion
- This Work Proposes
- Energy-Scaled Multiprocessor Architecture
- Individual energy scaling on each core
- Compact Cooperative MultithreadingProgramming
Model - 2KB thread library
- Fast Cycle-Accurate Evaluation Platform
- 400K cycles/s, 4-ARM on 3GHz-PIII/512MB
- Performance and energy
- Current Activities Future Work
- More detail power model for MPSOC
- Interconnect technologies and on-chip busfor
MPSOC - Programming model evaluation on MPSOC
10References
- GEZEL Environment (UCLA) http//www.ee.ucla.edu/s
chaum/gezel - SimIt Instruction Set Simulator (Boston Univ.)
http//sourceforge.net/projects/simit-arm - EmSec Group (UCLA) http//www.emsec.ee.ucla.edu/
- Project is supported by
- SRC (Semiconductor Research Corp.)
- NSF (National Science Foundation)