Cache Coherency on - PowerPoint PPT Presentation

1 / 9
About This Presentation
Title:

Cache Coherency on

Description:

Bus mastership logic. FIQ. ASB(Advanced System Bus) Shared page register ... Pseudo Bus mastership logic - Initiates retry transaction. 64 way. 8. address. 31 ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 10
Provided by: SUH73
Category:

less

Transcript and Presenter's Notes

Title: Cache Coherency on


1
Cache Coherency on Heterogeneous Multiprocessor
Platform with Shared Memory
Taeweon Suh
April. 21. 2003
2
1. System Block Diagram
  • ARM920T D
  • 16KB
  • 32 byte block
  • 64-way set associative
  • Replacement
  • Round-Robin, Random
  • - No cache coherency
  • PowerPC D
  • 32KB
  • 32 byte block
  • 8-way set associative
  • MEI protocol

Shared page register Snoop hit address
register FIQ status register
3
2. Hardware Design Snoop logic
FIQ
To ARM
From ARM
31
5 4 3 2 1 0
address
BWAIT
BGNT
Valid
8
TAG CAM
Shared page register Snoop hit address
register FIQ status register
. .
64 way
Snoop hit
Pseudo Bus mastership logic - Initiates retry
transaction
BREQ
BGNT
ASB(Advanced System Bus)
4
2. Hardware Design(cont.) Wrapper
PowerPC755
Protocol Conversion - AMBA to PowerPC
GBL ADDR ARTRY
Protocol Conversion - PowerPC to AMBA
ASB(Advanced System Bus)
5
3. Simulation Results Worst Case Scenario
  • Simulation environment
  • Seamless CVE(Mentor Graphics)
  • PowerPC 100MHz, ARM920T 50MHz
  • Atalanta RTOS
  • Bakery algorithm for lock implementation
  • 1 task on each CPU
  • I enabled, D selectively enabled
  • Wait cycle 6 ( 120ns)

for (i0ilt100i) akc_entercritical()
// critical section c buffer
c buffer c // critical section
akc_exitcritical()
6
3. Simulation Results(cont.) Typical(?) Case
Scenario
  • Simulation environment
  • Seamless CVE(Mentor Graphics)
  • PowerPC 100MHz, ARM920T 50MHz
  • Atalanta RTOS
  • Bakery algorithm for lock implementation
  • 1 task on each CPU
  • I enabled, D selectively enabled
  • Wait cycle 6 ( 120ns)

for (i0ilt10i) akc_entercritical()
// critical section for (j1jlt100j)
c buffer c
buffer c // critical section
akc_exitcritical()
7
4. In the context of HPC with cache coherency
  • The best way to run your applications as fast as
    possible

Split the applications with no dependency each
other
  • However, it is not applicable in most cases,
    Then ?

Avoid the WCS like data manipulation
  • Another tip for HPC

Arrange data in cache block boundary and try to
fully use a block(cache line).
8
5. Conclusion
  • Successfully implement cache coherency on
    heterogeneous
  • processor platform with shared memory
  • Only invalidation scheme is possible
  • Can be generalized on every processor platform
    no matter
  • what cache coherency protocol it supports or
    not
  • Could be used in SoC design, which could have
    multiple
  • heterogeneous processors inside

9
Any Questions?
Write a Comment
User Comments (0)
About PowerShow.com