A Multiprocessor SystemonChip for RealTime Biomedical Monitoring and Analysis: Architectural Design - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

A Multiprocessor SystemonChip for RealTime Biomedical Monitoring and Analysis: Architectural Design

Description:

Architectural Design Space Exploration. Rustam Nabiev. Biomedical Engineering Dept. Karolinska University Hospital. Huddinge, Stockholm, Sweden ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 27
Provided by: carl293
Category:

less

Transcript and Presenter's Notes

Title: A Multiprocessor SystemonChip for RealTime Biomedical Monitoring and Analysis: Architectural Design


1
A Multiprocessor System-on-Chip for Real-Time
Biomedical Monitoring and Analysis
Architectural Design Space Exploration
Iyad Al Khatib IMIT, ICT, KTH Royal Institute of
Technology Stockholm, Sweden
Davide Bertozzi ENDIF University of Ferrara
Ferrara, Italy
Luca Benini DEIS University of Bologna Bologna,
Italy
Francesco Poletti DEIS University of Bologna
Bologna, Italy
Rustam Nabiev Biomedical Engineering
Dept. Karolinska University Hospital Huddinge,
Stockholm, Sweden
Mohamed Bechara ECE, FEA American University of
Beirut Beirut, Lebanon
Axel Jantsch IMIT, ICT, KTH Royal Institute of
Technology Stockholm, Sweden
Hasan Khalifeh ECE, FEA American University of
Beirut Beirut, Lebanon
43rd Design Automation Conference (DAC 06)
2
Outline
  • Motivation
  • MPSoC for ECG analysis
  • ECG analysis algorithm
  • Architectural bottleneck analysis
  • Architecture exploration
  • Architecture tuning and optimization
  • Scalability analysis
  • Comparison with state-of-the-art solutions
  • Conclusions

3
Motivation
United States, 2003
1,000,000
All Ages
lt85
85
800,000
Alzheimer
COPD
600,000
Cancer
Deaths
Other CVD
Stroke
400,000
Heart Disease
Heart diseases and stroke statistics 2006
update American Heart Association
200,000
0
50 of these deaths could be avoided with a
reliable combination of cost effective monitoring
and analysis
World market for biomedical devices for ECG
monitoring gt 1B Novosense05
4
State of the art
Limited processing power and tight power budgets
of Holter devices has traditionally limited their
functionality to data acquisition
  • Remote real-time monitoring through a
    communication link involves
  • Transmission of a huge amount of life-critical
    data
  • A 100 functional always-ON connection

5
Real-time ECG analysis
  • Real-time in-situ ECG MONITORING ANALYSIS aims
    to
  • Promptly react to life-threatening heart
    malfunctions
  • Relax requirements on telemedicine links
  • Challenges
  • Physiological variability of QRS complexes
  • Base-line wander
  • Muscle noise
  • Artifacts due to electrode motion
  • Power-line interference
  • Preserve patient mobility
  • Moving from 3-to-12-lead analysis
  • Larger sampling frequencies
  • Tight power budgets

Algorithm development
Scalable energy-efficient HW-SW platforms
6
Contribution of this work
  • 1. Remove HW bottlenecks
  • Scalable computational
  • horsepower
  • Scalable communication
  • architecture
  • 2. Remove SW bottleneck
  • Scalable algorithm for RT ECG analysis
  • Parallelization strategy
  • 4. Create a functionally and timing-accurate
    virtual platform
  • 0.13um industrial technology-homogeneous power
    models
  • Integrates industrial IP cores, interconnect
    fabric, IOs
  • 5. Explore the design space
  • Demonstrates 12-lead analysis _at_ gt1KHz
  • Performance and power analysis and tuning

7
Medical background
ECG is an electrical recording of the heart
activity
  • 1-lead ECG signal PQRSTU peaks
  • Each peak and inter-peak distance is related to a
    different heart activity
  • Sampling frequencies 250 Hz, 1kHz
  • Higher sampling frequencies might enhance
    analysis accuracy (e.g., resolve two peaks very
    close to each other)
  • A common analysis algorithm is Pan-Tompkins
  • QRS detection
  • Cascade of 4 filters band pass, differentiator,
    squaring operation, and finally a moving window
    integrator

8
Proposed ECG system
Up to 12 chan
Interconnection of up to 9 sensors
  • Commercial off-the-shelf sensors
  • Ambu Inc. silver/silver chloride Blue sensor R
    (www.ambuusa.com)
  • A/D Conversion up to 10 kHz
  • IIR Filters to eliminate sensor noise and effect
    of patient movements
  • 64 Mbyte SDRAM off-chip memory
  • ECG MPSoC based on STMicroelectronics components
  • Computation performed on chunks of 4 sec. of
    recorded data (4-beat cycles)

9
ECG analysis algorithm
  • ECG analysis starts from a reference point in the
    heart cycle
  • The R-peak is commonly used
  • Accurate detection of the R-peak of the QRS
    complex is prerequisite for the reliable
    functionality of ECG analyzers Bobbie2004
  • ECG signal variability is high
  • R-peak detection might be inaccurate
  • (e.g., R-T peak detection instead of
  • R-R peak detection)
  • As a consequence, other QRS
  • parameters will be inaccurate
  • Traditional techniques may fail
  • in detecting some serious heart disorders
  • R-on-T complex (premature
  • ventricular complexes)
  • Risk of ventricular fibrillation

10
Novel approach to ECG analysis
  • By autocorrelation, derive the period without
    looking for peaks
  • Accurately find peaks in a time window equal to
    the period

11
Autocorrelation analysis
For the heartbeat period, we need at least 4 secs
of ECG data in order for the ACF to give accurate
results 100 on MIT-BIH database
12
MPSoC architecture
INTERRUPT
PEn
PE1
PE2
CONTROLLER
System Global Interconnect

8kB SHARED MEMORY
512 kB PRI MEM 1
HARDWARE SEMAPHORES
PRI MEM N
Memory
Controller
Off-chip SDRAM Memory
  • We exploit industrial IP cores (200 MHz System)
  • ST220 4-issue VLIW DSPs with 32 kB instruction
    and data caches
  • STBus interconnect from STMicroelectronics
  • In-house optimized memory controller with DMA
    capability
  • Whole system modeled with the MPSIM virtual
    platform
  • Cycle accurate and bus-signal accurate
  • Up to 200 kcycles/sec (Pentium 4, 3.5GHz clock)
  • 0.13 um technology-homogeneous industrial power
    models

13
The memory bottleneck
Programming
Memory Controller
SDRAM
Data transfer
CORE
INTERCONNECT
Off-chip Memory Interface Unit
Controller
Transfer Engine
RAM
  • Push memory channel
  • Control Block keeps a table of objects to be
    moved
  • Table entries can be programmed by different
    cores
  • Transfer engine moves data
  • Triggers bus SDRAM transactions
  • Memory Controller handles SDRAM accesses

14
STBus interconnect
  • Advanced features with respect to widely used
    AMBA AHB

AMBA AHB
STBus
Forward channel
Backward channel
Split request and response channels Wait states
can be masked, depending on the depth of slave
FIFO buffers Multiple outstanding
transactions Out-of-order completion Low latency
arbitration
Straigthforward shared bus topology 2 data links,
but only 1 active at a time In order
completion Transaction pipelining
15
Flexible bus topology
  • STBus can be instantiated either
  • as a shared bus or as partial or full crossbar

Full
Crossbar
Partial
Crossbar
16
Crossbar-based interconnect
DSP 1
PRI MEM1
Each private memory on a crossbar branch,
accessible by its DSP or by the MemCtrl master
port
DSP N
PRI MEM N
SHM MEM
SEM
IRUPT
MEM CTRL
MemCtrl slave port for DMA programming
MEM CTRL
PRI MEM1
PRI MEM N
DSP 1
Partial grouping of initiators and targets may
result in marginal performance penalties while
reducing interconnect area (partial vs. full
crossbars)
MEM CTRL
DSP N
IRUPT
MEM CTRL
SHM MEM
SEM
17
Data management - I
  • Each DSP programs the DMA engine to periodically
    transfer input data chunks (4 secs of ECG signal)
    to its private on-chip memory
  • With 1kHz sampling frequency and 12 processors,
    required bandwidth is 6 Mbyte/sec (DMA
    programming plus actual data transfers)
  • Negligible with respect to STBus bandwidth (with
    1 wait state memory, it exceeds 400 Mbyte/sec)

18
Data management - II
Cache
line refills
  • Independent computation of each DSP in its
    private memory
  • High communication bandwidth requirement on the
    interconnect
  • More leads can be processed by the same DSP
  • The RTEMS OS supports multiple tasks

19
Data management - III
64 bytes output data to shared memory
  • Negligible bus bandwidth
  • When the shared memory gets filled beyond a
    certain level,
  • stored output data can be swapped to the
    off-chip SDRAM
  • 8 hours of history can be recorded
  • Data can also be transmitted via a telemedicine
    link

20
PE efficiency
  • We compared performance of
  • ST220 VLIW DSPs with respect to ARM7TDMI cores

2.5 times more energy-efficient
9 times faster
Same cache size (32 kB) Processing of 1 ECG
lead on 1 core 250 Hz sampling frequency
  • High-quality VLIW code generation
  • ARM7 (no Thumb) executable is 1.7 times larger
  • static IPC for the 4-issue ST220 VLIW DSP 2.9

21
Architectural tuning
  • Let us configure the system to satisfy
    application requirements at the minimum hardware
    cost

Processing of 4 secs of input data (250 Hz
sampling frequency). 12-lead ECG
  • Execution time scales linearly
  • Communication architecture (shared STBus) is
    well tuned
  • Peak memory controller bandwidth satisfies perf.
    requirement

22
Architectural tuning
Processing of 4 secs of input data (1 kHz
sampling frequency). 12-lead ECG
  • Load increases quadratically with sampling
    frequency
  • About 3 secs for 1 DSP to process 12 leads
  • Employing more processors is more effective here
  • Smoother energy degradation
  • Larger margin for heart disorders diagnosis

23
Looking forward
  • What is the maximum achievable sampling frequency
  • while meeting real-time requirements?

12 processors running. 12-lead ECG. 3.5 sec
real-time requirements
3.5
Hz
4000
2200
  • Typical state-of-the-art frequency range is
    250Hz-1kHz
  • 2.2 kHz achievable with a shared bus
  • about 4 kHz with an optimized partial crossbar

24
Interconnect optimization
  • System interconnect saturation
  • limits performance scalability
  • 100 busy at 2.2 KHz

Now the architecture is computation-limited
25
Comparison with research/commercial ECG SoCs
Let us compare two of our MPSoC platform
instances with similar designs in research and
on the market
Application results
Pre-filter
Leads per SoC
Real-Time analysis window
Freq. (Hz)
Memory
Data bits
Solution
Hear-period P,Q,R,S,T,U peaks, potential
disease detect
IIR
12
lt3.5s
4000
512kB pri.mems. 32kB I- and 32kB D-cache
16
Partial crossbar
Same as above
IIR
12
lt4s
2200
Same as above
16
Shared bus
Only QRS, only decide if healthy or unhealthy
Notch
1
No info
250
8kB Cache
10
1
Only QRS, only decide if healthy or unhealthy
IIR
8
No info
800
No info
12
2
1 Chang, M. et al., Design of a System-on-Chip
for ECG signal processing, The 2004 IEEE
Asia-Pacific Conference on Circuits and Systems,
December 2004. 2 FreescaleTM semiconductor,
Personal Electrocardiogram (ECG) Monitor,
http//www.freescale.com/
26
Conclusions
  • Real-time nomadic EKG analysis challenges
  • 12-lead, Multi KHz frequency
  • Algorithmic robustness
  • Software parallelization
  • Hardware bottlenecks (computation and
    communication arch.)
  • Real-time diagnosis
  • Autocorrelation-based algorithm is a promising
    alternative to traditional techniques
  • MPSoC required to handle increased computational
    requirements
  • HW-SW platform exploration
  • VLIW DSP more energy efficient than RISC core
  • Bus-based interconnect limits rate to 2KHz
  • VLIW core becomes the bottleneck at 4KHz
  • Future explore DVFS power management

27
Filtering stage
  • Filters out DC offsets and signal interferences
  • Hardware-implemented order-3 IIR filter
  • Output results in 16-bit binary format
  • Facilitates peak resolution and makes heartbeat
    period
  • computation more precise

28
The bus bottleneck
  • Bus bandwidth saturation limits scalability of
    state-of-the-art SoCs
  • Trends
  • Evolution of communication protocols
  • (AMBA AHB, STBus, CoreConnect, AMBA AXI)
  • Evolution of bus topology
  • (shared bus, partial/full crossbar,
    multi-layer architecture)
Write a Comment
User Comments (0)
About PowerShow.com