Design and Test Technology for Automotive Electronic Systems

About This Presentation

Title:

Design and Test Technology for Automotive Electronic Systems

Description:

... Technology. for. Automotive Electronic Systems. Andreas Steininger. Vienna University of Technology. page ... Automotive electronics: the specific situation ... – PowerPoint PPT presentation

Number of Views:324

Avg rating:3.0/5.0

Slides: 94

Provided by: andreasst7

Category:

more less

Transcript and Presenter's Notes

Title: Design and Test Technology for Automotive Electronic Systems

1
Design and Test Technology for Automotive
Electronic Systems
Andreas Steininger Vienna University of
Technology
2
My contact data
Andreas Steininger Vienna University of
Technology Faculty of Informatics Institute of
Computer Engineering Embedded Computing Systems
Group Treitlstrasse 3 A- 1040 Vienna Austria ste
ininger_at_ecs.tuwien.ac.at http//ti.tuwien.ac.at/e
cs
3
Outline

Automotive electronics the specific situation
Node-level view
designing cost efficient dependable nodes
test purpose techniques
System-level view
communication system
test purposes, challenges techniques
Summary

4
Main Contributors to this Material

Dr. Thomas Kottke R. Bosch AG / EADS
Dr. Christoph Scherrer Alcatel / Thales
Dr. Eric Armengaud DecomSys / VirtualVehicle
Dr. Karl Thaller DecomSys / Elektrobit Austria
Dr. Martin Horauer UAT Technikum Wien

5
Electronics in Cars some Facts

high proportion of value
up to 30
high development potential
more than 80 of the innovations
high number of Electronic Control Units (ECUs)
up to 70
complex distributed system
different networks topologies

6
Electronics in Cars - Benefits

cheap alternative to existing mechanical
solutions
lighter, smaller, cheaper, more flexible,
enabler for further optimizations
electronic ignition, motor management,
key to new functionality
safety ESP, active suspension, crash sensing
comfort air conditioning, infotainment,
security immobilizer, alarm, electronic key, GPS
tracking,
autonomy anticipatory braking, lane keeping,

7
Key Demands

Safety
Real-Time
Low Cost
Robustness
Testability

8
Key Demands

Safety
Real-Time
Low Cost
Robustness
Testability

high risk potential (energy!)
high public awareness
no safe state (in general)
certification required(EN 61508, ISO 26262)
high complexity of system application
legal issues (liability)

9
Key Demands

Safety
Real-Time
Low Cost
Robustness
Testability

engine 6000 rpm 1/10ms
VDM 100km/h 28cm/10ms
need to synchronize distributed activities
real-time communication
image processing tasks

10
Key Demands

Safety
Real-Time
Low Cost
Robustness
Testability

extreme competition
high cost inhibits introduction
tailored safety concepts
minimum degree of replication
use structural redundancies
generic solutions
scalable, configurable, flexible
marginal costs beat NRE

11
Key Demands

Safety
Real-Time
Low Cost
Robustness
Testability

wide temperature range
temperature cycles
humidity
vibrations
EMI (radiated conducted)
decreasing noise margins
service by non-experts

12
Key Demands

Safety
Real-Time
Low Cost
Robustness
Testability

complex distributed system
many options configs
multi-vendor system
startup in less than 1 sec
high availability reliability
diagnosis by non-expert
online testing required ?

13
How attain Safe Operation?

fault avoidance
fault tolerance
failure mode
fault type
number of faults

14
How attain Safe Operation?

fault avoidance
fault tolerance
failure mode
fault type
number of faults

protect system from faults
impractical
hostile environment
MTTF up to 109h to be guaranteed
susceptibility of electronics is known

15
How attain Safe Operation?

fault avoidance
fault tolerance
failure mode
fault type
number of faults

accept occurrence of faults,
BUT detect handle them
appropriately to avoid failure
introduce redundancy

16
How attain Safe Operation?

fault avoidance
fault tolerance
failure mode
fault type
number of faults

determined by application
fail safe
detect error gt safe state
single channel with ED
duplication comparison
fail operational
detect and mask error
voting over redundant results

17
How attain Safe Operation?

fault avoidance
fault tolerance
failure mode
fault type
number of faults

random fault
hits one replica onlygt replication sufficient
systematic fault
common cause faultgt avoid shared resources
design faultgt use diversity

18
How attain Safe Operation?

fault avoidance
fault tolerance
failure mode
fault type
number of faults

single fault
usual assumption
system recovered from F1 before F2 occurs
multiple fault
fault accumulation
fault bursts

19
Current Status

fail safe functions realized
shut off upon error
mechanical fall-back system assumes controlno
true by wire functions
single-channel solutions sufficient
tolerance against random faults
avoid design faults by field experience gt no
diversity
avoid common cause faults by design (?)
single fault assumption
keep faults rare (shielding, etc.)

20
Outline

Automotive Electronics the specific situation
Node-level view
designing cost efficient dependable nodes
test purpose techniques
System-level view
communication system
test purposes, challenges techniques
Summary

21
Node-level Solution

mission make a node (processor) fault tolerant
need to consider CPU and memory
aim is fail safe (but keep option for fail op
in mind)
simplex unit with error detection capabilities
duplication and comparison
hybrid approach
Ideas?

22
Options for the CPU Core

modify custom CPU core
parity for buses
two-rail coding for signals
self-checking implemen-tation of simple units
duplicate compare for complex units
careful layout

Single core ED
Dual core cmp
Superscalar proc. cmp ED

23
Options for the CPU Core

duplicate custom CPU core
master/checker operation
shared (safe) memory
validity check for inputs
self-checking comparator checks equality of
outputs
option clock delay
option mode switch

Single core ED
Dual core cmp
Superscalar proc. cmp ED

24
Solution Example Dual Core Frame

benefits
can use custom core without modifications
safety analysis valid for other cores as well
promises high ED coverage with moderate efforts
CPU is hard to protect otherwise
crucial points
enable easy recovery ( gt keep outage short)
eliminate single points of failure
detect common cause faults

25
Protection in the Dual Core Frame
Safe memories
Parity for buses
Dual-Rail Coding
Self-Checking Comparators
Core 1 (Master)
Instr. Addr.
Data in
Instr.
Data Addr.
Data out
Instr. Mem
Data Mem
?
?
?
Error_Sig
Instr. Addr.
Data in
Instr.
Data Addr.
Data out
Core 2 (Checker)
26
Potential for Common Cause Faults

identical input data
identical clock (lock step)
shared clock generator
shared power supply
both processors on same die

27
Temporal Diversity

operate checker with a delay against master
same fault hits at different point of computation
therefore different effect
therefore better chance to detect by comparison
store master output for comparison
choose delay of 1.5 clock cycles
larger delay causes high effort for little gain
(gtexperiments)
non-integer cycle number against clock related
effects
easy to implement by clock inversion

28
Temporal Diversity Implementation
Core 1 (Master)
Instr. Addr.
Data in
Instr.
Data Addr.
Data out
Instr. Mem
Data Mem
?
?
?
Error
DT
Instr. Addr.
Data in
Instr.
Data Addr.
Data out
Core 2 (Checker)
29
Fail Safe Dual Core Frame Summary

safe memories for instructions and data
comparison of all core outputs
parity protection for buses (data, address)
dual rail coding for single signals (int, rst,
err)
totally self-checking comparators
temporal diversity
How safe is the proposed solution?

30
Assessment of the Solutions Quality

How measure quality? ( Aim is fail
safe)
error detection coverage gt detect all errors
error detection latency gt detect them
quickly
Which method to choose?
theoretical analysis / modelling
experimental fault injection
field observation

31
Fault Injection Experiment 1

2 SPEAR cores in fail safe frame ( DUT)
synthesized to EDIF netlist
injected one by one into netlist
exhaustive list of stuck-at-1 and stuck-at-0
faults
download to FPGA, application run
golden device as reference ( REF)
upon mismatch (DUT ? REF) gt check comparator

32
Results of FI Experiment 1
?
?
Temporal diversity causes detection latency
Recovery becomes difficult !
33
Delayed WR as a Remedy

problem status
data corruption during ED latency (due to
temporal diversity)
solution approach
delay data output / memory WR until comparison
complete
allow memory RD without delay (performance!)
restrictions
RD-after-WR gt data conflict
WR-after-RD gt stale data
gt compiler has to take care of this

34
Enabling fast Recovery

error signal (dual rail)
notifies external component / memory
turns any further WR into RD (error confinement)
triggers processor interrupt
status register (memory mapped)
updated by HW
indicates source of error (data parity, address
mismatch,)
recovery
can build on uncorrupted status
can benefit from detailed status information

35
Results of FI Experiment 2
?
No change of memory contents in case of error
Erroneous read access is uncritical
36
Fail Safe Dual Core Summary

duplicate compare
generic approach, applicable to any core type
covers all (local) errors
need to carefully eliminate single points of
failure
need to complement with protection for signals
buses
temporal diversity
mitigates (many) common cause failures
requires output delay to ensure error confinement

37
Squeezing our more Efficiency

dual core is expensive ?
normally yields performance improvement
would be welcome here as well increasing
performance demand _at_ limited clock rates
but exclusively dedicated to safety here
observation not all tasks are safety critical
enable flexible switching between safety
mode and performance mode

38
Operation in Performance Mode

cores execute different instruction streams in
parallel
both cores have direct access to memory /
peripherals
instruction caches introduced to minimize
penalties from conflicting access
temporal diversity disabled
comparator disabled

39
Requirements on the Mode Switching

coherent operation in safety mode
internal states of cores must be aligned before
switching to safety mode (register file, cache)
safe operation in safety mode
switching must not introduce safety leakage
no corruption of safety-relevant data in perform.
mode
low performance penalty for mode switching
slow or complicated switching would spoil the
anticipated performance gain

40
Implementation of the Split Core Frame
41
Instruction RAM Control Unit (ICU)

handles all accesses to the instruction RAM(in
case of cache miss)
safety mode core 1 exclusively supplies the
instruction address
performance mode both cores request
instructions independently, ICU resolves
simultaneous requests

42
Data RAM Control Unit (DCU)

handles accesses to peripherals and data memory
maintains a unique identification bit for each
core
provides a memory locking mechanism (for atomic
RAM operations)

43
Mode Switch Detect Units

implemented as core-external units to still
allow the use of standard cores
are snooping the bus for the mode switch
instruction
trigger the mode switch when mode switch
instruction is encountered

44
Mode Switch Safety gt Performance
load ID reg address
LDL r1, 248 LDH r1, 255 mode switching LDW r2,
r1 BTEST r2, 1 JMPI_CT
mode switch instrgt core1 waitgt core2 waitgt
clk aligngt switch mode
load check ID bitgt cond branch core2
45
Mode Switch Performance gt Safety
core1 encounters mode switch instrgt trigger MSU
(core1 signal) gt halt core1 (wait1) gt
interrupt core2 (message2)
core2 encounters interruptgt save contextgt
jump to mode switch instr
core2 executes mode switchgt halt core2 switch
clockgt resume core1gt resume core2 after delay
46
Analysis Example Clock Switching

switching controlled by core mode signal (dual
rail coded)
special clock routing ensures detection of all
opens
watchdog with independent time source detects
clock failure
using core mode signal as trigger detects
failure to switch back to safety mode

47
Fault Injection in Safety Mode
?
Delayed WR still ensures error confinement
48
Fault Injection in Performance Mode

fault injected in performance mode, then switch
to safety mode

No undetected effects / late detections in safety
mode
Watchdog important to prevent hang-up in perf mode
49
Options for the CPU Core

Single core ED
Dual core cmp
Superscalar proc. cmp ED

duplicate pipeline, modify rest
shared register file
shared (safe) memory
validity check for inputs
self-checking comparator checks equality of
outputs
option mode switch

50
Superscalar Proc Implementation
51
Mode-Switch Principle

Mode switch from safety mode to performance mode
Mode switch from performance mode to safety mode

MS mode switch Px instr in perf mode Sx instr in
safe mode
52
Overall Comparison of Options
results from PhD Kottke
53
We still need a Safe Memory

key parameters
128kB / 32bit words
0.25mm technology, 100MHz clock
soft error rate l 10-12/h per bit
permanent error rate l 10-15/h 10-14/h
operating lifetime 10h
working lifetime 104 h (10 years)
allowed failure rate lspec lt 10-10/h overall

lnative ? 10-5/h
54
Implementing a Safe Memory
Why not duplicate compare?

detect bit flips in storage cells
parity (or EDC/ECC)
detect erroneous address decoding
special decoder logic design
protect interfaces
parity for data, address and control buses
prevent illegal WR access
provide mask input for write enable

55
We still need a Safe Memory

detect bit flips in storage cells
parity (or EDC/ECC)
detect erroneous address decoding
special decoder logic design
protect interfaces
parity for data, address and control buses
prevent illegal WR access
provide mask input for write enable

56
Possible Address Decoder Errors

correct behavior
any given address activates exactly one assigned
memory cell
erroneous behaviors
an address activates no memory cell at all
an address activates more than one memory cell
an address activates a wrong memory cell

57
Checking the Address Decoder
check for missing or multiple cell
activationsXOR(upper half) ? XOR(lower half) ?
re-check parity behind cell arrayOR over even
cells ? parity ?

large decoders built from cascade of smaller ones

58
Outline

Automotive Electronics the specific situation
Node-level view
designing cost efficient dependable nodes
test purpose techniques
System-level view
communication system
test purposes, challenges techniques
Summary

59
Node Testing Whats the Purpose

factory test
unveil manufacturing defects
startup test
check function before starting mission
on-line test
check function during mission

60
Basic Principle of Testing
source Agilent
61
The Complexity Problem

Functional Testing
apply all possible/relevant input patterns
of required vectors explodes with DUT
complexity
example SW self-test of processor
Structural Testing
check function of all constituting components
if no defect in components gt no defect in DUT
need access to internal components gt scan test

62
The Scan Test Principle
circuit registers are chained to one or more
shift registers
63
Why Care for On-line Testing?

Errors are detected anyway
we have provided lots of mechanisms
What about rarely used resources?
faults will rarely get activated there
so who cares when they are faulty and unused?
What about fault accumulation?
faults may accumulate in rarely used resources
our assumption was single faults!

64
A Toy Example Steer by Wire
65
Example Architecture
single fault assumption!
Within a Fault Tolerant Unit (FTU) twocomputer
nodes operate in active redundancy.
66
Reliability Model (Markov)
b
b
b
w
a
Additional parameters d rarely used
resources s activation rate of these
w
67
Model Results
high activation rate
no rarely used resources
log(MTTF)
9
rare activation ofresources impairsMTTF gt
fault accumulation
8
7
6
2
0
0
20
-2
40
-4
60
-6
80
-8
log(activation rate)1/h
rarely used resources
100
68
Are there rarely used Resources ?
memory
interconnect
comb logic
flip-flops
irregular use
irregular use
example TTP/C controller prototype chip
69
Conclusion of the Analysis

irregularly used resources deserve specific
attention
danger of fault accumulation
memory is often the dominant resource in a
system
therefore relatively high error probability
hardware resources tend to exhibit higher and
more regular activation than software
tasks/memory cells
it is wise to protect memory from fault
accumulation
on-line testing of memory

70
Testing versus Error Detection

concurrent error detection
checks ongoing activities for certain properties
does not perform explicit stimulation
does not cover unused resources and irrelevant
errors
detects error as soon as it becomes activated
testing
applies explicit stimuli
checks for expected result
covers all resources included in the test scope
detects defect only upon test execution

71
Transparent On-line Memory Test

problem on-line test needs to be transparent
do not destroy memory contents or degrade
reaction time
solution systematic inversion of memory
contents
instead of writing 0 or 1 gt flip bit
application of standard test algorithm possible
(March, e.g.)
upon CPU read or write gt suspend test
keep track of inverted cells gt re-invert upon
CPU read
drawbacks
need to introduce multiplexor gt access delay
increased memory activity gt power consumption
test controller not protected

72
TOMT Implementation
Processor
Memory
73
Outline

Automotive Electronics the specific situation
Node-level view
designing cost efficient dependable nodes
test purpose techniques
System-level view
communication system
test purposes, challenges techniques
Summary

74
Interaction between Subsystems

is the key to nowadays automotive innovations
allows exchange of status
allows sharing of resources (sensors, e.g.)
allows coherent distributed activities
enables completely new types of applications
no way around that
is the nightmare of every system validator
applications become mutually dependent
further explosion of test space
thousands of options, versions etc.
products from dozens of different vendors must
interact

75
The Role of the Bus System

point to point connection on demand
too unflexible
too much cabling (several km!)
one generic bus system for the whole car
demands are too different
safety issues
waste of bandwidth
mix of different bus systems (plus bridges for
interconnect)
communication partly in parallel
selection of bus protocol according to demands

76
An Example Architecture
77
Time-Triggered Communication
source G. Bauer
78
The Temporal Firewall Principle
source G. Bauer
79
Benefits of TT Communication

temporal firewall
decouples activities of individual nodes
reduces coupling to desired data exchange only
global periodic schedule
complete temporal specification for global
activities
allows isolated development (and test!) of
components
enables systematic planning of resource
utilisation
provides life-sign for every sender
allows masking of babbling idiots by bus
guardian
builds on existence of global time

80
Outline

Automotive Electronics the specific situation
Node-level view
designing cost efficient dependable nodes
test purpose techniques
System-level view
communication system
test purposes, challenges techniques
Summary

81
System-level Test The Concept

principle of structural testing
if all components OK gt system OK
decoupling of components by TT-approach
every component is fully specified and can be
developed and tested in isolation

everythings great!
82
System-level Test The Reality

dozens of complaints about mystic interactions of
subsystems reported
virtually all brands are affected
What happened to our system-level test concept?

83
The Root of the Problem

Recall the purpose of structural testing
identify defects
Are we actually looking for defects??
Our test concept is still very good at that!
What do we actually want to test for?
configuration errors
system design errors (systems are too complex to
verify)
SOS (slightly off specification) errors
white spots in the specifications (bus protocol,
)

We need to test the system function!
84
Solutions Ahead?

need to determine manageable and sufficient set
of (functional) test cases
divide and conquer in the functional domain
hierarchic testing (requirements gt properties)
inclusion of formal tools (model driven
testing)
inclusion of statistics
inclusion of field experiences
need to consider practical constraints
limited accessibility, black boxes
cost

85
Solving the Complexity Problem
Application

decomposition into services mechanisms
clearly defined inputs, outputs and config.
parameters for each mechanism
use hierarchical structuring of mechanisms for
diagnosis

Transport
Data link
Physical
details http//embsys.technikum-wien.at/steacs.ht
ml
86
Solving the Accessibility Problem
details http//www.ecs.tuwien.ac.at/armengaud/ex
tract
87
Forcing the Clock Synchronization
details http//www.ecs.tuwien.ac.at/armengaud/ex
tract
88
Summary

the automotive domain has its own laws and rules
need extremely cost-effective robust solutions
for safety-critical real-time applications,
versatile and custom tailored
on node level
different redundancy concepts applicable
example dual core CPU and memory with protection
mechs
on-line testing for memory may be required
on system level
crucial role of communication infrastructure
advantages of time triggered approach
insufficient suitability of structural testing

89
Hungry for more?

http//ti.tuwien.ac.at/ecs
steininger_at_ecs.tuwien.ac.at

90
Related publications of my group (1)

1 T. Kottke and A. Steininger, A Fail-Silent
Memory for Automotive Applications, 9th IEEE
European Test Symposium, Corsica 2004.
2 T. Kottke and A. Steininger, A Generic Dual
Core Architecture with Error Containment,
Journal of Computing and informatics, vol. 23,
no.5, 2004.
3 T. Kottke and A. Steininger, A
Reconfigurable Generic Dual-Core Architecture,
Intl Conference on Dependable Systems and
Networks (DSN2006), Philadelphia, 2006.
4 T. Kottke and A. Steininger, A Fail-Silent
Reconfigurable Superscalar Processor, 13th IEEE
Pacific Rim Intl Symposium on Dependable
Computing, Melbourne, 2007.
5 C. El Salloum, A. Steininger, P.
Tummeltshammer and W. Harter, Recovery
Mechanisms for Dual Core Architectures, 21st
IEEE Intl Symposium on Defect and Fault
Tolerance in VLSI Systems (DFT06), Washington,
2006.
6 A. Steininger and C. Temple, Economic
Self-Test in the Time-Triggered Architecture,
IEEE Design Test of Computers, vol 3/1999
7 A. Steininger, Testing and Built-in
Self-Test A Survey, Journal of Systems
Architecture 46(2000)

91
Related publications of my group (2)

8 A. Steininger and C. Scherrer, On the
Necessity of BIST in Safety-Critical Applications
A Case Study, 29th Annual Intl Symposium on
Fault-Tolerant Computing (FTCS29), Madison,
1999.
9 C. Scherrer and A. Steininger, How does
Resource Utilization Affect Fault Tolerance?,
2000 IEEE International Symposium on Defect and
Fault Tolerance in VLSI Systems (DFT00),
Yamanashi, 2001.
10 C. Scherrer and A. Steininger, How to Tune
the MTTF of a Fail-Silent System, 2001 IEEE
International Symposium on Defect and Fault
Tolerance in VLSI Systems (DFT01), San
Francisco, 2001
11 C. Scherrer and A. Steininger, Dealing with
Dormant Faults in an Embedded Fault-Tolerant
Computer System, IEEE Transactions on
Reliability, vol. 52, no. 4, 2003.
12 K. Thaller and A. Steininger, A
Transparent Online Memory Test for Simultaneous
Detection of Functional Faults and Soft Errors in
Memories, IEEE Transactions on Reliability, vol.
52, no. 4, 2003.

92
Related publications of my group (3)

13 E. Armengaud, F. Rothensteiner, A.
Steininger, R. Pallierer, M. Horauer, M. Zauner,
A Structured Approach for the Systematic Test of
Embedded Automotive Communication Systems, Intl
Test Conference 2005, Austin 2005.
14 E. Armengaud, A. Steininger, M. Horauer, R.
Pallierer, A Layer Model for the Systematic Test
of Time-Triggered Automotive Communication
Systems, 5th IEEE Intl Workshop on Factory
Communication Systems, Vienna, 2005.
15 E. Armengaud, A. Steininger and M. Horauer,
Automatic Parameter Identification in FlexRay
based Automotive Communication Networks, 11th
IEEE Intl Conference on Emerging Technologies
and Factory Automation, Prague 2006.
16 E. Armengaud and A. Steininger, Pushing the
Limits of Remote Online Diagnosis in Embedded
Real-Time Networks, 6th IEEE Intl Workshop on
Factory Communication Systems, Torino, 2006.
17 P. Milbredt, A. Steininger and M. Horauer,
Automated Testing of FlexRay Clusters for System
Inconsistencies in Automotive Networks, 4th
Intl Symposium on Electronic Design, Test and
Applications (DELTA 2008), Hong Kong, 2008.

93
Related Theses Projects

T. Kottke, Untersuchung von fehlertoleranten
Prozessorarchitekturen für sicherheitsrelevante
Automobilanwendungen, PhD thesis, Vienna
University of Technology, 2005. (German)
C. Scherrer, Zuverlässigkeit zweifach
redundanter Architekturen unter besonderer
Berücksichtigung latenter Fehler, PhD thesis,
Vienna University of Technology, 2002. (German)
K. Thaller, A Transparent Online Memory Test,
PhD thesis, Vienna University of Technology,
2001.
E. Armengaud, A Transparent Online Test Approach
for Time-Triggered Communication Protocols, PhD
thesis, Vienna University of Technology, 2008.
STEACS (Systematic Test of Embedded Automotive
Communication Systems)http//embsys.technikum-wie
n.at/projects/steacs/index.html
EXTRACT (Exploiting Synchrony for Transparent
Communication Services Testing)http//ti.tuwien.a
c.at/ecs/research/projects/extract

Write a Comment

User Comments (0)

About PowerShow.com

Design and Test Technology for Automotive Electronic Systems - PowerPoint PPT Presentation

Design and Test Technology for Automotive Electronic Systems

... Technology. for. Automotive Electronic Systems. Andreas Steininger. Vienna University of Technology. page ... Automotive electronics: the specific situation ... – PowerPoint PPT presentation