332:437 Concepts in Digital Systems Design - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

332:437 Concepts in Digital Systems Design

Description:

332:437 Concepts in Digital Systems Design – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 41
Provided by: guofe
Category:

less

Transcript and Presenter's Notes

Title: 332:437 Concepts in Digital Systems Design


1
332437Concepts in Digital Systems Design
  • Instructor Prof. Michael L. Bushnell
  • Teaching Assistants
  • Ms. Srihitha Yerabaka
  • Mr. Raghuveer Ausoori
  • Course web site
  • http//www.caip.rutgers.edu/bushnell
  • ECE Department Rutgers University

2
Changes in ECE Undergrad. Hardware Courses
3
Course Structure
  • Uses Verilog
  • No more troubles with arithmetic conversions
  • Less trouble with configurations
  • Course is now a 2-term sequence
  • 1st term Academic content
  • 2nd term Design Project (exclusively)
  • Use automatic logic synthesis to synthesize
    project as Field-Programmable Gate Array

4
332437 Lecture 1History of Fault Tolerance
  • Motivation for fault tolerance
  • Generations of Computers
  • Fault Tolerance
  • Definitions
  • Applications
  • Triple Modular Redundancy
  • Mid-Value Select Technique Flux Summing
  • Summary

Material from Design and Analysis of Digital
Fault Tolerant Systems, By Barry Johnson,
Addison-Wesley Publishers
5
Photographs from NASA
  • Made possible by fault-tolerant interplanetary
    spacecraft (unmanned) and the Hubble space
    telescope
  • Spacecraft Electronics
  • Severe radiation fields (Gamma rays)
  • Extremes of heat and cold
  • Cosmic rays
  • Extreme electromagnetic disturbance (particularly
    around Jupiter)

6
Hubble Space Telescope Rising from Space Shuttle
7
Neptune Rising Above Surface of its Moon
8
The Orion Nebulae
9
Cartwheel Galaxy
10
4 Largest Moons of Jupiter -- Ganymeade,
Callisto, Io, and Europa Voyager II
11
New Asteroid Found by Voyager II
12
Voyager II Photograph of Neptune
13
Voyager II -- Rings of Saturn
14
Voyager II -- Rings of Neptune
15
Device and System Reliability
16
Dependability-Performance Trade-off
17
Operational Times for Fault Tolerant Mobile
Computers
  • Simplex Single processor, no fault-tolerant
    hardware
  • TMR Triple Modular Redundancy (described later)
    fault-tolerant hardware
  • Uses a lot of power

18
Generations of Computers
  • 1st -- All electronic, stored program 1945-55
  • Manchester Mark I Kilburn
  • Princeton IAS von Neumann
  • Univac I Mauchley Eckert
  • Fault-tolerance needed for computer to work at
    all
  • 2nd Discrete Transistor Computers 1955-64
  • IBM 7090/7094
  • Univac 1100
  • MIT Whirlwind Forrester, Wang, Olsen
  • Invention of Magnetic Core Memory
  • Invention of CRT Display with light pen

19
Generations of Computers
  • 3rd Invention of Integrated Circuit 1964-74
  • 1959 Kilby, Texas Instruments
  • 1960 Noyce, Fairchild
  • IBM System/360 Blaauw Brooks
  • Single architecture, whole range of
    price/performance
  • 4th Invention of Dynamic RAM Memory 1974-88
  • Invention of mprocessor (Intel 4004)
  • IBM System/370
  • DEC VAX 11/780
  • DRAM replaces magnetic core memory
  • Virtual Memory (segmentation and paging) used

20
Generations of Computers
  • 5th Parallel Distributed Computers
    1988-present
  • Enabled by cheap VLSI Hardware
  • Pervasive Computer Networking
  • Carnegie-Mellon c.mmp 16 processor DEC pdp-11
  • Distributed memory, crossbar interconnect
  • IBM SP-2 1 to 64 processors
  • Sun Work Station
  • IBM PC
  • N-Cube 10 1024 processors

21
Current Ultra-Large Scale IC Technology
  • AMD K8 233 million transistors
  • Intel Smithfield -- 230 million transistors
  • New IBM Cell Chip mprocessor 234 million
    transistors
  • Hardware on chip doubles every 1 ½ years
  • Currently
  • 2.8 GHz mprocessor clock rates
  • 1 Billion transistors on a chip
  • Beginning of Network-on-a-Chip Era

22
Major Changes in Systems Design
  • Fault-Tolerant Computing now affordable and
    necessary for mobile sensor-on-silicon
    applications for medicine, networking
  • Severe problems in verification and testing of
    computers
  • 60 of cost of hardware, Intels biggest capital
    cost
  • Hardware cannot be tested unless specific design
    procedures followed
  • Network reliability now a major headache
  • Low-Power Design most important hardware
    problem
  • System-on-a-Chip put several mprocessors (e.g.,
    8086 DSP), DRAM, glue logic, A/D, D/A, analog
    filters, chemical sensors, wireless
    transmitter/receiver on 1 chip

23
Definitions
  • Fault-tolerance System can continue correct
    performance in presence of hardware/software
    faults
  • Fault Physical defect or flaw in
    hardware/software component
  • Error Manifestation of a fault
  • Failure situation where the error resulted in a
    a system incorrectly performing its function

24
3 Universe Model (Avizienis)
  • Physical Universe semiconductor, power supply,
    printer, etc.
  • Informational Universe where errors occur
  • Incorrect computer data words
  • Incorrect digital voice/picture image
  • External or Users Universe where user of
    system ultimately sees effects of faults errors
  • Fault-latency time between occurrence of fault
    and appearance of an error caused by that fault
  • Error-latency time between appearance of error
    and appearance of resulting failure

25
Fault Causes
  • Specification mistakes wrong algorithms,
    architectures, hardware/software specifications
  • Implementation mistakes poor design, poor
    component selection, poor construction, software
    coding errors
  • Component defects manufacturing main cause of
    faults
  • Imperfections, random defects, wear-out (broken
    bonds, corrosion)
  • External disturbance g rays, a particles,
    electromagnetic interference, battle damage,
    environment extremes

26
Fault Description
  • Nature hardware/software/analog/digital
  • Duration how long fault is active
  • Permanent in existence indefinitely
  • Transient appears/disappears in very short time
  • Intermittent appears/disappears/reappears
    repeatedly
  • Extent localized to a given hardware or
    software module or globally affects hardware
    /software/both
  • Value
  • Determinant status in unchanged throughout time
  • Indeterminant status at time T may differ from
    status before or after T

27
Fault Tolerance Methods
  • Fault Tolerance give system the ability to keep
    performing its tasks after faults occur
  • Fault Avoidance -- Prevent fault occurrence
  • Design reviews, component screening, testing
  • Fault Masking prevent system faults from
    introducing errors into system informational
    structure

28
Methods to Achieve Fault Tolerance
  • Fault Masking
  • Reconfiguration eliminate faulty module
    restore system to operation
  • Fault Detection recognize that fault occurred
  • Fault Location find where fault occurred
  • Fault Containment isolate fault prevent from
    spreading through system
  • Fault Recovery remain operational or regain
    operation after faults

29
Metrics
  • Reliability R (t) conditional probability that
    system works throughout t0, t, given that it
    worked at t0
  • Note 0.97 0.9999999
  • Availability A (t) probability that system is
    available at time instant t perform its function
  • Unreliable but available
  • Safety S (t) probability that system will
    perform its function correctly or will
    discontinue working in a way that does not affect
    operation of other systems or endanger people

30
Metrics (continued)
  • Performability P (L, t) probability that system
    will be at or above level L of performance at
    time t
  • Graceful degradation
  • Maintainability M (t) probability that a failed
    system will be restored to operation within time
    period t
  • Testability ability to test for system
    attributes controllability observability
  • Design for Testability Now critical to
    construct any digital system if it is to be
    successfully manufactured
  • Method Add hardware strictly for testing
    purposes
  • Dependability reliability availability
    performability testability

31
Fault-Tolerant Computing Applications
  • Long-life applications
  • P (operation at time t 10 yr.) 0.95
  • Satellites Unmanned Space Flight
  • ATT Telstar Communications Satellites
  • NASA Martian Pathfinder
  • Critical Computation
  • NASA Space Shuttle
  • Foxboro Nitroglycerine Plant Controls
  • Maintenance Postponement
  • Lucent 5 ESS Telephone Exchange
  • High Availability
  • New York Stock Exchange Quotron System

32
Redundancy Techniques
  • Passive fault masking (hide faults)
  • Active or dynamic detect fault remove broken
    hardware from system with electronic switch
  • Hybrid Combine 1 2

33
Passive Redundancy
  • Triple Modular Redundancy
  • Single point of failure, restoring organ
  • Generalize N modular redundancy
  • N must be odd if majority voting is used
  • Problems
  • Very expensive
  • 3 results may not agree e.g., in analog control
    system, A/D converters have jitter in least
    significant bits may disagree
  • Solve Mid-Value select technique

34
Triple Modular Redundancy (TMR)
35
TMR with Triplicated Voters
36
Software Voting
37
Mid-value Select Technique
38
Passive Redundancy (continued)
  • Frequently, one result must be produced
  • Leads to single point of failure problem
  • Example Motor Controller
  • Solve
  • Flux summing used closed loop control system to
    compensate for faults
  • Secondary current a S primary currents
  • Works because flux summer transformer is
    incredibly reliable

39
Flux-Summing
40
Summary
  • Motivation for fault tolerance
  • Generations of computers
  • Fault Tolerance
  • Applications
  • Triple Modular Redundancy
  • Mid-Value Select Technique Flux Summing
  • Fault tolerance necessary for applications
  • Medicine
  • Transportation
  • Defense
  • Inter-Planetary Exploration
Write a Comment
User Comments (0)
About PowerShow.com