Why do so many chips fail - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Why do so many chips fail

Description:

Why do so many chips fail? Ira Chayut, Verification Architect ... Chip component count increases exponentially over time (Moore's law) ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 11
Provided by: dang174
Category:
Tags: chips | fail | many

less

Transcript and Presenter's Notes

Title: Why do so many chips fail


1
Why do so many chips fail?
  • Ira Chayut, Verification Architect
  • (opinions are my own and do not necessarily
    represent the opinion of my employer)

2
Failure rate of first silicon is rising
  • research by Collett International revealed
    that 52 of complex application specific
    integrated circuits (ASICs) required a respin and
    the reason was largely due to functional errors.
    (http//www.techonline.com/community/ed_resource/
    feature_article/36655)
  • Who is to blame? (There must be someone to
    blame!)
  • Management they didnt provide enough resources
  • HW Engineering they created the functional
    errors
  • Verification they didnt catch the functional
    errors
  • Architecture they didnt focus on testability
  • Marketing they kept changing the specs

3
People dont kill chips, complexity kills chips
http//www.cs.utexas.edu/users/dburger/teaching/cs
395t-s99/papers/2_src.pdf (1999) Projected
numbers are a bit lower than current reality a
dual core AMD Opteron has 233 million transistors
and the Intel Itanium 2 has 592 million
transistors
4
Complexity increases exponentially
  • Chip component count increases exponentially
    over time (Moores law)
  • Interactions increase super-exponentially
  • IP reuse and parallel design teams facilitate
    more functions with fewer HW engineers per
    function and more functions per chip
  • Verification effort gets combinatorially more
    difficult as functions are added

5
Why verification is not able to keep up
  • Verification effort gets combinatorially more
    difficult as functions are added
  • BUT
  • Verification staffing/time cannot be made
    combinatorially larger to compensate
  • AND
  • Chip lifetimes are too short to allow for
    complete testing
  • THUS
  • Chips will continue to have ever-increasing
    functional errors as chips get more complex

6
Limiting the number of architectural and
functional errors
  • Thorough unit-level verification testing
  • Small simulations run faster
  • Avoids combinatorial explosion of interactions
  • Well defined interfaces between blocks with
    assertions and formal verification techniques to
    reduce inter-block problems
  • Emulation or FPGA prototyping to accelerate
    testing

7
How to live with functional errors
  • Successful companies have learned how to ship
    chips with functional and architectural time to
    market pressures and chip complexity force the
    delivery of chips that are not perfect (even if
    that were possible). How can this be done
    better?
  • For a long while, DRAMs have been made with extra
    components to allow a less-than-perfect chip to
    provide full device function and to ship
  • How to do the same with architectural features?
    How can full device function exist in the
    presence of architectural or implementation
    omissions or errors?

8
Architecture support
  • Embrace Perls motto There's More Than One Way
    to Do It allow for multiple ways of
    accomplishing all critical specified functions
  • Analogous to Design for Test (DFT) and Design for
    Verification (DFV), we should start thinking
    about Architect for Verification (AFV)
  • Thanks to Dave Whipp for the AFV phrase and
    acronym
  • In some problem domains, such as networking,
    upper-layer protocols can recover from some
    silicon errors though there is a performance
    penalty when this is used

9
Architect support, continued
  • A programmable abstraction layer between the real
    hardware and users API can hide functional warts
    hardware catches specific operations and either
    directs them to one of multiple hardware
    implementations, or signals a software trap
  • Pyramid minicomputers hid the assembly language
    from users, compiler could work around problems
  • Transmeta maps standard machine language to
    hidden processor architecture, translation
    software can work around problems
  • Soft hardware can allow chip redesign after
    silicon is frozen (and shipped!)

10
Summary
  • Ever increasing chip complexity prevents total
    testing before tape-out (or even before shipping)
  • AFV techniques can make chip verification not
    subject to combinatorial explosion
  • We have to accept that there will be
    architectural and functional failures in every
    advanced chip that is built
  • Architecture support needed to allow failures to
    be worked around or fixed after post-silicon
Write a Comment
User Comments (0)
About PowerShow.com