Graphics Stability - PowerPoint PPT Presentation

1 / 59

About This Presentation

Title:

Graphics Stability

Description:

CRASH (Comparative Reliability Analyzer for Software and Hardware) The CRASH Tool ... Enables wide range of endurance/load/stress testing. Configurable load profiles ... – PowerPoint PPT presentation

Number of Views:52

Avg rating:3.0/5.0

Slides: 60

Provided by: downloadM

Category:

more less

Transcript and Presenter's Notes

Title: Graphics Stability

1
Graphics Stability

Gershon Parent
Software Swordsman
WGGT
gershonp _at_ microsoft.com
Microsoft Corporation

Steve Morrow Software Design Engineer WGGT stevemo
r _at_ microsoft.com Microsoft Corporation
2
Session Outline

Stability Benchmark History
CRASH (Comparative Reliability Analyzer for
Software and Hardware)
The CRASH Tool
The CRASH Plan
The Experiments
CDER (Customer Driver Experience Rating)
Program Background and Description
High-level Statistics of the Program
Factors Examined in the Crash Data
Normalized Ratings
Customer Experience and Loyalty

3
Stability Benchmark History

WinHEC May 04
CRASH 1.0 released.
Web portal has 52 non-MS members from 16
companies
November 04
CRASH 1.1 released to the web. Includes DB
backend
December 04
Stability Benchmark components ship to 8,000
customers and normalizable OCA data begins
flowing in
CRASH Lab completes first data collection pass
Web portal has over 60 non-MS members from 17
companies

4
CRASH Tool

CRASH is new dynamic software loading tool
designed to expose and easily reproduce
reliability defects in drivers/hardware
Answers the call from IHVs and OEMs for more
reliability test tools.
Enables wide range of endurance/load/stress
testing
Configurable load profiles
Scheduled cycling (starting and stopping) of
test applications
Replay-ability
Automatic failure cause determination
Scripting for multiple passes with different
scenarios
Creation of a final score

5
CRASH Demo
o
_
X
6
CRASH Demo
7
CRASH Demo
8
CRASH 4 Phase Plan

Phase 1
Produce CRASH documentation for review by
partners
Release 1.0 to our partners for feedback
Phase 2
Release 1.1 with database functionality to our
partners
Execute controlled baseline experiments on a
fixed set of HW and SW to evaluate the tools
effectiveness
Phase 3
Execute series of experiments and use results to
increase accuracy and usefulness of the tool
Phase 4
Create a CRASH-based tool for release to a
larger audience

9
Experiment 1 Objectives

Determine if the CRASH data collected sufficient
to draw meaningful conclusions about the
part/driver stability differences
Determine how machine configuration affects
stability
Evaluate how the different scenarios relate to
conclusions about stability
Find the minimum data-set needed to make
meaningful conclusions about part/driver
stability
Create a baseline from which to measure future
experiments
Identify other dimensions of stability not
exposed in the CRASH score

10
Experiment 1 Details

Standardize on one late-model driver/part from
four IHVs
Part/Driver A, Part/Driver B, Part/Driver C,
Part/Driver D
Test them across 12 different flavors of
over-the-counter PCs from 4 OEMs
OEM A, OEM B, OEM C, OEM D
High End and Low End
Include at least two motherboard types
MB Type 1, MB Type 2
Clean install of XP SP2 plus latest WHQL drivers
Drivers snapped 8/16/04
Use the 36 hr benchmark profile shipped with
CRASH 1.1

11
Important Considerations

Results apply only to these Part/Driver/System
combinations only
Extrapolation of these results to other parts or
drivers or systems is impossible with this data

12
CRASH Terminology

Profile
Represents a complete run of the Crash tool
against a driver
Contains one or more scenarios
Scenario
Describes a session of CRASH testing
Load intensity/profile
What tests will be used
How many times to run this scenario (loops)
Score
Score is always a number that represents the
percentage of the testing completed before a
system failure (hang or kernel-break)

13
Profile Score Averages
14
CRASH Terminology Failures

Failure
Hang
No minidump found and loop did not complete
Targeted Failure
Minidump auto-analysis found failure was in the
display driver
Non-Targeted Failure
Minidump analysis found failure was not in
display driver
Does not count against the score

15
Percentage of Results by Type
16
Average Profile Score by Machine Group
17
Average Profile Score by OEM and MB
18
Affect of MB Type on Profile Score
19
Score Distribution for Part/Driver C D(MB Type
1)
20
Experiment 1 Test Profile

Real Life
Moderate load and application cycling
9 max and 3 min load
Tractor Pull
No load cycling
Moderate application cycling
Incrementally increasing load
Intense
High frequency load and application cycling
9 max and 0 min load

21
Average Scenario Score by Part/Driver
22
Statistical Relevance Questions

Question How do I know that the difference
between the averages of result set 1 and Result
Set 2 are meaningful?
Question How can I find the smallest result set
size that will give me 95 confidence?
Answer Use the Randomization Test

23
Randomization Test
Delta 1
Set 1
Set 2
Combination Set
Delta 2
Random Set 2
Random Set 1

Random test 10,000 times. If 95 of the time the
Delta 1 is greater than Delta 2 then you are
assured the difference is meaningful.
Try smaller sample sizes until the confidence
drops below 95. That is your minimum sample
size.
Information on the Randomization Test can be
found online athttp//www.uvm.edu/dhowell/Stat
Pages/Resampling/RandomizationTests.html

24
Scores and Confidence Intervals for
Part/Driver/MB Combinations
25
The Experiment Matrix

With three experiments completed, we can now
compare
One driver across two OS configurations
Two versions of one driver across a single OS
configuration

26
Old vs. New Drivers

This table compares the profile scores for old
drivers vs. new drivers on OEM Image
New drivers were noticeably better for
parts/drivers C D
Part/Driver A and B were unchanged

27
OEM Image vs. Clean Install

This table compares profile scores for OEM Image
vs. Clean Install with Old Drivers
Clean install scores universally better than OEM
image for parts/drivers C and D
Part/Driver A and B were unchanged

28
Future Plans

Collate with OCA data
CRASH failure to OCA bucket correlations
What buckets were fixed between 1st and 2nd
driver versions?
Do our results match field data?
customer machines have hardware that is typically
several years old
Can we find the non-display failure discrepancy
in the field?
Begin to tweak other knobs
Content
Driver-versions
HW-versions
Windows codenamed Longhorn Test Bench
PCIe cards

29
Suggested Future Experiments

Include more motherboard types
Newer drivers or use a Control Group driver.
Reference Rasterizer?
Disabled AGP to isolate chipset errors from AGP
errors
Driver-Verifier enabled
Add non-graphics stress tests to the mix
Modified Loop Times

30
IHV Feedback

There are definitely unique driver problems
exposed through the use of CRASH and it is
improving our driver stability greatly
CRASH is producing real failures and
identifying areas of the driver that we are
improving on
Thanks for a very useful tool

31
CRASH 1.2 features

RunOnExit
User specified command run upon the completion of
CRASH profile
More logging
Logging to help troubleshoot problems with data
flow
More information output in xml
More system information
More failure details from minidumps
More control over where files are put
More robust handling of network issues

32
Customer Device Experience Rating (CDER) Program
Background

Started from a desire to rate display driver
stability based on OCA crashes
Controlled program addresses shortcomings of OCA
data
Unknown market share
Unknown crash reporting habits
Unknown info on non-crashing machines
This allows normalization of OCA data to be able
to get accurate number of crashes per machine
stability rating

33
CDER Program Description Status

Program Tools
A panel of customers (Windows XP only)
User opt-in allows extensive data collection,
unique machine ID
System Agent/scheduler
System Configuration Collector
OCA Minidump Collector
System usage tool (not yet in the analysis)
Status
All tools for Windows XP in place and functioning
First set of data collected, parsed, analyzed

34
Overall Crash Statistics of Panel

Machines
8927 in panel
49.9 experience no crashes
50.1 experience crash(es)
8580 have valid device driver info
82.2 have no display crashes
17.8 have display crashes
Crashes
16.1 of valid crashes are in display
Note Crashes occurred over 4 yr period

35
Crash Analysis Factors

Examined several factors which may have an impact
on stability ratings
Processor
Display Resolution
Bit Depth
Monitor Refresh Rate
Display Memory
Note Vendor part naming does not correspond to
that in CRASH presentation.
Note Unless otherwise noted, data for these
analyses were from the last 3 years

36
Display Resolution Crashes Distribution
37
Bit Depth Crashes Distribution
38
Refresh Rate Crashes Distribution
39
Display Memory Crashes Distribution
40
Display Crashes By Type (Over Last Year)
41
Normalized Crash Data

The following data is normalized by program share
of crashing and non-crashing machines

42
Crashes per Machine Ranking by Display Vendor for
Last Year (2004)
43
Vendor A Normalized Crashes By Part/ASIC Family
Over Last 3 Years
44
Display Vendor B Normalized Crashes By
Part/ASIC Family Over Last 3 Years
45
Display Vendor C Normalized Crashes By
Part/ASIC Family Over Last 3 Years
46
Normalized Crashes Ranked by Part - 2004
47
Ranking and Rating Conclusions

This is a first look
Need to incorporate system usage data
Need to continue collecting configuration data to
track driver and hardware changes
Need more panelists, and a higher proportion of
newer parts
With that said
This is solid data
This demonstrates our tools work as designed
It shows the viability of a crash-based rating
program

48
Customer Experience Loyalty

A closer look at segment of panelists who
Experienced display crashes, and
Switched or upgraded their display hardware or
driver

49
Experience Loyalty Highlights

19.4 of users who experienced display crashes
upgraded their drivers, or hardware, or changed
to a different display vendor
7.9 of users (nearly 41 of the 19.4) who
experienced display crashes switched to a
competitors product
ALL users who switched to a competitors product
had the same or better experience
Only 91.3 of those who upgraded had the same or
better experience afterwards, based on crashes
Time clustering of crashes

50
Overall Experience of Users After Changing
Display System
51
Experience of Users After Upgrading
52
Experience of Users After Switching Display
Vendors
53
Time-Clustering of Crashes for Users Who
Experienced 3 Or More Crashes

Our data indicates a users crashes are
generallyhighly clustered in time

54
Time-Clustering of Crashes for Users Who
Experienced 6 Or More Crashes
55
User Experience Caveats

User Experience here is strictly concerned with
how many crashes the users experienced
It doesnt include hardware changes/upgrades
where different hardware used the same driver
Having fewer crashes may not always mean user
experience was better, but for the vast majority
we believe it does
Having fewer crashes may be attributable to other
system changes, and/or other factors
Crashes going away may mean the user gave up
using whatever was causing the crashes

56
Going Forward

Current Program (Windows XP-based)
Normalize by usage as data becomes available
Include periodic configuration data in analysis
Correlate with CRASH tool results
Continue to develop towards rating program
Planned for Longhorn/LDDM
Modify tools for Longhorn and new display driver
model
Larger set of participants for Longhorn Beta1
Recruit more users with newer hardware

57
Call To Action

Create LDDM Drivers
If you are a display vendor, leverage stability
advances in new Longhorn Display Driver Model
(LDDM)
Join the Stability Benchmark Portal
If you are a display IHV or a System Builder
contact grphstab _at_ microsoft.com
Get latest tools and documents
Join the Stability discussion on the portal
Use the tools
Send us feedback and suggestions
Share ideas for new experiments

58
Community Resources