Title: Graphics Stability
1Graphics Stability
- Gershon Parent
- Software Swordsman
- WGGT
- gershonp _at_ microsoft.com
- Microsoft Corporation
Steve Morrow Software Design Engineer WGGT stevemo
r _at_ microsoft.com Microsoft Corporation
2Session Outline
- Stability Benchmark History
- CRASH (Comparative Reliability Analyzer for
Software and Hardware) - The CRASH Tool
- The CRASH Plan
- The Experiments
- CDER (Customer Driver Experience Rating)
- Program Background and Description
- High-level Statistics of the Program
- Factors Examined in the Crash Data
- Normalized Ratings
- Customer Experience and Loyalty
3Stability Benchmark History
- WinHEC May 04
- CRASH 1.0 released.
- Web portal has 52 non-MS members from 16
companies - November 04
- CRASH 1.1 released to the web. Includes DB
backend - December 04
- Stability Benchmark components ship to 8,000
customers and normalizable OCA data begins
flowing in - CRASH Lab completes first data collection pass
- Web portal has over 60 non-MS members from 17
companies
4CRASH Tool
- CRASH is new dynamic software loading tool
designed to expose and easily reproduce
reliability defects in drivers/hardware - Answers the call from IHVs and OEMs for more
reliability test tools. - Enables wide range of endurance/load/stress
testing - Configurable load profiles
- Scheduled cycling (starting and stopping) of
test applications - Replay-ability
- Automatic failure cause determination
- Scripting for multiple passes with different
scenarios - Creation of a final score
5CRASH Demo
o
_
X
6CRASH Demo
7CRASH Demo
8CRASH 4 Phase Plan
- Phase 1
- Produce CRASH documentation for review by
partners - Release 1.0 to our partners for feedback
- Phase 2
- Release 1.1 with database functionality to our
partners - Execute controlled baseline experiments on a
fixed set of HW and SW to evaluate the tools
effectiveness - Phase 3
- Execute series of experiments and use results to
increase accuracy and usefulness of the tool - Phase 4
- Create a CRASH-based tool for release to a
larger audience
9Experiment 1 Objectives
- Determine if the CRASH data collected sufficient
to draw meaningful conclusions about the
part/driver stability differences - Determine how machine configuration affects
stability - Evaluate how the different scenarios relate to
conclusions about stability - Find the minimum data-set needed to make
meaningful conclusions about part/driver
stability - Create a baseline from which to measure future
experiments - Identify other dimensions of stability not
exposed in the CRASH score
10Experiment 1 Details
- Standardize on one late-model driver/part from
four IHVs - Part/Driver A, Part/Driver B, Part/Driver C,
Part/Driver D - Test them across 12 different flavors of
over-the-counter PCs from 4 OEMs - OEM A, OEM B, OEM C, OEM D
- High End and Low End
- Include at least two motherboard types
- MB Type 1, MB Type 2
- Clean install of XP SP2 plus latest WHQL drivers
- Drivers snapped 8/16/04
- Use the 36 hr benchmark profile shipped with
CRASH 1.1
11Important Considerations
- Results apply only to these Part/Driver/System
combinations only - Extrapolation of these results to other parts or
drivers or systems is impossible with this data
12CRASH Terminology
- Profile
- Represents a complete run of the Crash tool
against a driver - Contains one or more scenarios
- Scenario
- Describes a session of CRASH testing
- Load intensity/profile
- What tests will be used
- How many times to run this scenario (loops)
- Score
- Score is always a number that represents the
percentage of the testing completed before a
system failure (hang or kernel-break)
13Profile Score Averages
14CRASH Terminology Failures
- Failure
- Hang
- No minidump found and loop did not complete
- Targeted Failure
- Minidump auto-analysis found failure was in the
display driver - Non-Targeted Failure
- Minidump analysis found failure was not in
display driver - Does not count against the score
15Percentage of Results by Type
16Average Profile Score by Machine Group
17Average Profile Score by OEM and MB
18Affect of MB Type on Profile Score
19Score Distribution for Part/Driver C D(MB Type
1)
20Experiment 1 Test Profile
- Real Life
- Moderate load and application cycling
- 9 max and 3 min load
- Tractor Pull
- No load cycling
- Moderate application cycling
- Incrementally increasing load
- Intense
- High frequency load and application cycling
- 9 max and 0 min load
21Average Scenario Score by Part/Driver
22Statistical Relevance Questions
- Question How do I know that the difference
between the averages of result set 1 and Result
Set 2 are meaningful? - Question How can I find the smallest result set
size that will give me 95 confidence? - Answer Use the Randomization Test
23Randomization Test
Delta 1
Set 1
Set 2
Combination Set
Delta 2
Random Set 2
Random Set 1
- Random test 10,000 times. If 95 of the time the
Delta 1 is greater than Delta 2 then you are
assured the difference is meaningful. - Try smaller sample sizes until the confidence
drops below 95. That is your minimum sample
size. - Information on the Randomization Test can be
found online athttp//www.uvm.edu/dhowell/Stat
Pages/Resampling/RandomizationTests.html
24Scores and Confidence Intervals for
Part/Driver/MB Combinations
25The Experiment Matrix
- With three experiments completed, we can now
compare - One driver across two OS configurations
- Two versions of one driver across a single OS
configuration
26Old vs. New Drivers
- This table compares the profile scores for old
drivers vs. new drivers on OEM Image - New drivers were noticeably better for
parts/drivers C D - Part/Driver A and B were unchanged
27OEM Image vs. Clean Install
- This table compares profile scores for OEM Image
vs. Clean Install with Old Drivers - Clean install scores universally better than OEM
image for parts/drivers C and D - Part/Driver A and B were unchanged
28Future Plans
- Collate with OCA data
- CRASH failure to OCA bucket correlations
- What buckets were fixed between 1st and 2nd
driver versions? - Do our results match field data?
- customer machines have hardware that is typically
several years old - Can we find the non-display failure discrepancy
in the field? - Begin to tweak other knobs
- Content
- Driver-versions
- HW-versions
- Windows codenamed Longhorn Test Bench
- PCIe cards
29Suggested Future Experiments
- Include more motherboard types
- Newer drivers or use a Control Group driver.
Reference Rasterizer? - Disabled AGP to isolate chipset errors from AGP
errors - Driver-Verifier enabled
- Add non-graphics stress tests to the mix
- Modified Loop Times
30IHV Feedback
- There are definitely unique driver problems
exposed through the use of CRASH and it is
improving our driver stability greatly - CRASH is producing real failures and
identifying areas of the driver that we are
improving on - Thanks for a very useful tool
31CRASH 1.2 features
- RunOnExit
- User specified command run upon the completion of
CRASH profile - More logging
- Logging to help troubleshoot problems with data
flow - More information output in xml
- More system information
- More failure details from minidumps
- More control over where files are put
- More robust handling of network issues
32Customer Device Experience Rating (CDER) Program
Background
- Started from a desire to rate display driver
stability based on OCA crashes - Controlled program addresses shortcomings of OCA
data - Unknown market share
- Unknown crash reporting habits
- Unknown info on non-crashing machines
- This allows normalization of OCA data to be able
to get accurate number of crashes per machine
stability rating
33CDER Program Description Status
- Program Tools
- A panel of customers (Windows XP only)
- User opt-in allows extensive data collection,
unique machine ID - System Agent/scheduler
- System Configuration Collector
- OCA Minidump Collector
- System usage tool (not yet in the analysis)
- Status
- All tools for Windows XP in place and functioning
- First set of data collected, parsed, analyzed
34Overall Crash Statistics of Panel
- Machines
- 8927 in panel
- 49.9 experience no crashes
- 50.1 experience crash(es)
- 8580 have valid device driver info
- 82.2 have no display crashes
- 17.8 have display crashes
- Crashes
- 16.1 of valid crashes are in display
- Note Crashes occurred over 4 yr period
35Crash Analysis Factors
- Examined several factors which may have an impact
on stability ratings - Processor
- Display Resolution
- Bit Depth
- Monitor Refresh Rate
- Display Memory
- Note Vendor part naming does not correspond to
that in CRASH presentation. - Note Unless otherwise noted, data for these
analyses were from the last 3 years
36Display Resolution Crashes Distribution
37Bit Depth Crashes Distribution
38Refresh Rate Crashes Distribution
39Display Memory Crashes Distribution
40Display Crashes By Type (Over Last Year)
41Normalized Crash Data
- The following data is normalized by program share
of crashing and non-crashing machines
42Crashes per Machine Ranking by Display Vendor for
Last Year (2004)
43Vendor A Normalized Crashes By Part/ASIC Family
Over Last 3 Years
44Display Vendor B Normalized Crashes By
Part/ASIC Family Over Last 3 Years
45Display Vendor C Normalized Crashes By
Part/ASIC Family Over Last 3 Years
46Normalized Crashes Ranked by Part - 2004
47Ranking and Rating Conclusions
- This is a first look
- Need to incorporate system usage data
- Need to continue collecting configuration data to
track driver and hardware changes - Need more panelists, and a higher proportion of
newer parts - With that said
- This is solid data
- This demonstrates our tools work as designed
- It shows the viability of a crash-based rating
program
48Customer Experience Loyalty
- A closer look at segment of panelists who
- Experienced display crashes, and
- Switched or upgraded their display hardware or
driver
49Experience Loyalty Highlights
- 19.4 of users who experienced display crashes
upgraded their drivers, or hardware, or changed
to a different display vendor - 7.9 of users (nearly 41 of the 19.4) who
experienced display crashes switched to a
competitors product - ALL users who switched to a competitors product
had the same or better experience - Only 91.3 of those who upgraded had the same or
better experience afterwards, based on crashes - Time clustering of crashes
50Overall Experience of Users After Changing
Display System
51Experience of Users After Upgrading
52Experience of Users After Switching Display
Vendors
53Time-Clustering of Crashes for Users Who
Experienced 3 Or More Crashes
- Our data indicates a users crashes are
generallyhighly clustered in time
54Time-Clustering of Crashes for Users Who
Experienced 6 Or More Crashes
55User Experience Caveats
- User Experience here is strictly concerned with
how many crashes the users experienced - It doesnt include hardware changes/upgrades
where different hardware used the same driver - Having fewer crashes may not always mean user
experience was better, but for the vast majority
we believe it does - Having fewer crashes may be attributable to other
system changes, and/or other factors - Crashes going away may mean the user gave up
using whatever was causing the crashes
56Going Forward
- Current Program (Windows XP-based)
- Normalize by usage as data becomes available
- Include periodic configuration data in analysis
- Correlate with CRASH tool results
- Continue to develop towards rating program
- Planned for Longhorn/LDDM
- Modify tools for Longhorn and new display driver
model - Larger set of participants for Longhorn Beta1
- Recruit more users with newer hardware
57Call To Action
- Create LDDM Drivers
- If you are a display vendor, leverage stability
advances in new Longhorn Display Driver Model
(LDDM) - Join the Stability Benchmark Portal
- If you are a display IHV or a System Builder
contact grphstab _at_ microsoft.com - Get latest tools and documents
- Join the Stability discussion on the portal
- Use the tools
- Send us feedback and suggestions
- Share ideas for new experiments
58Community Resources
- Windows Hardware Driver Central (WHDC)
- www.microsoft.com/whdc/default.mspx
- Technical Communities
- www.microsoft.com/communities/products/default.msp
x - Non-Microsoft Community Sites
- www.microsoft.com/communities/related/default.mspx
- Microsoft Public Newsgroups
- www.microsoft.com/communities/newsgroups
- Technical Chats and Webcasts
- www.microsoft.com/communities/chats/default.mspx
- www.microsoft.com/webcasts
- Microsoft Blogs
- www.microsoft.com/communities/blogs
59Additional Resources
- Email grphstab _at_ microsoft.com
- Related Sessions
- Graphics Stability Part 2
- WDK for Graphics An Introduction
- Longhorn Display Driver Model Roadmap and
Requirements - Longhorn Display Driver Model Key Features