M Windows NT 4.0 Setup and Debugging - PowerPoint PPT Presentation

About This Presentation
Title:

M Windows NT 4.0 Setup and Debugging

Description:

Copy I386 directory to the Hard Drive and start again from the beginning. Make certain that the Controller and/or Hard Drive is correctly configured. Troubleshooting ... – PowerPoint PPT presentation

Number of Views:205
Avg rating:3.0/5.0
Slides: 64
Provided by: asu
Learn more at: https://www.asu.edu
Category:

less

Transcript and Presenter's Notes

Title: M Windows NT 4.0 Setup and Debugging


1
MWindows NT 4.0Setup and Debugging
  • Joseph West
  • Sr Technology Specialist

2
Agenda
  • Setup (build overview)
  • Three phases of Setup
  • Character-Based Setup
  • Boot from Character-Based to GUI-Based Setup
  • GUI-Based Setup
  • Troubleshooting
  • (Blue Screens Stop Codes)
  • Latest information for NT 4.0
  • SP4

3
Hardware Compatibility List
  • How important is it
  • Support parameters
  • http//www.microsoft.com/hwtest/
  • http//support.microsoft.com/

4
Character-Based Setup
  • Gathering of System
  • Architecture Information
  • CPU Type
  • Motherboard Architecture
  • Hard Drive Controllers
  • File Systems
  • Disk Free Space
  • Memory

5
Info Gathered is Required for Basic System
Initialization
  • Failure to Detect will lead to failure of Setup
  • Unsupported components and enhancements
  • PCI 2.1
  • Special Bus Drivers
  • Caching Chips for Burst Mode

6
Boot from Character-Based to GUI-Based Setup
  • Windows NT Kernel is loaded completely for the
    first time
  • Finds a valid Hard Drive
  • Polls Adapters and tests Bus
  • Most likely point of failure
  • Drivers are loaded into Memory and
    Multi-threading is initialized

7
GUI-Based Setup
  • Install secondary Drivers
  • Create Accounts
  • Machine and Administrator
  • Configure Network Settings
  • Build final System Tree and Registry

8
Troubleshooting Character-Based Setup
  • NTHQ Tool
  • Located in Support Directory
  • Purpose is to show all hardware peripheral
    settings
  • Works with PCI, PnP and Legacy peripherals

9
Troubleshooting Character-Based Setup
  • NTHQ Demo

10
Troubleshooting Character-Based Setup
  • Unsupported Controller
  • and BIOS Enhancements
  • 32-bit I/O
  • Enhanced Drive Access
  • Multiple Block Access or Rapid IDE
  • Power Management Features

11
Troubleshooting Character-Based Setup
  • Setup Hangs During Initial Boot
  • Disable CD-Boot capability before installing
  • Needs to be done at both the Controller and BIOS
    levels

12
Troubleshooting Character-Based Setup
  • Setup Cannot Find Hard Drive
  • Scan System for Viruses
  • Make certain there is valid Boot Sector on the
    Hard Drive

13
Troubleshooting Character-Based Setup
  • Setup Cannot Find Hard Drive
  • If Hard Drive Controller is SCSI
  • Are devices properly terminated
  • Is SCSI BIOS enabled - first Controller (if at
    all)
  • On secondary Controllers, make certain BIOS is
    disabled
  • Partition and format using current Controller

14
Troubleshooting Character-Based Setup
  • Setup Cannot Find Hard Drive
  • If Hard Drive Controller is IDE or EIDE
  • Make certain drive is on primary Controller
    Channel
  • Make certain drive is jumpered correctly
  • (i.e.) Master, Slave, Independent

15
Troubleshooting Character-Based Setup
  • Setup Does Not Detect Hard
  • Drive Controller Correctly
  • Manually select Controller type
  • Make certain that an NT 4.0 driver is being
    loaded
  • Use NTHQ Tool to check for correct IRQ and Memory
    addressing

16
Troubleshooting Character-Based Setup
  • Setup Cannot Find a Valid Partition
  • If Windows 95 is on the system, back-up and Fdisk
    Hard Drive (no support for Fat 32)
  • Recreate Partitions and Format with DOS 6.22
  • Restore Windows 95 and proceed with Windows NT
    installation
  • Make certain that correct HAL is being loaded

17
Troubleshooting Failure to RebootFrom
Character-Based to GUI-Based Setup
  • Stop Messages
  • Record Hex Value, 0x1e, 0x7b, etc.
  • Record Values in parentheses
  • Record component where failure occurred
  • Note where in Boot Process error occurred
  • Call PSS (installation support)

18
Troubleshooting Failure to Reboot
  • Stop Messages Which
  • can be Solved in the Field
  • 0x7b, (0x4,0,0,0), or 0x8b
  • Indicates problem with Master Boot Record
  • Scan for Viruses
  • Confirm correct Controller driver is loaded
  • Refresh Master Boot Record

19
Troubleshooting Failure to Reboot
  • After Reboot,
  • Video Remains Black
  • Check for devices using IRQs 2, 9 or 12 (PCI)
  • Scan Hard Drive for Viruses

20
Troubleshooting Failure to Reboot
  • Stop Messages Which
  • can be Solved in the Field
  • 0x1e or 0xa
  • Disable any Third-party services or drivers which
    were loaded prior to Upgrade
  • Use NTHQ to confirm appropriate Memory and IRQ
    settings

21
Troubleshooting GUI-Based Setup Issues
  • Setup Will Not Read
  • From CD-ROM Drive
  • Make certain CD is on HCL
  • Copy I386 directory to the Hard Drive and start
    again from the beginning
  • Make certain that the Controller and/or Hard
    Drive is correctly configured

22
Troubleshooting GUI-Based Setup Issues
  • If Setup Fails During
  • Copy of Files to Hard Drive
  • Disable all external Caches in BIOS
  • Make certain Hard Drives are terminated
    correctly Active Preferred

23
Setup Enhancements in Windows NT 4.0
  • Bootable CD-ROM
  • Supports only El Torrito Specification
  • Can only be used in No Emulation Mode
  • Must be supported by both System and SCSI BIOS

24
Setup Enhancements in Windows NT 4.0
  • Winnt Character-Based
  • Setup Logging
  • Using Winnt or Winnt32 /L
  • Logs all actions during character-based setup to
    find last successful action
  • Helps to isolate where setup halted without
    requiring special DLLs

25
Setup Enhancements in Windows NT 4.0
  • Restartable GUI-Based Setup
  • If the machine fails during GUI-mode Setup the
    problem can be fixed and setup will continue from
    reboot

26
Agenda
  • Setup (build overview)
  • Three phases of Setup
  • Character-Based Setup
  • Boot from Character-Based to GUI-Based Setup
  • GUI-Based Setup
  • Troubleshooting
  • (Blue Screens Stop Codes)
  • Latest information for NT 4.0
  • SP4

27
Youre Up and Running, But ...
28
Debugging(the connection)
  • Connect
  • Modem, Null-modem cable, LAN
  • Boot.ini
  • / Debug /Debugportcom1 / Baudrate19200
  • Symbols
  • Retail NT CD (in the) support\debug\platform\sym
    bols sub-directory

29
Debugging(the connection)
  • Debugging Demo

30
Interpreting Blue Screens
  • The error code and parameters at the top of the
    screen
  • The list of modules that have successfully loaded
    and initialized in the middle of the screen
  • The list of modules that are currently on the
    stack at the bottom of the screen

31
Stop Codes
Note For a complete listing of stop codes, see
Windows NTW 4.0 Resource Kit, Chapter 39,
Windows NT Debugger, or Q142657 article on
http//support.microsoft.com
32
Common Stop Codes
  • 0xA
  • 0x1E
  • 0x24
  • 0x3F
  • 0x50
  • 0x7B
  • 0x7F
  • 0xC000021A

33
0xA
  • 0x0000000A IRQL_NOT_LESS_OR_EQUAL
  • Description
  • An attempt was made to touch paged out memory at
    a process interrupt request level (IRQL) that is
    too high. Code that runs at higher interrupt
    levels cant touch paged-out memory because
    paging would be to expensive. If it happens that
    a pageable page is not committed, but its
    virtual address range is still in the translation
    buffer, high irql code can get away with touching
    it. But if the system is stressed then the
    memory manager will have likely paged that page
    out and when an in page is attempted - the
    bugcheck will occur. So, this is why certain
    bugs tend to not show up on developers boxes
    which are less stressed than production.
  • Typical Scenarios
  • System configuration changes, virus scanners,
    other file I/O filters.

34
0x1E
  • 0x0000001E KMODE_EXCEPTION_NOT_HANDLED
  • Description
  • Essentially, this bugcheck identifies an error
    that occurred in a section of code where no error
    detection routines were in place. Most
    exceptions are generated directly in the section
    of code that is executing. In this case, the
    error was not trapped in the middle of the code
    that was executing. Therefore, the error was
    allowed to fall through to this default error
    handler. This makes the error a very common
    exception. The actual instruction fault is
    usually similar to a STOP 0xA that is a memory
    access violation.
  • Typical Scenarios
  • Invalid or obsolete third-party driver or system
    service, Microsoft driver or system service bug,
    file I/O filter drivers.

35
0x24
  • 0x00000024 NTFS_FILE_SYSTEM
  • Description
  • A STOP 0x24 is the result of NTFS code that
    detects a problem with the structure of the NTFS
    file system. This is not a cut and dried
    exception code and debugging it is sometimes
    difficult. Disk corruption can generate a STOP
    0x23 (FAT_FILE_SYSTEM) and 0x24. However any
    processes involved in reading or writing data
    from a FAT or NTFS file system could cause the
    disk data to appear corrupted. Therefore SCSI
    and IDE drivers as well as the disk structure
    itself (hard errors, i.e. bad blocks) can be
    suspect. The file system calls this bug check in
    multiple places and this will help us identify
    the actual source line that generated the bug
    check. Also, this bugcheck can be caused by I/O
    filter drivers (resource hangs, race conditions,
    etc.). After the above is eliminated, more
    low-level constructs such as file system
    synchronization objects, scb attributes, etc.
    need to be examined by the debug engineer.
  • Typical Scenarios
  • This bugcheck is encountered when the NTFS file
    system has a corruption, or the hard drive has a
    bad block.

36
0x3F
  • 0x0000003F NO_MORE_SYSTEM_PTES
  • Description
  • This stop isnt as common as most of the others
    in this section, but a good explanation is
    warranted. A STOP 0x3F is the result of a system
    doing lots of I/O, therefor fragmenting the
    system PTEs. The bugcheck occurs not because
    the system is out of PTE's, but because a driver
    requests a huge chunk of memory that cant be
    satisfied because a contiguous block that big
    isnt available.
  • Typical Scenarios
  • Often video drivers will allocate large amounts
    of kernel memory that must succeed. Also, some
    backup programs do the same.
  • For these situations, consult a PSS engineer for
    the Registry hack that allows the increase of
    total system PTEs.

37
0x50
  • 0x00000050 PAGE_FAULT_IN_NONPAGED_AREA
  • Description
  • A STOP 0x50 is caused when a memory region that
    is not supposed to be paged out (usually for
    performance reasons) is paged out. This stop can
    be caused by a variety of problems including
    corrupt NTFS volumes, bad network packet data,
    and in general kernel mode drivers that corrupt
    memory. Also, drivers that free an MDL but dont
    communicate it to all portions of the driver.
    Others include Disk, Controller, and Disk Driver
    problems.
  • Typical Scenarios
  • Usually third-party kernel mode drivers munging
    memory, or reading beyond allowable memory.
    Also, when the file system is pushed to the
    tested limits (large Mac volumes), bugs in NTFS
    are exposed that result in this STOP. This STOP
    can occur due to interaction problems between
    SCSI Controller firmware and Hard Drive firmware.

38
0x7B
  • 0x0000007B INACCESSIBLE_BOOT_DEVICE
  • Description
  • During the initialization of the I/O system, the
    driver for the boot device may have failed to
    initialize the device that the system is
    attempting to boot from, or the file system that
    is supposed to read that device may have either
    failed its initialization or simply not
    recognized the data on the boot device as a file
    system structure.
  • If this is the initial setup of the system, this
    error may have occurred because the system was
    installed on an unsupported Hard Disk or SCSI
    Controller.
  • This error can also be caused by the installation
    of a new SCSI Adapter or Hard Disk Controller or
    by repartitioning the Hard Disk with the System
    Partition.
  • Typical Scenarios
  • VIRUS
  • LBA type problems, MBR type problems, SCSI
    Controller/Hard Drive geometry issues, etc.

39
0x7F
  • 0x0000007F UNEXPECTED_KERNEL_MODE_TRAP
  • Description
  • This error means a trap occurred in kernel mode,
    either a kind of trap that the kernel is not
    allowed to have or catch (a bound trap), or a
    kind of trap that is always instant death (double
    fault).
  • Typical Scenarios
  • Hardware, kernel mode drivers that manipulate
    critical system data in an untimely fashion.
  • This STOP most often is the result of the
    processor taking a double 0x7f (8,0,0,0). Note
    that these parameters can also show up for a
    modern software issue involving Netmon (bhnt.sys).

40
0xC000021A
  • 0xC000021A FATAL_SYSTEM_ERROR
  • Description
  • This is a typical description that accompanies
    this error The Windows Subsystem System process
    terminated unexpectedly with a status of
    (0x6130F2B6 0x01B6FBA4). The system has been
    shutdown.
  • The failing process sometimes is listed in the
    blue screen itself.
  • This bugcheck occurs when a user-mode subsystem
    such as Winlogon or CSRSS is fatally compromised
    such that security can not be guaranteed. The
    Operating System makes a transition into kernel
    mode and throws this exception.
  • Typical Scenarios
  • A typical cause of this crash would be an
    extensible perfmon counter that overwrites its
    Winlogon shared data buffer (Q171033), and in
    general any access violation that compromises a
    user-mode subsystem.

41
Break
42
Agenda
  • Setup (build overview)
  • Hardware Compatibility List
  • Three Phases of Setup
  • Character-Based Setup
  • Boot from Character-Based to GUI-Based Setup
  • GUI-Based Setup
  • Troubleshooting
  • (Blue Screens Stop Codes)
  • Latest Information for NT 4.0
  • SP4

43
A Day in the Life
Video
44
NT4 Service Pack 4
  • Contents
  • Hotfixes for important customer-reported problems
  • Resource and memory leak bugfixes from NT5
  • 30 support, diagnostic and repair tools from the
    NT Resource Kit are included on the SP4 CDROM
  • Event log entries for clean and dirty shutdown
  • Process Improvements
  • Dedicated Service Pack test team
  • Beta Program for Service Packs
  • Improving the Knowledge Base, depth and ease of
    use
  • Slipstreaming Service Packs into OEM releases

45
Resource / Memory Leaks
  • Problem
  • Leaks lead to hung systems and bluescreen crashes
  • Some customers do preventive reboots
  • Difficult to stop or kill the offending process
  • Solutions
  • Fix leaks several hundred in NT5, key fixes in
    NT4 SP4
  • Job objects in NT5, set memory limits on a
    collection of processes
  • Visual Studio adding leak checking to MFC and CRT
  • Next Work Items
  • Better leak detection
  • Logging in under low resource conditions
  • Stopping and killing processes

46
Bugchecks (Blue Screens)
  • Kernel mode code detected a serious error
  • Blue screens are still frequent and very hard to
    diagnose
  • Crash dumps take too long on large memory systems
  • Prevention
  • Find and fix bugs in our code
  • Review all calls to KEbugcheck by NT5 RTM
  • Improve diagnosis
  • Reduced clutter on the blue screen, focus on key
    data, and add hints
  • Crash dumps are now dramatically faster in NT5
  • Developing comprehensive crashdump analysis tools
    for NT4 and NT5

47
Bugchecks (Blue Screens)
48
3rd Party Drivers
  • Problem
  • One of the most common complaints from PSS
  • Source of pool corruption - difficult to diagnose
  • Solution
  • DDK driver samples and documentation is improved
    in NT5
  • Enhanced driver testing in NT4 and NT5, including
    pool corruption tests
  • NT5 will have driver signing, warning level by
    default
  • WDM drivers will drive higher quality
  • We are testing major third-party anti-virus
    software regularly

49
Unnecessary Reboots in NT5
  • Problem
  • Hardware and software configuration and
    maintenance
  • Solutions
  • Fixed 50 software configuration cases which
    required a reboot in NT4. Key fixes include
  • Adding, removing and configuring network
    protocols changing IP addresses
  • Reconfiguring settings on PCI and other PnP
    hardware
  • Reboots still required for some rare cases
  • Machine name change, domain membership changes,
    system locale and system font changes, service
    pack installation
  • Hardware reconfiguration by clustering solutions
    in NTS/E
  • Where possible, hotfixes will avoid requiring a
    reboot

50
Diagnosis and Recovery
  • Recovery Involves
  • Detection (hard with a hung application or
    server)
  • Diagnosis (need good tools, need parallel
    installs, bad error messages)
  • System Recovery (chkdsk, crash dump biggest time
    hits)
  • Application recovery (SQL, Exchange Store, etc)
  • We are delivering
  • 30 of the most critical support, diagnostic, and
    repair tools in SP4 and NT5 B2
  • Fixing 35 worst error messages by B230, then
    next 200 as time allows
  • NT5 Safe-mode Boot today and Floppy Boot by NT5
    RTM
  • Both support NTFS
  • Web-based trouble-shooter for most common
    bluescreens
  • Online chkdsk post NT5

51
NT Test Initiatives
  • Long duration Server stress
  • 10 Servers running stress for a month starting
    at NT5 Beta 2
  • Mix of stress including BackOffice, IIS,
    Client/Server, etc
  • Specifically watching for memory and resource
    leaks
  • Improved driver testing for NT4 and NT5
  • Catch pool corruption
  • Fault injection
  • Better integration testing of Server applications
  • BackOffice applications Exchange, SQL Server
  • Using automated scripts from BackOffice teams
  • Testing with Oracle, SAP R/3, Lotus Notes
  • 100 Top Server Applications from Tier 1 RDP
    customers
  • Expanded tests for customer configurations
  • RDP Customer configurations, ISP

52
Resource Kit Tools
  • Network Diagnostic and Support Tools
  • nettest - quickly determine whether local uses
    network is configured properly (IDW)
  • Applications, Service Problems and Memory Leaks
  • memsnap - detection of memory and resource leaks
    over time (dump directory)
  • Disk Problems
  • fixacls - resets ACLs on system files to
    installation defaults, fixes users who hose their
    ACLs
  • Debugger Tools
  • debug wizard - easy setup of debuggers for
    customers
  • Other
  • windiff - file compare util, critical for many
    situations (reskit)

53
Event Log Analyst
  • Prototype tool for collecting and analyzing event
    log reliability data
  • Designed for collecting reliability trend data
    from an entire datacenter in few hours
  • Collected data from 800 CDC servers in 5 hours
  • Analysis is manual with Excel, less than 3 hours
  • Provides trend analysis of reboots, bugchecks,
    and Dr Watsons

54
Event Log Analyst
55
Event Log Analyst Metrics
  • Mean time between reboots
  • Mean time between bugchecks
  • Mean time between Dr Watsons
  • Trend analysis of reboots/server-year
  • Trend analysis of bugchecks/server-year
  • Trend analysis of Dr Watsons/server-year
  • Bugcheck distribution
  • Dr Watson distribution
  • SP4 Only Availability percentage
  • SP4 Only Mean time to repair

56
Tools for NT4 SP4 and NT5
  • Network Diagnostic and Support Tools
  • browstat - only useful tool for diagnosing
    browser problems (reskit)
  • dhcpcmd - useful for fixing DHCP issues (reskit)
  • dnscmd - diagnose and repair DNS problems
    (reskit)
  • eseutil - used for WINS and DHCP database
    diagnosis and repair
  • nettest - quickly determine whether local uses
    network is configured properly (IDW)
  • winscl - diagnose and repair WINS (reskit)
  • winsadd - command line tool for batching static
    and dynamic entries in WINS
  • nltest - used for resetting secure channels,
    diagnosing and fixing trust problems (reskit)

57
Tools for NT4 SP4 and NT5
  • Applications, Service Problems and Memory Leaks
  • depends - display and troubleshoot application
    dependency problems (IDW)
  • tlist - list running processes, used in
    conjunction with kill (reskit)
  • kill - forcibly terminate processes (reskit)
  • memsnap - detection of memory and resource leaks
    over time (dump directory)
  • pmon - detection of memory and resource leaks
    over time (reskit)
  • pviewer - gather extended information about
    running processes (reskit)
  • reg - registry utility, used for diagnosis and
    repair of many types of issues

58
Tools for NT4 SP4 and NT5
  • Disk Problems
  • disksave - saves and restores the MBR (reskit)
  • fixacls - resets ACLs on system files to
    installation defaults, fixes users who hose their
    ACLs
  • ftedit - used daily to help customers repair
    fault tolerant volumes (reskit)
  • Debugger Tools
  • gflags - set global flags needed for various
    kinds of debugging (IDW)
  • remote - allow remote debugging by PSS (reskit)
  • debug wizard - easy setup of debuggers for
    customers
  • all standard debuggers - already ships in
    /support dir

59
Tools for NT4 SP4 and NT5
  • Other
  • uptomp - update system from uniproc to multiproc
    (reskit)
  • robocopy - used daily by PSS during support
    calls, easiest way to move large amounts of data
    around very quickly.
  • shutdown - remote shutdown of systems (reskit)
  • ntevntlg.mdb ntmsgs.hlp - better error message
    docs (reskit)
  • windiff - file compare utility critical for many
    situations (reskit)
  • dumpel - dump event log messages from local or
    remote systems (reskit)
  • list - used daily by PSS for reviewing
    exceedingly large log files, etc.

60
Summary
  • Best Practices matter
  • Mature, disciplined planning procedures
  • Design, Implement, Test
  • Configuration Operational control
  • Technology matters
  • OS system services
  • UPS, RAID, ECC Memory, multi-homing
  • Cluster Services
  • We can deliver availability with Windows NT today
  • Microsoft is investing heavily in availability

61
References and Resources
  • http//www.microsoft.com/ntserver/
  • http//www.microsoft.com/ntworkstation/
  • http//www.microsoft.com/windowsnt5/
  • http//www.microsoft.com/hwtest/
  • http//support.microsoft.com/
  • http//support.microsoft.com/support/kb/articles/q
    103/0/59.asp
  • Descriptions of Bug Codes for Windows NT

62
References and Resources
  • Inside Windows NT Second Edition, David A.
    Solomon
  • MS Press 1998
  • Windows NTW 4.0 Resource Kit
  • Chapter 19 What Happens When You Start Your
    Computer
  • Chapter 21 Troubleshooting Startup and Disk
    Problems
  • Chapter 36 General Troubleshooting
  • Chapter 39 Windows NT Debugger, or Q142657
    article
  • Supporting Windows NT Server in the Enterprise
  • MS Press 1998
  • Chapter 7 Troubleshooting Tools and Methods

63
Questions?
64
M
Write a Comment
User Comments (0)
About PowerShow.com