Title: Linux Guide to Linux Certification, Second Edition
1Linux Guide to Linux Certification, Second
Edition
- Chapter 13
- Troubleshooting and Performance
2Objectives
- Describe and outline common troubleshooting
procedures - Identify good troubleshooting practices
- Effectively troubleshoot common hardware-related
problems - Effectively troubleshoot common software-related
problems
3Objectives (continued)
- Monitor system performance using command-line and
graphical utilities - Identify and fix common performance problems
- Understand the purpose and usage of kernel
modules - Recompile and patch the Linux kernel
4Troubleshooting Methodology
Figure 13-1 The maintenance cycle
5Troubleshooting Methodology (continued)
- Monitoring Observing system areas for problems
or irregularities - Proactive maintenance Minimizing chance of
future problems - e.g., perform regular system backups
6Troubleshooting Methodology (continued)
- Reactive maintenance Correcting problems when
they arise - Documenting solutions
- Developing better proactive maintenance methods
- Documentation System information stored in a log
book for future references - Troubleshooting procedures Tasks performed when
solving system problems
7Troubleshooting Methodology (continued)
Figure 13-2 Common troubleshooting procedures
8Troubleshooting Methodology (continued)
- Two troubleshooting golden rules
- Prioritize problems according to severity
- Spend reasonable amount of time on each problem
given its priority - Try to solve root of problem
- Avoid missing underlying cause
- Justify why a certain solution is successful
9Resolving Common System Problems
- Two categories of problems
- Hardware-related
- Software-related
10Hardware-Related Problems
- Often involve improper hardware or software
configuration - SCSI termination
- Video card and monitor configuration
- POST test alerts
- Loose hardware connections
- IRQ or I/O address conflicts
- View output of dmesg command
11Hardware-Related Problems (continued)
- Absence of device drivers prevent OS from using
associated devices - Kudzu program Detect and install support for new
hardware - If hardware device not detected, device driver
must be configured manually - HDDs most common device to fail
- Good idea to use RAID
12Hardware-Related Problems (continued)
Figure 13-3 The kudzu welcome screen
13Hardware-Related Problems (continued)
Figure 13-4 Configuring new hardware using kudzu
14Hardware-Related Problems (continued)
- If HDD containing partitions mounted on
noncritical directories fails - Power down computer and replace failed HDD
- Boot Linux system
- Use fdisk to create partitions on replaced HDD
- Use mkfs to create filesystems
- Restore original data
- Ensure /etc/fstab has appropriate entries to
mount filesystems
15Hardware-Related Problems (continued)
- If HDD containing / filesystem fails
- Power down computer and replace failed HDD
- Reinstall Linux on new HDD
- Restore original configuration and data files
16Software-Related ProblemsApplication-Related
Problems
- Missing program libraries/files, process
restrictions, or conflicting applications - Dependencies Prerequisite shared libraries or
packages required for program execution - Programs usually check at installation
- Package files may be removed accidentally
17Software-Related ProblemsApplication-Related
Problems (continued)
- rpm V command Identify missing files in a
package or package dependency - ldd command Display shared libraries used by a
program - ldconfig command Updates /etc/ld.so.conf and
/etc/ld.so.cache files
18Software-Related ProblemsApplication-Related
Problems (continued)
- /etc/ld.so.conf file List of directories
containing shared libraries - /etc/ld.so.cache file Contains location of
shared library files - compressor/decompressor (codec) file Contains
rules to compress or decompress multimedia
information
19Software-Related ProblemsApplication-Related
Problems (continued)
- Filehandles Connections that programs make to
files - ulimit command Modify process limit parameters
in current shell - Can also modify max number of filehandles
- /var/log directory Contains most system log
files - If applications stop functioning due to
difficulty gaining resources, restart using SIGHUP
20Software-Related ProblemsOperating
System-Related Problems
- Most software-related problems related to OS
- Boot loader, filesystem, serial device problems
- LILO problems Place linear in, remove
compact from /etc/lilo.conf file - GRUB problems Typically result of missing files
in /boot directory - mkbootdisk command Create a boot floppy diskette
21Software-Related ProblemsOS-Related Problems
(continued)
- If filesystem on partition mounted to noncritical
directory becomes corrupted - Unmount filesystem
- Run fsck command with f (full) option
- If fsck command cannot repair filesystem, use
mkfs command to re-create the filesystem - Restore filesystems original data
22Software-Related ProblemsOS-Related Problems
(continued)
- If / filesystem is corrupted
- Boot from first Red Hat Fedora installation CD
- Type linux rescue at welcome screen
- Enter shell for Linux system on CD
- Create new / filesystem via mkfs command
- Restore original data to re-created / filesystem
- Reboot system
23Software-Related ProblemsOS-Related Problems
(continued)
Figure 13-5 The Red Hat Fedora Linux
installation welcome screen
24Software-Related ProblemsOS-Related Problems
(continued)
Figure 13-6 Obtaining a shell in rescue mode
25Software-Related ProblemsOS-Related Problems
(continued)
Figure 13-7 The command-line shell used in
rescue mode
26Software-Related ProblemsOS-Related Problems
(continued)
- Knoppix Linux and BBC Linux Bootable CD-based
Linux distributions containing many filesystem
repair utilities - setserial command Set IRQ, I/O address, and
speed of serial devices
27Software-Related ProblemsOS-Related Problems
(continued)
Table 13-1 Common keywords used with the
setserial utility
28Performance Monitoring
- Jabbering Failing hardware components send large
amounts of information to CPU - Other causes of poor performance
- Software monopolizes system resources
- Too many processes
- Too many read/write requests to HDD
- Rogue processes
29Performance Monitoring (continued)
- Bus mastering Peripheral components perform
tasks normally executed by CPU - To increase performance
- Add RAM
- Upgrade to faster HDDs
- Disk Striping RAID
- Decrease kernel size
30Performance Monitoring (continued)
- Run performance utilities on a regular basis
- Record results in a system log book
- Eases identification of performance problems
- Baseline Measure of normal system activity
31Monitoring Performance with sysstat Utilities
- System Statistics (sysstat) package Common
performance monitoring utilities - Multiple Processor Statistics (mpstat) utility
Displays CPU statistics - Input/Output Statistics (iostat) command
Displays block device input/output statistics - System Activity Reporter (sar) command Displays
various system statistics
32Monitoring Performance with sysstat Utilities
(continued)
Table 13-2 Common options to the sar command
33Monitoring Performance with sysstat Utilities
(continued)
- Large number of pages sent to and taken from swap
partition - Slower performance
- Add RAM to resolve
34Other Performance Monitoring Utilities
- free command Displays memory and swap statistics
- vmstat command Displays memory, CPU, and swap
statistics
35Customizing the Kernel
- Options to provide additional hardware support or
change existing hardware support - Insert modules into kernel
- Recompile kernel
- Download and compile new kernel
36Kernel Modules
- May insert device drivers and kernel features
into the kernel as modules - Reduces kernel size
- Good form to compile standard device support into
the kernel - Leave support for other devices and features as
modules - Modules typically stored in subdirectories of
/lib/modules/ltkernel-versiongt
37Kernel Modules (continued)
- insmod command Insert modules into kernel
- modprobe command Insert module and all necessary
prerequisite modules into kernel - lsmod command Lists modules currently used by
kernel - Can also show module dependencies
- rmmod command Remove modules from kernel
38Kernel Modules (continued)
- Modules usually inserted into kernel
automatically at boot time using modprobe - /etc/modprobe.conf file Used to load any alias
modules at system startup
39Compiling a New Linux Kernel
- Gain or remove hardware or kernel support
- /usr/src/ltkernel-versiongt directory Contains
Kernel source code for a specific distribution - Can download new kernel source code
- /usr/src/linux directory Contains Kernel source
code - Symbolic link to /usr/src/ltkernel-versiongt
40Compiling a New Linux Kernel (continued)
- Make commands Compilation-related tasks
- Many types
- make mrproper Remove files created by previous
kernel - make oldconfig Record current kernel features
and settings - make config Prompts user for kernel
configuration information - Text-based
41Compiling a New Linux Kernel (continued)
- make menuconfig Provides menus to select kernel
configuration - make xconfig or make gconfig Provide graphical
interface to select kernel configuration - xconfig runs in KDE gconfig runs in GNOME
- make clean Remove files not required for
compilation
42Compiling a New Linux Kernel (continued)
Figure 13-8 The make menuconfig interface
43Compiling a New Linux Kernel (continued)
Figure 13-9 The make gconfig interface
44Compiling a New Linux Kernel (continued)
Figure 13-10 Configuring power options in the
Linux kernel
45Compiling a New Linux Kernel (continued)
- make bzImage Compile the kernel
- Creates bzip2-compressed kernel
- Copy to /boot directory
- Rename as vmlinuz-ltkernel versiongt
- makemodules_install Compile necessary modules
and copy to appropriate location
46Patching the Linux Kernel
- If not changing kernel version, can apply patches
- Rather than download all kernel source code
- Patch command Supply a patch to kernel source
code - Still need to recompile kernel
47Summary
- Administrators monitor the system, perform
proactive/reactive maintenance, and document
system information - Common troubleshooting procedures involve
isolating and determining the cause of system
problems and implementing and testing solutions
that can be documented for future use - System problems can be categorized as hardware-
or software-related - IRQ conflicts, invalid hardware settings, absence
of kernel support, and hard disk failure are
common hardware-related problems
48Summary (continued)
- Software-related problems can be
application-related or OS-related - Absence of program dependencies or shared
libraries, program limits, and resource conflicts
are common application-related problems, whereas
boot failure, filesystem corruption, and the
misconfiguration of serial devices are common
OS-related problems - System performance is affected by a variety of
hardware and software factors
49Summary (continued)
- The sysstat package contains many useful
performance monitoring commands - System features and hardware support are compiled
into the kernel or provided by modules - You can compile a Linux kernel with only the
necessary features and support to increase system
performance