CS5226 2002 Hardware Tuning - PowerPoint PPT Presentation

About This Presentation
Title:

CS5226 2002 Hardware Tuning

Description:

Parity Check - Classical ... A single parity bit can only reveal single bit errors since if an even number of ... Take result of XOR and XOR with parity stripe. ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 33
Provided by: compN
Category:

less

Transcript and Presenter's Notes

Title: CS5226 2002 Hardware Tuning


1
CS5226 2002Hardware Tuning
  • Xiaofang Zhou
  • School of Computing, NUS
  • Office S16-08-20
  • Email zhouxf_at_comp.nus.edu.sg
  • URL www.itee.uq.edu.au/zxf

2
Outline
  • Part 1 Tuning the storage subsystem
  • RAID storage system
  • Choosing a proper RAID level
  • Part 2 Enhancing the hardware configuration

3
Modern Storage Subsystem
  • More than just a disk
  • Disks, or disk arrays
  • Connections between disks and processors
  • Software to manage and config. devices
  • A logical volume for multiple devices
  • A file system to manage data layout

4
RAID Storage System
  • Redundant Array of Inexpensive Disks
  • Combine multiple small, inexpensive disk drives
    into a group to yield performance exceeding that
    of one large, more expensive drive
  • Appear to the computer as a single virtual drive
  • Support fault-tolerant by redundantly storing
    information in various ways

5
Data Striping
File blocks (e.g., 8KB per block)
Disk 2
Disk 3
Disk 4
Disk 5
Disk 6
Disk 1
Stripe unit blocks 1-6, 7-12,
6
Parity Check - Classical
  • An extra bit added to a byte to reveal errors in
    storage or transmission
  • Even (odd) parity means that the parity bit is
    set so that there are an even (odd) number of one
    bits in the word, including the parity bit
  • A single parity bit can only reveal single bit
    errors since if an even number of bits are wrong
    then the parity bit will not change
  • It is not possible to tell which bit is wrong

7
Parity Check - Checksum
  • A computed value based on the content of a block
    of data
  • Transmitted or stored along with the data to
    detect data corruption
  • Recomputed at the receiver end to compare with
    the one received
  • Detects all errors with old bits of errors, and
    most errors with event number of bits
  • It is computed by summing the bytes of the data
    block ignoring overflow
  • Other parity check methods, such as Hamming Code,
    corrects errors

8
RAID Types
  • Five types of array architectures, RAID 1 5
  • Different disk fault-tolerance
  • Different trade-offs in features and performance
  • A non-redundant array of disk drives if often
    referred to RAID 0
  • Only RAID 1, 3 and 5 are commonly used
  • RAID 2 and 4 do not offer any significant
    advantages over these other types
  • Certain combination is possible (10, 35 etc)
  • RAID 10 RAID 1 RAID 0

9
RAID 0 - Striping
  • No redundancy
  • No fault tolerance
  • High I/O performance
  • Parallel I/O

10
RAID 1 Mirroring
  • Provide good fault tolerance
  • Works ok if one disk in a pair is down
  • One write a physical write on each disk
  • One read either read both or read the less busy
    one
  • Could double the read rate

11
RAID 3 - Parallel Array with Parity
  • Fast read/write

12
RAID 5 Parity Checking
  • For error correction, rather than full redundancy
  • Each stripe unit has an extra parity stripe
  • Parity stripes are distributed

13
RAID 5 Read/Write
  • Read parallel stripes read from multiple disks
  • Good performance
  • Write 2 reads 2 writes
  • Read old data stripe read parity stripe (2
    reads)
  • XOR old data stripe with replacing one.
  • Take result of XOR and XOR with parity stripe.
  • Write new data stripe and new parity stripe (2
    writes).

14
RAID 10 Striped Mirroring
  • RAID 10 Striping mirroring
  • An striped array of RAID 1 arrays
  • High performance of RAID 0, and high tolerance of
    RAID 1 (at the cots of doubling disks)

.. More information about RAID disks at
http//www.acnc.com/04_01_05.html
15
Comparing RAID Levels
16
What RAID Provides
  • Fault tolerance
  • It does not prevent disk drive failures
  • It enables real-time data recovery
  • High I/O performance
  • Mass data capacity
  • Configuration flexibility
  • Lower protected storage costs
  • Easy maintenance

17
Hardware vs. Software RAID
  • Software RAID
  • Software RAID run on the servers CPU
  • Directly dependent on server CPU performance and
    load
  • Occupies host system memory and CPU operation,
    degrading server performance
  • Hardware RAID
  • Hardware RAID run on the RAID controllers CPU
  • Does not occupy any host system memory. Is not
    operating system dependent
  • Host CPU can execute applications while the array
    adapter's processor simultaneously executes array
    functions true hardware multi-tasking

18
RAID Levels - Data
  • Settings
  • accounts( number, branchnum, balance)
  • create clustered index c on accounts(number)
  • 100000 rows
  • Cold Buffer
  • Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID
    controller from Adaptec (80Mb), 4x18Gb drives
    (10000RPM), Windows 2000.

19
RAID Levels - Transactions
  • No Concurrent Transactions
  • Read Intensive
  • select avg(balance) from accounts
  • Write Intensive, e.g. typical insert
  • insert into accounts values (690466,6840,2272.76)
  • Writes are uniformly distributed.

20
RAID Levels
  • SQL Server7 on Windows 2000 (SoftRAID means
    striping/parity at host)
  • Read-Intensive
  • Using multiple disks (RAID0, RAID 10, RAID5)
    increases throughput significantly.
  • Write-Intensive
  • Without cache, RAID 5 suffers. With cache, it is
    ok.

21
Which RAID Level to Use?
  • Log File
  • RAID 1 is appropriate
  • Fault tolerance with high write throughput.
    Writes are synchronous and sequential. No
    benefits in striping.
  • Temporary Files
  • RAID 0 is appropriate.
  • No fault tolerance. High throughput.
  • Data and Index Files
  • RAID 5 is best suited for read intensive apps or
    if the RAID controller cache is effective enough.
  • RAID 10 is best suited for write intensive apps.

22
Controller Prefecthing No, Write-back Yes
  • Read-ahead
  • Prefetching at the disk controller level.
  • No information on access pattern.
  • Better to let database management system do it.
  • Write-back vs. write through
  • Write back transfer terminated as soon as data
    is written to cache.
  • Batteries to guarantee write back in case of
    power failure
  • Write through transfer terminated as soon as
    data is written to disk.

23
SCSI Controller Cache - Data
  • Settings
  • employees(ssnum, name, lat, long, hundreds1,
  • hundreds2)
  • create clustered index c on employees(hundreds2)
  • Employees table partitioned over two disks Log
    on a separate disk same controller (same
    channel).
  • 200 000 rows per table
  • Database buffer size limited to 400 Mb.
  • Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID
    controller from Adaptec (80Mb), 4x18Gb drives
    (10000RPM), Windows 2000.

24
SCSI (not disk) Controller Cache - Transactions
  • No Concurrent Transactions
  • update employees set lat long, long lat where
    hundreds2 ?
  • cache friendly update of 20,000 rows (90Mb)
  • cache unfriendly update of 200,000 rows (900Mb)

25
SCSI Controller Cache
  • SQL Server 7 on Windows 2000.
  • Adaptec ServerRaid controller
  • 80 Mb RAM
  • Write-back mode
  • Updates
  • Controller cache increases throughput whether
    operation is cache friendly or not.
  • Efficient replacement policy!

26
Enhancing Hardware Config.
  • Add memory
  • Cheapest option to get a better performance
  • Can be used to enlarge DB buffer pool
  • Better hit ratio
  • If used for enlarge OS buffer (as disk cache), it
    benefits but to other apps as well
  • Add disks
  • Add processors

27
Add Disks
  • Larger disk ?better performance
  • Bottleneck is disk bandwidth
  • Add disks for
  • A dedicated disk for the log
  • Switch RAID5 to RAID10 for update-intensive apps
  • Move secondary indexes to another disk for
    write-intensive apps
  • Partition read-intensive tables across many disks
  • Consider intelligent disk systems
  • Automatics replication and load balancing

28
Add Processors
  • Function parallelism
  • Use different processors for different tasks
  • GUI, Query Optimisation, TTCC, different types
    of apps, different users
  • Operation pipelines
  • E.g., scan, sort, select, join
  • Easy for RO apps, hard for update apps
  • Data partition parallelism
  • Partition data, thus the operation on the data

29
Parallel Join Processing
  • Algorithm decompose and processing in parallel
  • T R S
  • Let f A ? (1..n) (a hash function)
  • R ?i1..n Ri, Ri r ? R f(r.A) i
  • S ?i1..n Si, Si s ? S f(s.A) i
  • T ?i1..n Ri Si
  • Issues
  • However, data distribution, task decomposition
    and load balancing are non-trivial

30
Parallelism
  • Some tasks are easier to be parallelised
  • E.g., scan, join, sum, min
  • Some tasks are not so easy
  • E.g., sorting, avg, nested-queries

31
Parallel DB Architectures
  • Shared memory
  • Tightly coupled, easy-to-use, but not scalable
    (bottlenecks when accessing shared memory and
    disks)
  • Shared nothing
  • A distributed with message-passing as the only
    communication mechanism
  • Highly scalable
  • Difficult for load distribution and balancing
  • Shared disk
  • A trade-off, but towards the shared-memory end

32
Summary
  • In this module, we have covered
  • The storage subsystem
  • RAID what are they and which one to use?
  • Memory, disks and processors
  • When to add what?
Write a Comment
User Comments (0)
About PowerShow.com