Multiple Device Driver Linux Software RAID - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Multiple Device Driver Linux Software RAID

Description:

Chunk size. Power of two 4KB. A RAID assigns chunks to disks in a round robin fashion. Stripe ... chunk at each disk form a stripe. Parity. A chunk ... – PowerPoint PPT presentation

Number of Views:201
Avg rating:3.0/5.0
Slides: 34
Provided by: csF2
Category:

less

Transcript and Presenter's Notes

Title: Multiple Device Driver Linux Software RAID


1
Multiple Device Driver (Linux Software RAID)
  • Ted Baker ? Andy Wang
  • CIS 4930 / COP 5641

2
The md driver
  • Provides virtual devices
  • Created from one or more independent underlying
    devices
  • The basic mechanism to support RAIDs
  • Redundant arrays of inexpensive disks

3
Common RAID levels
  • RAID0
  • Striping
  • RAID1
  • Mirroring
  • RAID4 (gt 3 disks)
  • Striped array with a parity device
  • RAID5 (gt 3 disks)
  • Striped array with distributed parity
  • RAID6 (gt 4 disks)
  • Striped array with dual redundancy information

4
Common RAID levels
  • RAID10
  • Striped array of mirrored disks
  • RAID01
  • Mirroring two RAID0s
  • RAID50
  • Striped array of RAID5s
  • RAID51
  • Mirroring two RAID5s

5
md pseudo RAID configurations
  • Linear (catenates multiple disks into a single
    one)
  • Multipath
  • A set of different interfaces to the same device
    (e.g., multiple disk controllers)
  • Faulty
  • A layer over a single device into which errors
    can be injected

6
RAID Creation
  • gt mdadm --create /dev/md0 --level1
    --raid-devices2 /dev/hdac1
  • Create /dev/md0 as RAID1
  • Consisting of /dev/hda1 and /dev/hdc1

7
RAID Status
  • To check the status for RAIDs
  • See /proc/mdstat
  • Personalities raid1
  • md0 active raid1 sda50 sdb51
  • 979840 blocks 2/2 UU
  • md1 active raid1 sda62 sdb61
  • 159661888 blocks 2/1 _U
  • gt................. recovery 17.9
  • (28697920/159661888) finish56.4min
    speed38656K/sec
  • unused devices ltnonegt

8
md Super Block
  • Each device in a RAID may have a superblock with
    various information
  • Level
  • UUID
  • 128 bit identifier that identifies an array

9
Some RAID Concepts
  • Personality
  • RAID level
  • Chunk size
  • Power of two gt 4KB
  • A RAID assigns chunks to disks in a round robin
    fashion
  • Stripe
  • A collection of ith chunk at each disk form a
    stripe
  • Parity
  • A chunk constructed via XORing other chunks

10
Synchrony
  • An update may involve both the data block and the
    parity block
  • Implications
  • A RAID may be shut down in an inconsistency state
  • Resynchronization may be required at startup, in
    the background
  • Reduced performance

11
Recovery
  • If the md driver detects a write error, it
    immediately disables that device
  • Continues operation on the remaining devices
  • Starts recreating the content if there is a spare
    drive

12
Recovery
  • If the md driver detects a read error
  • Overwrites the bad block
  • Read the block again
  • If fails, treat it as a write error
  • Recovery is a background process
  • Can be configured via
  • /proc/sys/dev/raid/speed_limit_min
  • /proc/sys/dev/raid/speed_limit_max

13
Bitmap Write-Intent Logging
  • Records which blocks of the array may be out of
    sync
  • Speeds up resynchronization
  • Allows a disk to be temporarily removed and
    reinserted without causing an enormous recovery
    cost
  • Can spin down disks for power savings

14
Bitmap Write-Intent Logging
  • Can be stored on a separate device

15
Write-Behind
  • Certain devices in the array can be flagged as
    write-mostly
  • md will not wait for writes to write-behind
    devices to complete before returning to the file
    system

16
Restriping (Reshaping)
  • Change the number of disks
  • Change the RAID levels
  • Not robust against failures

17
Faulty.c
  • static int __init raid_init(void)
  • return register_md_personality(faulty_personali
    ty)
  • static void raid_exit(void)
  • unregister_md_personality(faulty_personality)
  • module_init(raid_init)
  • module_exit(raid_exit)

18
Faulty.c
  • static struct mdk_personality faulty_personality
  • .name "faulty",
  • .level LEVEL_FAULTY,
  • .owner THIS_MODULE,
  • .make_request make_request,
  • .run run,
  • .stop stop,
  • .status status,
  • .reconfig reconfig,

19
Faulty.c
typedef struct faulty_conf int
periodModes atomic_t countersModes
sector_t faultsMaxFault int
modesMaxFault int nfaults mdk_rdev_t
rdev conf_t
  • static int run(mddev_t mddev)
  • mdk_rdev_t rdev
  • struct list_head tmp
  • int i
  • conf_t conf kmalloc(sizeof(conf),
    GFP_KERNEL)
  • .../ zero out conf /
  • ITERATE_RDEV(mddev, rdev, tmp) conf-gtrdev
    rdev
  • mddev-gtarray_size mddev-gtsize
  • mddev-gtprivate conf
  • reconfig(mddev, mddev-gtlayout, -1)
  • return 0

20
Faulty.c
  • static int reconfig(mddev_t mddev, int layout,
  • int chunk_size)
  • int mode layout ModeMask
  • int count layout gtgt ModeShift
  • conf_t conf mddev-gtprivate
  • .../ error checks /
  • if (mode / clear something /)
  • / clear various counters /
  • else if (mode lt Modes)
  • conf-gtperiodmode count
  • if (!count) count
  • atomic_set(conf-gtcountersmode, count)
  • else ...
  • return 0

21
Faulty.c
  • static int stop(mddev_t mddev)
  • conf_t conf (conf_t )mddev-gtprivate
  • kfree(conf)
  • mddev-gtprivate NULL
  • return 0

22
Faulty.c
  • static int make_request(request_queue_t q,
    struct bio bio)
  • mddev_t mddev q-gtqueuedata
  • conf_t conf (conf_t)mddev-gtprivate
  • int failit 0
  • if (bio_data_dir(bio) WRITE) / data
    direction /
  • .../ misc cases /
  • if (check_sector(conf, bio-gtbi_sector,
    bio-gtbi_sector
  • (bio-gtbi_size gtgt 9), WRITE))
  • failit 1 / if a sector failed before,
    fail again /
  • if (check_mode(conf, WritePersistent))
  • / if the period is reached for a sector,
    record the
  • sector and fail it /
  • add_sector(conf, bio-gtbi_sector,
    WritePersistent)
  • failit 1
  • ...

23
Faulty.c
  • else / failure cases for reads /
  • ...
  • if (failit)
  • struct bio b bio_clone(bio, GFP_NOIO)
  • b-gtbi_bdev conf-gtrdev-gtbdev
  • b-gtbi_private bio
  • b-gtbi_end_io faulty_fail
  • generic_make_request(b)
  • return 0
  • else
  • bio-gtbi_bdev conf-gtrdev-gtbdev
  • return 1

To the queue of this device, initialized in md.c
from the disk device inode
Let the main block layer submit the IO and
resolve the recursion
24
ll_rw_blk.c
  • A file system eventually calls generic_make_reques
    t()
  • void generic_make_request(struct bio bio)
  • ...
  • do
  • q bdev_get_queue(bio-gtbi_bdev)
  • .../ check errors /
  • ret q-gtmake_request_fn(q, bio)
  • while (ret)

25
Faulty.c
  • static int faulty_fail(struct bio bio,
  • unsigned int bytes_done,
    int error)
  • struct bio b bio-gtbi_private
  • b-gtbi_size bio-gtbi_size
  • b-gtbi_sector bio-gtbi_sector
  • if (bio-gtbi_size 0)
  • bio_put(bio)
  • clear_bit(BIO_UPTODATE, b-gtbi_flags)
  • return (b-gtbi_end_io)(b, bytes_done, -EIO)

26
Linear.c
  • static int __init linear_init(void)
  • return register_md_personality(linear_personali
    ty)
  • static void linear_exit (void)
  • unregister_md_personality(linear_personality)
  • module_init(linear_init)
  • module_exit(linear_exit)

27
Linear.c
  • static struct mdk_personality linear_personality
  • .name "linear",
  • .level LEVEL_LINEAR,
  • .owner THIS_MODULE,
  • .make_request linear_make_request,
  • .run linear_run,
  • .stop linear_stop,
  • .status linear_status, / for proc /
  • .hot_add_disk linear_add,

28
Linear.c
  • static int linear_run(mddev_t mddev)
  • linear_conf_t conf
  • / initialize
  • conf linear_conf(mddev, mddev-gtraid_disks)
  • if (!conf) return 1
  • mddev-gtprivate conf
  • mddev-gtarray_size conf-gtarray_size / in
    bytes /
  • ...

typedef struct linear_private_data struct
linear_private_data prev dev_info_t
hash_table / to track disk boundaries /
sector_t hash_spacing sector_t array_size
int preshift dev_info_t disks0
linear_conf_t
29
Linear.c
  • ...
  • / determines whether two bio can be merged /
  • / overrides the default merge_bvec function /
  • blk_queue_merge_bvec(mddev-gtqueue,
    linear_mergeable_bvec)
  • / queues are first plugged to build up the
    queue length, then unplugged to release requests
    to devices /
  • mddev-gtqueue-gtunplug_fn linear_unplug
  • mddev-gtqueue-gtissue_flush_fn
    linear_issue_flush
  • / disable prefetching when the device is
    congested /
  • mddev-gtqueue-gtbacking_dev_info.congested_fn
  • linear_congested
  • mddev-gtqueue-gtbacking_dev_info.congested_data
    mddev
  • return 0

30
Linear.c
  • static int linear_stop(mddev_t mddev)
  • linear_conf_t conf mddev_to_conf(mddev)
  • / the unplug fn references 'conf /
  • blk_sync_queue(mddev-gtqueue)
  • do
  • linear_conf_t t conf-gtprev
  • kfree(conf-gthash_table)
  • kfree(conf)
  • conf t
  • while (conf)
  • return 0

31
Linear.c
  • static int linear_make_request(request_queue_t
    q,
  • struct bio bio)
  • const int rw bio_data_dir(bio)
  • mddev_t mddev q-gtqueuedata
  • dev_info_t tmp_dev
  • sector_t block
  • .../ check for errors and update statistis /
  • tmp_dev which_dev(mddev, bio-gtbi_sector)
  • block bio-gtbi_sector gtgt 1
  • .../ more error checks /

32
Linear.c
  • if (unlikely(bio-gtbi_sector (bio-gtbi_size gtgt
    9) gt
  • (tmp_dev-gtoffset tmp_dev-gtsize)
    ltlt 1))
  • / This bio crosses a device boundary, so we
    have to
  • split it. /
  • struct bio_pair bp
  • bp bio_split(bio, bio_split_pool,
  • ((tmp_dev-gtoffset
    tmp_dev-gtsize) ltlt 1)
  • - bio-gtbi_sector)
  • if (linear_make_request(q, bp-gtbio1)) /
    recursion!? /
  • generic_make_request(bp-gtbio1)
  • if (linear_make_request(q, bp-gtbio2)) /
    recursion! /
  • generic_make_request(bp-gtbio2)
  • bio_pair_release(bp) / remove bio hazard /
  • return 0

33
Linear.c
Points to the specific device
  • bio-gtbi_bdev tmp_dev-gtrdev-gtbdev
  • bio-gtbi_sector bio-gtbi_sector -
    (tmp_dev-gtoffset ltlt 1)
  • tmp_dev-gtrdev-gtdata_offset
  • return 1

Translates the virtual sector number to the
physical sector number for the specific device
Again, let the main block layer submit the IO and
resolve the recursion
Write a Comment
User Comments (0)
About PowerShow.com