Linux Training : m. RAID

RAID

Redundant Array of Inexpensive Disks (or Really Awful In Datacenter) is a process to provide larger and/or more redundant and/or faster data storage than single drives alone.

  • Hardware RAID vs. Software RAID

    Hardware raid uses controller cards or system chipsets to manage the drive systems. When checksums have to be computed, hardware RAID is good. The better systems have battery backups on the cards for cache retention during a power fail event.
    Software RAID is extremely flexible at the (slight) cost of being a tiny bit slower. Drives from a software RAID array can be put into a new system and it "Just Works". There is no battery backed up cache so a power fail event requires battery backup to the entire system with auto-shutdown abilities.
  • RAID levels 0, 1, 5, 6, 10

    The different levels store the data in different ways and with different storage amounts. where n is the number of drives in the array.  The levels are distinct in their usage of the concepts of disk mirroring, data striping, and parity blocks.
    • 0 Striping across all drives in array. Storage size is n.
      • Very high speed read and write
      • Lose any one drive and lose all the data on all the drives (mostly - some recovery tools exist)
    • 1 Mirroring across pairs or drives. Storage size is n/2 (for mirror pairs. n/3 for triplets)
      • Very high speed read (can sequential blocks pull from multiple drives)
      • Writes are only as fast as the slowest drive
      • High data reliability - lose any one drive the other drives has the data.
      • Recovery times are decent and performance during recovery is reduced
    • 5 Striped with Parity - n-1 overall storage space
      • Parity calculation is overhead.
      • Write speed is slowest drive speed.
      • Read speed is (n-1)* slowest drive speed
      • Lose one drive = can recover
      • Lose 2 drives = data loss
      • Recovery time long (longer with software RAID) and performance can be very bad
    • 6 Striped with dual parity uses two parity drives. Storage size is n-2
      • Same as RAID 5 except speed loss of one drive countered by ability to lose 2 drives without data loss.
      • Recover time is better than RAID 5 and so is performance during recovery
    • 10 Mirror with striping. Pairs of drives in mirror and striped across mirror pairs. Storage size is still n/2
      • Write speed is still single drive speed
      • Read speed is very fast
      • Can lose 1 disk per mirror twin with no data loss
      • Recovery time is decent and performance during recovery is slightly impacted
  • Linux RAID tools

    mdadm is the manager tool for Linux Software RAID. The RAID wiki is the best place for full details on how it works and best practices.