I woke up this morning to find 2 disks in my RAID5 array had been marked as failed!

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
md1 : active raid5 sda1[0] sdd1[4](F) sdc1[2](F) sdb1[1]
  2930276352 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/2] [UU__]

Jun 12 01:00:59 cube kernel: [12609415.780056] ata4: lost interrupt (Status 0x50)
Jun 12 01:00:59 cube kernel: [12609415.843792] end_request: I/O error, dev sdd, sector 71
Jun 12 01:00:59 cube kernel: [12609415.843831] md: super_written gets error=-5, uptodate=0
Jun 12 01:00:59 cube kernel: [12609415.843840] raid5: Disk failure on sdd1, disabling device.
Jun 12 01:00:59 cube kernel: [12609415.843843] raid5: Operation continuing on 3 devices.
Jun 12 01:21:52 cube kernel: [12610668.816042] ata4: lost interrupt (Status 0x50)
Jun 12 01:21:52 cube kernel: [12610668.816084] end_request: I/O error, dev sdc, sector 71
Jun 12 01:21:52 cube kernel: [12610668.816121] md: super_written gets error=-5, uptodate=0
Jun 12 01:21:52 cube kernel: [12610668.816130] raid5: Disk failure on sdc1, disabling device.
Jun 12 01:21:52 cube kernel: [12610668.816134] raid5: Operation continuing on 2 devices.

RAID5 can only continue with 1 failed disk, therefore the array at this point was unusable.
A short SMART test on the disks do not show any problems and I can happily read the mdadm metadata off the disks, which makes me believe it was a SATA controller blip that made mdadm mark the disks as faulty. The result is I have 4 disks in 3 unique states:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
/dev/sda1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
      Array UUID : b11307f6:64ac80f8:87328348:bdf9ddc4
            Name : cube:0
  Creation Time : Thu Dec 22 20:18:43 2011
      Raid Level : raid5
    Raid Devices : 4

  Avail Dev Size : 1953517954 (931.51 GiB 1000.20 GB)
      Array Size : 5860552704 (2794.53 GiB 3000.60 GB)
  Used Dev Size : 1953517568 (931.51 GiB 1000.20 GB)
    Data Offset : 2048 sectors
    Super Offset : 8 sectors
          State : clean
    Device UUID : 59119dcd:719c04cb:9d5d0373:b6e7bc90

    Update Time : Tue Jun 12 09:20:31 2012
        Checksum : 2e841bd5 - correct
          Events : 3128

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 0
    Array State : AA.. ('A' == active, '.' == missing)
/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
      Array UUID : b11307f6:64ac80f8:87328348:bdf9ddc4
            Name : cube:0
  Creation Time : Thu Dec 22 20:18:43 2011
      Raid Level : raid5
    Raid Devices : 4

  Avail Dev Size : 1953517954 (931.51 GiB 1000.20 GB)
      Array Size : 5860552704 (2794.53 GiB 3000.60 GB)
  Used Dev Size : 1953517568 (931.51 GiB 1000.20 GB)
    Data Offset : 2048 sectors
    Super Offset : 8 sectors
          State : clean
    Device UUID : fdefac50:23e46ed9:efe52d06:7a05d375

    Update Time : Tue Jun 12 09:20:31 2012
        Checksum : 383ee841 - correct
          Events : 3128

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 1
    Array State : AA.. ('A' == active, '.' == missing)

/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
      Array UUID : b11307f6:64ac80f8:87328348:bdf9ddc4
            Name : cube:0
  Creation Time : Thu Dec 22 20:18:43 2011
      Raid Level : raid5
    Raid Devices : 4

  Avail Dev Size : 1953517954 (931.51 GiB 1000.20 GB)
      Array Size : 5860552704 (2794.53 GiB 3000.60 GB)
  Used Dev Size : 1953517568 (931.51 GiB 1000.20 GB)
    Data Offset : 2048 sectors
    Super Offset : 8 sectors
          State : active
    Device UUID : 72b8f327:1f411a85:87144199:4497444c

    Update Time : Tue Jun 12 01:21:22 2012
        Checksum : 24b45db9 - correct
          Events : 3109

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 2
    Array State : AAA. ('A' == active, '.' == missing)
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
      Array UUID : b11307f6:64ac80f8:87328348:bdf9ddc4
            Name : cube:0
  Creation Time : Thu Dec 22 20:18:43 2011
      Raid Level : raid5
    Raid Devices : 4

  Avail Dev Size : 1953517954 (931.51 GiB 1000.20 GB)
      Array Size : 5860552704 (2794.53 GiB 3000.60 GB)
  Used Dev Size : 1953517568 (931.51 GiB 1000.20 GB)
    Data Offset : 2048 sectors
    Super Offset : 8 sectors
          State : clean
    Device UUID : 5cd317fe:018dead2:939ee134:ec97baba

    Update Time : Tue Jun 12 01:00:29 2012
        Checksum : 52be4913 - correct
          Events : 2780

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 3
    Array State : AAAA ('A' == active, '.' == missing)

Attempting to reassemble the array doesn’t work out the box. Instead it’s marking all the disks as spare:

1
2
3
4
5
6
7
# mdadm --assemble  /dev/md1 --uuid=b11307f6:64ac80f8:87328348:bdf9ddc4

Personalities : [raid6] [raid5] [raid4]
md1 : inactive sda1[0](S) sdd1[4](S) sdc1[2](S) sdb1[1](S)
      3907035908 blocks super 1.2

unused devices: 

Not sure where to go from here….