When a hard disk is faulty, a few things will happen:
- The Raidix software will send an email alert. (If you find a faulty disk in the GUI but you didn't receive any email alert then check the internet connection of the server and the notifications configuration in Raidix. It is very important to receive those alerts...)
- The disk will be shown as "Faulty" in the Raidix interface (drives section), with a red icon on it (not orange, which means a different type of warning. Disks in orange state should not be replaced without analysing the cause and the disk health, contact support if neccesary).
Note that the interface will also show the serial number of the faulty disk, which you will need to confirm that you replace the right disk and to start the RMA process.
- In the Raidix GUI, the raid will be shown in orange color and labeled as "Degraded / Online". (the raid, not the faulty disk which is shown in red). A "degraded / online" status means that the raid still works, but the parity protection is degraded. You need to replace the faulty disk, which will "rebuild" the parity information using the new disk for full protection.
Note that in raid-6, if there are problems with more than 2 disks at the same time then the whole raid will become red in the GUI, and it will be labeled as "Degraded / Offline". This means that the raid will not work anymore. If there are really 3 faulty disks all the data will be lost. And even if they are not faulty, if you replace the wrong disk it will count as an additional disk failure, so please make sure to follow instructions below for replacing the disk safely.
To remove a Faulty disk:
Activate it's LED icon in the Raidix interface (the icon will start blinking), and check that no other disk has the LED icon activated (to avoid confusions).
Then go to the enclosure to locate the disk. It must be a blue LED on the faulty disk, and it must be blinking one time per second or so. (it should be the only LED light doing that). Ignore red lights and other "solid" lights. The LEDs used to locate disks must be blinking blue.
For example, an important thing is not be confused with potential red lights activated by the enclosure itself. In some enclosure models it can trigger its own red lights in one or more disks as a result of previous events such as a system crash, or a controller reset while reading from other disks (disks different than the faulty disk that caused the controller reset in the first place...), and so on. Those type of red lights usually stay solid red until the enclosure receives a power cycle, and should not be confused with the identification LED that we activate in the Raidix interface to identify a faulty disk, which is always a blue light in the enclosure blinking one time per second.
For that reasosns, before removing the disk we recommend to double check that you have located the right disk in the enclosure, by doing this:
- Go to the enclosure and locate the disk with the blinking blue light, and write down the slot position (or take a picture) to compare later.
- Now in the raidix Interface, deactivate the LED icon of the faulty disk. Go back to the enclosure and check that the light has stopped blinking.
- Now reactivate the LED of the faulty disk in the Raidix interface, and check that the expected LED has started blinking again. In this way you can be sure you are going to remove the right disk.
- Pull the disk lever and remove it. Once removed, check that the serial number of the disk matches the serial number that was shown in the Raidix interface. Do not insert a new disk until you have checked the serial number. (If you realise that you have made a mistake put the original disk back inmediately and contact support, never install a new disk in the wrong slot because doing that will start a rebuild while having one disk faulty and one disk "left over" wrongly removed (two disk failures), which is only one step for disaster...
Alternatively, another way to replace a disk in safe way is to power off the server, which permits to remove all the disks to see the serial numbers as needed. But logically it will cause a downtime for user access, while a hot swap replacement does not produce a downtime.
Inserting the new disk for replacement:
Once the faulty disk has been removed and its serial number confirmed, insert the new disk.
In the Raidix GUI, check that the rebuild process starts. It may take up to a couple of minutes to start (in the drives section, the replaced disk will show a "rebuilding" label including the progress percentage. If it doesn't start, check the spare disk policies in the Raidix interface (or contact support if not sure)
Depending on the disk size and usage the disk can take many hours to rebuild (typically a full day for 8TB disks).
In the raidix interface you can also control the reubild priority in the advanced options of the raid menu. In general a good practice is to set 50% priority, so that you dedicate the other 50% performance resources for working. You can modify these values depending on your needs, but make sure that the rebuild completes in reasonable time (no more than 2 days)
Make sure to process the RMA for the faulty disk as soons as possible. It is highly recommended to always have spare disks at hand so that you do not need to wait for RMA processes to complete.
And please remember: Raid protection is not a backup. It just makes the storage hardware available for more time, but it can still fail (and will fail when given enough time). You always need to have independent backups of critical files.