Bad RAID controller


If you have a UCS chassis where the VMs go offline and you can’t access the VMs with vSphere, you may have to reboot the chassis via the IMC web interface.  But, when you reboot it and watch it via the IP KVM console, this is what you don’t want to see:

Version 3.20.00 (Build November 19, 2010)
Copyright(c) 2010 LSI Corporation
HA -0 (Bus 14 Dev 0) LSI MegaRAID SAS 9261-8i
FW package: 12.12.0-0038
Multibit ECC errors were detected on the RAID controller.
The DIMM on the controller needs replacement.
Please contact technical support to resolve this issue.
If you continue, data corruption can occur.
Press 'X' to continue or else power off the system and replace the
DIMM module and reboot. If you have replaced the DIMM press 'X' to continue.
OK, so get TAC on the phone and have them send you the replacement controller.  Hopefully, you have 4 hour response.  Here is the link to replace the RAID controller in the UCS chassis:
After replacing the RAID controller, you will want to make sure to copy the drive configuration to the newly replaced controller.  This was not documented in the card replacement documentation.  Here is the link to do this.  Follow the instructions EXACTLY:
Verify the virtual devices show up on the RAID controller during bootup.  Things still might go south.  The vSphere client may show CUCM and UCXN as missing.  This is because the datastores are not there.  TAC will need to re-add the datastores for you, possibly via CLI.  More than likely, the file system will have errors on the VMs.  It may be so bad that the won’t boot, so you will need to mount the Recovery CD, boot to it, then choose the option to fix the file system errors.  After that, grab a beer. The servers should come up just fine.
see ->