Painless Hardware Upgrades with Cisco UCS

 

So, one of the huge advantages of Cisco UCS is its approach to “statelessness”. If you are not familiar with this concept, just know that anything that ties an operating system, hypervisor, or application to a specific piece of hardware is considered “stateful” and is not desirable in datacenter servers. Using this methodology, Cisco has made the upgrade path extremely easy for a customer to upgrade from one server model to the next without having to re-install anything. To be more specific, I upgraded various operating systems and hypervisors that were running in a service profile assigned to a B200 M2 and moved the profile to a brand new architecture of a B200 M3. The UCS portion of this migration is really (really) easy – you simply associate the profile from an M2 and assign it to an M3. The OS or hypervisor takes care of the rest. This article will cover the details of how this migration worked and what steps I took to make sure it was a success. Disclaimer: everything you’re about to read is totally unsupported by Cisco TAC. As a company, we have not tested nor certified this process. I am simply reporting here what I, myself, have tested and seen work. So don’t call Cisco if this doesn’t work. Feel free to leave a comment and I’ll look into it when I can find time.

I tested of few different installations of various hypervisors and OS’s throughout this process and hope to test more as I have time. These are my results thus far:

NOTE: All migration testing was done booting from XIO Emprise 5000 FC SAN using a Cisco VIC CNA.

Environment:

B200 M2
Palo (2.0.2q)

B200 M3

VIC1240 (2.0.2q) – A 1280 should work identically as they are based on the same ASIC

 

UCS Manager

2.0.2R
I installed each of the below onto the M2 and then migrated it to the M3. I was mainly looking for the instance to still boot after migrating. I did not test every aspect of the installation once it was on the M3. For instance, if you added or removed vNICs or vHBAs and caused the PCI bus to re-enumerate, you may have some cleanup to do. I was simply verifying that you could actually get to the point of being able to clean up at all.

  1. VMware ESXi 4.1 Build 348481 migrated without issue. I was running Fnic version 1.1.0.113.2-4vmw (according to vmkload_mod –s fnic)
  2. VMware ESXi 5.0 Build 441354 migrated without issue. I was running Fnic version 1.2.0.3-1vmw (according to vmkload_mod –s fnic)
  3. Windows Server 2008 R2 migrated without issue in one test I ran, but had the dreaded STOP 0x7B in separate (different) attempt. It depends on some factors that we’ll cover in a minute. If you just keeping score, mark one more down for UCS, provided you installed Windows 2008 using Cisco provided Palo drivers version 2.0 or later. If you installed using a version of the drivers prior to 2.0, you will likely see the STOP 0x7B trap screen. The good news is you can still get it to work, but it requires some manual input on your part. This is the technical part so you can skip this paragraph if you are not in this situation.

There are certain types of devices that Windows needs to install drivers for prior to the time that the normal plug-n-play process is available. Disk Drive controllers would fall into this category (as do the various bridges that sometimes lead to disk controllers). Because Windows is bypassing the plug-n-play manager, it directly starts these drivers. The portion of the registry that stores all of this is known as the Critical Device Database (CDDB). You can locate it at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase. When Windows is starting up, it loads the drivers in the CDDB. When it’s done, it should have access to the boot volume because the boot volume’s controller driver would be in the CDDB. But what if it isn’t? Well, in that case you get STOP 0x7B. Windows is trying to tell you that it has loaded all drivers it has and that the boot volume is still not accessible.

On a normal computer, this is quite unpleasant because you can’t always get back into Windows to fix it. If it happens due to a BIOS upgrade, it’s likely that the bridge was updated and you need a new driver prior to the BIOS upgrade. Check out this blog for more detailed info on the CDDB if you are interested. Luckily, with UCS, you can roll the profile back to an M2 with Palo and boot it right back up. Once you are back into Windows, you can fix this problem pretty easily. Here is the problem and solution:

When you install the OS from scratch and feed it the 2.0 or higher drivers during installation on the M2 with Palo, it builds the following tree in the CDDB:

However, if you installed with an earlier version of the drivers, the CDDB tree is not built the same and it does not create the entries for the 12×0/mlom and it looks like this:

This behavior is expected of course because the VIC 1240/1280 did not exist at that time you used those older drivers. However, the new Palo (M81KR) drivers contain all the CDDB tree info for both old and new VIC cards. To fix the problem, all you have to do is:

  1. Boot windows on the M2.
  2. Select (highlight) HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CriticalDeviceDatabase\PCI#VEN_1137&DEV_0045&SUBSYS_00481137
  3. Click File->Export and export this key to a file.
  4. Then open the reg file in notepad.
  5. Change the string 00481137 to 00841137 (you are just swapping the “4” and the “8”.
  6. Save the changes and close the file.
  7. Double click the reg file to import it into the registry.
  8. Shut down windows and move the profile to the M3.

I would not suggest editing the string in the registry directly as this would overwrite the only working string you have. Last Known Good would help here, and will also help if you mess anything up while doing the above steps. Just remember, Last Known Good is only valid until you successfully login.

I hope this article is helpful to you. Let me know if anything isn’t clear.

Side note: If you want more information on statelessness and how it benefits you, see this blog article. But briefly, take a NIC, for example, that typically has a MAC address burned in at the factory. If you install an application and it licenses itself to the MAC address you will find it painful to replace the NIC if it fails because the MAC address will change. Cisco UCS makes this simple by creating a portable wrapper for the application (and OS) to run in that contains the MAC address (called the Service Profile). The Service Profile contains a lot more than the MAC address (over 100 items of server identity can be stored in it) and because it’s portable, I can move it from physical server 1 to physical server 2 and the application sees no changes. Think of it as virtual machines technology for the physical side of the datacenter. This capability is not unique to Cisco (HP, Dell, and IBM all have some aspect of it), but only UCS offers you complete control over every desired aspect of server identity.

 

-Jeff

 

7 thoughts on “Painless Hardware Upgrades with Cisco UCS

  1. Hey! I know this is somewhat off topic but I was wondering if you knew where I could get a captcha plugin for my comment form? I’m using the same blog platform as yours and I’m having difficulty finding one? Thanks a lot!

  2. Hi Jeff

    Thanks for the info, but I have one question.

    Currently I have Cisco UCS B200 M1 with M71KR CNA installed and configured with 2 vHBA and 2 vNIC along with Boot from SAN loaded with Windows server 2008 OS.

    We have planned to associate this service profile to a newer version of the blade which is B200 M3 with mLOM 1240 installed on it.

    But clarification, will we able to load the OS without any additional driver installed and registry modification ?
    Also, what i was planning to do is, install mLOM 1240 drivers on the existing server OS while it is associated with M1 server and once done planned to remove association with M1 and associate with M3 assuming it works, please advise ?

    Thanks
    Raghu

    • You’re going to have an issue because you are moving HBA’s from Emulex or Qlogic to Cisco. Windows won’t let you easily add a driver for a device that isn’t present. There is a way, but it will not add the device the Windows Critical Device Database (CDDB) in the registry which is needed to boot off of. It can be done, but it requires you manually adding registry entries and copying files to the right place. It’s not a huge deal – I’ve done it several times with different vendors over the years (so if I can do it – ANYONE can!). However, if you have never done it, it could be quite frustrating.

  3. GREAT BIG THX Jeff, this saved me from looking dumb and made my customer very happy. My customer wanted to do a hardware upgrade by moving a profile for a Win2k8R2 SQL Server from a B200 M2 to a B420 M3. I asked Cisco and they said “it should work”, then it didn’t and Cisco said “not supported” but no reason why. After reading your post I was able to get it working and save many hours building new SQL Servers, migrating the database, rebuilding SQL replication, renaming and re-IPing. What would have taken days, probably weeks only took a couple of hours.

  4. This would have been useful when we attempted this unsuccessfully last fall. Unfortunately TAC was not as helpful as your article is. Currently I am attempting a M2 to M3 migration booting from the local LSI adapter and running into the same roadblocks. Apparently there is enough incompatibility between the two and TAC is again of minimal or no help here.

Leave a Reply

Your email address will not be published. Required fields are marked *