Boot from SAN 101 with Cisco UCS

So my Dad is selling Viagra now – or at least that’s what his latest email claimed. Thinking this was quite a shift from his real estate job, I called him to inquire. It turns out that while dad was sleeping, his yahoo email had been working overtime to provide this medical benefit to everyone in his address book. While I find the emails a minor nuisance, the job of helping him remove the malware from his PC is what I really could do without and it happens time and again.

I’m sure this sounds all too familiar to most of you. After all, you’re the “computer person” in the family, so it’s either your dad that fell for the “You’re infected – Click Here now to clean your system!” trick, or it’s your sister’s roommate that doesn’t understand why her Facebook keeps posting links to inappropriate sites on everyone’s wall. When I come across things like this, I’m the type of guy who wants to understand not just how to fix it, but to understand how the infection works and what makes it spread. I do this so that the next time I see the problem, I completely understand it and don’t have to submit to the drastic measure of re-installing the operating system and/or formatting the drive like PC repair shops do. Doing this provides me with a simple solution that can be done faster and easier next time, and I won’t be forced to repeat menial tasks like reloading the system over and over again every time it comes up. I take this same approach in the data center when I see a repetitive task on the horizon. Today, I want to help you understand how Boot from SAN works in a legacy blade environment and how UCS makes the task 100 times faster (literally).

I should let you know right up front that I’m not a storage guru, but I’ve setup servers to Boot from SAN on so many different arrays now that I’ve lost track. This should be of some comfort if you also don’t know storage that well – meaning that this isn’t rocket science (which is probably equally easy since it’s just fuel and a match, right?). And while most of the relevance in this discussion surrounds Boot from SAN, I could make a case for this to be just as useful to any SAN admin who wants to allocate storage prior to the OS being installed on the local HDD. Sometimes it just comes down to timing and schedules and for those situations, this information will prove valuable.

Now, for those who don’t boot from SAN today, I’ll review briefly the general tasks that are normally required for legacy servers and blades to do so.

  1. Server Tasks
    1. Power on the server.
    2. Press the appropriate hot key to enter the server BIOS/EFI and change the Boot Order to boot from the server’s SAN storage controller. Note: HP calls this “Boot Controller Order” which is different than the standard Boot Order.
    3. Save settings and reboot.
    4. Again, press the appropriate hot key during POST when the HBA option ROM is loading (CTRL+Q for Qlogic and ALT+E for Emulex). Entering this utility forces the HBA to login to the upstream switch. Leave it at the intro screen.
  2. Switch Tasks
    1. Zone the server WWPN to a zone that includes the storage array controller’s WWPN.
    2. Zone the second fabric switch as well. Note: For some operating systems (Windows for sure), you need to zone just a single path during OS installation so consider this step optional.
  3. Array Tasks
    1. On the array, create a LUN and allow the server WWPNs to have access to the LUN.
    2. Present the LUN to the host using a desired LUN number (typically zero, but this step is optional and not available on all array models)

The steps are done in this order for a reason. Without the Server Tasks section, the HBA’s won’t be logged in so that zoning can occur. Without the Switch Tasks, the array can’t see the HBAs so the Array Tasks won’t work. At this point you can reboot the server and install the OS right onto the SAN LUN. To the OS, the LUN is seen just like a local disk – pretty cool! Well, it’s pretty cool until you have a bunch of servers to repeat this on. When deploying a few servers, most people wouldn’t mind repeating a task or two on each server in order to get the job done. Sometimes it’s just faster to do this than to try to automate if you can see the light at the end of the tunnel (and it isn’t an oncoming train). But what if the task is beyond “just a few servers”….like 100 or more? That’s a lot of servers you need to catch during POST to force them to login. I’ve used every major vendor’s Remote Console out there and none of them are perfect (yes, Cisco’s CIMC too). That’s at least 200 reboots to attend to. Following the instructions below will show you how to force any number of Cisco UCS blades to all login at one time with no need to ever enter the HBA BIOS utility or Server BIOS. I need to get one item out of the way before we proceed – pre-provisioning. Some might argue that the servers don’t need to login to the fabric in order to complete these steps because, in UCS, all the WWPNs are already known ahead of time – before the blades are even purchased. While this is true, and certainly possible, it just hasn’t been my personal experience that storage people like to do it that way. Feel free to comment otherwise.

Using UCS, the tasks above change just slightly. But instead of repeating the 3 basic task sets 100 times, we do the task list below just once.

  1. UCS Manager Tasks
    1. Create a Service Profile Template with x number of vHBAs.
    2. Create a Boot Policy that includes SAN Boot as the first device and link it to the Template
    3. Create x number of Service Profiles from the Template
    4. Use Server Pools, or associate servers to the profiles
    5. Let all servers attempt to boot and sit at the “Non-System Disk” style message that UCS servers return
  2. Switch Tasks
    1. Zone the server WWPN to a zone that includes the storage array controller’s WWPN.
    2. Zone the second fabric switch as well. Note: For some operating systems (Windows for sure), you need to zone just a single path during OS installation so consider this step optional.
  3. Array Tasks
    1. On the array, create a LUN and allow the server WWPNs to have access to the LUN.
    2. Present the LUN to the host using a desired LUN number (typically zero, but this step is optional and not available on all array models)

Remember, the above tasks are done one time only and all 100 servers (or however many you have) will login to the SAN switch. As you are probably aware, Cisco UCS manages SAN, LAN, and the Servers from one “unified” console. What makes this task (and lots of others) so easy can be traced back to the concept of Service Profiles and Templates. UCS manages the server facet by separating the server’s identity (MAC, UUID, WWNN, WWPN, VLANs, BIOS Settings, Firmware, etc) from the physical hardware itself. I’m not covering Service Profiles in this article, but you can read up on them on Sean’s blog here: http://www.mseanmcgee.com/2010/04/the-state-of-statelessness-cisco-ucs-vs-hp-virtual-connect/. Part of the server’s identity is the BIOS settings and boot order. Additionally, because any number of Service Profiles can be instantly created from and linked to a Template, I can create 100 or more “servers” and manage everything about them from one template. If I add a vlan to the template, I just added a vlan to all servers using the template. If I change the boot order of the template, I change the boot order of all servers. And yes, you can unlink profiles from templates ((and link them back) if you don’t want this behavior at some point. Because I assume some of you may want to go and try this, I am including the actual step-by-step below. If you have any questions, please let me know.

So let’s take a look at the steps required to get this trick to work.

  1. Create a Service Profile Template.
    1. Choose “Updating” template type from the first screen in the wizard. An updating template allows us to make changes in just one place, to the template itself, when we want to change multiple servers built using that template.

  1. Create the Boot Policy
    1. While creating the template, choose the option to “Create Boot Policy”

  1. Add only SAN Boot to the policy (I always name my vHBA’s “fc0″ and “fc1″). In this example, I am only adding a Primary path. A Secondary can also be created if you wish to have both vHBA’s login via this method.

  1. Add a SAN Boot Target to the Primary vHBA (fc0)

  1. Use ANY valid WWPN as the Boot Target WWPN. Since the purpose of this policy is to simply force the HBA to login, you can use one from your actual array or just make one up. I always use 20:00:00:00:00:00:00:00

  1. Finish the Policy and Template
    1. Save the Boot Policy and be sure and select it from the list before you continue with the rest of the Service Profile Template creation. Finish the Template at this point.
  2. Create Profiles
    1. With the template highlighted, click “Create Service Profile from Template” and create the desired number of Service Profiles from the Template. Use names that make sense to you for this deployment. For 100 servers, you probably want to use Server Pools to automate the Service Profile association to physical hardware (I intend to cover Server Pools as future topic). Regardless, once the Service Profiles are associated to physical servers, we can proceed to the next step.
  3. Boot the Servers
    1. By default, this will happen for you automatically. The servers will power on and try to boot off the array WWPN you designated as a target which obviously won’t actually boot the server, but it will accomplish our primary goal of the server vHBA successfully logging in to the SAN Fabric! That’s right, we now have 100 servers logged into the SAN Fabric and normal switch zoning can be done. Here’s an example of one successful login. First, look in the Service Profile to see the assign WWPN for the server in question. In our example, it’s 20:00:00:25:B5:00:A1:BF as shown below.

  1. Verify Successful Login
    1. So now that we know what we’re looking for, let’s go see if it shows up in the local switch’s FLOGI Database. A simple “show flogi database” command at the switch produces the WWPNs for all vlans.

Bingo – there it is! Along with the other 100 vHBAs I needed to zone.

Once zoning is complete, simply change the Boot Policy in the template to point to the actual WWPN of the array. You can edit the Policy or just create a new one. This brings up another huge UCS advantage. If you can easily change the boot target for all of the servers, that means you can do some pretty cool Disaster Recovery and/or SAN upgrades. Imagine if your datacenter upgrades from array vendor-X to array vendor-Y, or simply upgrades to the next model array from the same vendor. Replication software exists to handle to actual cloning of the LUNs, but nothing exists to go out and change all the servers to point to the new array to find their boot LUN. Well, until now that is.

So there you have it. With UCS I can take a task that would have taken a single admin more than a day and a ton of keystrokes and shortened the task to just a few simple steps. I’d like to send a plug for an article written by Jeremy Waldrop (who’s done more than his fair share of UCS). He shows an interesting method of pre-provisioning that uses UCS service profiles that you could combine with the information presented above to pre-provision your storage before your physical blades arrive – with no outage! Great work Jeremy – http://jeremywaldrop.wordpress.com/2010/11/11/cisco-ucs-service-profile-coolness/

Now you’ve heard what I have to say, and I hope you found this useful and that it will save you some time in deploying your servers. If you have anything you’d like to add, please feel free.

For more info on how Fibre Channel is implemented on Cisco UCS, see this link: https://supportforums.cisco.com/docs/DOC-6186

Thanks for stopping by.

-Jeff Allen

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>