Boot from SAN 101 with Cisco UCS

So my Dad is selling Viagra now – or at least that’s what his latest email claimed. Thinking this was quite a shift from his real estate job, I called him to inquire. It turns out that while dad was sleeping, his yahoo email had been working overtime to provide this medical benefit to everyone in his address book. While I find the emails a minor nuisance, the job of helping him remove the malware from his PC is what I really could do without and it happens time and again.

I’m sure this sounds all too familiar to most of you. After all, you’re the “computer person” in the family, so it’s either your dad that fell for the “You’re infected – Click Here now to clean your system!” trick, or it’s your sister’s roommate that doesn’t understand why her Facebook keeps posting links to inappropriate sites on everyone’s wall. When I come across things like this, I’m the type of guy who wants to understand not just how to fix it, but to understand how the infection works and what makes it spread. I do this so that the next time I see the problem, I completely understand it and don’t have to submit to the drastic measure of re-installing the operating system and/or formatting the drive like PC repair shops do. Doing this provides me with a simple solution that can be done faster and easier next time, and I won’t be forced to repeat menial tasks like reloading the system over and over again every time it comes up. I take this same approach in the data center when I see a repetitive task on the horizon. Today, I want to help you understand how Boot from SAN works in a legacy blade environment and how UCS makes the task 100 times faster (literally).

I should let you know right up front that I’m not a storage guru, but I’ve setup servers to Boot from SAN on so many different arrays now that I’ve lost track. This should be of some comfort if you also don’t know storage that well – meaning that this isn’t rocket science (which is probably equally easy since it’s just fuel and a match, right?). And while most of the relevance in this discussion surrounds Boot from SAN, I could make a case for this to be just as useful to any SAN admin who wants to allocate storage prior to the OS being installed on the local HDD. Sometimes it just comes down to timing and schedules and for those situations, this information will prove valuable.

Now, for those who don’t boot from SAN today, I’ll review briefly the general tasks that are normally required for legacy servers and blades to do so.

  1. Server Tasks
    1. Power on the server.
    2. Press the appropriate hot key to enter the server BIOS/EFI and change the Boot Order to boot from the server’s SAN storage controller. Note: HP calls this “Boot Controller Order” which is different than the standard Boot Order.
    3. Save settings and reboot.
    4. Again, press the appropriate hot key during POST when the HBA option ROM is loading (CTRL+Q for Qlogic and ALT+E for Emulex). Entering this utility forces the HBA to login to the upstream switch. Leave it at the intro screen.
  2. Switch Tasks
    1. Zone the server WWPN to a zone that includes the storage array controller’s WWPN.
    2. Zone the second fabric switch as well. Note: For some operating systems (Windows for sure), you need to zone just a single path during OS installation so consider this step optional.
  3. Array Tasks
    1. On the array, create a LUN and allow the server WWPNs to have access to the LUN.
    2. Present the LUN to the host using a desired LUN number (typically zero, but this step is optional and not available on all array models)

The steps are done in this order for a reason. Without the Server Tasks section, the HBA’s won’t be logged in so that zoning can occur. Without the Switch Tasks, the array can’t see the HBAs so the Array Tasks won’t work. At this point you can reboot the server and install the OS right onto the SAN LUN. To the OS, the LUN is seen just like a local disk – pretty cool! Well, it’s pretty cool until you have a bunch of servers to repeat this on. When deploying a few servers, most people wouldn’t mind repeating a task or two on each server in order to get the job done. Sometimes it’s just faster to do this than to try to automate if you can see the light at the end of the tunnel (and it isn’t an oncoming train). But what if the task is beyond “just a few servers”….like 100 or more? That’s a lot of servers you need to catch during POST to force them to login. I’ve used every major vendor’s Remote Console out there and none of them are perfect (yes, Cisco’s CIMC too). That’s at least 200 reboots to attend to. Following the instructions below will show you how to force any number of Cisco UCS blades to all login at one time with no need to ever enter the HBA BIOS utility or Server BIOS. I need to get one item out of the way before we proceed – pre-provisioning. Some might argue that the servers don’t need to login to the fabric in order to complete these steps because, in UCS, all the WWPNs are already known ahead of time – before the blades are even purchased. While this is true, and certainly possible, it just hasn’t been my personal experience that storage people like to do it that way. Feel free to comment otherwise.

Using UCS, the tasks above change just slightly. But instead of repeating the 3 basic task sets 100 times, we do the task list below just once.

  1. UCS Manager Tasks
    1. Create a Service Profile Template with x number of vHBAs.
    2. Create a Boot Policy that includes SAN Boot as the first device and link it to the Template
    3. Create x number of Service Profiles from the Template
    4. Use Server Pools, or associate servers to the profiles
    5. Let all servers attempt to boot and sit at the “Non-System Disk” style message that UCS servers return
  2. Switch Tasks
    1. Zone the server WWPN to a zone that includes the storage array controller’s WWPN.
    2. Zone the second fabric switch as well. Note: For some operating systems (Windows for sure), you need to zone just a single path during OS installation so consider this step optional.
  3. Array Tasks
    1. On the array, create a LUN and allow the server WWPNs to have access to the LUN.
    2. Present the LUN to the host using a desired LUN number (typically zero, but this step is optional and not available on all array models)

Remember, the above tasks are done one time only and all 100 servers (or however many you have) will login to the SAN switch. As you are probably aware, Cisco UCS manages SAN, LAN, and the Servers from one “unified” console. What makes this task (and lots of others) so easy can be traced back to the concept of Service Profiles and Templates. UCS manages the server facet by separating the server’s identity (MAC, UUID, WWNN, WWPN, VLANs, BIOS Settings, Firmware, etc) from the physical hardware itself. I’m not covering Service Profiles in this article, but you can read up on them on Sean’s blog here: http://www.mseanmcgee.com/2010/04/the-state-of-statelessness-cisco-ucs-vs-hp-virtual-connect/. Part of the server’s identity is the BIOS settings and boot order. Additionally, because any number of Service Profiles can be instantly created from and linked to a Template, I can create 100 or more “servers” and manage everything about them from one template. If I add a vlan to the template, I just added a vlan to all servers using the template. If I change the boot order of the template, I change the boot order of all servers. And yes, you can unlink profiles from templates ((and link them back) if you don’t want this behavior at some point. Because I assume some of you may want to go and try this, I am including the actual step-by-step below. If you have any questions, please let me know.

So let’s take a look at the steps required to get this trick to work.

  1. Create a Service Profile Template.
    1. Choose “Updating” template type from the first screen in the wizard. An updating template allows us to make changes in just one place, to the template itself, when we want to change multiple servers built using that template.

  1. Create the Boot Policy
    1. While creating the template, choose the option to “Create Boot Policy”

  1. Add only SAN Boot to the policy (I always name my vHBA’s “fc0″ and “fc1″). In this example, I am only adding a Primary path. A Secondary can also be created if you wish to have both vHBA’s login via this method.

  1. Add a SAN Boot Target to the Primary vHBA (fc0)

  1. Use ANY valid WWPN as the Boot Target WWPN. Since the purpose of this policy is to simply force the HBA to login, you can use one from your actual array or just make one up. I always use 20:00:00:00:00:00:00:00

  1. Finish the Policy and Template
    1. Save the Boot Policy and be sure and select it from the list before you continue with the rest of the Service Profile Template creation. Finish the Template at this point.
  2. Create Profiles
    1. With the template highlighted, click “Create Service Profile from Template” and create the desired number of Service Profiles from the Template. Use names that make sense to you for this deployment. For 100 servers, you probably want to use Server Pools to automate the Service Profile association to physical hardware (I intend to cover Server Pools as future topic). Regardless, once the Service Profiles are associated to physical servers, we can proceed to the next step.
  3. Boot the Servers
    1. By default, this will happen for you automatically. The servers will power on and try to boot off the array WWPN you designated as a target which obviously won’t actually boot the server, but it will accomplish our primary goal of the server vHBA successfully logging in to the SAN Fabric! That’s right, we now have 100 servers logged into the SAN Fabric and normal switch zoning can be done. Here’s an example of one successful login. First, look in the Service Profile to see the assign WWPN for the server in question. In our example, it’s 20:00:00:25:B5:00:A1:BF as shown below.

  1. Verify Successful Login
    1. So now that we know what we’re looking for, let’s go see if it shows up in the local switch’s FLOGI Database. A simple “show flogi database” command at the switch produces the WWPNs for all vlans.

Bingo – there it is! Along with the other 100 vHBAs I needed to zone.

Once zoning is complete, simply change the Boot Policy in the template to point to the actual WWPN of the array. You can edit the Policy or just create a new one. This brings up another huge UCS advantage. If you can easily change the boot target for all of the servers, that means you can do some pretty cool Disaster Recovery and/or SAN upgrades. Imagine if your datacenter upgrades from array vendor-X to array vendor-Y, or simply upgrades to the next model array from the same vendor. Replication software exists to handle to actual cloning of the LUNs, but nothing exists to go out and change all the servers to point to the new array to find their boot LUN. Well, until now that is.

So there you have it. With UCS I can take a task that would have taken a single admin more than a day and a ton of keystrokes and shortened the task to just a few simple steps. I’d like to send a plug for an article written by Jeremy Waldrop (who’s done more than his fair share of UCS). He shows an interesting method of pre-provisioning that uses UCS service profiles that you could combine with the information presented above to pre-provision your storage before your physical blades arrive – with no outage! Great work Jeremy – http://jeremywaldrop.wordpress.com/2010/11/11/cisco-ucs-service-profile-coolness/

Now you’ve heard what I have to say, and I hope you found this useful and that it will save you some time in deploying your servers. If you have anything you’d like to add, please feel free.

For more info on how Fibre Channel is implemented on Cisco UCS, see this link: https://supportforums.cisco.com/docs/DOC-6186

Thanks for stopping by.

-Jeff Allen

41 thoughts on “Boot from SAN 101 with Cisco UCS

  1. Pingback: Think Meta » Links and Whatnot, Take #1

  2. Pingback: Technology Short Take #7 - blog.scottlowe.org - The weblog of an IT pro specializing in virtualization, storage, and servers

  3. Jeff,
    Great article and I can’t thank you enough for the spoon feeding. I’m new with UCS and trying to grab a hold on as much knowledge as I can. Keep up with the nice step-by-step posts, I enjoyed reading it alot.

    Kendrick

  4. Pingback: UCS BIOS Policies | Jeff Said So

  5. Hey Jeff,

    we’re about to recieve a UCS with 16 blades and Bang.. i come across this article.. talk about timing.

    A big big big THANK YOU for this article. i dont know how many hours you have saved me! :)
    Cheers!

  6. Wow, I wish Cisco had this in their documentation! One of the biggest issues an admin has is what should he expect to see and what is the best order to run tasks.

    This is what I’ve been searching for to get SAN booting set up correctly. MUCH appreciated!

  7. Thanks for the informative article!

    A Cisco SE consultant told me you have to remove/disconnect the local hard drives in order to boot-from-SAN, if the said hard drives already have bootable OS installed on them. Is that true?

    • This is no longer a requirement. If the boot order does not include local storage, the option ROM for the local controller will not load.

  8. Pingback: Citrix XenDesktop on Cisco UCS - cliff davies cliff davies

  9. Jeff,

    Great article! I have a question on something that I’ve been getting a bit of conflict in. Some people are telling me I should configure the server’s wwnn and wwpn the same, while I’ve also come across some old guidelines that say the Node Name should be 20:10:00:25:B5:XX:XX:XX and the Port Name should be 20:00:00:25:B5:XX:XX:XX

    What do you recommend and why?

    Thanks!
    SF

  10. Remember, I’m no storage guru, but IMO, they should be different. Why? Simple – when you’re troubleshooting something, you want to know what you are actually looking at. Not to mention that UCS will now allow you to make them the same (you can use the same pool for both, but not the same actual address). Then there’s the other side of the connection too – I seriously doubt the SAN switch will allow them to be the same as it uses both differently.
    Now, the actual format of the address is not that important – so long as it’s considered valid by the switch and the array. Some are more strict than others. Some older MDS and NXOS code didn’t allow certain OUI’s so Cisco came out with some immediate guidelines on what address ranges to use. We probably weren’t that this was our issue and that it was going to be fixed soon (it is fixed now).
    Thanks for posting – let me know if this didn’t answer your question.

  11. Hi, Jeff

    I need to know can we not use a SAN switch & still achieve the BOOT from SAN feature. Direct connect of the storage is the option that I would like to explore for this… I am using 1.4j for UCS manager. I have already implemented Direct attached storage & it works wonderfully so was curious in exploring the optionof Boot from SAN without using a SAN swicth as I would be an added advantage to the customer & also increase the UCS merits

  12. For the moment, Cisco doesn’t support direct attaching of the FC array to the Fabric Interconnect without a SAN switch somewhere on the northbound SAN fabric (the array can be plugged directly into the FI, but a switch would be required somewhere to inherit zoning from). As you have discovered, the FI supports switch mode as of 1.4, and direct-attach “works”, but you have no zoning support. Booting from direct attached would also “work”, but is not supported as of this writing w/o a switch for zoning. I will try to get an article published when this feature becomes fully supported.

    • Thanks Jeff you have been of great help for accelerating the Cisco UCS momentum. The take away of my discussion would be I may try the Boot from SAN feature for a single Server alone in a chassis on a demo & await cisco documents for the same.

  13. I would really like to see auto-deploy in action on CiscoUCS in vSphere 5. That would be really awesome. We use this in our production but 25GB of SAN is a waste of space considering its costing 2k plus for that SAN on our part times that by the host and it adds up quick. We use 25GB LUNS…

  14. Pingback: Cisco UCS Boot-from-SAN Troubleshooting with the Cisco VIC (Part 2) | Jeff Said So

  15. Pingback: UCS Boot-from-SAN Troubleshooting with the Cisco VIC (Part 2) | Jeff Said So

  16. Jeff,
    Do you know if Cisco is going to support iSCSI SAN boot for XENSERVER on the UCS Blades?

    Thank you,
    Warren

    • I assume you’re asking this due to LeftHand or EQ storage. I’ve seen iSCSI boot done but i’m not a fan. Actually I’m becoming less and less of a fan of iSCSI on XenServer with every update. If you’re looking at a SAN at the same time as the UCS I highly recommend looking at an EMC VNX. You can boot Fiber Channel. Supports MPIO with 8GB FC (which moves more data than 10GB iSCSI) or you can do NFS or a mix of both. Just my two cents. iSCSI can give you some real tense moments during migrations or upgrades lately with XenServer.

  17. Dear Jeff,
    You simply rock. Magnificent article. I am currently implementing Soot From SAN for my Cisco UCS Blades, It goes without saying that this article is a boon for me. A very big thanks to you.
    Question: In HP-Virtual Connect Manager, When we go to the “Interconnect” Page, Once the Interconnect-Fibre Channel is selected, it would show the information of Loggedin or Loggedout(from SAN switch). Similarly in UCSM, is there a way to see whether the vHBAs are logged in? (Without accessing the SAN switch?)

    • Thanks for the kind words and I’m glad this article helped you. As far as being able to see what is logged into the upstream switch, we track that info and can display it, but only in the CLI. I wish it was nice and easy like it is in VC, and you’ve even inspired me to open an enhancement request for it :)
      For now, you need to ssh into the FI:
      #connect nxos [a | b]
      #show npv flogi-table
      SERVER EXTERNAL
      INTERFACE VSAN FCID PORT NAME NODE NAME INTERFACE
      ——————————————————————————–
      vfc2996 10 0xe5000f 20:00:00:25:b5:aa:00:0f 20:00:00:25:b5:ff:00:0f fc2/13
      vfc3010 10 0xe50108 20:00:00:25:b5:aa:00:0e 20:00:00:25:b5:ff:00:0e fc2/14

      (you will have copy/paste the above into a text editor to get a decent reading on it)

      Any WWPN listed with an FCID is logged into the upstream switch (because that is what generated the FCID).
      Hope that helps.

  18. Jeff,
    You obviously know your stuff. I’m in a bit of a bind trying to get my environment configured to boot from SAN and I’m getting no where with my SE and techs. Is there any chance you could exchange some time for a beer or dinner?

    • Jeff,

      is there any document/procedure how do we replace/move SAN boot LUN to new LUN?

      I have a requirement to replicate the RHEL san boot disk on UCS and move to new Storage Array that means new LUN. how can i achieve this?

      Thanks,
      Govinda.

      • I think you mean that you have a RHEL server booting from SAN and now to save hassle of building new servers, you just want to copy this LUN to New Luns and Build Servers in Flash.

        Not sure if it can be done in UCS or not But you can Use Lun copy feature in storage. Offcourse you need to use some thing like sysprep in windows to maintain unique identity of OS . We build some 300 Servers same via in single day :) :)

        hope it helps

        Regards
        Sushil

      • You forgot that I’m not a storage genius :)
        However, most storage vendors provide a “cloning” capability that will do what you ask.

  19. Great post. One question. How do you handle things like unique server names, licensing and IP addresses. Once you build one server and create a service profile, is there a way to have a script do deal with the variables? Most prod servers are not DHCP I’m assuming.

    Thanks
    Mark

    • DHCP is becoming common place in datacenters on staging/build vlans. They are usually isolated. As far as IP uniqueness and licensing, these items are beyond the scope of UCS itself. Cisco has solutions for this in our CIAC (Cisco Intelligent Automation for Cloud). Other vendors such as Symantec (Altiris), and Microsoft offer solutions around this as well.

  20. So as far as UCS blades, is it more common for them to host a dedicated server or install ESXi on them and go crazy with VMs? With UCS service profiles, is there some sort of HA capability where a blade that has HW issues has its contents auto migrated to another free blade? I see some overlap with what Vsphere is capable of and what VCS manager does as failures are detected. Just trying to get my head around this. Any docs you can recommend? Thanks !

  21. Hi Jeff,

    I would like to ask regarding booting from SAN, We just connect the 2 HBA to the Storage directly without FC Switch then installed VMWARE 5.1, During the process of Installing VMWARE 5.1 we encountered an error “Cannot format to VMFS” do you have any idea regarding our problem.

    Thanks

  22. Hi Jeff,

    We our using CISCO UCS C200 with Emulex HBA FC, The Error when installing the ESXI 5.1 is already resolved but another problem occur, after the installation is finished we rebooted the server and we see a black screen with a cursor blinking.

    what our things we need to do?

  23. This is extremely common. You are most likely pointing the HBA to a WWPN target on the array that is a) not on the same fabric or b) on the same fabric, but not the “owning” controller for the LUN (very common in active/passive arrays).
    Cisco TAC should be able to help you troubleshoot this – also, please see my other post on troubleshooting problems like this:
    http://jeffsaidso.com/2012/02/ucs-boot-from-san-troubleshooting-with-the-cisco-vic-part-2/

  24. Here’s a question for you, how do you deal with SAN Boot configurations if the blade is a Windows server and you’re connected to a VNX array. I can’t get PowerPath to work on UCS blades with any version installed. Yes, the host can see and boot from the lun but when the rest of the paths are added in (because boot from SAN on EMC storage, you present only one path to the disk, not all of them) PP still doesn’t see the array or paths. I’ll probably have to open up a support ticket for this.

  25. Hi All,

    I am facing one issue in Cisco UCS B200 M3 Description as per below.

    I have 5 Chassis. out of 5 chassis i installed solaris11 on 2 chassis means (16 Blades) and remaining 3 chassis i installed windows 2012.

    I configured branded zone in solaris . If i reboot my One Fabric Interconnect my solaris servers base as well as virtual machine not able to ping but my windows machine abe to ping after 2 RTO.

    That means my FI cluster configuraion working fine becz of my windows machine working fine after rebooting FI. Only problem in solaris servers.

    Please help me i am trying everything but no luck.

    • Are you using Fabric Failover in the profile? I believe the UI calls it “enable failover” with a checkbox. Assuming you are using this, the problem would be in the driver. You could simulate the same thing by simply forcing all the uplinks down on a single FI. Unless you have altered the Network policy, the server links connected to that FI would go down as well. You will not see “link down” on the solaris interfaces if you are using fabric failover (we handle the failover), but when you are in this state, look at the Solaris interface configuration and see what it sees.

      Also, what are you pinging? It sounds like it should work because the windows servers work, but the profiles for the two could have very different vlan configurations. Make sure the network portion of the profiles match between the windows and solaris machines if you are pinging the same device in both cases.

  26. Well, the comments on this post certainly took a turn for the spam-bots!

    Great article, though. Very helpful.

    • You don’t like hearing about the Academy Awards, iPhone apps and Low bank rates? ;)
      I’ve been a little bit busy lately so I haven’t been keeping up. I’m glad you mentioned it though because that caught my eye and I went and cleaned things up.

      thanks Zach

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>