ENM Source Pinning Failed – A Lesson in Disjoint Layer 2

So, today’s article is on VLAN separation, the problems it solves, and the problems it sometimes creates. Not all networks are cut from the same cloth. Some are simple and some are complex. Some are physical and some are virtual. Some are clean while others are quite messy. The one thing that they all have in common is that Cisco UCS works with all of them.

A Look Back

In UCS, we have a topology that we call “disjoint Layer 2” (DJL2) which simply means that a there are networks upstream from UCS that are separated from one another and cannot all be accessed by the same UCS uplink port (or port channel). For instance, you might have upstream VLANs 10, 20, and 30 on UCS uplink port 1 and VLANs 40, 50, and 60 on UCS uplink port 2. Prior to UCS 2.0, this configuration was not supported (in End Host Mode (EHM)). The main reason is that prior to 2.0, when VLANs were created, they were instantly available on ALL defined uplink ports and you could not assign certain VLANs to certain uplink ports. In addition to this, UCS uses the concept of a “designated receiver” (DR) port that is the single port (or port channel) chosen by UCSM to receive all multicast and broadcast traffic for all VLANs defined on the Fabric Interconnect (FI). To make this clear, UCS receives all multicast/broadcast traffic on this port only and drops broadcast/multicast traffic received on all other ports. Unless you have DJL2, this method works really well. If you do have DJL2, this would lead to a problem if you defined the above VLAN configuration and plugged it into pre-2.0 UCS (in EHM). In this situation, UCS would choose a designated receiver port for ALL VLANs (10-60) and assign it to one of the available uplinks. Let’s say the system chose port 1 (VLANs 10, 20, and 30) for the DR. In that situation, those networks (10, 20, 30) would work correctly, but VLANs 40, 50, and 60 (plugged into port 2) would not receive any broadcast and multicast traffic at all. The FI will learn the MAC addresses of the destinations on port 2 for 40, 50 and 60, but necessary protocols like ARP, PXE, DHCP (just to name a few) would be broken for these networks. In case you’re wondering, pin groups do not solve this problem so don’t waste your time. Instead, you need UCS 2.0+ and DJL2 which allows specific VLANs to be pinned to specific uplink ports. In addition, you now have a DR port for each defined VLAN as opposed to globally for the each FI. If you want to know more about the DR port, how it works, and how you can see which ports are the current DR on your own domain, please see the Cisco whitepaper entitled “Deploy Layer 2 Disjoint Networks Upstream in End Host Mode” located here: http://www.cisco.com/en/US/solutions/collateral/ns340/ns517/ns224/ns944/white_paper_c11-692008.html

The Rules

You’ve probably figured out that if this were super easy, I wouldn’t be writing about it. Well, yes and no. It’s easy to turn on the DJL2 feature, but there are some lesser known rules around making it work. There is no “enable DJL2” button and you won’t find it by that name in UCSM. You simply enable it when you assign specific VLANs to specific uplink ports. It’s then automatically on. But many people make a mistake here. Staying with the above example, you want port 1 to carry VLANs 10-30 and port 2 to carry VLANs 40-60. When you first enter VLAN manager, you will see VLANs 10-60 defined and carried on ports 1 and 2. You might think to just take VLANs 40-60 and assign them to port 2. Well, that does remove 40-60 off of port 1, but it would also leave 10-30 on port 2 (along with 40-60). So you must isolate VLANs to their respective uplink ports. Furthermore, if you define a new VLAN, you need to go into VLAN manager and pin it to the port(s) you intend and remove it from the ports it should not be on. The main thing to remember here is that the original UCS rules on VLAN creation have not changed. That is, a created VLAN is always available on all uplink ports. That still happens even when you have DJL2 setup because UCS manager has no idea where to put that new VLAN unless you tell it – so it follows the original rule. I recommend looking at your VLAN config in NXOS (show VLAN) before you run it in production. This will verify that the changes you wanted to make are truly that changes you made in the GUI.

ENM Source Pinning Failed

So now we have DJL2 setup properly on our uplink. Let’s look at the server side as it is often an area of confusion. It’s probably also the way most of you found this blog entry because you googled for the term “ENM Source Pinning Failed”. Let me explain why. When you create vNICs on a service profile using the config we had above (10-30 on port 1 and 40-60 on port 2), you are not able to trunk/tag VLANs from BOTH port 1 and port 2 to the same vNIC. For example, you can have have a single vNIC with VLANs 10, 20, and 30 and another vNIC with VLANs 40, 50, and 60 and both vNICs can be on the same server. But you CANNOT have a single vNIC with VLANs 10 and 40. If you do, the vNIC will go into an error state and will lose link until one of the VLANs is removed. The picture below might help – keep in mind that this diagram is very simplistic and that you can also get an ENM source pin failure with just a single FI:

The above illustration shows a configuration where the user wants to have VLANs 10-50 reach a single server, but this will not work in a DJL2 configuration and will result in ENM Source Pin Failure. Instead, the illustration below would achieve the desired result of VLANs 10-50 reaching the same server, but do not violate the DJL2 rules and would work fine.

Hopefully this helped explain DJL2 a little better and maybe alleviate the ENM Source Pinning error you might be getting.

Thanks for stopping by.

-Jeff

Update: I am running UCSM version 2.2.1d in the lab at present and came across a scenario I need to share. I had vlan 34 on FI-A and vlan 35 on FI-B. I did not need failover for this so each vlan was isolated to a single FI. I set up disjoint L2 correctly and “show vlan” in NXOS mode showed me that it was indeed setup the way I intended. However, any profiles that used vlan-34 on FI-A would throw the dreaded ENM source pin failed error in the profile. I spent several hours verifying and double-checking everything, but no joy. I then ran this command:
FI-A (nxos)# show platform software enm internal info vlandb id 34

I got nothing back. nada, zilch, nothing.
Running this on FI-B, I got what I expected:
FI-B (nxos)# show platform software enm internal info vlandb id 35
vlan_id 35
————-
Designated receiver: Eth1/1
Membership:
Eth1/1

Assuming something was really confused here, I rebotoed FI-A and all was well. If you encounter this, I don’t suggest you reboot the FI (unless you’re like me and it’s in a lab), and I would call TAC instead and let them find a non-disruptive method of correcting the issue. I just want to make a note of it here in case you feel like you got it all right and it still has an issue.

2nd update:

You can run into this problem from the opposite direction if you implement the uplinks first and then create the profiles later. If that’s the case, the profiles will throw an error before being applied saying “not enough resources overall” and “Failed to find any operation uplink port that carries all vlans of the vNICs”

12 thoughts on “ENM Source Pinning Failed – A Lesson in Disjoint Layer 2

  1. jeff,

    thank you for clearing this up, i was trying to solve this problem for last few hours. We need to separate 2 vlans from rest of network, so i decide to try vlan grouping.
    Ofc, i did get error, and now when i created setup as you explained, everything works.
    BUT one question remains.
    Lets say, vlans 10,20,30 are allowed on vnic0, and 40,50 are allowed on vnic1, and tehy are vnic’s on my esx host.
    If vnic0 fails, or lets say fabric extender fails (the one attached to vnic0), will my virtual machines that are running on host, and configured in vlans 10, 20 or 30 lost connectivity? Will “enable failover” option in vnic setup understand to failover configured vlans?

    I hope you understand my question.

    Thank you, really great article

    Igor

    • This is a really late reply and I’m sorry I missed it. You could avoid this situation by providing each FI with an uplink in each set of vlans. Fabric Failover would require this in order to function. Look at the graphics int he article and notice how the vlan clouds are cross-connected to each FI. If you do it that, you should be good.

  2. jeff,

    thank you for clearing this up, i was trying to solve this problem for last few hours. We need to separate 2 vlans from rest of network, so i decide to try vlan grouping.
    Ofc, i did get error, and now when i created setup as you explained, everything works.
    BUT one question remains.
    Lets say, vlans 10,20,30 are allowed on vnic0, and 40,50 are allowed on vnic1, and tehy are vnic’s on my esx host.
    If vnic0 fails, or lets say fabric extender fails (the one attached to vnic0), will my virtual machines that are running on host, and configured in vlans 10, 20 or 30 lost connectivity? Will “enable failover” option in vnic setup understand to failover configured vlans?

    I hope you understand my question.

    Thank you, really great article

    Igor

    • I do understand your question and yes you want Fabric Failover enabled. However, you probably want to test this because if you have thousands of VM’s attached a dozens of hosts on the same FI, the remaining FI in an outage will take a while to notify upstream switches that path has changed. If there are lots of VM’s you can create two additional vnics in the profile that carry the same vlans as the first two vnics, but have them attach to the alternate FI. Then you can use normal ESX teaming of the two adapters – just don’t use IP hash and you’ll be fine.

  3. furthermore, in this setup, will vlans 10,20 and 30 use only interconnect A, and vlans 40 and 50 use inly interconnect B (presume vnic0 is attached to FI A, and vnic1 to FI B)?

    • Not if I could help it. You still don’t want an FI outage to cause a downstream outage of the vlan(s). Notice the in graphic that each set of vlans goes to each FI.

  4. Jeff I have a question I can’t seem to answer regarding the native vlan feature. In our application of UCS we have only 2 vlans, (vlan 1400 and vLAN 780) one which goes upstream and the other which is for failover traffic that doesn’t go above the fabric, a private network internal to the configuration. It is consistent acrossed all firmware versions from 1.3 to 2.2(1d) that when I create the vnic that I choose the upstream vlan, 1400 as native. If I don’t do this I won’t get outbound traffic to the upstream switch. I’ve also done this with multiple vlans with none chosen as native and until I chose the vLAN 1400 (my vlan native to the first upstream switch even though the Trunk allows more than one vlan) I get no outbound traffic. I’ve read article after article and missed anything that will explain this. To further the issue my software team with go into their esxi layer and on the vswitch add the vlan ID to that thus where I did have outbound traffic it now stops until I remove it at the esxi layer. Can you expound on what my issue may be.

    Thanks!

    • Marko,

      In UCS, all vlans exiting the FI’s are tagged except for the native. I don’t know what your background is, so this may not be news to you, but I don;t want to assume….
      On a trunk port (all uplinks in UCS are trunked), many vlans are carried and each packet on the trunk has a “tag” that identified its vlan to the receiving switch. The only exception is the native vlan. The native sends untagged (normal) packets – and what vlan they belong to is not identifiable by anything inside the packets. The native vlan should match on both ends of the link or things get really fubar.

      For this discussion, assume the FI’s are just switches. The vnics inside the servers are connected to the “downlinks” and the upstream switches are connected to the “uplinks”. If the vnics in the servers are set to use vlan 99 only, but you do not mark it native, then traffic on vlan 99 is tagged when it reaches the FI. If it needs to leave the FI upstream, the FI looks at the uplink port to see if packets on vlan 99 are tagged or untagged (native) and sends the accordingly. In your situation, you are choosing vlan 1400 to be native in the service profile – so all packets on vlan 1400 are untagged when the reach the FI. When the leave the FI, it sounds like vlan 1400 is the native on the upstream switch – so you need to mark 1400 as the native in the vlan manager in UCS. This is displayed in a screenshot from my good friend Scott – http://ciscoservergeek.blogspot.com/2012/10/quick-and-dirty-disjoint-layer-2.html. You would right click 1400 and mark it as native.

      Now, on ESX, you have to view it as a switch as well (because it is). If you fill in that vlan field for the mgmt traffic (I don’t remember all the exact terms they use) with 1400, then you are tagging 1400 for mgmt traffic to the vswitch. So the vnic traffic in UCS would be expecting 1400 to be untagged and communication will fail when it comes into the FI.

      I’m sure if I were sitting there, I’d be able to actually fix it, but hopefully this has given you a point in the right direction to get it working the way you like. I can tell you almost for certain that it doesn’t sound like you need disjoint layer 2 at all. It would like break your configuration – I only mention it because you commented on this article which deals with DJL2.

      Good luck!

  5. Appreciating the hard work you put into your blog and detailed information you provide.
    It’s nice to come across a blog every once in a while
    that isn’t the same outdated rehashed information. Excellent read!
    I’ve saved your site and I’m including your RSS feeds to my Google account.

Leave a Reply

Your email address will not be published. Required fields are marked *