So… what we’re going to discuss here is a method by which you can implement some mechanism for change management, version control and rollback ability in UCS through service profile templates, but first, I’d like to give a little background.
My name is Loy Evans, and I’m a Cisco Data Center Consulting Systems Engineer, like Jeff. In my past, I’ve held a number of varied jobs in the IT industry from Programmer to Router Jockey to Data Center Master Architect. For the past few years, I’ve been consulting on UCS for customers in the Southeastern US. I pride myself on understanding not just what customers ask for, but also the questions behind the question being asked. This typically leads me to one of two things, either a business need, or a technical issue. OK, mostly some of both, but there’s always a tendency in one or the other direction, but in my opinion, it’s very important to understand the root of the question, as there will likely be subtle differences in how you approach the answer and maybe even more subtle differences in how you present the solution. When I talked to Jeff about some stuff I was doing, he thought it would be “in the wheelhouse” of what he considered core content for his blog. “So”… here we go.
Back to the lesson at hand… In this case, the customer brought up an issue that recently happened where they had a couple of changes that had taken place (a change in BIOS configuration, a Firmware update, and some operational network setting changes), and the way in which they had implemented it, they had no idea when the changes were done, or how to track the impact to the service profiles. These changes were made by modifying policies that were already being referenced by the service profiles, thus making the change management difficult, if not impossible, and the ability to monitor the magnitude and rate of change non-existent. On top of that, they had no process for implementing the changes in an orderly fashion. In short, they had a great tool in UCS manager, but were not using it for efficient operational control.
I decided to step back and look at the problem from a little higher viewpoint. My take on it was first: WHAT problem are you trying to solve, then HOW are you solving it? The answer to the former was simple: we have to adjust the environment to keep up with addressing a business need (adding/removing VLANs to a cluster) and fixing a technical issue (Firmware upgrade to support a new feature or BIOS configuration change to address a hypervisor bug). The answer to the latter was not so simple. In this case, they had not really worked out a system, and the implementation of the fixes followed bad form: modify configuration of a policy already in place. I’d say that’s probably a worst practice. I guess there is a bit of a gotcha… While UCS Manager is very flexible and you can just edit a policy at will, doesn’t mean you should. The good news is you have options, the bad news is…you have options.
So, my suggestion was to begin a practice of version control based on Policies and Templates. The following is a description of a set of concepts and practices that we put into place, and I now use as a recommended practice to all customers as they look to operationalize UCS in their organizations. For this discussion, I’m going to use Firmware Management for UCS Blades as the change we are implementing.
Keep this in mind: this is not the only change that you can manage through this process, it can extend to almost any change you might want to put in place on UCS.
Instead of Modifying, Try Creating and Adding a Date/Time Stamp
In this example we are going to create a new Firmware Management Policy (previous version was 2.1.1f, new version is 2.1.2a). To keep with the date stamp theme, we create a firmware management policy with a name of 20130901-hv_fw, which references a the blade firmware package of version 2.1.2a, as shown below.
For the example documented here, we have previously created one (named 20130801_hv_fw), and we created a new one as mentioned above. I will reference these for the rest of this post.
Most would typically just go change the service profile or updating template and move on. However, this would only exert a control system at one level, not at the root level for the workload, where we would find the most useful benefits of configuration management, and we would gain low level control, but not maintain high level control. Let’s not stop there with version control.
Templates Can Be Your Friend
Now we will take a service profile that is currently impacted by the business or technical issue, right click and create an updating service profile template.
Side note: In this and in all select-click actions, you can right click on navigation pane on the left side, or you can use one of the context links in the content pane on the right hand side of the UCS manager.
In our example I’ll use a service profile named hv_0 as our primer, which is a service profile created for a Hyper-V workload. This primer is the workload that we used to test the configuration, and once tested and verified, then we can use that as the model for the rest of the Service Profiles. We can make experimental changes, including the firmware policy, to this Service Profile in our test environment, test it out, then use it as a reference. You can see here that we have used the Firmware Policy labeled as 20130801_hv_fw.
Once we have done this, it’s very easy to create replicas. First we create a Service Profile Template by right clicking and selecting “Create Service Profile Template”.
Which we will configure as an Updating Template, functionality that we will use later.
This action takes only a few seconds, and once we have that Template, we can right click it to create the directly associated Service Profiles. In this example we will create 3 more Hyper-V host workloads, all with identical configurations, BIOS configurations, Firmware Versions, etc. as shown below, using the same naming convention we employed on the first (hv_0).
Now that we have created these new Service Profiles, you will notice something different from the original, as shown below. These service profiles are not directly modifiable, but rather are bound to the Template and must be either unbound or configured indirectly through the template.
If we look at hv_0, however, we will see that is not the case, and that Service Profile is directly modifiable, as it’s not bound to a template. To maintain consistency, we can bind that to the Template we created, by right clicking the Service Profile hv_0, and clicking “Bind to a Template”, and then choosing the existing template (20130801_hv_gold).
Now we have a complete set of bound Service Profiles that allow provide us with a solid base for consistent configuration.
Now Comes the Change
We have built out the base model, but now comes the need for the configuration change. As mentioned before, in this example we are changing the Firmware versions. Let’s create a new Firmware Policy by choosing the Firmware Management Package from the Servers Tab in the UCS Manager GUI.
We now have a new Firmware Policy that we can use for our new image testing. In this example, it’s been a month since we first created our versioning system, so we’re going to label our new Firmware policy as 20130901_hv_fw.
The first thing we should do is test this out, and the best way to do that is to grab one of our Service Profiles and make the changes to that one. To begin this process, we take one host out of production, then we unbind that Service Profile from the Template as shown here.
Now we can directly modify that Service Profile for our process. Now create a new Firmware Policy, in this case, called 20130901_hv_fw, which references the new Firmware version.
Then we can modify the Service Profile to reference that Firmware Policy.
Since this is modification of a of an existing Service Profile, we have to commit those changes by clicking “Save Changes” at the bottom right.
When we make this change, be aware that the Service Profile will need to reboot the server to update the Firmware, which UCS considers a “Maintenance Activity”. We have our Service Profile (and thus our Service Profile Template) using a “user-acknowledged maintenance policy”. This means when a maintenance activity is required, it will queue and UCS Manager will wait for a user to acknowledge the activity before rebooting the Service Profile. We will get notified of this with something similar to this message:
If we click Yes, we will also get some other messages indicating that there are pending maintenance activities. On a Windows machine you may see something like this in the system tray:
On any other OS you won’t see a pop up, but you will notice the Pending Activities indicator start flashing red-to-yellow at the top of the UCS Manager window (this happens on Windows as well, but in windows you get multiple notifications).
If we click that, we will then see the following Pending Activities list:
By clicking the check box “Reboot Now” as indicated above, we will reboot the Service Profile and the Firmware update will take place. You can watch this happen by clicking on the FSM (Finite State Machine) tab and watch the steps as they take place.
Templates Are Your Friend, Again
We now can take the newly modified, rebooted, and tested Service Profile and create a new known-good template. Once again, right click and select “Create Service Profile Template”. In our example, we’re creating an updating template with the name 20130901_hv_gold.
And you can now see we have very quickly created a second Service Profile Template.
My Kingdom for a Trouble-Free Maintenance Window
We now have our existing template, our test machine that we have used to verify proper operation, then moved back to production. We also have our newly minted template, and now we need to apply this to the production workloads. An important question to be considered is when and how to do this. My suggestion would be to roll these during a maintenance window, and the impact of such a maintenance windows will obviously depend on the workload you’re managing. Bare metal, non-clustered servers are a bit more impactful than virtualized hosts. You should be able to determine the possible impact and plan accordingly.
Let’s assume that we have procured the maintenance window necessary and it’s time to roll our new Firmware into the rest of the environment. We can now highlight all of the affected Service Profiles by shift-selecting all of the Service Profiles in our set (hv_0, hv_1, hv_2, and hv3), right clicking the set, and choosing “Bind to a Template”.
Choose the new Template
Which will then give us the message informing us of our maintenance policy
Yielding the new pending activities list
Something to note here is that hv_0 is not in this list. Since we have already gone through the process during our testing, it’s Binding to the new Template will not require any maintenance activities. A suggestion here is to choose a host, start maintenance mode and wait for any VMs to migrate off. Once that is done, you can come to this window, select the host Service Profile, check the “Reboot Now” box, then hit Apply (or you can hit OK). This will kick off the maintenance activity that is required to update the Firmware. Once that host is finished, stop maintenance mode on that host, then move to the next host, lather, rinse, repeat, and so on until you are done with the cluster.
As a side note: if you wanted to automate these maintenance activities, check out some of the awesome work done by Eric WIlliams, a slammin good coding dude at Cisco, as evidenced by some of his work here at the developer.cisco.com community forums.
What About the “Oh Snap” Factor?
Yeah, well, I think we know exactly what I really meant there, but it’s a good and important question, no matter how badly phrased my PG-rated version is. This is where we can utilize our previous-versioned Templates for Configuration Rollback. Let’s say we went through all of this and there just so happened to be a service impacting problem that we didn’t catch in our testing (blame QA, they’re used to it). While this is certainly not something we want to have to deal with, it’s something that we can easily do.
Let’s follow the same procedure we did to bring all of the Service Profiles up to the new version, just in reverse. If we want to roll back to what in our example is our last known good, we can shift-select all of the Service Profiles, right click, and select “Bind to Template” again, choosing our old stand-by, 20130801_hv_gold.
Of course, we will be prompted with our notice of what maintenance activity this will entail.
Then we will come back to our Pending Activities list, this time with all of our affected hosts in the list. Depending on the maintenance window you worked out, you can follow your maintenance schedule as before by selecting one host at a time using host maintenance modes to move workloads around and selectively rebooting a host at a time.
Once again, you can also utilize an automation script, or just say the hell with it and reboot them all at once. If you choose this last one, please clear your browser history, pretend you never heard of me and freshen up the resume. Don’t say I didn’t warn you.
If you wan to monitor the status of the changes (in this step or any other when the server is in the throes of a maintenance activity, you can click on the “FSM” tab and watch the progress as well as the step-by-step details as the process is going on. If you have reached step 38 listed below (as of version 2.1.1), you are beginning the process of the Firmware Updates, starting at 38 with the BIOS Image Update.
On the Usefulness of Firmware Policies
So, as a footnote, I am a HUGE fan of using Firmware Policies, and consider their use self-evident, however I commonly have to field the question, “why bother?” One simple experience that I like to fall back on from many years of previous experience… When have you EVER gotten a replacement server during a hardware failure and replacement that had the EXACT same firmware as the server you are replacing?
Yeah. Exactly.
Thanks for reading. See you next time.
— Loy
Follow me on twitter @loyevans