vSphere Integrated Containers 0.4 – Inspecting VCH and ContainerVM

Last week, I had an interesting conversation with my friend Michael on vSphere Integrated Containers (VIC) in it’s current version 0.4. We discussed some of the key concepts and how they relate to other container implementations out there. I decided to summarize the key observations with a little more detail here as I expect this information to be interesting for operations teams once they start running VIC.
Please note: this is based on the currently available Open Source VIC project in version 0.4 running on vSphere 6.0 in my homelab.
For simplicity reasons, I decided to go with a “standalone ESXi” installation of my Virtual Container Host (VCH) in this example.

First, I created a new container host called “VCH001” on my ESXi host from my PhotonOS based worker VM:
root@photonbox [ /workspace/vic ]# ./vic-machine-linux create --bridge-network 'VCH Bridge' --external-network 'VM Network' --image-datastore mydatastore --target 'root@esxi01.think-v.com' --name VCH001

The output of this command shows the necessary details:
INFO[2016-08-06T19:51:56Z] Please enter ESX or vCenter password:
INFO[2016-08-06T19:52:00Z] ### Installing VCH ####
INFO[2016-08-06T19:52:00Z] Generating certificate/key pair - private key in ./VCH001-key.pem
INFO[2016-08-06T19:52:02Z] Validating supplied configuration
INFO[2016-08-06T19:52:05Z] Firewall status: DISABLED on /ha-datacenter/host/esxi01.think-v.com/esxi01.think-v.com
INFO[2016-08-06T19:52:05Z] Firewall configuration OK on hosts:
INFO[2016-08-06T19:52:05Z] /ha-datacenter/host/esxi01.think-v.com/esxi01.think-v.com
INFO[2016-08-06T19:52:05Z] License check OK
INFO[2016-08-06T19:52:05Z] DRS check SKIPPED - target is standalone host
INFO[2016-08-06T19:52:07Z] Creating Resource Pool VCH001
INFO[2016-08-06T19:52:07Z] Creating appliance on target
INFO[2016-08-06T19:52:07Z] Network role client is sharing NIC with external
INFO[2016-08-06T19:52:07Z] Network role management is sharing NIC with external
INFO[2016-08-06T19:52:09Z] Uploading images for container
INFO[2016-08-06T19:52:09Z] bootstrap.iso
INFO[2016-08-06T19:52:09Z] appliance.iso
INFO[2016-08-06T19:52:22Z] Waiting for IP information
INFO[2016-08-06T19:52:42Z] Waiting for major appliance components to launch
INFO[2016-08-06T19:52:44Z] Initialization of appliance successful
INFO[2016-08-06T19:52:44Z]
INFO[2016-08-06T19:52:44Z] Log server:
INFO[2016-08-06T19:52:44Z] https://VCH_IP:2378
INFO[2016-08-06T19:52:44Z]
INFO[2016-08-06T19:52:44Z] DOCKER_HOST=VCH_IP:2376
INFO[2016-08-06T19:52:44Z]
INFO[2016-08-06T19:52:44Z] Connect to docker:
INFO[2016-08-06T19:52:44Z] docker -H VCH_IP:2376 --tls info
INFO[2016-08-06T19:52:44Z] Installer completed successfully

More details about the inner workings can be found in the VIC 0.4 blogposts by Cormac that are also listed in the link section below. In this post I’d like to focus more on the topic of state information and how this is handled in VIC 0.4.

First of all, it is important to understand the difference between VCHs in VIC in comparison to other (in this case linux-based) container solutions. While each container in a N:1 model (containers:linux) has its private namespace, the underlying shared kernel provides the container control plane to look into containers and perform process-related actions (start, stop, …). Runtime environment and control plane are directly coupled.

In VIC, the runtime/execution environment of the container is a so called containerVM (based on PhotonOS) which is decoupled from it’s “control plane”, the Virtual Container Host itself. This creates a new layer of abstraction where communication flow but also state information needs to be captured and made available.

To establish a secure communications path between these two components, VIC also introduces the concept of a Tether to connect into the actual containerVM. This concept is part of the Port Layer Abstractions that allows VIC to be extensible. More details are described on the VIC Container Abstractions documentation page.

Let me share a summary of how the VCH and containerVMs actually look like on the infrastructure – and where information on state is actually stored. At first, let me go into the VMX file of the VCH. As expected, there are two vNICs attached:

ethernet0.virtualDev = "vmxnet3"
ethernet0.networkName = "VM Network"
ethernet0.pciSlotNumber = "192"
ethernet0.uptCompatibility = "TRUE"
ethernet0.present = "TRUE"
ethernet1.virtualDev = "vmxnet3"
ethernet1.networkName = "VCH Bridge"
ethernet1.pciSlotNumber = "224"
ethernet1.uptCompatibility = "TRUE"
ethernet1.present = "TRUE"

Here, we also find the boot disk that got transferred with the deployment of the VCH:

ide0:0.deviceType = "cdrom-image"
ide0:0.fileName = "appliance.iso"
ide0:0.present = "TRUE"

The general approach for storing state information is described in the Configuration persistence mechanism overview documentation. According to this, VIC actually makes use of the vSphere extraConfig and guestinfo mechanisms to store relevant information. But where do extraConfig and guestinfo actually reside? In a normal vSphere VM, this information is stored in the VMX file of the VM (and remember, a container in VIC actually is a VM – the containerVM).

Starting a simple “hello-world” container should trigger the whole workflow that also creates a new VM. But let’s go through it step by step:

root@photonbox [ /workspace/vic ]# docker -H VCH_IP:2376 --tls run -it hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
a3ed95caeb02: Pull complete
c04b14da8d14: Pull complete
Digest: sha256:548e9719abe62684ac7f01eea38cb5b0cf467cfe67c58b83fe87ba96674a4cdd
Status: Downloaded newer image for library/hello-world:latest

Looking at the recently executed containers from my worker VM, we can see the following reference:

root@photonbox [ /workspace/vic ]# docker -H VCH_IP:2376 --tls ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2cf7f483bf6e hello-world "/hello" Less than a second ago Stopped jolly_panini

So our container ran as ID 2cf7f483bf6e. How does that containerVM actually look on our standalone ESXi host and even more interestingly, where does the information about the container (from docker ps -a) come from?

First of all, there is a newly created VM named 2cf7f483bf6e7f32daa53f51ca388d5fb153f78d3a74d313318099086638ad58 – just as expected. Looking at the VMX file, we’ll find a lot of session information that we already found in docker ps -a:

guestinfo./common/name = "jolly_panini"
guestinfo./sessions|2cf7f483bf6e7f32daa53f51ca388d5fb153f78d3a74d313318099086638ad58/common/name = "jolly_panini"
guestinfo./sessions|2cf7f483bf6e7f32daa53f51ca388d5fb153f78d3a74d313318099086638ad58/cmd/Path = "/hello"
guestinfo./repo = "hello-world"

The containerVM mounts the “bootstrap.iso” from the VCH001’s VM folder (that also got deployed via the vic-machine installer):

ide0:0.deviceType = "cdrom-image"
ide0:0.fileName = "/vmfs/volumes/d37f7a1b-0ab13c48/VCH001/bootstrap.iso"
ide0:0.present = "TRUE"

The containerVM also has a serial connection to the VCH (explanation):

serial0.allowGuestConnectionControl = "FALSE"
serial0.fileType = "network"
serial0.fileName = "tcp://VCH_IP:8080"
serial0.network.endPoint = "client"
serial0.yieldOnMsrRead = "TRUE"
serial0.present = "TRUE"
serial0.hardwareFlowControl = "TRUE"

The containerVM’s network adapter is connected on the “VCH Bridge” portgroup and therefore only talks to the VCH. This is where the container traffic is flowing, management and control plane traffic is going via serial0.

ethernet0.virtualDev = "vmxnet3"
ethernet0.networkName = "VCH Bridge"
ethernet0.pciSlotNumber = "192"
ethernet0.uptCompatibility = "TRUE"
ethernet0.present = "TRUE"

The containerVM also has it’s own harddisk (attached VMDK):

scsi0.virtualDev = "pvscsi"
scsi0.present = "TRUE"
scsi0:0.deviceType = "scsi-hardDisk"
scsi0:0.fileName = "2cf7f483bf6e7f32daa53f51ca388d5fb153f78d3a74d313318099086638ad58.vmdk"
scsi0:0.present = "TRUE"

To delete the VCH and the containerVMs, vic-machine-linux is called with the “delete” option:

root@photonbox [ /workspace/vic ]# ./vic-machine-linux delete --target esxi01.think-v.com --user root --name VCH001
INFO[0000] Please enter ESX or vCenter password:
INFO[2016-08-06T20:50:24Z] ### Removing VCH ####
INFO[2016-08-06T20:50:28Z] Removing VMs
INFO[2016-08-06T20:50:33Z] Removing images
INFO[2016-08-06T20:50:34Z] Removing volumes
INFO[2016-08-06T20:50:36Z] Removing appliance VM network devices
INFO[2016-08-06T20:50:38Z] Bridge network was not created during VCH deployment, leaving it there
INFO[2016-08-06T20:50:40Z] Removing Resource Pool VCH001
INFO[2016-08-06T20:50:40Z] Completed successfully

 

In summary, all container state information is kept close to the containerVM, stored in the VMX file. VCH and containerVM use the ISO-files that are tranferred during the vic-machine install process. VIC also introduces a new level of abstraction between control plane and execution environment that allows VIC to be extensible for future usecases.

 

Additional links around VIC 0.4 – most of them by Cormac:

Reset to Standard vSwitch from Distributed vSwitch on homelab Intel NUC

I just had to reset my homelab Intel NUC’s ESXi 6.0 network configuration because I wanted to test a specific setting in vSphere Integrated Containers. Unfortunately, the Intel NUC only has one physical uplink and that uplink (and VMkernel Portgroup) was configured on a Distributed vSwitch – I needed it on a Standard vSwitch for the test. Migrating the VMkernel Portgroup from the Distributed to a Standard vSwitch was a little challenging and I didn’t want to set up an external monitor to use the Direct Console User Interface (DCUI). But with the help of William’s ESXi virtual appliance and some hints in the vSphere documentation, I was able to reproduce the necessary keyboard inputs and perform it with only a USB keyboard attached to the NUC. Instead of summarizing it only for myself, I though I’ll share it here as I couldn’t find similar instructions on google.

Please don’t do this in a production environment, blindly configuring a system isn’t a good idea.

tl;dr: the steps are: F2 – TAB – <root_password> – ENTER – DOWN – DOWN – DOWN – DOWN – ENTER – DOWN – ENTER – F11

 

What is actually going on if you could view DCUI? First, you need to use/press F2 (and potentially “fn” or similar) to get into ESXi’s DCUI system management:

Bildschirmfoto 2016-08-01 um 07.58.16

It will ask you to authenticate first (pressing TAB – <root_password> – ENTER):

Bildschirmfoto 2016-08-01 um 08.15.31

Then, you need to go to “Network Restore Options” in the System Customization menu (pressing DOWN – DOWN – DOWN – DOWN – ENTER):

Bildschirmfoto 2016-08-01 um 07.58.48

And in the “Network Restore Options”, you’ll have the option to “Restore Standard Switch” (pressing DOWN – ENTER – F11):

Bildschirmfoto 2016-08-01 um 07.59.11

After selecting “Standard Switch”, you’ll need to confirm a new dialog with “F11” and then a new vSwitch will be created on your host. Mine worked like a charm, I found a new Standard vSwitch with vmk0 using my “old” management IP address for ESXi.

Tech Links – CW19

Events

Open Votings

EMC World 2016

Data Center Virtualization

Software-Defined Storage

Automation & Orchestration

Operations Management

End-User Computing

Software-Defined Networking & Security

Network Functions Virtualization

  • Live 4G Mobile Applications Deployment Demonstration on the VMware vCloud NFV Platform at MWC – https://t.co/0o1gMpck3G
  • Comment: “On the multi-dimensional evolution of platforms and applications” – https://t.co/VyDxNLyRWC

Cloud-Native Applications

Various VMware news

Websites, Blogs, Podcasts, Social Media

My VMworld 2016 session proposals (NFV, Containers/Docker, …)

VMworld session voting time has come and I just wanted to share my session proposals – your votes would be appreciated:

On the multi-dimensional evolution of platforms and applications

I’d like to touch on a topic that I am seeing in several of my areas of interest right now. In general, it’s related a lot to the overall topic of “First, Second and Third Platform” but I’d like to focus more on the individual implications for multiple domains. Over the course of the last few months, I have been involved in several discussions around different platforms and applications as well as their individual evolution and maturity. My personal observation is that both don’t necessarily evolve synchronously. Therefore, it is important to not only identify the phase that you are currently in but also to understand the operational implications of the “generation disconnect” between app and platform.

 

Evolution of Platforms

As mentioned above, I’d like to relate my observations to the three platform generations below. I’d like to point out that these three generations are subdivided in several different technologies and can look and feel different in specific use-cases or fields of application. The common sense around the generations is:

platform_evolution

 

In addition to these phases, I see different “implementations” of the respective platform generation. Take Client-Server as one example – this can be a physical-server-only model, this also stretches to server virtualization and potentially even to “VM-oriented” hosting or even cloud services. My friend Massimo also wrote a nice piece on this.

 

Evolution of Applications

One of my key observations is that there is no simple 1:1 connection between applications and platforms. With the rise of 2nd generation platforms, not all applications from the 1st platform have been dropped and immediately been available for the next-generation platform. It’s actually an evolution for applications that are still business-relevant and therefore make sense to be optimized for the next-generation platform. And here comes the important observation: I believe there are (at least) three phases in an application evolution cycle that is happening for each platform generation – or potentially even in each concrete implementation of the platform generation. I’ll call theses phases “Unchanged”, “Optimized” and “Purpose-built” for now:

application_evolution

But how does that fit in the overall platform picture? I’ll try to merge the previous two pictures into one. It also shows a potential application evolution path across platform generations. As you can see, there can be a slight overlap between “purpose-built” of the previous and the “unchanged” phase of the next-generation platform.

evolution

But let’s move on to two concrete examples that I see applicable.

 

Example 1: Network Functions Virtualization

I’ll start with Network Functions Virtualization (NFV). NFV is a Telco industry movement that is supposed to provide hardware independence, new ways of agility, reduced time to market for carrier applications & services, cost reduction and much more – basically, it’s about the delivery of promises of Cloud Computing (and Third Platform) for the Telco industry (read more about it here). The famous architectural overview as described by ETSI can be seen below:

ETSI NFV

NFV differentiates between several functional components such as the actual platform (NFVI = Network Functions Virtualization Infrastructure), the application (VNF = Virtual Network Function), the application manager (VNF manager) and e.g. the orchestration engine.

So how could this look in reality? Let’s assume the VNF manager detects a certain usage pattern of its VNF and that VNF is reaching it’s potential maximum scale for the currently deployed amount of VNF instances. The VNF manager then talks to the Orchestrator that could then trigger e.g. the scale-out of the VNF by deploying additional worker instances on the underlying infrastructure/platform resources. The worker instances of the VNF could then automatically be included in the load distribution and have instant integration into necessary backend services where applicable. All of that happens via open APIs and standardized interfaces – it looks and feels a lot like a typical implementation example for a “third platform” including the “purpose-built” app.

Now into a quick reality check. ETSI’s initial NFV whitepaper is from October 2012. It basically describes the destination that the industry is aiming for. And while there might be some examples where VNFs, NFVI and Orchestration are already working hand in hand, there is still a lot of work to do. Some of the “NFV applications” (or VNFs) might have been just „P2V“’ed (1:1 physical to virtual conversion) onto a virtualization platform and basically have the same configuration, same identity and are kept as close to its physical origins as possible. This allows a VNF provider/vendor to keep existing support procedures and organizations while offering their customers a “NFV 1.0 product” that is providing some early benefits of NFV (hardware independence, faster time to market, …). But this also implies that you transfer some of the configurations that made perfect sense in the physical world over to the virtual world – where it only makes questionable sense. In this case, I’d actually talk about a move from a “purpose-built” app from the first platform to an “unchanged” app on the second platform. 

One example: one physical server in a telco application had 30*300GB harddisks, had 2*4Core CPUs and 128GB RAM. It never used more than 1TB of storage and average utilization has been below 4 CPUs and 32GB RAM. The “unchanged” version of this app would be a 1:1 conversion with all (unnecessary) resource overhead provided in a virtual machine. The “optimized” version of this app is a right-sized application (so only 1TB storage, 4 CPUs and 32GB RAM) that is also leveraging easy configuration files for installation as well as crash-consistent and persistent data management to allow backup & restore as VM. But a “purpose-built” version of that app would leverage the underlying NFVI APIs, would allow scale-out deployment options based on actual demand as well as optimizations that are e.g. encryption at every layer of the application to ensure global deployment models even in the face of lawful interception relevance, etc.

 

Example 2: Microservices, Containers & Docker

My next example are microservices and their close friends containers. They are promising a new generation of application architecture and are drivers for the “3rd platform” architecture. One of this movements famous poster-childs is Docker. Docker is a great (new) way to package and distribute applications in “containers” that contain applications or just pieces of a larger application architecture. Newly developed applications usually follow a scale-out design, some might be written with something like the “12 factor app” manifesto in mind (or the 15 factors according to Pivotal). Coming back to the pictures above: a 12 factor app could be considered “purpose-built” for the “third platform”.

But how many applications have been built for this? There are many great examples for microservices-oriented applications by the “cloud-native” companies such as Google, Amazon, Facebook and the likes. Adrian Cockcroft also gives inspirational talks about these topics around the globe. But I actually expect many applications to stay mainly unchanged as they are optimized for their current platform. At the same time, some of them might become available as (Docker) containers as part their next release. But again – if you look into the details, you’ll find the same application in a different wrapper. RAR is now ZIP (for my German readers: “Aus Raider wird nun Twix…”). But will these potentially “single-container-applications” run well on a Cloud-Native/third platform architecture? They might not! To put it in a picture:

So in this case, it is actually important to understand these application limitations and expectations towards the platform (what about data persistence, security, platform resilience, networking, …) to make sure it runs smoothly in production. Coming back to Massimo’s blogpost – you can run your old Windows NT4 on a Public Cloud, but does it make sense?

Summary

Just like the continuous evolution of platforms that expose new characteristics and capabilities, there is also an ongoing evolution of applications. It is important to understand the key aspects of the application architecture and it’s deployment model before making a platform decision. The word “VNF” does not necessary imply the alignment with NFV and the word “Docker” does not automatically describe a Cloud-Native or microservices-oriented application.

 

Edits:

18.05.2016: added picture (containerizing legacy applications)