Sharing vCloud Director’s -Snapshot functionality

Sharing vCloud Director’s -Snapshot functionality. These slide deck is based on vCAT3.1. It is my preparation part for VCAP-CID. Please refer version and date. It is there to reflect if there any changes and when those changes were done.

Initial Sizing for vCloud Consumer Resources

In previous post I discussed various sizing approach and how it gets influenced using allocation models. Let me continue from the same post. As we already have decided on allocation models and how much each Organization vDC will use and in what way. lets move further.

image_thumb131

Previous post missed on one very important aspect. It is a relation between Catalog and VM offering. From above table it is not clear how many VM’s and which sizes are available for the particular BU. It leaves lot of room for consumer to do this calculation. To address this problem you can provide number of small, medium, large size you can create from this, and then assist them fixing the VM template accordingly.There is screen (shown below) in vCloud director which shows this status but it is only vCloud Administrator (only user who can create Organization vDC), who can see it. It is of little use for Organization Administrator

image

So how do we solve/help organization in deciding VM’s and its sizing they can offer or get most out of it. Let’s try to answer the below question

“How many VM in HR BU we can fit and of which Sizes”

I have created a tiny excel sheet with simple formula. Below part have Virtual Machine catalog defined. Here goal is to fit VMs either using 256 GB RAM or 256 GHz. You can manipulate number of VM’s or you can change the VM catalog as per your fit to consume maximum of both compute section i.e. CPU and Memory

 

VM Sizing Sheet

 

In above figured I have populated Virtual Machine catalog in standard T-Shirt size naming convention. This is very simple excel sheet which is picking up vCPU and Memory defined in Virtual Machine Catalog table.

e.g. As highlighted in table above for M size, we have 2 vCPU and 4 GB RAM configured. So for 5 VMs we will need 10 vCPU and 20 GB RAM. Similarly for other VM size it follows the same formula. In the end you get total VM as 44 with total 128 vCPU and 256 GB RAM

To fit the compute (256 GHz or 256 GB RAM) you can change the VM catalog size and continue to do so once you each either of the limits are reached. In above table have reach 256 GB RAM limit. But you can very well try to reach CPU limit as well by manipulating VM size in the catalog.

I have uploaded the excel sheet on my google drive, will be available on request. I’m just doing this to know how popular it is getting Winking smile

Once you are done with this permutation and combination you would come up with some standard sizes. Just ensure you create catalog of only such sizes. This will allow you to control and monitor the capacity of organization.

So key take away is you can definitely create VMs of your own sizes to make most out of the capacity you are buying. I have provided excel sheet based formula which will make things bit easier.

Capacity Planning and Allocation Models in vCloud Director

Capacity planning for vCloud is extremely important. It is quite complicated if you follow me on this post especially if we are moving from vSphere stack. Let me explain how it is different than vSphere capacity planning and then I aim to explain the ways to make it simple. Capacity planning is the basic foundation of various service offering and QoS you as a service provider.

Let’s look at vSphere side first, sizing in vSphere at broad level can be put as

  • Find Total CPU, Memory capacity requirement
  • Size the host
  • Put some headroom for any peaks/abnormal utilization
  • Select number of host a cluster can tolerate
  • Size the cluster

    Pretty much we get good consolidation ratio and over-subscription as well. This 100% works, here you don’t have to worry about about how many VMs will be powered ON as CPU & Memory consumption is not always considered to do capacity planning. It is active memory, peaks and overall utilization. vSphere resource pools are hardly used for any CPU and memory for allocating limits and reservations. They are used for prioritizing workloads. But this is what changes here in vCloud. When organization vDC is created, Resources pool gets created (behind the scenes), they get reservations and limits set via allocation model and which in turn limits the total capacity available for new VMs/Resource Pools.

    vCloud Capacity Planning and Allocation Model

    Resource allocation model is selected while creating organization vDC. It cannot be changed once selected. So resource allocation model selection has significant impact on sizing on vCloud in terms what kind of VM sizing we can offer.

    Ok now, when you’ve created organization vDC, behind the scene resource pool is created in the vSphere cluster. This resource pool’s CPU and memory reservation and limit is selected based on your allocation model. The vSphere cluster where these resource pool’s are created is associated with Provider vDC. If you have been with me so far, you would have understood it is allocation model which further influences the capacity distribution of Provider vDC. It is like pie is created out of Provider vDC resource as you keep creating organization vDC and how this pie is to be consumed is influenced by Allocation model. So selection of allocation model has significant impact on vCloud offering.

    There are two ways you can do sizing of vCloud compute resource.

  • By standardizing offering
  • Driven by Customer Requirement (rarely suitable for cloud business model)

    Case:01 Standardized Offering

    In this case we go ahead and choose what would be service offering for our customer and can carve out Organization vDC accordingly. For discussion sake we choose only two allocation model for simplification.

    We select Pay as you Go allocation model and plan to reserve 50% of the Provider vDC capacity for it. Please let me remind you, we are planning. We are not carving out this into organization & organization vDC. As organization cannot be created until and unless we know who is the customer.We will only do it when we have a client and provision organization vDC with PAYG model only if he is asking for it. To explain in a simple way, we can offer services to the client based on what we have planned. “We have 50% of the capacity reserved for PAYG model, please tell us if you have use cases for PAYG and if yes, how much you wish to buy from it”. We can offer standard offering influenced by two parameters of PAYG as explained in previous post

    1. CPU Speed
    2. CPU resource Guaranteed
    3. And CPU and Memory quota parameters will serve how much percentage of 50% customer needs.

      Similar thing you can do for reservation pool model. It is complete block of resources (CPU, memory) dedicated to the customer. Here customer has full freedom how to carve out the block.

      Case:02 Customer’s custom requirement

      Customer comes to you and request you that they need 50% resources to be reserved for PAYG model as they have lot of transient workload. Remaining 50% should be allocated to dedicate business units as mentioned below

      1. HR (6.25%)
      2. Marketing (25 %)
      3. Sales (18.75%)

      Customer has freedom to use this resource the way he wants. He can internally standarize VM offering in terms of CPU, Speed. This standarization can be enforced by using catalogs. There is a relation between allocation model, catalog and VM sizing. This relationship I explain in next post.

      Below is the block level view of how the organization would look based on customer’s requirement

      CapacityPlanning

      Provider vDC is divided into 4 organization vDC as per customer’s requirement

      image

      Idea here is you can split Provider vDC into organization vDC.

      Conclusion

      1. Capacity planning is influenced by allocation model
      2. Allocation model further influences VM sizing
      3. It is strongly recommended to plan allocation model per Provider vDC. It will give good insight as how the resource will be consumed, how much resources will be available and also help you tackle over subscription
      4. If provider vDC is going to have all three allocation models, it is going to make thing bit complicated the way resources are going to be used and when the capacity will needs to replenished
      5. In my opinion plan in advance, simply do not start filling Provider vDC with organization vDC with any allocation model

      Availability considerations for Business Critical Application

      There are various high availability options available but which is best fit for your application is something we as Architect always have to make a decision.

      To  make this decision, you must understand business requirement.

      Case:01 Zero Downtime

      It means you cannot take a downtime during

      1. OS Upgrade and maintenance
      2. DB upgrade and maintenance
      3. Application Update and maintenance

      So we need to design for fail at OS, DB and Application and virtualization environment even for ESXi host level failure

      1. vSphere HA will protect against ESXi host failure
      2. OS & Application level failure will be protected using in guest clustering e.g. MSCS, Veritas Cluster, FT (only for application which can be scaled horizontally on single vCPU)
      3. Only application level failure can be achieved using vSphere HA for application protection introduced in 5.5.
      4. You can use Symantec Application HA.You can find the list of application which are supported here.

      007-Exclaim-RedNote:In some environments, Operations team may not want automatic restart of the application in the event of only an application error. Instead, immediate notification and manual intervention may be preferred to  determine the root cause of the problem.

       

      image

      Scenario: 01 Suppose vSphere HA event is triggered due to ESXi host failure, which in turn will fail OS, MSCS/Veritas cluster will detect OS is failed. It will move the application to the other node. During this failover services won’t be able. Time is generally within seconds but no data is lost neither user experiences any appreciable outage

      Scenario: 02 Suppose OS inside the VM fails, MSCS/Veritas detects it and fails over application to Another Node. During this failover services won’t be able. Time is generally within minutes but no data is lost neither user experiences any appreciable outage but it again clearly ruled by failover time. So to say it is zero down it not a right term

      Scenario: 03 Suppose application service restarts, MSCS/Veritas detects tries to re-start it, if re-start fails it moves the application to another Node.During this failover services won’t be able. Time is generally within minutes but no data is lost neither user experiences any appreciable outage but it again clearly ruled by failover time. So to say it is zero down it not a right term

      Scenario: 04 Suppose OS/Application needs a maintenance window, simple failover application using MSCS/Veritas to anther node.

      In scenario 02,03,04 discussed above, Downtime is time needed to failover services, it can vary from few seconds to few minutes. So here complete protection is done at Hardware, OS and application level. If any of the layers fails, it won’t impact end users. Biggest reason in vSphere people prefer in-guest agent to get rolling upgrades. We will discuss this in more detail below.

      Hosting Business Critical Application in Cloud

      In-guest clustering is very complicated to configure. This complexity increases further when you wish to host such application inside vCloud director.

      Support Challenge

      As of 5.1 (haven’t seen anything 5.5 yet)

      1. There is no support for clustering inside vCloud Director
      2. There is no support for RDM when using in vCloud Director

      So you can configure a cluster using vSphere but the moment vCloud director comes into picture we face technical limitations. Please note in vCloud director VM is created via vCloud portal not via vCenter

      Update: One of the experts in vCloud director actually contacted me and explained why vCD doesn’t support RDM and there is no plan for it. Getting RDM inside cloud breaks the principle of portability in case customer wish to move workloads between cloud.

      Licenses

      Business critical application which uses oracle for hosting their application bring another challenge with them. Oracle license policy is most inflexible. Hosting oracle database inside cloud means dedicating host to oracle. It is simply not going to meet the economies of scale. Though we can use VM-Host affinity rule to do so but this is not explicitly accepted or denied by Oracle. You need to read lot into the license agreement as mentioned by Michael Webster here

      Source

       

      Case of In-Guest Clustering (Why?)

      Rolling upgrade is the only use case for recommending in-guest cluster agent . During rolling application remains online i.e from application perspective zero down time. I have always asked my customers don’t we have schedule maintenance window? If answer is always Yes, then 9 out of 10 cases I have not recommended using in-guest cluster. As all plan upgrades/changes to the OS, Database can be done during this window.

      Over and above following points makes my cases further strong against in-guest cluster

      1. We can use Snapshot technology to do upgrade of OS or database which gives us roll back point.

      2. We have vSphere 5.0 improved HA functionality which smartly detects host is isolated or gone down.At the max VM comes up in 15 minutes (along with Applications what I refer as “Ready to Serve”) when HA event occurs. So just for 15 minutes downtime (extremely conservative estimate, refer here), I don’t like operations team to carry overhead of configuring in-guest clustering and bring complexity when it gets on virtualized platform.

      3. And how many times in a year application has failed and it needs monitoring? If failure rate is almost no, then again this makes case for no in-guest clustering.

      Conclusion

      The final design choice will be ruled by how much downtime a business is ready to tolerate, and the cost they are willing to invest in the extra resources and skills to install and operate software that provides application monitoring. It is a trade-off.

      Trade-off

      Reservation Pool Model–Behind the scenes

      This is the last allocation model we will be discussing in this post. Previous two resource allocation models were discussed are Pay-as-you-go and Allocation Pool model. Reservation pool model is perhaps simplest to understand and to implement it. That being said from resource allocation point of view very costly as well. Reservation pool as the name says reserves resources. These resources are reserved even if VM’s are in powered off state.

      image

      Reservations in this model are guaranteed and are set to 100%

       

       

       

       

      Let’s see what happens at resource pool level. An organization vDC is created with the following values

      • CPU allocation = 4 GHz
      • Memory allocation = 2 GB

      These are only two settings you need to configure for reservation pool as can be seen below.

      Reservation Pool allocation model and Resource Pool Settings

      Figure:01 Reservation Pool allocation model and Resource Pool Settings

      Right hand side of the image, there is a screen which shows resource pool setting which was created. We can see reservation of 4000 MHz and 2048 MB is applied and same values are used to set limits. So it is not only reservation but also limits are applied to the resource pool.

      It is like cutting pie from the available resource and making it 100% available to the Organization vDC up front. As this guaranteed, it is costly. Resources gets reserved irrespective if they are used or not.

      SNAGHTML1848b00c

       

      Comparison

      If we compare this model with other two model it has number of options missing. It makes it easier to configure.

      1. You don’t have option to choose vCPU speed right upfront
      2. You don’t have option to reserve % of resource. It is always 100%
      3. CPU and Memory both gets allocated up front and are charged by vCenter charge back manager based on this value
      4. Most important feature, consumer has the option to choose reservation per VM. This gives user complete freedom to prioritize resources for the workload.

      Per VM reservation options for Consumers

      Figure:02 Per VM reservation options for Consumers

      Conclusion

      1. Simple to understand and explain to the consumer
      2. 100% resources are guaranteed to organization vDC. In other words these resources are not available for other organization vDC to use.
      3. Consumers gets option to configure resources per VM basis provides same flexibility as vSphere Administrators gets

      Allocation Pool–Behind the scenes

      In the last post I discussed Pay-as-you-go model. Let’s discuss the next widely used model in vCloud director. Allocation Pool model is generally used for Production workloads. In this model you have ability not only to allocated resource but also guarantee some percentage of resources. Below is the example of customer requirement where he wants 50% resources to be guaranteed and leaving a room for a burst of 50% in case of unexpected peaks.

      SNAGHTMLd91345f

      Figure:01 – Allocation Pool model

      Above requirement can be translated into allocation pool model as shown below. Use CPU allocation and Memory allocation to define how much CPU and Memory you wish to allocate to this vDC. Below screen it is 4 GHZ and 2 GB RAM. These are hard limits, you can’t provision anything beyond this. It is worth noting here, vCenter Chargeback manager uses 4 GHZ and 2 GB RAM and will charge based on it.

      clip_image001

      Figure:02 – Allocation Pool Resource allocation model

      Below screen is example where in none of the VM’s are powered ON. So reservation is set to zero. However limit on resource pool for CPU is set.

      Resource Pool created by allocation pool model

      Figure:03 Resource Pool settings before powering ON VM

      It is worth noting there is NO limit set on memory resource in spite of configuring it in allocation screen shown above. In fact there is expandable reservation is selected. It means resources from parent resource can be pooled to meet the cap of 2 GB and 50% reservation i.e. 1 GB.

      Below is the screen of resource pool property when three VM’s are started.

      Resource Pool Setting after VM's are Powered ON

      Figure:04 Resource Pool settings After powering ON VM

      Referring figure:02 allocation model, below spread sheet can be used to tally the resource pool settings.

      Table explaining allocation pool calculation

      Table:01 VM Resource calculation

      Relationship between allocation pool model, resource pool and calculation in spread sheet is explained below. You can see all are matching and making complete sense.

      Resource Pool numbers matching with allocation pool

      Figure:05 Relationship between resource pool and allocation model

      Memory configured per VM

      Figure:06 Virtual machine configured memory size

      In table:01 you will notice configured RAM is 1536 MB RAM and 3000 MHZ CPU and it is still below the configured resource. Now lets try to exceed one of the limit. Let’s bump up configured memory of VM27 to 1.5 GB, then total will be 2.5 GB RAM (512+512+1536=2560 > than 2 GB RAM)

      VM27 memory is changed to 1536

      Figure:07 Configured memory size of VM27 changed to 1536 MB

      Error message invoked by Admission Control

      Figure:08 VM27 cannot be powered ON because of memory resources not available

      VM27 cannot be powered ON because memory configured is exceeded. Admission controls kicks in and stop VM27 from powering.

      Similarly we can prove that if you over allocate CPU more than configured even then admission control will be in effect. In below screen I have changed the vCPU count of VM27 to 3. Total vCPU becomes 5 here and Single vCPU speed we have restricted to 1000 MHZ, therefore total CPU demand changes to 5000 MHZ but we have allocated only 4000 MHZ. Admission control got kicked in and it didn’t allowed VM to power ON

      CPU Count increased

      Figure:09 CPU Count of VM27 changed to 3

      Error message invoked by Admission control

      Figure:10 VM27 cannot be powered ON because of CPU resources not available

      Conclusion

      1. In Allocation pool reservation/limits are set at resource pool level as shown in figure 03 and figure 04 . There are no VM level resource limits/reservation set as we saw in Pay-as-you-go model.
      2. Reservation is set at resource pool level, therefore resources within the pool will be consumed first come first service.
      3. Memory & CPU reservation are configured on the resource pool dynamically i.e. only when VM is powered ON as shown in figure 4. If other organization vDC is using more resources, then it is no guarantee that resources will be available to power ON VM’s. So underline availability of resource is must, not only to power ON VM but also guarantee reservation to VMs

      Pay-as-you-go Resource Allocation Model behind the scene

      When organization is created it won’t have any resources. We must assign resources to the organization. When you allocate resource to organization you end up creating organization vDC. These resources comes from Provider vDC as explained in earlier blog. We need resource allocation model to ensure right kind of service could be delivered to the consumers/BU. Only CPU and Memory are part of resource allocation model.

      vCloud provides three different types of resource allocation model

      1. Pay-as-you-go

      Pay-as-you-go model is primarily used for varying workload and it is most suitable for developer and QA workload. This model is unique in many ways. In this model resources are applied per VM basis. It means memory and CPU limits and reservations are applied per VM basis and not at resource pool level. Resources are allocated only when workload are powered ON. This model assumes there is infinite pool of resource. As cloud administrator you have to ensure this assumption is not taken seriously by organization administrator. You can control it using various options. These are CPU, Memory quota and maximum number of VMs that can be started. You can use either of parameter to control resource usage. Over and above there is way you can also provide some kind of guarantee (reservation) on CPU and Memory when VM is powered ON.

      In below screen organization vDC is created with the following values

      1. CPU quota = 6 GHz (read this as a limit)
      2. CPU Resources Guaranteed = 50%
      3. vCPU Speed = 1 GHz (read this as a limit per vCPU at VM level)
      4. Memory quota = 3.33 GB ((read this as a limit))
      5. Memory Resources Guaranteed=50%

      image

      lets create three VM’s each with 2 vCPU and 512 GB memory. Below table explains what will be VM level memory and limit configuration

       image

      From the above table it can be read that Total CPU configured per VM will be 2000 GHz (see below –it is configured as limit) and 50% of total resources are guaranteed (see below out 2000 MHZ, 1000 MHZ are reserved).

      image

      One thing you must note 6 GHZ and 3.33 GB RAM is cumulative limit. It means you can have 1 VM of 6 GHZ and 3.33 GB RAM or you can have any number of VMs as long 6 GHZ and 3.33 GB RAM resources are not exceeded.

      To prove it, I have configured two VM with 3 vCPU which amounts to 6 GHZ. I Left the 3rd VM un touched to 2 vCPU. Since we have allocated only 6 GHZ it cannot powered on 3rd VM.

      image

       

       

      image

      Similar fact can be proved for memory reservation as well

      Conclusion

      Quota you configure in this allocation model is the maximum a VM or VM’s in this Organization vDC can use. CPU and Memory quota is the another way to limit the resource allocated in Pay-as-you-go model.If you do not configure any quota then there is high risk that provider vDC’s entire resource might get consumed without being warned. So please use these controls always.

      Reservations and limits are set at the VM level and not at the resource pool level.

      Another point one should note, it is not the consumption of resources by VM but allocation of resource to the VM are considered by admission control before powering on additional VM. E.g :- If you allocate 6 GHZ to this organization vDC and you configure 1 VM with 6 vCPU (assuming vCPU speed =1 GHz), you cannot power ON another VM as 6 GHZ limit is reached. However it not concerned about how much out of 6 GHz this VM is actually using. This is monitoring consumer must do and make a decision to resize it. Of course provider can provide these utilization numbers if requested.

      If you look at it, it is different kind of admission control compared with the one we have learnt from vSphere HA cluster. Therefore design Pay-as-go model accordingly.

      In next blog post I will be discussing the allocation pool model.

      Organization, Organization vDC and Provider vDC

      In vCloud director organization is the authentication and security boundary. Authentication boundary can be controlled using LDAP. LDAP could be internal or could be external. Security comes using Role Based Access Control (RBAC) in built in vCloud director. RBAC model can used to control who can manage organization i.e. organization Admin and who can deploy vApps.

      In another analogy organization can be seen as business unit if you are considering it as private cloud and complete new company if it is public cloud.  In below example you can think HR, Marketing and Sales as business Unit. They represent Organization in vCloud director. Here authentication boundary will be e.g. using AD and security will be controlled assigning role for the groups created groups in AD for each organization

      Example HRUsers, HRAdmins, HRvApp Onwner will be groups in AD and then these groups can be provided appropriate permissions to the Role defined/customized in vCloud director.

       

      image

      Provider vDC

      Provider vDC is IT organization if we look at it from private cloud perspective. But it can be provider(VMware, Savvis, CSCS) if we look them from public cloud perspective. Provider vDC provides resource to organization. These resources are Computer (vSphere host), Storage (SDRS) and Network.

      Organization vDC

      Creating organization is first step but organization is simply a boundary. We need to populate resources in to the organization which can be consumed by the organization or business unit. These resources are carved out from Provider vDC. This carving of resource ends up in creating organization vDC. Organization can have more than one Organization vDC. You might need more than one organization vDC if you have different environments within organization. example –> HR might have development workload, Production workload, QA workload. Based on the requirements you can have more than one vDC. These organization vDC can be separated/secured from each other using various networking tricks available in vCloud. However you must understand to create more than one vDC, more than one Provider vDC must be available. Following diagrams depicts Provider vDC has three different clusters GOLD,SILVER and BRONZE and these are carved into Organization vDC according to its needs

      image

      More detailed explanation will follow in up coming blogs about what is GOLD, BRONZE and SILVER and how they reflect in Organization vDC.

      Pod like Architecture for vCloud Director

      It is strongly recommended by VMware to create a pod based architecture for any cloud based architecture. Pod helps you scale your design horizontally without impacting any core functionality of the architecture. Two major components forms the part of POD. On left hand side of the image below there is management pod and on right hand side there is resource pod. Management pod consists of

      1. vCenter
      2. Orchestrator
      3. vCloud Director Cell
      4. vCNS Manager (vShield Manager)
      5. Active Directory
      6. DNS
      7. vSphere Update Manager
      8. NTP Server
      9. Database for vCenter, vCloud Director, VUM database, Orchestrator, vCenter Charge back
      10. vCenter Chargeback manager
      11. vCloud Connector (for Hybrid solution)
      12. vCAC for automation manager

       

      POD Architecture for vCloud Director

       

      On the right hand side we have resource Pod. Resource Pod is the one which provides resources to vCloud director. These resources are generally formed in the form of cluster. Cluster can provide you storage, vMotion boundary. In above figure resource pod is further divided into three clusters i.e. GOLD, SILVER and BRONZE. These clusters depicts the various tiers available. Though I have not broadly defined how these tiers differ from each other. These tiers forms the basis of provider virtual datacenter (vDC). So each tier actually translates into Provider vDC as far as vCloud director is concerned. i.e. vCloud director has now three different tiers of services. As your customer grows you can grow scale these tiers horizontally. Scaling of these tier is limited by underline vSphere limitations. e.g. maximum number of nodes per cluster you can have is 32. As you reach this number you would need to add another cluster.

      Without management pod in place DR of resource workload becomes extremly complicated.

      Management Pod manages resource pod i.e. vCenter which manages these cluster will be in management POD and similarly vCloud director cell will be in the management tier.