VMware vSphere 6.0 is the platform businesses depend on to deploy, manage and run their virtualized Windows and Linux workloads.
In this course you will learn how to connect to and use shared SAN Storage including Fibre and iSCSI storage, to Format and Expand VMFS volumes, how to Resource Tune Virtual Machines, how to create and tune Resource Pools, and how to perform cold, VMotion and Storage VMotion migrations.
Learn Storage, VMFS, Resource Management and VM Migrations
This course covers four major topics that all vSphere 6 vCenter administrators must know:
The skills you will acquire in this course will help make you a more effective vSphere 6 administrator.
SANs or Storage Area Networks are specialized shared storage devices that usually include specialized share storage networks. The idea behind a SAN is to centralize the provisioning, access and management and back up of storage resources. Furthermore, SANs simplify storage tasks that can be difficult or impossible with local, fixed storage resources (e.g.: local SCSI/SAS RAID cards and disks), such as:
Redundant Pathways. SANs can be designed for multiple storage pathways. By using two or more pathways through a storage network to a storage device, you gain the benefit of performance (I/Os can complete on either path) and redundancy (if one path fails, I/Os can be retried on a surviving path)
Improved Performance. SAN usually contain multiple interface processors (Storage Processors), and include powerful on board CPUs and memory to reduce I/O latency and minimize RAID overhead.
Capacity Management. SANs usually allow administrators to grow SAN volumes dynamically (by allocating unused physical storage to an existing RAID set). In this way, storage administrators can expand full volumes without having to provision more storage or copy data from one SAN volume to the next.
Snapshotting. Most SANs support volume snapshotting. A volume snapshot is a moment in time copy of a SAN volume that can be backed up to near line or off line storage.
Shadowing. High end SANs support volume shadowing. This feature lets SAN administrators replicate I/Os on a local volume over to a volume on a remote SAN. This feature helps with disaster recovery in that the remote SAN always has an up to the minute accurate copy of critical production SAN volumes.
SANs solve many problems associated with local PC server storage including:
Capacity. Physical servers have limits on the number and size of physical disks that can be connected to the server.
Over provisioning. Often, PC servers are over provisioned with storage at deployment time because it can be so difficult to expand local RAID volumes later on. Usually, this storage goes unused resulting in a huge waste due to excess capacity. This problem is further exacerbated by the fact that local RAID cards often do not let you easily expand RAID arrays onto new storage volumes.
Cost. Purchasing high end RAID cards and enterprise class disks for each PC server can add up quickly to a sizable investment in storage. Often trade offs taken to reduce costs result in lower performance or redundancy than is required or desired.
Back Ups. Local disk image back ups are a good defense against catastrophic data loss. But it can be challenging to perform image back ups of PC server RAID sets. SANs usually provide LUN snapshot capabilities so that image backups of SAN LUNs can be performed at any time.
Shadowing. SAN shadowing is the replication of updates to a LUN on a production SAN to a corresponding LUN on a Disaster Recovery SAN. By replicating all I/Os on the DR SAN you are protected from data loss due to a facility or server failure. Local RAID cards normally do not provide volume shadowing capability.
Fibre SAN deployments include a shared Storage Appliance (the SAN), a Storage Network and Fibre Host Bus Adapters (HBAs). Most components in a Fibre SAN can be duplicated (HBAs, SAN Switches, SAN Storage Processors). Duplication provides two benefits:
Redundancy. If one component fails, the ESXi host can find an alternative surviving path through which it can complete I/Os
Performance. If all paths are healthy, ESXi can use different paths to different LUNs to distribute the overall I/O load. This reduces contention and results in overall greater performance.
SANs include management tools that let SAN administrators create uniquely numbered SAN LUNs (addressable storage volumes) from RAID sets of physical drives. RAID sets can be created for both capacity and storage efficiency purposes.
Modern SANs support SATA, Serial Attached SCSI (SAS) and solid state (SSD) drives. SATA drives are used to create SAN LUNs that offer high storage density with reasonable performance at low cost. Enterprise SAS drives are less storage dense but perform 3-5x faster. These devices are used to create SAN LUNs for workloads that demand the highest overall performance.
Don't discount SATA storage. Seagate and Western Digital both make 'Enterprise' SATA drives with 5 year warranties. SATA drives are relatively slow (7,200 rpm, 8-12ms average seek) compared to 15,000 rpm enterprise SAS drives. But they are cheap and can be provisioned in large numbers at low cost – making it possible to create highly redundant RAID sets across many spindles. This divide-and-conquer approach often yields as good as or better performance per dollar than small RAID sets of 15k rpm disks.
SANs also include hot-spare capability. Usually SANs are provisioned with one or more drives that are not (immediately) put into service. If an active drive fails, the SAN will remove the failed drive from an active SAN LUN and replace it with an available hot-spare. This minimizes the total time the SAN LUN spends in a non-redundant state.
Every visible node in a Fibre SAN deployment is addressed by a unique hardware address called the World Wide Name (WWN). WWNs are 8 byte addresses made up of a 4 byte Vendor ID (first 4 bytes) followed by 4 additional bytes that uniquely identify the device. All components in a Fibre SAN array have WWNs including Fibre HBAs, Fibre Switches, Storage Processors and SAN LUNs. When new SAN LUNs are created the SAN management tool will assign a unique WWN to the LUN so it can be distinguished from other LUNs on the SAN.
Storage traffic on the SAN is delivered by WWN (source and destination WWN). SAN administrators can use Zoning to restrict which nodes can exchange traffic. Zoning rules specify pairs of WWNs that are allowed to exchange data. Normally a SAN administrator would set up zoning rules that specify which Fibre HBAs can talk to Storage Processors (to defend against ad-hoc deployment of new servers). Administrators would also create LUN visibility rules on the SAN that specify, on a LUN by LUN basis, which Fibre HBAs can see which LUNs (by associating the WWN of the Fibre HBA with the WWN of the LUN). When an ESXi server scans the SAN for storage volumes, the SAN consults its visibility rules and exposes only authorized LUNs to the ESXi host.
There are four different types of network isolation available in Fibre Networks – Port Zoning, WWN Zoning, Soft Zoning and Hard Zoning. The purpose of all Zoning is to create/enforce device access control lists (to prevent unauthorized devices access).
Different Zoning strategies are implemented by different vendors; so you should consult with your Fibre Switch and Fibre SAN configuration guides to find out which Zoning strategies your hardware supports.
Each Zoning strategy has pros and cons relating to ease of configuration, ease of modification and level of device isolation/protection offered. Generally, WWN and Hard Zoning are the most secure, followed by Port Zoning and finally Soft Zoning (which is viewed as very insecure).
ESXi uses hardware addresses to uniquely identify a SAN LUN. Hardware addresses are constructed as follows:
vmhba#- a generic name for a storage controller followed by the storage
controller's unique number
C #- Channel number – usually 0. On some SAN's, it is the Storage Processor #
T #- the Storage Processor/Target number used to deliver I/Os to the SAN
LUN #- the target SAN LUN to receive I/O requests
An example of a complete hardware path would be vmhba1:C0:T1:L2 which references Fibre Controller 1, Channel 0, Storage Processor/Target 1, LUN 2.
ESXi maps vmhba# to specific device drivers for storage controllers. That way, administrators do not need to know anything about the make or model of storage controller in use.
Different SANs identify LUNs in different ways. Many SANs assign a unique LUN number to each LUN, while others assign all LUNs 0 and differentiate LUNs by their Target number.
There is no single standard for this. You need to check your SAN maker to see what they do (The ESXLab remote access lab uses a SAN that identifies LUNs using different Target numbers).
When an ESXi server boots, it scans its PCI bus for storage controllers. Storage controllers are assigned vmhba# numbers in the order they are found during a bus scan.
If ESXi finds a fibre controller, it instructs the fibre controller to scan its storage bus for Storage Processors (SP). Then, for each SP, ESXi scans the HBA/SP pair for visible LUNs. ESXi enters each visible LUN along with the LUNs WWN into a storage volume roster for that HBA. This way, when ESXi finds the same WWN for a storage volume through multiple paths, it knows that the additional paths are alternative paths to the same SAN LUN.
ESXi can use a maximum of 256 SAN LUNs per SAN. LUNs do not need to be sequentially numbered as ESXi scans all possible LUN numbers (range of 0-255) and records all LUNs it finds. You can click the Rescan link in the Storage Adapters view at any time to scan for new LUNs. Rescans are safe to perform while ESXi is actively running VMs. They complete quickly and with no risk to ESXi or running VMs. This way, the SAN administrator can provision new storage (new SAN LUNs) at any time. ESXi administrators can scan for new LUNs, partition them, format them VMFS and put them into service while the ESXi host is active.
iSCSI's major advantage is that it uses commodity Ethernet networking rather than specialized (i.e.: expensive) fibre networking. iSCSI networks can be simple flat LAN segments, vLANed isolated segments, redundant segments (e.g.: HBA 1 and SP 1 on one switch, HBA 2 and SP 2 on a second switch with the switches uplinked) or on separate routed LAN segments. Leveraging TCP/IP's inherent reliability and routing makes it easy to provision a reliable iSCSI network at reasonable cost.
Gb Ethernet has a theoretical maximum of 128MB/second (more with bonding). However, with latency and protocol overhead it is unlikely that you will ever achieve this speed on a single copper link. Speeds of 70MB/s to 90MB/s are attainable assuming that the storage array can keep up.
10Gb Ethernet is fully supported by ESXi 6.0 as is 40Gb Ethernet using high end Mellanox 40Gb Ethernet controllers.
With bonding and multiple Storage Processors it may be possible to delivers 2x-4x the speed of a single path (about 128MB/sec of peak throughput on a 1Gb Ethernet path and about 700-800MB/s on a 10Gb Ethernet path). This is not in the same league as Fibre SANs (which signals at 4, 8 or 16Gb/sec) but should be sufficient for light to medium storage I/O workloads.
ESXi can boot from LUNs on an iSCSI SAN only if it is configured with a iSCSI hardware initiator and that controller has been configured as a boot controller with an assigned boot LUN.
iSCSI SANS are completely suitable for production use. You can purchase iSCSI SANs that support:
- Active/active multipathing
- Multiple 10Gb Ethernet targets
- LUN thin provisioning
- LUN replication
- LUN compression
- LUN access controls
- High capacity and performance scalability
- Hot swap most hardware components with zero down time
The example above is more oriented toward non-enterprise production uses of iSCSI. The point is, that while iSCSI can offer nearly all of the features of enterprise Fibre SAN solutions, you can buy down market solutions that trade off features, scalability and redundancy for a much lower cost of entry than Fibre can hit.
In the diagram above, the Ethernet, TCI/IP network is drawn as a cloud because the network could be any of:
- A simple flat segment (e.g.: a single switch)
- A more complex flat segment (e.g.: two switches stacked or up linked).
This configuration gives you some protection from a switch failure.
- A routed network of two or more segments. This configuration also
provides protection from a switch failure.
For best performance it is recommended that the ESXi boxes and the iSCSI SAN reside on the same local segment (so as to avoid congestion and latency at a router). Also, iSCSI traffic should be on an isolated segment so that disk I/Os over iSCSI do not have to compete with other network traffic.
iSCSI traffic is not encrypted so anyone connecting to the iSCSI storage LAN segment could sniff traffic and potentially capture data. For this reason, it is best to isolate iSCSI traffic on a private physical or virtual LAN segment
Network redundancy improves network reliability and performance. You have three different options:
Your best redundancy option may be to combine one or more of the above strategies. For instance, Stacking two Ethernet switches combined with NIC Teaming would give you a very high degree of redundancy without the added complexity of separate, routed LAN segments.
iSCSI uses a qualifying naming scheme that differs from standard fully qualified names. iSCSI Qualified Names (iQNs) use this format:
iqn- must be present. Stands for iSCSI Qualified Name
yyyy-mm- the year and month the vendor's domain name was registered
.com.vendor – the vendor's domain name in reverse
alias- the local alias assigned to the node
This example indicates that the host is a VMware host and that the domain vmware.com was registered in January, 1998.
Your ESXi host must know which LUNs are available to it on your iSCSI SAN. There are two ways to update the ESXi storage roster for iSCSI based volumes.
Static configuration allows ESXi administrators to type in the LUN properties directly into the ESXi host, essentially transposing information from the SAN configuration display to the ESXi host. This approach should be avoided because there is no way to dynamically update the ESXi host whenever the iSCSI SAN's configuration changes.
Send Targets is a special request built into the iSCSI protocol. The ESXi host can issue a Send Targets request to the iSCSI SAN at any time. The SAN responds by reviewing the LUNs visible to the requesting ESXi host and then returning a list of visible LUNs and their properties. The ESXi host would then use this information to populate its roster of available LUNs for the iSCSI controller.
ESXi can use a limited number of iSCSI hardware initiators (iSCSI controller cards).
An iSCSI hardware initiator appears to the ESXi host as a storage controller. It includes an on board CPU, memory and firmware that implements the TCP/IP protocol (for network traffic) and the iSCSI Storage Initiator stack (to act as a storage controller). At the back end, the iSCSI hardware initiator provides one or two RJ45 jacks for connectivity to an Ethernet LAN segment. ISCSI hardware initiators have the following advantages:
- All network and iSCSI Initiator overhead is off loaded onto the card
- The card may implement Jumbo Frames (up to 9,000 byte packet payloads).
Jumbo Frames are an enhancement to Ethernet that allow a single packet to carry much larger payloads thereby eliminating a lot of TCP/IP protocol overhead. Traditionally, Ethernet frames have a maximum transfer unit (maximum payload size) of 1,500 bytes. This is insufficient for block oriented traffic where 3 frames (2 x 1500 bytes and 1 x 1,096 bytes) would be needed to carry an 4k disk block. With Jumbo Frames, the entire disk block could be transferred in one packet resulting in reduced protocol overhead (as only one packet needs to be sent/acknowledged vs 3).
Note: Modern operating systems (Windows Vista, Windows Server 2008, RedHat Enterprise Linux 5.4+ and 6.0+, etc.) do I/Os in 4k (4096 byte) blocks. Jumbo frame enabled Ethernet storage networks could easily carry the disk I/O request in a single frame.
ESXi also supports iSCSI Software Initiators. The VMkernel supports iSCSI Software Initiator through a dynamically loadable VMkernel driver module that implements the iSCSI Software Initiator stack and the TCP/IP stack within the VMkernel. All that's needed to complete the picture is a VMkernel Port on a Virtual Switch (connected to the same LAN segment as the iSCSI SAN). With iSCSI Software Initiators, the ESXi host can act as an iSCSI client without the need to invest in expensive iSCSI controller card(s).
iSCSI Software Initiator stacks look to the VMkernel like an iSCSI controller is present. The ESXi host CPU runs the iSCSI Software Initiator stack which places modest overhead on the ESXi host. iSCSI I/Os flow through a VMkernel port to a physical NIC.
You are now ready to edit the iSCSI Software Initiator's properties. The Initiator is disabled by default. To turn it on, click the Add... link and then click OK. This instructs the VMkernel to load the Software iSCSI Initiator stack. It will take a few seconds before this step completes.
Next, we must configure Dynamic (LUN) Discovery. Click the tab and then click Add. When the Add Send Targets Server window pops up, enter in the IP address and port number of the first Storage Processor on the iSCSI SAN, and click OK. If your iSCSI SAN has multiple Storage Processors, repeat this step for each additional Storage Processor.
Note the CHAP button in the pop up window. Click this button if you need to enter CHAP authentication information for this Storage Processor. ESXi 5 has the ability to keep separate CHAP information for each Storage Processor.
Be very careful that you get the IP address and Port number correct. If you make a mistake, you can remove the incorrect entry but the remote won't take effect until you reboot the ESXi server.
CHAP authentication can be employed to ensure that only authorized ESXi hosts can access storage on your iSCSI SAN. CHAP support has been greatly improved since ESXi 3.5. ESXi 4 can now do:
1-way CHAP – where ESXi authenticates to the iSCSI SAN
2-way CHAP – where ESXi authenticates to the SAN and then the SAN authenticates
back to ESXi
iSCSI uses Challenge Handshake Authentication Protocol (CHAP) whenever authentication is required. CHAP is an authentication protocol that was popular during the MS Windows RAS (Remote Access Services) days. CHAP is a simple shared secret (password) protocol where the ESXi client and the iSCSI SAN both have the same user name and password account information. CHAP is simple, low overhead and does not expose any sensitive information. This is critical because CHAP does not use encryption.
Step 1 – ESXi sends a log in request with the pre-assigned login ID
Step 2 – The SAN looks up the login id and password. It generates a large,
one time Hash (H) code and sends the code to the ESXi host
Step 3 – The ESXi host uses the stored Password (PW) and the one time hash
(H) in a mangling algorithm to produce a one-time result (R)
Step 4 – The SAN uses the password on file (PW), the same one-time hash (H)
and the same algorithm to produce a one-time result (R)
Step 5 – The ESXi host transmits its one-time result (R) to the SAN
Step 6 – The SAN compares its local R with the ESXi R. If they match, the ESXi
host is authenticated and the SAN will handle it's I/Os
Anyone sniffing the LAN gets the Hash code (H) and the Result (R). However, there is no known way to derive the password (PW) from these two values other than a dictionary attack – and that would take years (if you have a well chosen password).
Once you have completed the CHAP Authentication tab, you have completed your iSCSI Software Initiator configuration chores. The next step is to scan your iSCSI SAN for available LUNs (assuming of course that your SAN administrator has already created some LUNs for your use).
To do this, go to the Storage Adapters view, (Configuration > Storage Adapters), click the iSCSI Software Adapter (usually vmhba33) and then click the Rescan... link. The above dialog will pop up and ask you if you want to scan for new empty LUNs, LUNs already partitioned and formatted VMFS or both. Normally you would just click OK so you can discover both types of LUNs.
It can take 5-60+ seconds for the SCAN to complete. It can take a further 30 seconds for the LUN roster to populate with the list off newly discovered LUNs
In the screen grab (above) the iSCSI scan has completed an a new storage volumes have been discovered (In our case, the SAN identifies LUNs by unique Target numbers 0, 1, 2, 3...).
The Storage view displays a roster of all usable storage volumes available to the ESXi host. The storage view will display accessible storage volumes that are either partitioned and formatted VMFS or NAS storage volumes (NFS shares).
VMFS volumes have hardware paths under the Device column header and the type vmfs3 under the Type column header.
NFS volumes have IP:/path under the Device column header and NFS under the Type column header.
The Details window displays properties of a selected storage volume including the total volume size, storage used, the Path Selection Policy (Fixed or MRU), the total number of paths to the volume, the number of broken and disabled paths, the block size for the VMFS file system and the number of LUNs that provide raw storage space for the VMFS.
The Storage Views tab lets you review storage consumption by VM and also storage maps. Storage reports (storage consumption by VM) displays very useful information such as:
- If the VM benefits from path redundancy to storage. Path redundancy is critical to
- The amount of storage used by a VM
- The amount of space used by snapshots active on the VM
- The number of virtual disks the VM has
Examples of Free and Open Source iSCSI SAN solutions
Openfiler – Free iSCSI Target (SAN) software. Not VMware certified. Not suitable for high transaction volumes or high I/O loads
FreeNAS – NFS/iSCSI open source OS turns a PC or server into a storage appliance
Nexenta Stor Community Edition – NAS/iSCSI storage OS load for PCs/PC servers
StarWind – free iSCSI Target (SAN) software for Windows
StorMagic - Virtual Storage Appliance for VMware
QUADstor – Open source iSCSI SAN for Linux
DataCore – Virtual SAN software
TrueNAS – Commercial version of the Open Source FreeNAS project
Additional commercial SAN solutions: Tintri, SimpliVity, VMware vSAN, Exablox...
Windows Server 2008, Server 2012 iSCSI Target... Microsoft added block mode storage via iSCSI Target software as a free download in Windows Server 2008 and built this feature in to Server 2012. This allows Windows server to share block storage with remote iSCSI initiators.
Hyperlinks for these products can be found in the supplemental material attached to this lecture.
The vSphere Client will display very little in the way of useful diagnostics if things should go wrong. The first thing to do when troubleshooting iSCSI connection problems is to double check everything including IP addresses, configuration settings, firewall setup, etc.
If all else fails, review log files via the ESXI console.
ESXi 5.0, 5.1 and 5.5 support NFS v3 client connections only. These hosts can use all of the above named features (ESXi + NFS v3 column)
ESXi 6.0 can connect to an NFS server using either NFS v3 or NFS v4.1 connections... If you use vSphere Client to create an NFS connection, you automatically connect using NFS v3 only. If you use vCenter plus vSphere Web Client to create an NFS connection, you can choose to connect using either NFS v3 or NFS v4.1. If you connect to NFS via NFS v4.1, you can only use features identified in the ESXi + NFS v4.1 column.
Many of the features that do not work with NFS v4.1 are high end features available only on VMware's most expensive licenses. Features that do not work on datastores provided by NFS v4.1 connections include Storage DRS (load balancing) clusters and Storage I/O Control (for storage bandwidth management).
Additional features that are not supported when using NFS v4.1 connections are Site Recovery Manager (disaster recovery tool for virtual environments) and Virtual Volumes (virtual disk containers introduced in vSphere 6.0).
If you need any of the unsupported features or if you need to administer an ESXi host with vSphere Client, do not use NFS 4.1 client connections.
Many NFS NAS devices do not yet support NFS v4.1. A partial list of NAS devices that DO NOT support NFS 4.1 include:
● FreeNAS / TrueNAS
● NexentaStor Community Edition
● NetApp’s OnTAP Simulator
● Synology DSM 5.1
Some NFS servers do correctly support NFS v4.1. Check VMware's hardware compatibility portal to verify your NAS server supports NFS 4.1 (and is running the correct firmware version).
When in doubt, stick with NFS v3
(or risk data corruption on your NAS)
Traditional SAN storage provisioning involves allocating a private SAN LUN to physical servers. In this case, the PC Server benefits from using the SAN:
The problem is that legacy operating systems Windows, Linux, UNIX, etc.) require exclusive access to any SAN LUNs. That is, none of these operating systems would permit two or more physical machines using the same LUN at exactly the same time.
So, even though a PC server may use a SAN LUN, that SAN LUN is effectively held captive by that PC server. If the PC server were to fail, then no other physical machine would be able to use that LUN (for recovery purposes) unless the SAN administrator reconfigured the SAN to make the LUN visible to a new PC server.
VMware File System (VMFS) volumes were designed for safe, concurrent access by ESXi hosts. This means that, unlike traditional operating systems, many ESXi hosts can connect to, mount and concurrently use files on the same VMFS volume at the same time.
VMFS file systems are general purpose, hierarchical file systems that can be used to hold files needed for your virtualization initiatives. VMFS was designed to be an efficient (very low overhead) file system suitable for files of all sizes. It is especially important that VMFS remain efficient on extremely large files because virtual disk files (.vmdks) can be up to 2TB in size.
VMFS are often used to hold other files useful to virtualization such as operating system, utility and application install images. If you rip (using Roxio, Nero or your favorite CD/DVD ripping tool) install media to files on a VMFS, then any VMcan mount those files on its virtual CD/DVD device and use the image as if it were physical media. This eliminates many problems normally encountered with physical media including:
VMFS volumes are designed to safely handle concurrent I/O activity by multiple ESXi hosts. This is accomplished by cleaver use of LUN and file locks. For example, when an ESXi host is told to power on a VM, it must assert a file lock on the VM (so that no other ESXi host can manipulate the VMs virtual disk files). The virtual disk file lock is established as follows:
- The ESXi host asserts a non-persistent SCSI reservation (LUN lock) on the entire VMFS volume. This gives the ESXi host temporary exclusive access to
the LUN. I/Os from other ESXi hosts will queue at the host while the non-persistent SCSI reservation is present
- The ESXi host then places a file lock on the .vmdk file of the VM to be powered on
- The ESXi host updates the file system structure to indicate that the VM has been powered on
- The ESXi host then releases the non-persistent SCSI reservation against the
LUN. This allows I/Os from other ESXi hosts to other files on the LUN to proceed
- The ESXi host then proceeds to power on and run the VM
ESXi hosts scan for storage volumes on boot or whenever the Rescan link (Configuration Tab > Storage Adapters > Rescan...) is clicked. The ESXi host will update it's available storage roster with the properties of all visible LUNs found during a rescan.
If a LUN is partitioned and formatted VMFS, the ESXi host will add the VMFS volume to the available storage view (Configuration Tab > Storage). Any volume in this view is immediately available for use by the ESXi host either to access existing files on that VMFS or to create new files on the VMFS.
VMFS volumes can be referenced by either their Runtime path (e.g.: vmhba32:C0:T1:L1) or their label (e.g.: Production, Test, etc.). The vSphere Client displays VMFS volumes by their label.
If you log in to the Local/Remote Tech Support command line, you can navigate over to the top directory of a specific VMFS volume with the command:
# cd /vmfs/volumes/VMFS-Label(VMFS-Label is the name of the VMFS)
It is easy to construct a new VMFS volume onto an available storage volume. You do this by invoking the Add Storage Wizard (Configuration tab > Storage > Add Storage...)
The first step of this wizard asks you if you want to add either a NAS/NFS resource or a Disk/LUN resource. Click Disk/LUN.
Next, the wizard will display a list of all volumes visible to the ESXi host that contain non-VMFS partitions or volumes that have no partition table at all (e.g.: new SAN LUNs or local physical/RAID volumes).
Whenever physical LUN properties are displayed, the following should be kept in mind:
Capacity – is the actual reported size of the LUN in GB or MB
Available – is the amount of unpartitioned space available on the LUN
A disk is unpartitioned when the Available space almost matches the reported capacity (as some space is held back for the MBR and partition table). If available space reports as zero, then the disk is fully partitioned with non-VMFS partitions. If you delete these partitions, then non-VMFS data will be irretrievably lost.
Above is a roster of available storage volumes on an ESXi host. To complete the Add Storage wizard, select one of the volumes. All column headers are sortable – and LUNs are added in the order found – so click a column header to re-order the LUNs into something that makes sense for you (e.g. Click LUN header to see LUNs sorted by their LUN ID value).
Once the Add Storage Wizard completes, your new VMFS volume is ready for use (and is added to the Storage roster).
ESXi uses VMFS labels as a way to defend against SAN LUN renumbering and/or changes in the LUN Name (e.g.: vmhba1:C0:T1:L1) which can happen across boots. Fibre SANs may sometimes renumber (reassign LUN volume numbers) as SAN administrators add/remove volumes from the SAN. By using the VMFS volume name, rather than it's number, ESXi administrators can continue to use a mnemonic name rather than having to concern themselves with the currently active LUN number for a LUN.
VMFS volumes are very efficient but do impose both capacity and performance costs on a storage volume.
Capacity - VMFS volumes lose about 3-6% of their overall capacity to VMFS file system overhead. Smaller VMFS LUNs lose more capacity to overhead than do larger LUNs
Performance – A VM's virtual disk is represented as file in a VMFS. When a VM does I/Os against it's virtual disk, those I/Os are completed by the VMFS file system driver. As a result, the VM not only has its normal disk I/O overhead (e.g. NTFS overhead) but also a modest amount of VMFS overhead.
VMFS volumes now support capacity growth through dynamic LUN expansion. This means that your SAN administrator can grow a storage volume and you can grow a VMFS partition and file system onto the newly allocated space.
There is another strategy for growing VMFS volumes. VMFS supports capacity expansion through LUN spanning – the joining together of a VMFS volume with additional empty SAN volumes.
Suppose our Production VMFS volume were full (or nearly full). If a VMFS volume fills, there can be undesirable results including:
- You cannot power on a VM because there is no room to create the VMkernel swap
file that must be present to handle VM paging to disk
- You cannot snapshot a VM because there is no space left tohold the file that
accumulates the changes to a virtual disk that occur after the snapshot is taken
- You cannot make new VMs on the LUN because there is no room left to allocate
space to the VMs virtual disk and other constituent files
- You cannot increase the size of existing virtual disk files
- VMs with snapshots will freeze when there is no more space to record virtual disk
LUN spanning is a capacity management technique that lets ESXi administrators increase the size of a VMFS by gluing together (spanning) an existing VMFS with an empty volume. Once the Span is complete, the VMFS will be able to use free space on the original volume allocated to the VMFS and the new volume (that was added to the VMFS).
The other advantage to LUN Spanning is that spans can be created while the VMFS is in use so that capacity issues can be dealt with immediately rather than having to wait for the next maintenance window.
VMFS LUN spanning can cross multiple volumes. In the example above, the Production VMFS has been spanned across two additional volumes. In this case the Production VMFS will report, as its capacity, the sum of the sizes of all three volumes assigned.
LUN Spans are not a form of RAID. That is, LUN Spans do not mirror or stripe across the allocated volumes. As storage is requested, the Span will allocate space from the first volume until it fills. Once the first volume has filled, additional storage needs will be met by allocating free space from the second LUN. And, when that LUN fills, storage will be allocated from the third LUN (and so on). Files on the Span get free space from whichever volume has it to give. So, there is no way to know (on a file by file basis) which volumes contribute storage to a file.
Once a volume is assigned to a VMFS (either as the first or subsequent volume), it is considered in use and that volume is removed from the available storage roster.
You can add a volume to a VMFS at any time by completing these steps:
Configuration tab > Storage > click VMFS > Properties > Increase button
The LUN Properties window lets you review the currently assigned storage volume(s) for a VMFS and also lets you add additional volume(s) to the VMFS through the Increase button.
When you click the Increase button, you invoke the Increase Datastore Capacity Wizard. This wizard starts by showing you a roster of all volumes visible to the ESXi host that contain either unpartitioned space (Capacity nearly equals Available space) or volumes with non-VMFS partitions (Available space is either substantially less than Capacity or reports as None).
Note that the Datastore Capacity Wizard will automatically assume you want to create a Span if there is no free space on the LUN whose properties you are editing. If there is free space on the LUN, the wizard will assume you wish to grow the LUN.
In the example above, we select a volume and assign it as LUN span to the VMFS.
You need to rerun the Increase Datastore Capacity wizard for each additional LUN you wish to add to a VMFS.
Be aware that ESXi does not judge or second guess the suitability of the LUN(s) you select for use as extent volumes for a VMFS. Poor choices for extent candidates are any LUNs that do not match the
of the original LUN in the VMFS. If the additional LUNs are not as fast as the first LUN (because of a different RAID strategy or different SAN acceleration settings), then some of your I/Os to the LUN will take longer to complete than others – probably leaving you scratching your head wondering why some VMs run quickly and others don't.
If you use additional LUNs that are visible to you but not other ESXi hosts then any VMs on the span would not be able to use VMotion, DRS or HA.
And, if you span redundant LUNs with non-redundant LUNs, then you risk data loss across all files on the LUN if the non-redundant volume were to fail.
Exercise care when selecting LUN span candidate volumes for a VMFS
As of ESXi 4.0, you can now increase VMFS space by:
1. Having your SAN administrator grow the SAN LUN on which a VMFS partition lives
2. Growing the VMFS partition and file system on the newly expanded LUN
This process can be performed hot – while ESXi is up and running and while VMs are actively using the VMFS that is being grown.
When growing a VMFS it is really important to record the LUN Name (vmhba#:C#:T#:L#) before you attempt to grow the VMFS. You will need this information so you can select the correct volume in the Extent Device screen (above). You get this information from the Storage view.
With the hardware path (vmhba#...) in hand, review the Extent Device roster looking for a volume Name that has a Yes in the Expandable column. That will be your newly (physically) extended volume. Select this volume and continue with the Wizard.
The Increase Datastore Capacity wizard validates your selected volume - to ensure it is the same volume on which the VMFS lives and that it has free space available. If both conditions are met, then the wizard will allow you to grow the VMFS.
By default, the wizard will grow the VMFS onto all free space. You can grow the VMFS to less than all free space – but there is really no benefit to having unallocated free space on a volume.
At boot or on rescan, ESXi learns all healthy paths to each LUN. So, if a path were to fail, ESXi can easily reroute I/Os around the failed component to the desired LUN. For example, if Storage Processor 1 were to fail, ESXi would:
- Immediately detect the loss of SP1
- Select a healthy path that does not include SP1. There may be a short lag
in I/Os while ESXi tests the health of the path and selects an alternative
- Re-issues I/Os that did not complete on the failed path over to the new
active path so that no VM I/Os are lost or handled out of order
In the above case, ESXi would direct I/Os around the failed SP1 and through SP2. When SP1 reports that it is healthy (either because SP1 was replaced or the fault was cleared (e.g.: Fibre cable was plugged back in), ESXi must decide how to respond. It can either continue to use the known healthy (but more congested) SP2 or it can swing back to using SP1. Fail back policies determine how ESXi responds:
Most Recently Used - ESXi continues to use SP2. We have high confidence in the health of the path, but I/Os might take longer to complete due to path congestion at SP2.
Fixed Path - ESXi swings I/Os back to SP1, I/Os will complete more quickly but there may be a risk that SP1 could fail again if the fault was not completely cleared.
Multipathing on an iSCSI storage network is a function of how the TCP/IP network was provisioned. If you have two iSCSI hardware initiators or two ports on a single iSCSI HBA and/or two iSCSI Storage Processors, then ESXi will automatically discover all paths through the HBAs and SPs to visible SAN LUNs. As with Fibre SANs, having multiple HBAs and SPs contributes to iSCSI reliability and performance.
If you are using iSCSI software initiators, you can further enhance reliability and performance by NIC Teaming the vSwitch that is carrying iSCSI traffic. Through NIC Teaming, the vSwitch can assign more NICs to handle iSCSI traffic. And, if the assigned NIC fails, the Team will re-balance so that iSCSI I/Os will complete through a healthy NIC.
ESXi has multipathing capability built right into the VMkernel. As a result, there is no need for 3rd party multipathing tools. New to ESXi 4 is the ability to add a limited number of 3rd party multipath solutions to ESXi.
On boot or rescan, ESXi scans Fibre and iSCSI SANs and discovers all available paths to each LUN. Here is how ESXi uses hardware paths to reference LUNs:
The Canonical Path is the generic name used by ESXi when referencing a LUN. By default, this is the first path found to a LUN. The Canonical path remains the same regardless of any changes in underlying path usage due to path failures or active path re-assignments.
Fixed path and MRU multipathing are now considered Legacy I/O strategies and would not normally be selected unless you had just one physical path between your ESXi host and your SAN storage device or your SAN does not support active/active multipathing.
By default, ESXi uses Fixed (VMware) multipathing. You should upgrade to Round Robin (VMware) which is active-active multipathing if you have more than one I/O path to a LUN and only if your SAN supports active/active multipathing. Round Robin multipathing distributes storage I/Os across all healthy paths which substantially improves VM disk I/O performance.
In the past, VMware only supported one I/O path per LUN. This made virtualization unsuitable for workloads that required high storage bandwidth (e.g.: workloads that need more than one path of I/O bandwidth). With Round Robin multipathing, these workloads can now be virtualized because their disk I/O demands can (finally) be met.
Active / Active Multipathing
All storage paths are used simultaneously to transmit I/O requests between the ESXi host and the SAN
Active / Stand-by Multipathing
This is where all traffic between an ESXi host and the SAN flows through one Active path. All remaining paths are in Stand-by mode. Should the Active path fail, a surviving (healthy) stand-by path is selected to be the new Active path. In this mode, only one path (the Active path) can carry I/Os.
A variation of Active / Stand-by multipathing. In this mode, an administrator declares different active paths on a LUN by LUN basis. While I/Os may only flow through one path to a LUN at a time, different Active paths are declared for different LUNs, allowing I/Os to flow through multiple paths Concurrently. There is no attempt to dynamically load balance across paths. Performance is improved as I/Os are statically distributed across available healthy paths.
For a good discussion on multipath options, please see the attached document...
It is usually more important to allocate scarce resources such as CPU and RAM in a predictable manner than in a fair manner.
Fairness usually implies that all VMs get a fair (or equal) access to host resources. While this sounds nice, the reality is that some VMs (e.g.: Production VMs) are likelymuch more important to us than other VMs (e.g.: test, development, quality assurance and training VMs).
Predictable resource allocation implies that you know and have control over how resources are allocated. There are two aspects to predictable resource allocation:
If resources are not fully committed (i.e.: there are more physical resources available than all VMs demand) then the VMkernel will ensure that VMs get all the resources (either CPU or memory) that they request – and perhaps resources for idling (allocated but unneeded CPU, RAM).
If resources are over committed (i.e.: VMs currently demand more memory or CPU cycles than the ESXi host can deliver) then the VMkernel allocates scarce resources to the most important VMs.
The VMkernel has a number of strategies it uses to determine who is most important when experiencing resource contention. Some of these strategies are built into the VMkernel and others are under your control. We will explore these in this chapter.
Whenever the VMkernel has more CPU resources than VMs demand, the VMkernel gives all running VMs all of the CPU cycles they require.
On its own, the VMkernel has no way of knowing what the VM guest OS is doing with the CPU cycles it gets. If the guest OS wastes these cycles running its idle task (because it has nothing to do) then those cycles accomplish nothing in the VM and are not available for use by VMs that have real work to do.
If you install VMware Tools into your VMs (a best practice), then VMware Tools will report back to the VMkernel whenever the guest OS in a VM runs its idle task. By doing this, the VMkernel always knows which VMs truly need CPU service and which VMs would waste CPU by idling.
The VMkernel CPU scheduler automatically treats VMs that need to run as high priority VMs and VMs that want to idle as lower priority VMs. In this way, the VMkernel allocates CPU resources to where they are needed.
If the VMkernel has more physical CPU resources than are needed to run all non-idling VMs, then the VMkernel CPU scheduler will allow idling VMs to accumulate idle time.
Physical CPUs always cycle at full (rated) frequency. However, because (potentially many) VMs compete for a physical CPU resource (socket or core), a VM may not receive a full core of cycles in any given second of time.
If the host is over-provisioned with CPU resources, ESXi will allow a VM to use all of the CPU it wants. In this situation, the maximum number of cycles a uni-processor VM can use is the number of cycles a single CPU core can deliver (2.6ghz in the above example).
If the host is severely CPU over-committed, then the VM kernel must select which VMs run and which VMs wait. Under severe CPU stress, a low value VM could lose its turn at the CPU. It could receive as few as zero MHZ in a given second in time.
It is more likely that a VM will receive at least some cycles each second. How much depends on many factors including:
When the VMkernel determines that it is time to run a VM, the VMkernel allocates physical CPU resources (usually CPU cores) to the VM equal to the number of Virtual CPUs in the VM). That is, if the VM has 1 vCPU it will run with one core of resources. A dual vCPU VM runs with two CPU cores and a 4 vCPU VM runs with 4 CPU cores.
Each vCPU can be no faster than the frequency of the physical CPU core that runs the vCPU. So if you have a physical CPU that runs at 2.6ghz, then a vCPU cannot run any faster than 2.6Ghz. This is the absolute upper limit of CPU cycles that can be allocated to the VM (on a per VCPU basis).
If you like, you can lower this limit to a lesser value by setting a CPU limit to some number of MHZ less than the frequency of the physical CPU core. For example, you can set a limit of 1ghz for very low value VMs. If you did this, the VMkernel CPU scheduler would never allocate more than 1ghz of cycles to the VM, even if there were spare CPU resources available.
One good example for the use of CPU limits is legacy NT4 applications. Some old NT4 based applications waste CPU by polling the keyboard rather than giving up the CPU when they are idle. If you migrated this workload to a VM, it would try to burn a full physical CPU core (of a modern, high speed CPU, not the 300-1,000mhz that an old Pentium 3 CPU could deliver). By setting a limit, you could control how much CPU this badly behaving application could consume – perhaps limiting it to no more cycles than it had when it was physically deployed.
You can also assign CPU reservations. A reservation is a guaranteed allocation of CPU cycles (in MHz) to a VM. This allocation is delivered to the VM every second by the VMkernel CPU scheduler and is provided regardless whether the VM needs the cycles or would waste the cycles running its idle task.
An example of a VM that could benefit from a CPU reservation is a busy VM that runs an interactive network application – such as Microsoft Terminal Services or Citrix servers.
Normally, under CPU load, the interactive VMmay lose its CPU to other VMs. If that were to happen, then users working with the VM might experience lag or jerkiness in their interactive sessions. If you assign a CPU reservation, then the VM will hold onto the CPU even if starts to idle. This would allow the interactive VM to appear more responsive (smoother) under load – as the VM can respond instantly to any keyboard or mouse events from the client.
Reservations are guaranteed commitments of resources. Once you declare a reservation, the VMkernel will honor it, even if it means penalizing other VMs. Excessive use of reservations could lead to artificial contention as the VMkernel is no longer free to pull CPU away from idling VMs and redirect it to busy VMs.
When a VM boots, the VMs BIOS reports the declared amount of RAM to the VMs guest OS. The guest OS will then treat this declaration as the total physical RAM available to the VM. If the VM needs more RAM than it was provisioned with, the VM will use it's native memory management capabilities (paging). Paging transfers less important memory pages to disk to free up memory for more important pages.
The VMkernel allocates RAM to a VM as the VM attempts to use memory, not on boot. So, if a VM boots with a 8GB memory declaration but only loads 4GB of pages into RAM, the VMkernel will only provide the VM with 4GB. In this way, the VMkernel prevents memory waste by allocating RAM to VMs that don't need it.
If a VM clearly demonstrates an ongoing need for more RAM than it was given (through persistent guest OS paging), you should increase the declared memory for the VM the next time you can power cycle it (power down, dial up RAM, power on).
It is possible that the VM could spike on memory (thereby gaining more RAM from the VMkernel) and then later have the application that needed the memory release it. When this happens, the VM ends up with an over allocation of physical RAM. The VMkernel will learn about this over allocation through VMware tools (who reports unused memory back to the VMkernel) and can steal back any over allocation through the Ballooning memory management technique (more later).
When a VM runs, it believes it has a full memory allocation (as set in the Edit Settings... > Memory). In reality, the VMkernel assigns the VM physical memory only when the VM attempts to read/write a given page.
The VM can never receive more physical RAM that it was assigned (Edit Settings > Memory). It will only receive a full allocation if it actually tries to use each and every page.
Under memory contention, the VMkernel employs various memory management techniques to ensure that memory is used efficiently. The technique of last resort is VMkernel paging. Under extreme memory stress, the VMkernel will page out some/all of the VM's memory and re-assign that memory to other (more important) VMs. However, the VMkernel will never page out any declared reserved memory or the memory used to hold the VMs virtual hardware.
Memory shares are used to determine which VMs hold on to memory and which VMs have their memory paged (which VMs win or loose memory competitions). In a nutshell, the more memory shares held by the VM, the more likely it is to hold on to its memory under memory stress.
The standard formula used to calculate memory shares is 10 shares for each MB of RAM declared by the VM.
Memory is a critical ESXi resource and the one most likely to become exhausted (although this is changing as memory prices for PC servers continue to fall).
Managing ESXi server memory starts with a change in attitude toward memory. In physical PC server deployment, the practice was to over provision the PC server with RAM because physically opening the PC server to add RAM in the future meant acquiring memory, finding a maintenance window and then powering down the server to add memory. Over provisioning is one of the habits we want to break. With VMs, changing the memory allocation is simple. Find a 5 minute maintenance window, shutdown the VM and increase the memory setting. Then, simply power on the VM and it has more RAM. The guest OS will automatically see and use the larger allocation.
So, the first best-practice you should adopt to conserve memory is to not over provision your VMs with RAM.
If you cannot take a VM down, you can reduce it's memory demands by reducing the Memory Limit assigned to the VM. Normally the memory limit is the amount of RAM assigned. However, you can drop this value, and the VM will be forced to give up any allocation over this new limit. If the VM needs more RAM that this new, lower limit, it will be made up through paging.
Under memory stress ESXi can page some or all of the VMs memory to disk (a VMkernel swap file allocated to the VM for this possibility). If too much of the VMs memory is paged to disk, the VMs performance will suffer noticeably. If the VMkernel continues to page more and more of the VM to disk, the VMs performance will continue to degrade until the VMs performance becomes unacceptable (or non-existent if 100% of the VM is paged to disk).
You use memory reservations to guarantee that high value VMs retain an acceptable minimum allocation of RAM. This helps to ensure that critical VMs maintain a tolerable level of performance even during periods of extreme memory starvation.
The ESXi memory manager guarantees that a VM always gets its reservation and never receives more than its limit. If memory is scarce, the VMkernel will dynamically shuffle memory between VMs in real time using a number of techniques (transparent page sharing, ballooning and VMkernel swapping). The VMkernel uses Share allocations to decide which VMs retain memory (or receive RAM) and which VMs are forced to give up memory (under stress). By default, a VM receives 10 memory shares for each 1MB of declared RAM. That way VMs with more declared memory are more likely to retain RAM than smaller VMs (if a big VM didn't need the RAM why was it assigned?).
The VMkernel uses a weighted scheduling algorithm called Shares to decide how to hand out scarce resources. Shares apply to the delegation of CPU cycles, memory and disk/LUN I/O bandwidth.
The idea behind share based allocation is very simple. In a nutshell, the more shares you have (relative to other VMs) the proportionally greater service you receive for that resource from the VMkernel. Shares are handled as a magnitude or weight and share assignments are always relative to the outstanding total.
For example, suppose you had 4 VMs with 1,000 shares each. The total number of outstanding shares is 4,000 and each VM holds ¼ of these shares so under CPU contention each VM would get 25% of available CPU resources.
If a VM were very important, you could increase its share assignment; say to 3,000 shares. Now there is a total of 6,000 outstanding shares, and one VM holds ½ of the total. Consequently, it would receive 50% of all CPU and the remaining three VMs would compete for the remaining half.
If two more VMs were to power on with 1,000 CPU shares each, then the total shares would increase to 8,000 and our important VM would hold 3/8 of all shares. The remaining 5 VMs would compete for the remaining 5/8.
Note: A VM can never receive more than it's configured maximum for any resource. E.g.: a uniprocessor VM with ½ of all CPU shares on a 4 processor machine can only receive one physical CPU core of service.
A Resource Pool is an inventory item that is only available under the Hosts and Clusters view. A Resource Pool functions nominally like a folder in that it can contain sub-items including VMs and sub-Resource Pools.
Unlike a folder, a Resource Pool support CPU and Memory assignments. The Resource Pool draws CPU and Memory resources from its parent host, resource pool, or cluster and then divides whatever it receives amongst its members according to their individual Share, Reservation and Limit settings.
In the example above, the Production Resource Pool competes with other Resource Pools at the same level for an allocation of ESXi CPU and RAM from the esx2.esxlab.com ESXi host. Whatever resources it gets are then delegated to the Resource Pool members (the three powered on VMs).
Resource Pools act as resource containers and make it easy for ESXi administrators to assign specific resource settings to a Resource Pool as required.
Resource delegations make it easy to sub-divide the resources of your ESXi host or cluster without having to worry about how the Resource Pool owner will use the resources. For example, suppose you have a (internal) customer who, in exchange for budget, requests a guaranteed allocation of exactly 4Ghz of CPU and exactly 4GB of RAM. Here is how you meet this request:
- Create a new Resource Pool for the customer
- Edit the Resource Pool's properties. Assign a 4Ghz CPU reservation and a 4
GB memory reservation.
- If the allocation is strict (e.g. the customer can have 4Ghz/4GB and no
more), assign a Limit of 4Ghz CPU and 4GB of RAM
- Assign the customer the role of Resource Pool Administrator (see the Permissions
chapter on how to do this)
Now, the customer 'owns' their own Resource Pool. They can create one VM or many VMs in their Resource Pool. They can set individual VM resource assignments (Reservations, Shares, Limits) for both CPU and Memory.
Because of the Resource Pool reservations, the customer receives a guaranteed allocation of CPU and RAM. If you set Limits, the customer also is constrained with a hard cap on the amount of resources their VMs can collectively consume.
One option available to VMs and Resource Pools is Expandable Reservations... A Reservation is a guaranteed allocation of resources. That is, once a VM or Resource Pool claims a Reservation, there is no way (other than de-tuning the VM or RP, or powering off the VM) to reclaim the resource.
ESXi will to honor all Reservation requests. On VM boot, if ESXi cannot meet a VM reservation request, then that VM will not be allowed to power on. This could happen if, for example, a Resource Pool had a total declared reservation of 2GB and this reservation was fully allocated to other powered on VM (in the RP). Any attempt to power on a new VM that wanted its own memory reservation, would fail.
Normally the RP Administrator would have to reduce the reservation settings of other VMs to free up memory for the VM to boot. Another alternative is to use Expandable Reservations. Expandable Reservations give a VM or sub-RP permission to borrow resources directly from the parent Resource Pool's parent – an ESXi host or cluster rather than from just the Resource Pool itself.
The good news with Expandable Reservations is that the VM can get is reservation and boot. The bad news is that there is no way to force a VM to give back any expandable reservation it has acquired. Also, once Expandable Reservations are enabled, there is no way to cap how much resources a RP borrows from it's parent's parent.
USE EXPANDABLE RESERVATIONS WITH CAUTION!
As previously discussed, Resource Pools are a great way to meet any resource commitments you have with your user community. They allow ESXi administrators to delegate a fraction of host CPU/Memory without having to worry about how those resources will be used.
If the customer powers on too many VMs or if the VMs attempt to use more resources than their Resource Pool has, then the customer's VMs will exhibit poor performance even though the ESXi host may have uncommitted resources.
If the customer wishes to increase their resource allocation, then they would need to negotiate a higher resource delegation, presumably in exchange for more budget or other consideration.
vCenter has a Scheduled Tasks feature that lets you complete a wizard (to perform some function) and then set a future time/date for that wizard to run. New to ESXi 4/vCenter 4 is the ability to schedule resource changes to VMs and Resource Pools.
To invoke this function, please do the following from the vSphere Client:
- Click View > Management > Scheduled Tasks
- On the background click New Scheduled Task
- Select Change Resource Settings from a Resource Pool or Virtual Machine
- Give the task a name (so you can use it again)
- Set the task frequency (it can auto-repeat)
- Set the tasks date/time to run
- Optionally send a notification e-mail when the task completes
- Complete the wizard making the resource changes you require
The Resource Allocation tab gives you a tabular view of the resource settings and allocations for your VMs and Resource pools under a datacenter, folder, cluster or host.
In this tab you can easily see how the resources of an inventory object are allocated to VMs and Resource Pools. You can also edit some settings (Reservations, Limits, Shares) directly in this view by simply placing your mouse over the item you'd like to change, clicking it to invoke a drop down or an edit box and then making the necessary adjustments.
Click the Memory button to see details of Resource Pool and VM memory resource
allocations. Note that resource pools set to Normal shares get substantially more shares than VMs that are also set to Normal share allocations.
Again, you can edit key fields simply by clicking on the Reservations, Shares or Limits values for a VM or Resource Pool.
VMware virtual machines run on top of virtual hardware. Virtual hardware is software that behaves, to the Guest OS, like and is indistinguishable from physical hardware.
ESXi uses memory to hold the data structures that make up virtual hardware. A VM with one vCPU, one NIC, one SCSI HBA and one virtual SCSI disk consumes about 50MB of RAM (depending on virtual hardware version in use). As you add more hardware, the amount of memory consumed to support the VM's virtual hardware grows. In the extreme case, a VM with 32 vCPUs, 8 NICs, multiple virtual SCSI HBAs and many virtual SCSI disks may consume up to 900MB or more of RAM before any physical memory is allocated for virtual machine operating system and program use.
Memory allocated for VM virtual hardware is never paged out to disk – and so is best viewed as an implied memory reservation over and above any reservations set for the VM.
So, to ensure the most efficient memory use possible, don't oversize a VM's virtual hardware.
The Resource Pool Summary tab contains useful information on the resource settings and resource consumption of your pool. You should periodically review the resource settings of all of your resource pools individually, and then collectively, to ensure that resources continue to be allocated in a way that makes sense for your system.
Migrating physical Windows and/or Linux machines to new hosts or storage is a complex, risky and time-consuming task. Fortunately, moving VMs to a new ESXi host or datastore (or both) is easy.
vCenter provides VM migration capabilities as an inherent part of the product. vCenter can move a VM to a new:
ESXi Host – This reassigns ownership of the VM to the new ESXi host. Depending on your configuration and licensing, this could be either hot or cold
Datastore- Moving a VM to a new datastore entails copying the VMs constituent files from one datastore to the next. Cold datastore migration can be done at any time and on any VM. Hot datastore migration requires a Storage
Host &- vCenter can move a VM to both a new ESXi host and a new datastore at the same time. To do this, launch the vSphere Web Client, and log in.
Your VM must be Hardware Version 9 (not vSphere Client can only configure VMs up to Hardware Version 8 – so you will have to power off your VM and upgrade the virtual hardware to V9. Then, launch the Migration Wizard on your powered on VM, select the new ESXi host and the new Datastore and watch the migration happen!
Cold migration is the act of moving a powered off VM from one host to the next. What actions are taken when a cold migration is requested depends on VM, ESXi host and storage configuration.
If the VM lives on a shared storage volume that is visible to both the source and target ESXi host, then cold migration is nothing more than the transfer of ownership of the VM from the source to the target ESXi host. In this case, the cold migration requests complete in seconds.
If the VM resides on a storage volume that is not visible to the target ESXi host then vCenter has more work to do. In this case, vCenter must:
- File transfer the VMs constituent files from the source ESXi host to the target ESXi host. This is done using an encrypted connection between the two hosts Service Consoles and will likely occur at no more than 10-40MB/s.
- Once the files have been transferred, the target machine will take on ownership of the VM
- Then, the source machine will remove the VM's constituent files from its own datastore(s) and remove the VM from its VM roster
There are many scenarios where it makes sense to cold migrate a VM. The most popular reasons are identified in the slide (above).
Cold migration can be time consuming because virtual disk files can be very large and the VMs files are transferred through Ethernet, TCP/IP. The actual file transfer is conducted between the Service Consoles of the source and target ESXi hosts.
Note: All network connections between ESXi hosts and VMware clients are encrypted. This ensures data privacy but adds overhead which makes the transfer take longer
You should expect no more than 20-80 megabytes/second of file transfer speed even on a Gb NIC (your mileage may vary). Make sure you have an appropriately sized maintenance window before you attempt a cold migration.
VMotion is the act of migrating a hot or running VM from one ESXi host to another. To be effective, hot migration must be capable of completing this task without introducing undesirable side effects such as OS instability, application instability or guest OS/application unresponsiveness.
VMware has offered VMotion since 2003 and is widely regarded as having the most robust, SAN friendly, production ready implementation of hot VM migration available.
VMware protects its products with patents. As VMware conceived, implemented and refined VMotion technology, it is likely that it's patent claim on this technology is solid. This may prevent or restrict the hot migration capabilities of competing products because VMware can always claim patent infringement.
For all of the above reasons, VMware's host migration technology is the preferred hot migration technology that is being actively deployed in production at this time.
VMotion provides two key benefits: dynamic VM load balancing and the ability to do ESXi host maintenance without the need to take VMs down.
Resource balancing with VMotion is critical to controlling PC server costs. Before VMotion, servers would need to be over provisioned to meet any possible future resource demands. With VMotion, an IT department can deploy the right amount of PC servers to meet their needs (right sizing) and then monitor resource load growth over time. As resource demands begin to exceed available resources, IT departments can provision and deploy new ESXi hosts and then use VMotion to balance VM load across all ESXi hosts. In this way, IT departments can, for the first time, conduct proper PC server capacity planning and deployment.
The second advantage of VMotion is that it, for the first time, enables PC server hardware and software maintenance during production hours. This results in cost savings for IT departments because hardware maintenance, patching, configuration updates, etc. can be conducted during normal business hours rather than during off hours (when consultants cost more and when employees may be accumulating time off for working over time).
VMotion is a separately licensed feature that is included in many ESXi Editions. Currently VMotion is included in these ESXi 4 Editions:
- Enterprise Plus
- Small Business Essentials+
VMotion is completed in a number of discrete steps. An unmentioned step here is that your VMotion request must pass a number of validation steps before vCenter will even attempt to conduct the VMotion. You will see these validation steps in the upcoming Labs.
The first step is that the target host creates virtual hardware for the new VM. This hardware is an exact duplicate of the VMs hardware on the source ESXi host.
VMotion works by copying the VMs memory image through the VMotion LAN segment to the target ESXi host. Since a VM could have many MB or GB of RAM (VMs can have up to 1TB of RAM) is is not practical to pause the VM and copy all of its RAM through the VMotion network. Instead, the source and target ESXi hosts cooperate by pre-copying the running VMs memory image through the VMotion LAN segment while the VM is running.
Pre-copy is tracked through a memory bit map that the source ESXi creates at the beginning of a VMotion request. The bit map allocates one bit for each page of VM memory. When the page is successfully copied to the target ESXi host, the bit is set. If the page is changed (because the VM is still running) after being copied the bit corresponding to the page is reset to indicate that the page on the target system is no longer valid.
Once it is determined that no more benefit can be had by attempting to copy pages, the source ESXi box deschedules (stops running) the VM. This is no different than if the VM were to lose it's turn at the CPU and be forced to wait while other VMs get a chance to run.
Next, the memory bit map (which identifies successfully copied memory pages) is transferred through the VMotion network to the target ESXi host. This lets the target ESXi host know which pages in the VM are valid and which pages still need to be copied.
Since the bit map is very small (one bit for every 4kB of VM memory), it can be transferred to the target host very quickly (in milliseconds).
When the VM is descheduled (loses its turn at the CPU), the VMkernel performs a context save. A context save is a save of the VM's CPU register contents, status bits and other CPU state information needed to restore the VMs CPU to the exact state it was in when the VM stopped. Since the VM will be resumed on the target ESXi host, this context needs to be transferred.
VMotion cannot wait until the VM is completely quiet before contemplating performing the VMotion. Busy VMs may have to deal with an ongoing number of network, screen, keyboard, mouse, CD/DVD, Floppy and/or disk I/O events. The VMkernel tries to judge the most opportune time to perform the actual migration but there may be pending I/Os on the source machine that need to be completed. If this is the case, these incomplete I/Os are transferred over to the target ESXi host where they can be completed once the VM is given a chance to run.
Once the VMs memory (or a substantial portion thereof), context, pending I/Os and bit map have been transferred to the target ESXi host we are now ready to perform the actual transfer of ownership.
There are two final steps before that transfer can be completed:
The source ESXi host must remove its exclusive lock on the VM. When ESXi hosts power on a VM, they place an exclusive file lock on the VM. The host machine must release this lock so that the target ESXi host can assert its own lock on the VM.
The target ESXi host must inform the physical switch that handles the VMsnetwork traffic that the VM's MAC address has moved to a new port on the physical switch. The target ESXi host does this by sending a Reverse Address Resolution Protocol (RARP) packet to the switch. RARP packets informthe switch which MAC addresses are behind which ports (as many VM MACs are behind a limited number of physical NICs used to uplink vSwitches to the physical network). RARP updates allow the physical switch to update its own MAC-to-port tables. This is needed so that the physical switch can forward any network packets destined for the VM to the VMotion target ESXi host.
The VM is now set to run on the target machine. Its virtual hardware exists exactly as it did on the source machine. The VM has it's complete CPU context. Its pending I/Os are waiting to be processed, and the physical switch knows where to deliver any inbound network packets... The only problem is that there are still needed memory pages on the source ESXi host.
The VMkernel makes very clever use of VM demand paged virtual memory management to solve this problem. While the VM is running on the target ESXi host:
- Memory references to valid pages complete as normal
- Memory references to not-yet-copied pages force a page fetch to the source host
The two ESXi hosts cooperate to bring the remaining pages from the source to the target ESXi host. The two machines:
- Use low priority, background page copies through the VMotion network to push needed (but not currently referenced) pages to the target ESXi host
- Use high priority page copies to immediately transfer any memory pages currently being referenced by the running VM
VMware's own objectives for VMotion are very aggressive. For example, a VMotion has only 1.5 seconds of transition window (the time from when the VM stops running on the source ESXi host to when the VM must run on the target ESXi host). This is to ensure that any latency experienced by applications, network connections and users on the VM is sufficiently small as to not cause any stress or failures to the guest OS or running applications.
Fail safe for VMotion is very simple... If the VM takes too long to transfer or, if anything goes wrong between the cooperating ESXi hosts, the VMotion request is abandoned. In this case, the source ESXi host simply reschedules the VM to run while the target ESXi host tears down all traces of the VM.
TCP/IP provides an elegant solution to lost network packets. Packet loss can occur if the VM transmits a packet to a physical peer and is immediately moved to the target ESXi host. If the physical peer replies quickly, it is possible that the physical switch may not (yet) have received the RARP update (informing the switch that the VM has moved) and will try to forward the packet to the source ESXi host. Since the source ESXi host no longer owns the VM, it can do nothing with the packet (cannot reject it, cannot deliver it, etc.). The originating machine would detect the packet timeout and retransmit. By the time this happens, the physical switch knows the new location of the VM and would forward the packet to the new ESXi host – who would then forward it to the freshly VMotioned VM.
VMotion requests go through an extensive validation process before any action is taken. In essence, the validation process attempts to determine if there is any reason that VMotion will fail or will leave the VM in an unsafe state.
Running VMs can access removable media (floppies/CDs), connect their virtual NICs to vSwitch Port Groups, etc. The OS or applications running on the VM can use these resources. For VMotion to succeed, the currently active configuration must be completely reproducible on the target ESXi host. If VMotion were attempted while a VM was using a local resource (local floppy media or local CD/DVD media) the loss of this resource after VMotion would leave the VM or application in an indeterminate state; risking data loss, application instability or even a VM crash. For this reason VMotion validation will fail if the VM is actively using any resource that is available on the source ESXi host but not on the target ESXi host.
VMs with CPU affinity cannot be VMotioned because there is no guarantee that the affinity setting can be honored on the target ESXi host. Any differences in vSwitch security settings will also prevent VMotion.
VMotion validation will generate warnings on references to (but not use of) host specific resources. For example, if a VM is configured to use a CD ISO image that is available to the source ESXi host but is not available to the target ESXi host, and the ISO file is disconnected (defined but not in use), then a warning is generated and VMotion will be allowed to proceed. In this case vCenter is advising you that once VMotion completes the VM won't have access to the media.
VMotion validation also warns you if the VM has snapshots. The presence of snapshots will not directly cause VMotion to fail. However, VMs with snapshots do incur extra disk I/O overhead and this overhead may result in more time needed to complete VMotion. Any delay in completing VMotion increases the risk of a VMotion failure, which is the reason for the warning.
It is important to configure ESXi hosts, that will act as VMotion peers, consistently so that the same resources are available on both systems. The cooperating ESXi hosts should see:
- The same datastores including common VMFS volumes and NFS datastores
- The same vSwitch Port Groups (case sensitive). While these port groups do not need to be on the same vSwitches or have the same NIC Team configuration, they do need to connect to the same virtual or physical LAN segments (no router between the source/target ESXi hosts), and
- VMotion must be configured correctly on both systems. This includes:
- Physical connectivity to an isolated VMotion network
- The network should be GB, not 10/100mb
- A vSwitch uplinked to this VMotion network
- A VMkernel port defined on the vSwitch with VMotion enabled
Surprisingly, the make or model of the machines involved is not important. That is, you can VMotion to/from HP/Dell/IBM/Sun/White Box servers with no issues if all other VMotion requirements are met.
VMware provides a CPU compatibility tool in the form of an ISO image. Simply download this image and burn it to a CD. Then boot your physical server off the CD to have the tool reportspecific CPU features such as:
Latest Virtualization Assist technology
For more information, please download the vSphere Best Practices guide in the attached PDF document
The bootable CPU reporting tool identifies key features of your host CPUs. Download the image, burn it to CD (it's only 3MB in size) and boot the physical host from the CD. Properties that must match across CPUs include:
The NX/XD flag is a hardware feature that prevents viruses from over writing the return address of a subroutine (and thereby taking control of the CPU). In Microsoft Windows, this feature is called Data Execution Protection or DEP
Note: It doesn't matter who manufactured your server as long as all VMotion CPU properties match! Check the attached PDF document for a download link.
This is a screen shot from an Intel Xeon 2620 6 core CPU (with 2 CPUs installed on the host). Note:
Note that this screen shot was made with the same tool that made the screen shot on the previous slide
Storage VMotion lets you move a VMs constituent files from one datastore to another while the VM is live. Storage VMotion works as follows:
- Storage VMotion is integrated into the Migration Wizard
- Select the Move Storage option when the wizard launches
- Select the target datastore and complete the wizard
The ESXi host VMkernel then creates the appropriate directory on the target datastore and begins to copy the VMs files. Special action is needed when copying the VMs virtual disk (VMDK) files...
- The VMkernel begins a VMDK copy
- If the VM performs an I/O to a part of the VMDK that has not (yet) been copied, the completes the I/O on the source volume
- If the VM performs an I/O to a part of the VMDK that has been copied, the VMkernel completes the I/O on the target volume
In this way, I/Os are completed in the appropriate location and all VM data remains consistent.
Once the Storage VMotion request completes, the source files/directory are removed.
Storage VMotion has many practical uses as outlined in the slide above. The key advantage to Storage VMotion is that it eliminates the downtime normally associated with cold migration.
VMotion stress tests are suggested so that you can verify for yourself that VMotion is not fragile. If you have extra time in this lab, please take some time and run any of the above stress-test suggestions. Feel free to think of additional stress tests and give them a try.
Get VMware vSphere and View trained here... on Udemy!
What do you do if you need to learn VMware but can't afford the $4,000 - $6,000 charged for authorized training? Now you can enroll in my equivalent VMware training here on Udemy!
I have created a six courses that together offer over 32 hours of VMware vSphere 6 lectures (about 8 days of instructor lead training at 4hrs lecture per day). With Udemy, I can provide more insight and detail, without the time constraints that a normal instructor led training class would impose. My goal is to give you a similar or better training experience - at about 10% of the cost of classroom training.
I am an IT consultant / trainer with over 25 years of experience. I worked for 10 years as a UNIX programmer and administrator before moving to Linux in 1995. I've been working with VMware products since 2001 and now focus exclusively on VMware. I earned my first VMware Certified Professional (VCP) designation on ESX 2.0 in 2004 (VCP #: 993). I have also earned VCP in ESX 3, and in vSphere 4 and 5.
I have been providing VMware consulting and training for more than 10 years. I have lead literally hundreds of classes and taught thousands of people how to use VMware. I teach both introductory and advanced VMware classes.
I even worked for VMware as a VMware Certified Instructor (VCI) for almost five years. After leaving VMware, I decided to launch my own training business focused on VMware virtualization. Prior to working for VMware, I worked as a contract consultant and trainer for RedHat, Global Knowledge and Learning Tree.
I hold a Bachelor of Science in Computer Science and Math from the University of Toronto. I also hold numerous industry certifications including VMware Certified Professional on VMware Infrastructure 2 & 3 and vSphere 4 & 5 (ret.), VMware Certified Instructor (ret.), RedHat Certified Engineer (RHCE), RedHat Certified Instructor (RHCI) and RedHat Certified Examiner (RHCX) as well as certifications from LPI, HP, SCO and others.
I hope to see you in one of my Udemy VMware classes... If you have questions, please contact me directly.