Here is the slide deck from my presentation at the North Central WIVMUG Super VMUG meeting on 11/19/2015. A full recap will follow shortly.
Thursday, November 19, 2015
Wednesday, November 4, 2015
Virtualization & SMB Series: Virtualization Architecture for Small Business
Virtualization & SMB Series: Virtualization Architecture for the Small Business
I had debated whether hardware or licensing should be the first topic covered in this series. As cost drive a lot of small business decisions, and hardware costs can vary greatly, it seemed like the best place to start. And, just a reminder, this series is focused for VMware admins in environments containing 20 or less VMs, the small end of SMB, obviously the solutions discussed below will not work for large environments. I will touch on larger VMware environments in a future series
Alright, let's begin!
The Shared Storage Conundrum
Shared storage in smaller environments is cost prohibitive. Yet, it's important requirement for VMware High Availability and Fault Tolerance. The small business must assess the value of uptime and lost productivity due to host failure against the costs of this additional hardware.
If 99.99% uptime, or the ~30 second reboot window using HA is critical, then shared storage is a necessity and you won't find value in this post. But, if a short outage while replacing failed hardware is acceptable, let's continue.
Local storage is starting to come back into the limelight with the advent of Virtual SAN and similar products. It also provides a cost effective storage option for the smaller ESXi environments. I will discuss the use of vSAN in the small business in a future post, but for the below example, simple local storage will suffice.
Number of Hosts
The number of hosts are dictated by the number and purpose of each VM. Does the environment call for a low number of VMs running simple network services such as AD, DNS, DHCP, print server, file server, and Exchange? Then a properly configured single ESXi host can be deployed. However, introducing more complexity to the environment on top of the above listed services, such as a resource intensive line of business application, the use of a second host will save future headaches and provide additional benefits.
The use of a second host provides flexibility, scalability, and, to an extent, disaster recovery. The flexibility to migrate VMs for host maintenance or to load balance, albeit manually, across both hosts (we will discuss VMware licensing and features in the next post). The scalability to add new VMs should the need arise. And, in the event of a host failure, a second host provides a platform to restore VMs from the failed host. Scalability and disaster recovery are dictated by the amount of resources with which the host is configured. This will be discussed next.
Configuring the Hosts
Now we know the environment will be running one to two ESXi hosts without shared storage, how do we properly configure these hosts? Here's where budget and some forecasting come into play. The approach I've taken is to look at the company's current needs coupled with potential growth over the next 1-3 years. Is the company projected to grow such that they will need additional VMs? Is the ship holding steady? Future uncertain? Doing your best Magic 8-Ball impression will help your design hold up over the next few years.
Let's use the following example explore this concept (I will stay vendor agnostic in my examples).
ABC Co. is a family owned company that has 20 employees, 5 of which use an ERP program on a daily basis, and host email internally. The "server room" is a climate controlled closet off the main office space with no racking. Sales have grown by 15% on average 3 of the last 5 years. You've been put in charge of the hardware refresh project. As hard as you've tried to demonstrate the benefits of shared storage, the budget is simply not there, and the CEO is comfortable with ERP and email services being down for upwards of half the day due hardware failure.
Knowing ABC Co. has projected growth, the solution implemented needs to have room for expansion. Assuming that the budget will not allow for the purchase of all resources up front, there are two things I do not skimp on in these scenarios: CPU and storage bays on the chassis. Dual 10 core hyper-threaded processors in each host may seem like overkill now, but when you look at the costs of purchasing 4 new processors in the future versus the capex investment today, fiscally, it's the better option. The same is true when it comes to the chassis. The price point between an 8 bay or a 16 bay chassis is generally negligible. Despite needing only 8 drives of local storage today, it's easier to fill in the open bays as needed versus replacing the entire chassis to obtain enough storage.
RAM and local storage is where you get to be more creative in your solution. Ideally, I would configure each host with enough RAM to run all VMs on a single host, but that may not be allowed within the constraints of the budget. Installing additional RAM in the future is cheap and easy, so it's not necessary to max out your host today. At minimum, I would suggest enough RAM to run all of the current VMs without constraint, and factor in some additional capacity to power one to two additional VMs should the environment suddenly need another server. This also allows the host to have the resources to bring up a critical VM should the other host in the cluster encounter a failure.
In a perfect world, local storage capacity, much like RAM, would provide enough resources to house all VMs in the environment. Knowing that storage, both shared and local, is expensive, this is usually not possible. What I've done in these scenarios is calculate how much storage is needed today, assuming max capacity of all thin provisioned VMs, and, much like RAM, add extra capacity for an additional VM or two. Again, in the future, it's easy to add additional hard drives and create a new virtual disk group to meet the storage needs.
In these smaller environments, providing high IOPS is not always necessary to meet business needs. While using solid state drives is necessary in some cases, adequate performance can be obtained from spinning disks in RAID arrays. I've had success implementing varying sizes of RAID arrays to meet the storage and budgetary needs. These arrays have included pools of 300-600GB 10K, 1.2TB 10k/15k, and 300-600GB 15k spinning disks.
Which RAID option is right for your environment? In almost every scenario, I have deployed RAID 10, but the hard drive count may deter some admins from its use. What makes RAID 10 a good choice is that is provides both read and write gains coupled with the ability to have two drives fail before data is lost (assuming they are from different sets). However, if the budget is truly that tight, RAID 5 is a decent option considering that it provides for a single drive failure and provides read performance.
Understanding that all VMs may be running on the same host on a single RAID array, I would recommend the investment in the utilization of RAID 10. A bit of capital spent now can save your company's data, and a lot of headaches, in the future.
Protecting Against Downtime
Downtime is inevitable, but there are steps admins can take to protect their environment from unplanned outages. Creating a nearly fully redundant hardware solution is the first step. Make sure to equip your host(s) with dual power supplies and dual NICs. Utilizing an uninterrupted power supply not only protects your hardware from a surge, but provides a clean flow of electricity to the host. The power input to the UPS should also be split between two different electrical circuits if possible, keeping things running in the event of a tripped breaker.
As discussed, RAID provides some data protection against drive failures, but what if an additional drive fails before you can replace the first? Implementing a hot swap drive to take over in the event of a drive failure provides a fail-safe to keep the array running at peak performance until you can replace the drive. Regular monitoring of the environment can also help detect and predict issues, but we will discuss that in a future post
We've got the Hardware, What's Next?
In the next post, I will go over VMware licensing for the small business and which solutions offer the best values.
In the meantime, do you have any questions, comments, or would you like me to dive deeper into a topic covered above? I'd love to hear from you! You can leave your thoughts in the comments section below, contact me on Google +Paul Woodward Jr, or reach out to me on Twitter @ExploreVM.
I had debated whether hardware or licensing should be the first topic covered in this series. As cost drive a lot of small business decisions, and hardware costs can vary greatly, it seemed like the best place to start. And, just a reminder, this series is focused for VMware admins in environments containing 20 or less VMs, the small end of SMB, obviously the solutions discussed below will not work for large environments. I will touch on larger VMware environments in a future series
Alright, let's begin!
The Shared Storage Conundrum
Shared storage in smaller environments is cost prohibitive. Yet, it's important requirement for VMware High Availability and Fault Tolerance. The small business must assess the value of uptime and lost productivity due to host failure against the costs of this additional hardware.
If 99.99% uptime, or the ~30 second reboot window using HA is critical, then shared storage is a necessity and you won't find value in this post. But, if a short outage while replacing failed hardware is acceptable, let's continue.
Local storage is starting to come back into the limelight with the advent of Virtual SAN and similar products. It also provides a cost effective storage option for the smaller ESXi environments. I will discuss the use of vSAN in the small business in a future post, but for the below example, simple local storage will suffice.
Number of Hosts
The number of hosts are dictated by the number and purpose of each VM. Does the environment call for a low number of VMs running simple network services such as AD, DNS, DHCP, print server, file server, and Exchange? Then a properly configured single ESXi host can be deployed. However, introducing more complexity to the environment on top of the above listed services, such as a resource intensive line of business application, the use of a second host will save future headaches and provide additional benefits.
The use of a second host provides flexibility, scalability, and, to an extent, disaster recovery. The flexibility to migrate VMs for host maintenance or to load balance, albeit manually, across both hosts (we will discuss VMware licensing and features in the next post). The scalability to add new VMs should the need arise. And, in the event of a host failure, a second host provides a platform to restore VMs from the failed host. Scalability and disaster recovery are dictated by the amount of resources with which the host is configured. This will be discussed next.
Configuring the Hosts
Now we know the environment will be running one to two ESXi hosts without shared storage, how do we properly configure these hosts? Here's where budget and some forecasting come into play. The approach I've taken is to look at the company's current needs coupled with potential growth over the next 1-3 years. Is the company projected to grow such that they will need additional VMs? Is the ship holding steady? Future uncertain? Doing your best Magic 8-Ball impression will help your design hold up over the next few years.
Let's use the following example explore this concept (I will stay vendor agnostic in my examples).
ABC Co. is a family owned company that has 20 employees, 5 of which use an ERP program on a daily basis, and host email internally. The "server room" is a climate controlled closet off the main office space with no racking. Sales have grown by 15% on average 3 of the last 5 years. You've been put in charge of the hardware refresh project. As hard as you've tried to demonstrate the benefits of shared storage, the budget is simply not there, and the CEO is comfortable with ERP and email services being down for upwards of half the day due hardware failure.
Knowing ABC Co. has projected growth, the solution implemented needs to have room for expansion. Assuming that the budget will not allow for the purchase of all resources up front, there are two things I do not skimp on in these scenarios: CPU and storage bays on the chassis. Dual 10 core hyper-threaded processors in each host may seem like overkill now, but when you look at the costs of purchasing 4 new processors in the future versus the capex investment today, fiscally, it's the better option. The same is true when it comes to the chassis. The price point between an 8 bay or a 16 bay chassis is generally negligible. Despite needing only 8 drives of local storage today, it's easier to fill in the open bays as needed versus replacing the entire chassis to obtain enough storage.
RAM and local storage is where you get to be more creative in your solution. Ideally, I would configure each host with enough RAM to run all VMs on a single host, but that may not be allowed within the constraints of the budget. Installing additional RAM in the future is cheap and easy, so it's not necessary to max out your host today. At minimum, I would suggest enough RAM to run all of the current VMs without constraint, and factor in some additional capacity to power one to two additional VMs should the environment suddenly need another server. This also allows the host to have the resources to bring up a critical VM should the other host in the cluster encounter a failure.
In a perfect world, local storage capacity, much like RAM, would provide enough resources to house all VMs in the environment. Knowing that storage, both shared and local, is expensive, this is usually not possible. What I've done in these scenarios is calculate how much storage is needed today, assuming max capacity of all thin provisioned VMs, and, much like RAM, add extra capacity for an additional VM or two. Again, in the future, it's easy to add additional hard drives and create a new virtual disk group to meet the storage needs.
In these smaller environments, providing high IOPS is not always necessary to meet business needs. While using solid state drives is necessary in some cases, adequate performance can be obtained from spinning disks in RAID arrays. I've had success implementing varying sizes of RAID arrays to meet the storage and budgetary needs. These arrays have included pools of 300-600GB 10K, 1.2TB 10k/15k, and 300-600GB 15k spinning disks.
Which RAID option is right for your environment? In almost every scenario, I have deployed RAID 10, but the hard drive count may deter some admins from its use. What makes RAID 10 a good choice is that is provides both read and write gains coupled with the ability to have two drives fail before data is lost (assuming they are from different sets). However, if the budget is truly that tight, RAID 5 is a decent option considering that it provides for a single drive failure and provides read performance.
Understanding that all VMs may be running on the same host on a single RAID array, I would recommend the investment in the utilization of RAID 10. A bit of capital spent now can save your company's data, and a lot of headaches, in the future.
Protecting Against Downtime
Downtime is inevitable, but there are steps admins can take to protect their environment from unplanned outages. Creating a nearly fully redundant hardware solution is the first step. Make sure to equip your host(s) with dual power supplies and dual NICs. Utilizing an uninterrupted power supply not only protects your hardware from a surge, but provides a clean flow of electricity to the host. The power input to the UPS should also be split between two different electrical circuits if possible, keeping things running in the event of a tripped breaker.
As discussed, RAID provides some data protection against drive failures, but what if an additional drive fails before you can replace the first? Implementing a hot swap drive to take over in the event of a drive failure provides a fail-safe to keep the array running at peak performance until you can replace the drive. Regular monitoring of the environment can also help detect and predict issues, but we will discuss that in a future post
We've got the Hardware, What's Next?
In the next post, I will go over VMware licensing for the small business and which solutions offer the best values.
In the meantime, do you have any questions, comments, or would you like me to dive deeper into a topic covered above? I'd love to hear from you! You can leave your thoughts in the comments section below, contact me on Google +Paul Woodward Jr, or reach out to me on Twitter @ExploreVM.
Subscribe to:
Posts (Atom)