A lot of community members are trying to decide between hyper-convergence vs. traditional server capabilities.
How do you decide between traditional IT infrastructure, Converged Infrastructure (CI), and Hyperconverged Infrastructure (HCI)?
What advice do you have for your peers?
It's really hard to cut through all of the vendor hype about these solutions.
Thanks for sharing and helping others make the right decision.
This is truly a TCO decision. But not as some do TCO, a comprehensive TCO which includes the cost of the new CI/HCI, plus the installation, training, and staffing, and the difference in operational costs over the life of the solution. This should consider the other points from Scott and Werner about consistency in support and compatibility of the components.
The best time I see this opportunity is in an environment with substantial IT debt, or when you can align refreshes of the various components together. This helps the TCO conversation dramatically.
Keep away from the "shiny object" argument. Just because "everyone" is doing it, is not the right reason. Nor does it make sense just because all your vendors are pushing it, or your technical team is pushing it. Again steer clear of "shiny object".
I recommend, getting demos of multiple 3-5 HCI vendors, and capture the capabilities they provide. Then spend some time with your business and technical teams to understand what their requirements in terms of capabilities really are. Separate the must-haves, from the should-haves, to the nice-to-haves. Then build a cost model of what you are currently spending to support your environments (understand your current TCO). Then build your requirements document accordingly. Release this as an RFP, to find a solution that can meet your TCO requirements. Your goal should be a better TCO than the status quo unless there are specific business benefits, that out way a simple TCO. Then you may need to talk with the business to fund that difference, so they can have that specific business value proposition.
Should I or Shouldn’t I – that is the HCI question.
Not to wax poetic over a simple engineering decision – but HCI is about understanding the size and scale of your applications space. Currently, most HCI implementations are limited to 32 nodes. This can make for a very powerful platform assuming you make the right choices at the outset.
What does HCI do that is fundamentally different from the traditional client-server architecture? HCI manages the entire site as a single cluster – whether it has 4 nodes, 14, 24 or 32. It does this by trading ultimate granularity for well-defined Lego blocks of Storage, Network and Compute. When seeking to modify a traditional architecture you need to coordinate between 3 separate teams, Storage, Server and Network to add, move, update or change anything. With HCI, if you need more capacity, you add another Block of storage, compute and network. You trade the ultimate “flexibility” of managing every detail of disk size, CPU type and how you network in a traditional architecture for a standard module in an HCI environment that will be replicated as you scale. At later dates, you can scale by adding a standard block or one that is compute or storage-centric. With that constraint, you don’t have to worry about the complexity of managing, scaling or increasing systems performance. But you do need to pick the right sized module – which means that for multiple needs, you may end up with different HCI clusters – all based on different starting blocks.
The decision to use HCI or traditional servers comes down to scale. For most needs today – general virtualization, DevOps and many legacy apps that have been virtualized, HCI is a more cost-effective. To handle EPIC for a large hospital chain or a Global SAP implementation for a major multi-national operation – you probably need a traditional architecture. If you need more than 32 servers to run an application today – that application will need to be cut up to fit in most HCI platforms. The trend is to use containers and virtualization to parse legacy applications into discrete modules that fit on smaller, cheaper platforms, but any time you even think of starting a conversation with "We just need the software dogs to ..." you're barking up the wrong tree.
Let’s look at three examples for HCI clusters – Replacing a Legacy applications platform where maintenance is killing you, Building out a new Virtualization cluster for general application use and a DevOps environment for remote and local developer teams.
A legacy application – say running logistics for distribution or manufacturing operation – typically is pretty static. It does what it does – but is constrained by the size of the physical plant, so will probably not grow too much beyond current sizing. In that case, I would scope the size of the requirements and if it is under 100TB probably opt for a Hybrid HCI solution where each node has 2 or 4 SSD’s that act as a data cache for a block of hard drives – typically 2-10 2.4TB 2.5” 10Krpm units – in a 2U chassis. You need to decide how many cores are needed for the application, and then have to add the number of cores the HCI software requires for its management overhead - which can range from 16-32. You start with dual-socket 8-core CPU’s and move up as needed, 12, 16, 18, 22, 24, etc. Most systems use Intel CPU’s which are well characterized for performance. So selection of CPU is no different from today - other than the need to accommodate the incremental cores for the HCI Software overhead. Most IT groups have standardized CPUs either because core selection is constrained by Software License costs tied to Core counts or go full meal deal on the core counts and frequencies to get the most from their VMware ELA’s. For HCI networking, since the network is how the disks and inter-process communications are handled, you go with commoditized 10Gb switching. Most server nodes in the HCI platforms will have 4-port 10Gb cards to provide up to 40Gb of Bandwidth for each node - tied together by a pair of 10Gb commodity switches.
If your legacy application is a moderate-sized transaction processing engine – then simply move from a hybrid system using a flash cache and spinning rust – to an All-Flash environment. You trade ~120 IOPs HDD’s for 4000 IOP SSD’s and then the network becomes your limiting factor.
If I were to build out a new virtualization platform for general applications I would focus on the types of applications – look at their IOPs requirements but in general, would propose All-Flash – just as we are doing today in traditional disk arrays. As noted earlier, the base performance of an HCI cluster is tied to the disk IOPs of the core building blocks, then network latency, and bandwidth. The current trend is for flash densities and performance to grow as HDDs have plateaued. While Spinning disk costs are fairly stable, the prices on flash are falling. If I build a cluster today for my virtualization environment that needs 5000 IOPs now and 10,000 IOPs next year as it doubles in size, I will get better performance from the system today and my future SSD price will fall allowing me to increase performance by adding nodes while the price per node drops. Don't forget that as the utility of a compute service increases - so do the number of users and maintaining user response times as more get added is about lower latency. Read SSD.
For a DevOp environment, I want those teams on All-Flash, High Core count CPU design limited to 6-8 nodes depending on how many developers and size of the applications they operate on. I would insist on a separate Dev/Test environment that mimics production for them to deploy and test against to verify performance and applications response times before anything is deployed to the tender mercies of the user community.
Obviously, I have made extensive generalizations in these recommendations but like the Pirate Code, they are just guidelines. Traditional IT architectures are okay but HCI is better for appropriately sized applications. Push for All-Flash HCI, there are fewer issues in the future from users.
When the budget dogs come busting in to disturb the serenity of the IT tower go Hybrid. Measure twice, scale over and over and document carefully. Have the user and budget dogs sign off because they will be first to complain when the system is suddenly asked to scale rapidly and is unable to deliver incremental performance as user counts grow. User support is tied to IOPs and when your core building block is 120 IOPs per drive you have 30 times less potential than when you use SSD.
Once the Software dogs catch up and the applications all live in modular containers that can be combined over high speed networks there will be no "If HCI", Just Do.
HCI lowers operating and capital costs since it is dependent on the integration of commoditized hardware and certified/validated software such as Dell Intel-based x86 servers and "SANSymphony" from DataCore, or say VSAN from VMware, etc. The idea is used to reduce costs and complexity by converging networking, compute, storage and software in one system. Thereby avoiding technology silos and providing a "cloud-like" cost model and deployment/operation/maintenance/support experience for the admin, developer, tenant/end-users.
HCI platforms can also help with scalability, flexibility, and reduction of single points of failure as well as HA, BURA, security, compliance, and more.
HCI provides a unified resource pool for running applications more efficiently and with better performance due to being more rack dense and because different technologies are converged into one solution.
Placing technology inside the same platform is very beneficial if for no other reason than the physical benefits of how data and electronic signals have less area to travel through. For instance, using internal flash memory to support a memory/disk read-write intensive database is a good idea so a separate external array is no longer necessary.
It is no different than how far water from a glacier has to travel down the river after it melts in order to reach the ocean. The more rapid and closer it is between the original source and the final destination, the better.
The above is a simple answer without knowing the current needs/use case of the business and is only intended to provide a very simple perspective on WHY HCI. You must consider your IT organization, goals and business needs very carefully before choosing any solution. I always advice considering the following;
1. Is "your world" changing?
2. Why?
3. What benefits do you expect to achieve from making incremental changes?
4. What happens if you do nothing?
As for me, the traditional approach is more simple to administrate and support. Converged Infrastructure (CI) is mostly the same – traditional approach, the difference is marketing flood.
Hyper-converged Infrastructure is more expensive in CAPEX and more complex to support(especially find out fails and its causes) than traditional
approach.
We are on the verge of new generation applications – with the web-scale-out approach. And for these types of applications, HCI will rule.
For traditional applications, HCI is not more suitable than the traditional approach.
There are some ground rules for moving to HCI.
1. All servers are virtualized
2. There is a balance between the compute, working memory and storage capacities and requirements.
3. Financially, it makes sense if there is similar aging for both compute nodes and storage systems. Recently acquired storage and older compute would likely make it more sensible to upgrade with required compute nodes.
4. As for what HCI System you should choose, consider whether you want a highly flexible system (which normally is the opposite of moving to simplified systems) or if you want an appliance that is set up for your needs for the next five years. The whole idea of HCI is to simplify and consolidate your systems. Having plenty of units that are open to adding new units and devices on a continuous basis is basically keeping your traditional DC.
So to sum it up, if you have an all virtualized system and expect it to grow “naturally” over the next five years, go for HCI. If you have different needs including bare metal, consider composable. If the only thing that matters is the purchase price and no focus on simplifying and lowering the TCO, go for traditional DC.
Aspects to consider:
- Capability: Most HCI storage are HA and single node failure won't affect the production. some requirement on IT system e.g. Singapore MAS is mandatory to have HA on critical system storage. This is a good point to consider HA at a storage level.
- Cost: There is actually no difference in the long run, and HCI may be a bit expensive.
- Builtin function: Backup? restore? replication? Minimum interval for replication? (affecting RPO), and any automation on DR? (RTO).
- Redundancy level: * Disk redundancy: how many disks can be failed at the same time
* Node redundancy: how many nodes can be failed at the same time
* Node and Disk: how many node and disk failure on other nodes will crash the HCI
For me, it's nice to have everything in one box or in my case, 2 boxes in 1 rack. I went Hyper-converged a little over two years ago and I haven't looked back since. I realize that many factors go into deciding whether you go hyper-converged or converged infrastructure. It was the small amount of data that we have that ultimately had me decide on going hyper-converged and yet today, one can put many terabytes in a single machine. It's a hard decision but I'm glad I went with hyper-converged.
HCI is easy to administrate and the performance is good but the stability of the services, firmware, software, and dependency from the vendor and VMware are not easy. Also, the upgrade to a newer version needs much more planning.
Today, instead of a traditional physical server, lots of IT Users are shifting to Virtual Machine. It is easy to manage and use physical hardware to optimize use.
Traditional IT infrastructure up to some extent will exist based on the application not being ready for VM or being heavily connected to external devices. But things are changing with IoT or Edge computing for the collection of Sensor data.
As today's applications demand 24/7 uptime, no data loss, odd hours maintenance time for the IT team, ever-growing data, to help overcome this challenge CI & HCI is the thing to go for.
There is a thin line between Converged Infra (CI) & Hyperconverged Infrastructure (HCI) from a solution point of view, but cost, support-service, vendor locking all changes between them.
An example of having CI is somewhat like having a blade-server setup, where you are tied with a vendor, and at times the hardware service can be unavailable; you need special after-sales support.
HCI offers more flexibility. You can mix and match a lot, like using some of the existing hardware in the setup of HCI too, resulting in a low TCO.
Go for HCI, like Proxmox HCI which is open-source and low on software cost. You can plan to use your existing hardware too up to an extent, thus you get HCI on a limited budget and get peace of mind as IT Team.
Below are items to consider:
Daily Operation:
HCI: Easy to deploy and manage, the administrators don’t need learning the new technology on storage. For example, it has a different skill and configuration on each brand of the storage system.
Cost:
HCI: It is more expensive during the initialization phase.
Hardware and Software Upgrade:
HCI is more flexibility during the hardware upgrade, and service interruption is not required.
Protection level:
If you plan to deploy the active-active cluster across two data centers, HCI is the best option. If we are using the traditional IT infrastructure, the cost of the active-active cluster is much more expensive than HCI.
Storage sizing:
It has the limitation on storage size on traditional IT infrastructure, e.g. the number of disks and storage controllers.
HCI is more scalability and flexibility when we are planning the usable storage capacity.
The types of HCI:
Each brand of HCI has a different software-defined storage technology, e.g. VMware vSAN and storage-controller virtual machine base.
If you need to refresh your servers and storage, then yes. You pay more but is much easier to handle, otherwise, I wouldn’t do it. I picked Cisco HyperFlex because we have Cisco UCS servers and it integrates well with our current system.
Most companies running IT Infrastructure who also want to reach cloud as part of their strategy but are struggling due to their application base should be strongly considering HCI solutions that offer Multi-Hypervisor and Multicloud as key adoption features.
Hybrid Cloud solutions that lower OPEX costs by 60% and more also help companies get the most out of their IT spend as well as tightly manage Cloud costs with very tightly integrated tools like CALM and Beam.
FRAME for VDI sessions a la Citrix is also offered as a rich feature set that is also tightly cloud integrated. Getting out of the DR site business is also a major money saver with Xi LEAP and concurrent Multi Public cloud strategies that also lowers OPEX by a significant margin and the CAPEX savings due to cloud presence speaks for itself.
HCI is consolidating several silos into just three. HCI itself collapses cloud, storage, compute, virtualization and application management into a single easily managed silo, leaving networking and specific application silos as a simpler easier to manage IT chunk to very easily manage and control. HCI also allows you to either retain VMware or replace it with AHV, a Hypervisor that itself is starting to exceed the capabilities of vSphere ESXi and the price is included with the HCI platform allowing further cost savings impacts to be realized.
HCI is not the Panacea for all IT challenges, but it does closely follow the 80/20 rule. It will elegantly solve 80% of IT Infrastructure and cloud challenges with security levels built right in out the box. For most corporations, the leading HCI solution will solve 95% of the tasks it presents with massive OPEX savings and the time to deploy their HCI is often condensed from 120 days to a 12-16 day cycle to deploy, test and hand over to operations.
The ease of use and simplicity is so slick that my 8-year-old grandchildren fully administer and run my 8 node cluster at home that my wife runs her businesses on. It took me a few weekends to show them the ropes and they took to it like ducks to water. Now we are all learning to extend into AWS and Azure public clouds and replace ESXi with AHV and integrating Cohesity backup into the equation.
My grandkids will probably be the most experienced HCI and Cloud resources by the time they finish college (with 15 years of experience to boast about buying that time as well). They will probably pay for college through remote administration of HCI and Cloud environments we offer as a service.
HCI infrastructure with cloud as a service, who can say no to a sublimely easy to run and low cost to manage experience with full cloud extension into their domain?