Try our new research platform with insights from 80,000+ expert users
PeerSpot user
Architect at a tech services company with 51-200 employees
Consultant
SRM - standard disaster recovery for VMware

Most VMware administrators have heard of Site Recovery Manager (SRM). SRM has been the standard in disaster recovery for some time. It plays into VMware’s parent company’s (EMC) product line, traditionally leveraging storage based replication. This architecture leverages write journaling technology we spoke of in our first article in the series, so Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPOs) could be very aggressive.

The down side to this architecture is that the customer has to have similar storage arrays at both the production and disaster recovery site. If for example the customer had a fiber channel array on the production side, and a lower grade NFS array from a different vendor on the other side SRM was not compatible Bummer…

VMware however released vSphere replication in the vSphere 5 family suite and allowed administrators to replicate their virtual machines without common storage subsystems. What this means is that you could have your traditional fibre channel SAN on the production side, and NFS, or internal storage on your disaster recovery site. The underlying storage type is completely irrelevant as long as the workload is supported. This is a gift for DR budgets everywhere. Additionally you can recover to previous points in time using snapshots at the recovery site much the same as you would use a traditional snapshot.

SRM in thie configuration sits on top of the vSphere replication instead of RPAs that are common in array to array based architectures. These replication appliances are Linux virtual machines that are deployed in the VMware environment. I will give VMware a large amount of credit here, where some competing technologies are cumbersome to install, vSphere replication installation takes only a few mouse clicks. Your vSphere replication appliances are functional in just a few minutes. Replication can be configured through the VMware fat client or the web client.

So what’s the catch? vSphere replication would fall into the snap and replicate category. This means that RTOs and RPOs wont be as aggressive as with array to array based replciation, or hypervisor technologies that use write journaling. The current RTOs and RPOs that can be achieved by vSphere replication with SRM over vSphere replication is 15 minutes. There are rumors that this will be coming down to 5 minutes in the future, but it’s only a rumor at this point. Also if you are trying to move to the web client then you will dismayed to learn that SRM can still only be managed through the VMware fat client. I don’t know to many administrators that are excited about the web client, but it’s a relevant piece of information for your day to day work.

So what about the licensing and additional costs? There are pros and cons to the vSphere replication / SRM model.

The virtual appliances are Linux based – pro

This means there aren’t additional Windows licenses required to operate the environment. Some of the other products use Windows based virtual appliances. When you have to stand up more Windows servers you have to patch and manage them, this adds to the cost of the solution. SRM can generally be installed on your Windows system that vCenter runs on. If you’re using the Linux based vCenter appliance SRM isn’t compatable. I would expect this to be resolved soon as VMware is trying to eliminate the need for Windows systems in the environment.

The base vSphere replication is free – pro

Yes you heard that correct, vSphere replication is free. If you have lower priority virtual machines you don’t have to buy SRM licenses. This means you can save money and buy only the SRM licenses (sold in packs of 25) for your mission critical VMs.

SRM is the orchestration tool on top of vSpherer replication – nutural

SRM and all of it’s power can be scoped down to only the systems you need it for. I personally like the flexability and choice, most companies don’t need to replicate all of their virtual machines with very tight RTOs and RPOs. If you are trying to replicate your entire VMware environment, you maybe better off with a solution that licenses by socket as it maybe more cost effective.

Snap and replicate technology – con

At the end of the day snap and replicate technologies are limited. Because the recovered virtual machine ends up with snapshots scalability can be an issue. Let’s look at an example.

VMware recommends that you only have 21 snapshots at a maximum using vSphere replication. More snapshots than this can lead to snapshot consolidation issues. If you wanted to have a recovery point every hour, you wouldn’t be able to recover your virtual machine to a point further back than 21 hours. This a limitation of any snaphost based replication technology not a defiency with in SRM or vSphere replication.

Scalability – neutral

The upper limit to SRM with vSphere replication is 1000 virtual machines. This will suit most enterprises; however, for very large scale deployments this may not be enough. SRM with storage array replication for example can support up to 1500 vitual machines. This limit is roughly about what you would get with any other snap and replication technology. In my personal experience Veeam starts to have problems after 300 virtual machines in a single instance.

Speaking of Veeam this is the next technology that we will discuss. Veeam is a good product that not only provides DR capabilities, but also a very mature backup solution. Join us for our next article in the series.

Originally published here: https://simplecontinuity.com/dr-for-vmware-srm-on-vsphere-replication/

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Roger Nurse - PeerSpot reviewer
Roger NurseVMware NSX T/V Consulting Engineer /Solutions Architect at Onebox Solutions
Real User

Nice article - I recently have been looking at Vsan as a viable option for lab POC. Some DRaaS customers have a need to replicate/recover specific workloads outside of the SRM protected groups so they can control failover testing. In real world I do not see many customers using vsphere native replication in conjunction with SRA San layer replication. Vsan requires 3 host Minimum and works with vsphere replication.

Vsphere replication nice free to use pro for sure. Limited use cases as far as enterprise production recovery. Perhaps vsphere replication and vsan combination is low cost future of DRaaS?

See all 2 comments
reviewer1261665 - PeerSpot reviewer
VMware Software Engineer at a insurance company with 10,001+ employees
Real User
Reliable, easy to use, and the interface is simple
Pros and Cons
  • "It's easy to use and the interface is quite simple."
  • "Cost is definitely an area where the product could be improved."

What is our primary use case?

Our primary use case is for the end-users. I work as a VMware software engineer and we have about 50 people using the solution. 

What is most valuable?

Some of its valuable features are that it's easy to use and the interface is quite simple as well. It's really a reliable and a good product.

What needs improvement?

Cost is definitely an area where the product could be improved, I'd definitely say it should have cheaper pricing.

Definitely the product could be faster and of course in IT everything is about pricing. 

For how long have I used the solution?

We've been using the product for the past year. 

What do I think about the stability of the solution?

VMware SRM is very reliable and stable. 

What do I think about the scalability of the solution?

The product is very scalable. 

How are customer service and technical support?

I'm satisfied with the technical support we've received.

How was the initial setup?

The setup is relatively complex because of how we use it so the setup can take some time.

What's my experience with pricing, setup cost, and licensing?

We pay an annual licensing fee for the product. 

What other advice do I have?

I would recommend the product to anyone requiring a disaster recovery process. 

I would rate this product an eight out of 10. 

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
VMware Live Recovery
October 2024
Learn what your peers think about VMware Live Recovery. Get advice and tips from experienced pros sharing their opinions. Updated: October 2024.
816,406 professionals have used our research since 2012.
SoheylNorozi - PeerSpot reviewer
IT Consultant at a tech services company with 51-200 employees
Real User
Top 5
Good integration and community support to help with mission-critical projects and services
Pros and Cons
  • "The most valuable feature is the integration with our Nutanix environment."
  • "You need a lot of knowledge to work with the interface because it is not really easy to use, and it would be great if the dashboard were simplified."

What is our primary use case?

This disaster recovery solution helps our clients with their mission-critical projects. They are able to maintain business continuity and increase the reliability of their services.

What is most valuable?

The most valuable feature is the integration with our Nutanix environment. If it didn't have this ability then we may not be using it.

What needs improvement?

There are sometimes performance issues when working with outside links, and it would be better if this were improved.

You need a lot of knowledge to work with the interface because it is not really easy to use, and it would be great if the dashboard were simplified.

For how long have I used the solution?

I have been using VMware SRM for approximately two years.

What do I think about the stability of the solution?

I think that this product is very stable.

What do I think about the scalability of the solution?

I have no issues in mind with respect to scalability or flexibility.

How are customer service and technical support?

I have not been in contact with technical support. Usually, I get help from the community. The forums and websites are great for getting help.

Which solution did I use previously and why did I switch?

I don't think that there is another product that serves the purpose that this VMware SRM does. Integration is very important to me, to have the whole environment integrated together in one place. Because of this, I went straight to VMware.

How was the initial setup?

The complexity of the setup depends on the scenario, but some experience is needed for deployment and installation.

What about the implementation team?

Using a consultant to assist with the deployment is common and this is what I recommend to my clients.

Which other solutions did I evaluate?

I did not evaluate other solutions.

What other advice do I have?

I would rate this solution an eight out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
PeerSpot user
Project Manager at Shriram Value Services
Real User
Its array-based integration is the most valuable feature

What is our primary use case?

VMware failover is our primary use case of DC DR automation. We are integrated with array-based replication.

How has it helped my organization?

We had not achieved our RTO before the SRM implementation. But now, we have achieved our RTO for DR drill.

What is most valuable?

  • Array-based integration is the most valuable feature.
  • It has a user-friendly dashboard and will use the same vCenter console.

What needs improvement?

DR drill report is good but needs to be improved, and the replication monitoring feature is not available.

For how long have I used the solution?

One to three years.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
PeerSpot user
IT Administrator and Sr. VMware Engineer at a retailer with 501-1,000 employees
Real User
It has a detailed and comprehensive policy-based control.

Originally published in Spanish at https://www.rhpware.com/2015/09/vmware-site-recovery-manager-61

It is well known that VMware Site Recovery Manager is a high availability solution for applications and data transfer in private cloud environments. This is accomplished using isolation and encapsulation of virtual machines, resulting in simplified automation of the processes involved in replication to remote sites. Thus, SRM reduces the costs associated with obtaining efficient Recovery Time Objectives (RTO), providing a robust and standardized solution for business continuity and dramatically reduce the risk of data loss in our VMware virtualized data centers.

Among the features offered by SRM is the ability to create and maintain disaster recovery plans more effective, which do not use written procedures and maintenance costs that this entails, as well as automated processes generate maintenance and testing, which allows our environment thoroughly tested before the event of a disaster.

But these are general skills that we already know and VMware Site Recovery Manager, now is time to see that brought back the brand new version 6.1 of the product. We are going to analyze in further detail what is each of them.

Storage Profile Based Protection

SRM 6.1 incorporates a new type of group policy-based protection. These groups use Storage Profiles provided by vSphere to identify and protect the datastores and virtual machines. This automates the process of adding or removing the protection of VMs and datastores fully integrated and allowing these tasks to monitor vRealize Automation, for example.

Protection groups based storage policies uses vSphere tags (ability to attach metadata inventory of vSphere) with policies, allowing vSphere administrator automate the provisioning of virtual machines meeting the requirements of performance, availability and protection.


The way to do this is:

• Create a tag and associate with datastores in each protection group

• Then, an associate for each protection group policy is created using this tag

• Finally, the protection group is created and associated with the storage policy created in the previous step

Thus, when a virtual machine is associated with this policy it will automatically be protected by SRM. Just simple.

Extended Storage and vMotion orchestration

Site Recovery Manager 6.1 is now a complete solution optimized for both the multi-storage as well as to migrate from one place to another, and can also fulfill the function of disaster recovery. In previous versions this was not possible in one product simultaneously. SRM 6.1 supports vMotion between remote vCenters with stretched storage, with the benefits this brings.

This integration allows you to integrate SRM with stretched storage, which could previously only be achieved using vSphere Metro Storage Clusters. The advantages of this new system are:

• Maintenance downtime is eliminated. Recovery plans and orchestration between sites allow vMotion migration of workloads completely transparent to the end user and applications

• Disaster downtime is eliminated. Hot migration of using vMotion between remote sites allows Site Recovery Manager 6.1 eliminate downtime associated with recovery

Having stretched storage added to the deployment of Site Recovery Manager exponentially reduces recovery time in the event of disasters, as workloads are migrated hot, uninterrupted by presenting the same storage architecture at both sites by using synchronous replication, allowing registered and lighted move VMs transparently.

Improved integration with VMware NSX

It is no surprise that VMware leads the integration of network virtualization with NSX to all its products, and SRM is no exception. But let's see why.

As in every event of disaster recovery it should be taken into account and fine-tuned the specifics of the network, such as maintaining consistency in IP addresses, firewalls and routing rules previously set, opening ports and other vital aspects. To this we must add that the use of vMotion between vCenters remote requires a Layer 2 network complexity increases significantly.

Now with the availability of newly released NSX 6.2 and many new features were added, Site Recovery Manager is benefited greatly. Now you can use both products together quickly to maintain perfect consistency and efficient networking between sites and perform the migration automatically without worrying about specific aspects of the network, as it has resolved NSX.

In NSX 6.2 can create Universal Logical Switches. Such switches can create Layer 2 networks that exceed the limits of vCenter, which means that when these switches are used with NSX will create a protected port groups connected to the same Layer 2 network.

Thus, when virtual machines are connected to these port groups of a Universal Logical Switch, SRM 6.1 will automatically recognize and not the manual mapping of networks between protected sites will be required. Site Recovery Manager intelligently recognize that it is the same logical network connecting both sites maintain cohesion by creating a single network protected.

This ability to create a Layer 2 network beyond the limits of vCenter eliminates the need to reconfigure IP addresses in case of failure reducing by more than 40% recovery time. In addition, security policies and security groups, firewall rules and edge configurations are preserved in the virtual machines recovered, gaining even more time after a recovery event.

We now know that NSX 6.2 also supports synchronization rules firewalls as well as routing information. This makes it easy to ensure that the configurations in a production network and recovery are kept synchronized making it much easier to create a safe isolation between sites non-disruptively for testing recovery plans.

The implicit network resources mapping, extended capabilities of layer 2 and the testing capacity provided by NSX in conjunction with Site Recovery Manager, added to protection groups based policies radically simplify the administration and operation, low costs associated operations, increased testing capabilities and dramatically reduces recovery times.

Conclusion

As you can see, Site Recovery Manager 6.1 introduces fundamental characteristics that achieve levels of automation until now never seen on the platform, as well as a detailed and comprehensive policy-based control that seamlessly integrate with NSX offer really amazing capabilities face events and disaster recovery between them and turn everything can be done in half the time it took before. We must also not forget the support on extended storage vMotion, which also significantly reduces time and can achieve much lower RTO.

Thanks for reading the article and if you wish you can collaborate sharing on your social networks.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1126809 - PeerSpot reviewer
Founder at a comms service provider with 11-50 employees
Real User
Quick to deploy, stable, and works without issues
Pros and Cons
  • "If you want to do failover, it works without any problem."
  • "Sometimes it can cause a bit of downtime during switchovers."

What is our primary use case?

SRM is implemented as part of VMware.

We use it for failover between geographical locations from one location to another location. Sometimes for maintenance purposes, we can failover services to another location so that we can maintain the technology at another location.

What is most valuable?

The solution works well. If you want to do failover, it works without any problem.

The product is stable.

The deployment process is fast.

What needs improvement?

The challenge it has is with the speed of failing over. Sometimes it can cause a bit of downtime during switchovers. Sometimes you realize that when you are failing over you can have downtime due to the fact that you're stepping down on one side and powering up on another side.

For how long have I used the solution?

We've used the solution since 2012. It's been almost ten years.

What do I think about the stability of the solution?

The stability is great. There are no bugs or glitches. It doesn't crash or freeze. It's reliable.

What do I think about the scalability of the solution?

The scalability is just a function. Whether you can scale or not depends on your resources. It depends on your bandwidth. The scalability will depend on the infrastructure it is using.

We have three people who work with the solution directly.

How was the initial setup?

The deployment doesn't take long. It also takes just a couple of hours as long as you have prepared and long as you have a design.

For the implementation process, we needed three people, however, for implementation, we need a trained engineer. Just one engineer is fine.

What about the implementation team?

If you have trained people who are certified, you can do it yourself. The most important thing is to make sure that it is done by a trained engineer.

What's my experience with pricing, setup cost, and licensing?

We do have to pay a licensing fee in order to use the solution. We have an annual subscription. 

What other advice do I have?

I don't remember the version of SRM that we're using now.

I'd recommend the solution to others.

I would rate it at an eight out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user335949 - PeerSpot reviewer
Principal Analyst at a pharma/biotech company with 1,001-5,000 employees
Video Review
Vendor
It helps execute a playbook to bring up your DR site after failing over a group of VMs to it, although I'd like more tools to help with editing the embedded databases.

What is most valuable?

Site Recovery Manager is valuable because it helps with the difficult problem of failing a group of virtual machines over to your DR site and bringing them up. Because there's things that must be changed in a machine in order to bring it up somewhere else like maybe its IP address or, you know, any slew of other things, the port groups or whatever it needs to be connected in, and you can either manually do all that by hand or you can program your recovery plan in Site Recovery Manager and it's pretty much, you know, menu driven because it's common things that you would have to do to a server in order to bring it up somewhere else, and you can go in there and you can actually have it prompt you to say oh, by the way, you need to turn on the database server before you turn on the next server. And it pauses and waits there so you can go over here and turn on your database server and then you click dismiss and it goes to the next step. Which you wrote all these steps into the Site Recovery Manager so that's what it does. Really helps execute a playbook for you to be able to help bring up your disaster recovery site.

How has it helped my organization?

You know, I've gone to a lot of Site Recovery Manager training here and stuff. One of the things that I think that they minimize is that normally you'll never use your DR site. But what you have to do every year is test your DR site.

What needs improvement?

Yeah, I would like more tools to help with editing the embedded databases. I have run into some issues where human error, not something that VMware themselves would have ever planned for, but human error, has caused the system to get out of sync. And the only way to correct that would be to actually manually edit the database, which you could do if Site Recovery Manager were on a Window server but now that everything's gone to this, Linux appliance, this sealed up appliance, it's very difficult to actually edit the database. Or maybe just have a reset button for them to be able to put everything back to a normal state. Maybe that's all they would need to do.

What do I think about the stability of the solution?

It's a very stable product. It is as scalable as VMware is itself.

What do I think about the scalability of the solution?

It's really just an add on to the virtual center. It used to be responsible for replicating. It is no longer responsible for replicating. The replication portion of Site Recovery Manager has been moved to vSphere itself. A lot of people may not know this. So you do not need to buy Site Recovery Manager in order to replicate VMs around. You can do that for free. But the automation piece that I'm telling you about and the playbook and stuff is what you buy Site Recovery Manager for now.

Which solution did I use previously and why did I switch?

I was responsible for designing and implementing a DR solution for my company and being that we're on a VMware environment it seemed only logical to go to VMware first because all the machines that I need to put at my, disaster recovery site are virtual servers I was like well I'm sure VMware has a solution.

Being able to test the environment, being able to make the changes to the virtual servers so they could come up on a different network. I needed to be able to go in there and change things like the IP address, the DNS settings and stuff like that to be able for them to come up at a different location.

How was the initial setup?

Least favorite things about Site Recovery Manager. It is a little bit difficult to get it set up the first time you've ever just because it is so different.

What about the implementation team?

Actually paid a consultant to come out and help me, train me on how to install it the very first time I installed it three versions ago but I've done it enough now to where I'm comfortable with it.

Which other solutions did I evaluate?

No, there weren't at the time I did it. I've been using Site Recovery Manager for several years so.

What other advice do I have?

I always think there's room for improvement. They would seriously need to sit down and take a machine. I want to bring this machine up over here on a different network at a different location. And write down all the steps that they would manually do if they were going to do this process by hand. And like I said the replication is free. So they could technically replicate that over there right now today, make a copy of it and go oh, okay, go bring it up over there and write down all the steps that you have to manually do and then multiply that times the number of machines that you have to do for your DR site. 

In my case it's about one hundred. I need to bring up about one hundred servers. Then you sit there and think to yourself okay, so, and you could just, you know, take your watch and say okay, I'm going to start now. I'm going to go over there and see what it takes to get this server up at the DR site. Oh, that took me about 20 minutes. Okay, well, then, you multiply that times a hundred and you're at 200- 2000 minutes, okay. So would you have 2000 minutes’ worth of time to go through and bring, you know, work on all these servers in the-in the case of a DR scenario. And if the answer's no, then you probably should look at something to help you out. Some tool to help you out with that and that's what Site Recovery Manager brings.

Everybody looks at reviews and I look at the negative reviews as well because I feel like sometimes that some of the positive reviews may not have been real but, up, people will always complain about something they don't like. They're the most vocal so for Site Recovery Manager I would probably type in Site Recovery Manager reviews into a search engine.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user240054 - PeerSpot reviewer
Technical Architect at a tech services company with 51-200 employees
Consultant
It has done some rearranging of the recovery plans so that you can get better visibility into what is going on during a failure.

Originally posted https://theithollow.com/2015/08/31/vmware-site-recovery-manager-6-1-annouced/

VMware announced Site Recovery Manager version 6.1 this week at VMworld in San Francisco California. Several new features were unveiled for VMware’s flagship Disaster Recovery product.

Storage Profile Protection Groups

Remember back in the old days (prior to today), when deploying a new virtual machine we had to ensure the datastore we were putting the virtual machine on was replicated? Not only that, but if this new VM was part of a group of similar VMs that needed to fail over together, we needed to make sure it was in the same protection group? Well VMware decided this was a cumbersome process and added “Storage Profile Protection Groups”.

In SRM 6.1 we will use storage profiles to map datastores with protection groups. Now we’ll be able to deploy a VM and select a storage profile to automatically place the VM in the correct datastore and even better, configure protection for the virtual machine.

Orchestrated vMotion in Active-Active Datacenters

Yeah, you kind of expected something like this right? VMware announced long distance vMotion and cross vCenter vMotions with vSphere 6.0 last VMworld. We can now start doing live migrations between physical locations so why not add this to the disaster recovery orchestration engine?

I think this new feature might be very useful for some companies that routinely deal with disasters where there is some warning, like a hurricane. Prior to SRM 6.1 you would have been able to do a planned failover through a previous version of SRM, but it would have required a small amount of downtime. You might also have been able to do a long distance vMotion but this would have been some manual or scripted work. With SRM 6.1 the planned failover could be done in an orchestrated fashion with zero downtime!

OK, you’ve probably got some questions about this, lets see if I can knock out a few of them.

Question 1: What if my virtual machine has a lot of RAM and vMotions could take a very long time? Do I have to vMotion them for planned migrations?

Answer 1: Nope! If you have certain VMs that you know you never want to vMotion during your planned migration, you’ll have the option to select the VM and disable the vMotion option during protection.

Question 2: What about the network?

Answer 2: Yeah, the network needs to be the same on both vCenters or your VM won’t be able to communicate with the rest of the network anymore. This is the same as a normal vMotion. SRM will be able to change IP Addresses like it always has, but this requires a small amount of downtime as you might guess.

Question 3: Do I have two different planned recovery options then?

Answer 3: There is one planned recovery still, but now there is an option to enable the vMotion of eligible VMs.

vCenter Spanned NSX Integration

The last main feature of the product is its integration with the NSX product. You used to have to explicitly map each VM with a recovery network. Now in SRM 6.1 if you’re using NSX on both vCenters and the NSX networks are the same on each, SRM will map these networks for you. (yes, you can override this mapping if you need).

Other Notes

SRM 6.1 has also done some rearranging of the recovery plans so that you can get better visibility into what is going on during a failure. If you’ve ever had to troubleshoot a failover this is a great addition to help narrow down the problem. It also provides more places to but scripts into your failover, which is welcomed.


Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Roger Nurse - PeerSpot reviewer
Roger NurseVMware NSX T/V Consulting Engineer /Solutions Architect at Onebox Solutions
Real User

Eric - I really like this post. Right to the point. I already see the technical value in the new SRM features.

Buyer's Guide
Download our free VMware Live Recovery Report and get advice and tips from experienced pros sharing their opinions.
Updated: October 2024
Buyer's Guide
Download our free VMware Live Recovery Report and get advice and tips from experienced pros sharing their opinions.