What will be the best strategy for develop a up to date BCRS?
Having a Business Continuity Recovery Solution is a tough job. Many details to take care of and many technology experts are involved in a large organization. Is there any best strategy to develop and maintain the BCRS up to date?
Hi,
Let me begin with this. BCRS has two parts to it - Biz Continuity & Recovery Solutions.
Every Organization has got the following at the least:
1. Biz space
2. Technologies
3. Process
4. Departments or Units
Now to look at it:
1. Firstly prepare a business case study (sites, headcounts, biz types, units etc)
2. Prepare a risk chart and categorize the assets (people, tools, hw, sw etc) against it
3. RPO & RTO not limited technologies, but has to be applied to the entire biz to achieve best and relative results
4. Put together dependency list and map them
5. Especially from technology perspective (imaging, block level backups, application replications, etc)
6. People point (for eg:- If you have location with 100 seater and a second location with 400 seater, need to define optimizer techniques and achieve the Biz continuity out of it, then prepare recovery solutions to the same)
Summary:
There's no one fit solution for any organization regardless of its head count. There are sever other factors that need to be counted in, as well to keep the BCRS up-to-date, need to start from hire-to-exit workflows, asset assignment to retirement practices workflows...with that the BCRS will be touch based to see if updates are required to tweak in.
Again in interest of readers time, for further queries, feel free to write to komandooru.arun@gmail.com
Search for a product comparison in Backup and Recovery
Channel Development - Business Continuity at a tech services company with 501-1,000 employees
Consultant
2015-10-14T12:21:01Z
Oct 14, 2015
Hi Tomas,
With BC&RS you will have as many possible solutions as you have experts. To keep some semblance of sanity around this, consider the following:-
Step 1 - What have you got and what do you need
1. Complete or Review you Business Impact Analysis for core systems/applications and data
2. Identify critical data availability needs at a data set level (RPO)
3. Identify critical system availability needs at both an application and system level (RTO)
4. Clarify the actions from system availability to having the organisation running business as usual.
Step 2 - Does it do what I want it to do
1. Recovery / Disaster Tests
2. Process Tests (Combination of system outages to check for weaknesses)
Step 3 - Get everyone into a good habits, Users and IT alike
Users - Like a building fire drill, get everyone into a good set of BC&RS habits.
I appreciate this is vastly simplified, however, it give me a simple check list to make sure I'm on the right track when helping a customer.
Storage Analyst III at a energy/utilities company with 1,001-5,000 employees
Vendor
2015-10-14T14:53:21Z
Oct 14, 2015
How big is the environment?
What are the Critical Apps SLA’s?
What is the business requesting in the way of RPO and RTO?
What are you currently using and what are the biggest pain points?
Are you looking to go from a tape backup environment to disk?
Is there any technology/solution that you or your business favour?
What is your budget?
That’s just to start with.
Once you have all these questions answered you can start to look at available solutions within your budget.
Take into account what you already have on the floor and see if any of it can be re-tasked for the new environment that is affordable RE maintenance costs.
We are just starting a project to replace tape as part of our Backup/DR long term plans.
We started with a number of vendors and got down to 2 and done a POC onsite for both to determine which was best for the environment and SLA's, RPO and RTO.
This gave us a great insight to both solutions as well as the local technical skills from each vendor, and we were able to gauge customer reactions to both solutions and choose the best fit with confidence.
Technical Consultant at a tech services company with 51-200 employees
Consultant
2015-10-14T12:27:57Z
Oct 14, 2015
1) It is always a best practice keep the critical backup in a system apart from keeping it in tape. At least the last backup should be available where we can access it immediately to restore at any time of crisis. If it is not done, then time to recovery will be very high to bring the tape and restore. Tape restoration is sequential and it is slow.
2) Better to have a DR solution with synchronous data replication.
3) Proactive support.
4) Best Monitoring system so that the failure should alert immediately without any delay.
What a great topic - I'll add in a disaster recovery plan template and guide that we've created and made available in an editable format which may be helpful to you or other visitors; it's available here: www.armadacloud.com
An IT DR plan is a big part of business continuity planning; we'll be releasing a BC plan template and guide shortly that includes the disaster recovery plan aspect. Good luck!
Michael, how have you utilized the need to maintain precise documentation in order to choose the right software/hardware? Have you been pleased with your decisions?
Technical Consultant at a tech services company with 51-200 employees
Consultant
2016-10-18T13:10:29Z
Oct 18, 2016
Very precise documentation describing environment and proper people which fully understand process/workflow (including possible way to recover environment to a working state). Then you can choose software/hardware and all possible resources. Let's call that strategy - do it properly!
In my opinion, a defined/documented business policy for BCRS/BCDR is the important 1st step. Understanding of the applications /workloads will also help choose the best technology: product or/service. I would recommend looking for cloud solutions as it is in constant innovation, driving costs reduction and it is opex model.
Technology aspects are important, but there are solutions that can help, like replication (at storage or VM or application level), orchestrator (like VMware Site Recovery Manager), ...
Most important are the organization and procedure aspects that are usually the biggest part of a Business Continuity Plan.
Account Manager at a tech services company with 501-1,000 employees
Consultant
2015-10-16T22:12:55Z
Oct 16, 2015
First we need to keep up the technologie evolution,
There's no easy or best strategy to develop and maintain the BCRS up to
date.
First we need to think on the hardware on each DataCenter (assuming that we
gonna have Disaster Recovery Site Mode).
We have to think on virtulization, that help us to have almost zero
downtime.
Then we can use replication, to keep all the sistems up in case one site
fails.
Today, virtualization can help us in many ways, including DMZ and by
bringing the operational model of a virtual enviroment to the data center
network, where we can transform the economics of network and security
operations.
Basically, virtualization is a big help on this cases, we can replicate
productions eviroments beetween sites and avoid downtimes in case of site
failure.
We can do the same using hardware replication base, but the costs will be
huge.
Customer Support Engineer at a computer software company with 51-200 employees
Vendor
2015-10-16T10:20:39Z
Oct 16, 2015
In Management View (Planning Phase)
1)Finance plays an important role in all Businesses, Budget should be allocated in Proper way as per their Business Needs, there should not be any compromise while Planning the BCDRP
2)Understand the Organization's Business Risk & its Critical
3)How to secure the Organization's Information(DATA) in case of Disaster, Hacking & etc
4)Define Organization Loss in case of Disaster, unavailability & etc. by terms of RPO, RTO & MTPOD.
5)Prioritizing the Application's/Information from most critical to Least critical(Keeping Budget in mind)
6)Security Policy, Continuity policy should be planned & Designed
In Technical View(Implementing Phase)
1)Make sure information should be secured as per the Policy (for example anti-virus should be installed on all machines)
2)Network Availability is the base in IT, so it should be Implemented in Hight Availability link
3)Storages, Database & Servers Should be in High Availability(VM, Cluster, which ever suites for their environment )& monitored Properly
4)High Availability will provide always availability(no downtime)
5)People Availability, Team should be available always as per business need
6)Prepare Remote Site(Disaster Recovery Site - DR) & make sure required DR resources were in sync with DC.
7)Clear Document should be prepared for each & every environment
8)Human Error also consider as Disaster, So Automate all Datacenter Processes
9)Test the DR site in regular time interval, this process also should be in automation
In Auditing View (Testing Phase)
1) Documents should be tested
2)Reports should be Validated
3)Conducting DR-DRILLS in regular interval time
Cloud Services Administrator at a tech services company with 51-200 employees
Consultant
2015-10-15T15:06:01Z
Oct 15, 2015
My best advise would be high availability, vm replication and a good backup program. Such as Asigra. You want to minimize downtime as much as possible, of course. For example, the program has the ability to save an encrypted local copy on-site as well as off-site. With a local copy, it minimizes downtime when trying to recover. Time that could save money.
Head of Fasset Technologies at a tech services company with 501-1,000 employees
Consultant
2015-10-15T13:41:04Z
Oct 15, 2015
Hi Thomas,
I agree with the comments made by Matthew Harris
With BC&RS you will have as many possible solutions as you have experts. To keep some semblance of sanity around this, consider the following:-
Step 1 - What have you got and what do you need
1. Complete or Review you Business Impact Analysis for core systems/applications and data
2. Identify critical data availability needs at a data set level (RPO)
3. Identify critical system availability needs at both an application and system level (RTO)
4. Clarify the actions from system availability to having the organisation running business as usual.
Step 2 - Does it do what I want it to do
1. Recovery / Disaster Tests
2. Process Tests (Combination of system outages to check for weaknesses)
Step 3 - Get everyone into a good habits, Users and IT alike
Users - Like a building fire drill, get everyone into a good set of BC&RS habits.
I appreciate this is vastly simplified, however, it give me a simple check list to make sure I'm on the right track when helping a customer.
Senior Automation Engineer at a construction company with 1,001-5,000 employees
Vendor
2015-10-15T11:39:34Z
Oct 15, 2015
Another point to note is that while everyone could identify a perfect BCRS scenario and a pick list of the best available technology, it is rare that a company ever has sufficient funds to implement a solution fully.
If you are a seasoned practitioner/realist rather than a "preacher"/consultant, most often you will be asked to implement or left to work with a system that is just adequate rather than more than enough. Any issues this may create you will encounter over the first 12 months, hopefully before a full disaster strikes.
Is there any best strategy to develop and maintain the Business continuity Recovery solution?
There are variations in business practices or processes designed in the name of implementing BCRS. while some organizations are comfortable with periodic backup of transaction servers, firms that maintain an aggressive risk management detail tend to have a more rigorous and detailed BCRS that looks at all aspects of the business human resources inclusive. In this latter case it is not unusual to find a cross functional team of senior managers maintaining a dashboard of the BCRS on a daily basis.
In between these two extremes, are many variations some responding to audit findings, specific needs of a line of business while others implement a roadmap established by top management. The variations maybe an indicator of the business maturity in risk management but this is best left for another forum.
To the question above, the answer is YES there is a best strategy in Business continuity Recovery strategy. When BCRS is owned at Top management / board level, chances are BCRS best practices are not only followed but also subjected to testing thus enabling continuous improvement. Where lines of business initiate BCRS, we are likely to find overlaps, omissions, duplications not to mention poor response in case of a business crippling incident.
So then what do we find in Best Practice Led BCRS.
• Top Management sponsored & Monitored BCRS Objectives
• A cross functional Business team responsible business risk assessment, Draft / Updated BCRS policy aligned with BCRS objectives. The policy should spell out key roles including third party dependencies, trigger points, records management and remedial measures.
• An implementation team that represents all owners of key assets in the BCRS. This team is the operational component of BCRS and will execute the day to day tasks of the BCRS during non incident days usually on call 24/7. They will be the key hands in case of a business recovery and should demonstrate adequate hands on knowledge of recovering critical business systems.
• Updated audit reports on the compliance of the BCRS team to a) BCRS objectives, Coverage of all identified BCRS processes and Testing of all practices including time taken to restore in case of a business crippling incident.
From the foregoing we realize that it is not just taking backups and keeping them safely. The organization basing on a matrix of objectives will determine the right mix of strategies to enable it maintain its competitive advantage(private sector) or operational efficiency (Public).
Do not be surprised therefore when you find both cold and walk backup solutions, data warehouses for high end Transaction processing systems, off site backups, versioning to track changes over time, etc.
Customer Support Engineer at a computer software company with 51-200 employees
Vendor
2015-10-14T18:47:28Z
Oct 14, 2015
Hi,
Proper Analyse & Automate the end to end Business Continuity Process and
test/audit in regular time interval is the best strategy to develop and
maintain the BCRS up to date.
Inovative and CreativeThinker at a tech services company
Consultant
2015-10-14T18:18:09Z
Oct 14, 2015
First of all, I would like to qualify myself. I graduated this June from Colorado Technical University (CTU) (Online) with a BSIT degree with 2 concentrations - Systems Security and Project Management. My final GPA was 3.95 and I was awarded the distinct honor of Summa Cum Laude. Prior to CTU, I attended the University of Cincinnati for 2 years pursuing a BSIT degree with a concentration in Networking.
I have my initial thoughts to throw out there for some more input to satisfy my questions. Is there an existing BCDRP? Was it practiced or simulated in any way to test its effectiveness? Has there been major changes since it was implemented (assuming there was an existing one)? What framework templates have been applied? Have you used the NIST guidelines since you are referencing a government organization? How have you been applying updates and performing Change Management? If there is an existing backup site, and if so, is it a cold, hot, or contracted out site? Are you ISO certified?
If your Technology Bureau Deputy Director lives under a rock, I can see the incompetence of the situation. However, having a title doesn't necessarily make one an expert - but seriously, a Technology Bureau Deputy Director should already know the answer, and if not - you mean to tell me that they don't know how to use Google?
Let me throw this Holy Grail concept out there for everyone! The VM Cloud Computing Architecture utilizing solid state flash drives with a StorageTek SL8500 Tape library for backup, an off-site storage and backup agreement with someone like Iron Mountain, the use of SAN and NAS storage technology with appropriate redundant HBAs and redundant A & B power connections, and a strong UPS should make any Data Center proud... Disaster Recovery is a plan that should be designed at the same time as the Business Continuity plan because of the overlap of Incident Impact - whether natural or manmade. There are hot cut-over sites that have almost the same redundant H/W and S/W applications and same to similar process functionality that is fully operational. There are cold cut-over sites that have the facility space to store the critical components of a skeleton Data Center and the capability to bring in the other items as needed; however, this cold site would not be in a continuously operational state and would have to be brought to life - so to speak... There are the partial needs scenarios - maybe all that happened was a major power outage. By having a pre-signed agreement to have a third party contractor arrive on site with mobile generators to supply temporary power might suffice. Maybe the PBX system had a major meltdown - same principle, contract with a communications vendor to supply mobile PBX capabilities until the permanent onsite PBX system can be repaired and functional again.
Whatever has been planned and approved should be exercised to practice the many different roles performed as assigned during a business crisis or disaster impact - many of which will probably be performed by the Incident Team. There are workstation/conference exercises where the role-playing can be practiced through simulated scenarios right at the organization site. Then, there should be actual mock exercises to make the practice more realistic along with after-action reviews and critiques to make continuous improvements. The more you practice, the more proficient you become...
The very first goal is to perform a risk assessment (RA) and a business impact analysis (BIA) to determine risks, value, and impact as a threat. General rule of thumb - e.g., if the value is $250 with minimal impact - don't spend $2500 for a Business Continuity Plan. Earmark $250 for a reserve contingency fund to replace the like item.
The second goal is to make your plans realistic and doable! You are located in NY state, and you have a BCDRP set up with a similar organization in California. This may look good on paper; however, it's not at all practical nor realistic.
Third, I mentioned on this briefly already, you need a good liability insurance policy, an approved earmarked reserve contingency budget - that's reasonable to cover replacement costs not covered by insurance, and some common sense planning for anticipated risks. If someone had used some common sense, the 911 emergency call center would not have been shut down because of a flooded basement where the operations call center was located... This critical emergency asset should have clearly been located on the 2d floor or higher. Another point - your Data Center draws 50000 Kw of energy to satisfactorly power and run the technical power load. So, why would anyone in their right mind design only one backup generator rated at 20000Kw to supply backup power? Believe it! These 2 scenarios were real live situations that I have actually seen and reviewed for my own research.
Lastly, you don't just stop with DRP... To be prepared for the real scenario of a disaster - you will also need a Disaster Follow-up Plan. Especially when there may be fatalities - you'll need to be sympathetic to the survivors and family members, you need to coordinate with the local authorities in advance, e.g., the local fire station and the establishment of an "Incident Command Post" in the primary effort of establishing some form of organized order - who's in charge and who isn't. Where is the rally point for the employees to assemble in order to be accounted for and be seen and treated if needed by the EMTs or other medical personnel? Who's in charge of notifying employees family members should an employee become hurt from an injury? When do you notify the public media and stakeholders of the situation? Are you obligated by any Federal Regulations to perform certain mandated actions?
If you have all of the answers and are on the correct path for your BCDRP to be successful and effective - then you, my friend, are way ahead of 87% of the rest of your co-harts. Many times, BCDRPs are just given lip service for something that's never going to work and is drawn up and thrown into a policy book and you're done - right? Wrong answer! If that kind of BCDRP has been approved by your upper management - they are willing to take huge risks and put the lives of their employees in jeopardy. I for one would hate to be employed at an organization whose culture is that lax and careless. My advice to employees that should ever work in an environment like that - RUN! Run Forest - Run!!!
Sr. Account Manager at a tech services company with 51-200 employees
Real User
2015-10-14T15:54:07Z
Oct 14, 2015
There is not a single best solution, while there are many good options, a solution that fits your data protection needs would depend on multiply variables. Discussing your compliance requirements and current infrastructure environment in a white boarding session would help narrow it down a bit
Senior Storage & Virtualisation Engineer (Managed Services) at a tech services company with 1,001-5,000 employees
Consultant
2015-10-14T13:56:50Z
Oct 14, 2015
In order to implement a good BCRS plan, there are alot of things to be
considered before selecting the right solution or strategy that suits your
environment. Some things to consider involves what kind of infrastructure
are you trying to protect? is it a physical or a virtual infrastructure?
what is your expected RTOs and defined RPOs? what is your backup plan? and
disaster recovery strategy?
It is usually a good recommendation to implement the 3-2-1 backup strategy
which means having atleast 3 total copies of your data, that is two of
which are local but stored on different media(ie, disks or LTO tapes), and
at least one copy offsite. Having backups locally helps to recover data
quickly when the need arises, however in an event where the onsite backups
are destroyed, using the 3-2-1 strategy allows you to recover data from a
remote site and restore it. There are many products out there that can be
used to implement this strategy, once again depending on the kind of
environment you have, if its a virtual one or physical one, one good
example for a virtual environment is VEEAM Backup & Replication software,
and another good solution for physical environments is Backup Exec from
Symantec. These are just a few solutions among many.
In addition to implementing a robust backup strategy for your environment,
you will also need to have a disaster recovery plan. This strategy is
important because it allow you to return to production if your protected
site goes down. In this strategy you will have to define what you need to
protect from your main site. Then you can replicate that to the recovery
site, this will allow you to failover to the recovery site when the
protected site is down within shorter period of time depending on the
configurations you have made. VMware site recovery manager is one good
product that allow you to implement such a strategy within your
organization to allow to business continuity when you need it. Also certain
advanced backup solutions like VEEAM can replicate strategic servers to a
recovery site which will allow you to bring those systems back online when
there is a disaster, this solutions however are used within virtual
environments. In addition to that many storage vendors offer many
replication technologies within their arrays that allows you to
consistently replicate data between your protected site and recovery site
which allows you to easily failover to your recovery site when the need
arises.
This strategy however can be expensive so you must really define what you
need to protect, so that you will have have to purchase alot of
infrastructure to run your core services when there is a disaster.
Putting considering the above you can tailor it to suit your environment
depending on how it has been configured and how you want to implement your
business continuity strategy.
Head of IT unit at a financial services firm with 1,001-5,000 employees
Vendor
2015-10-14T13:55:00Z
Oct 14, 2015
I'll add only, a tough job is to create BC Plan once, I think. Then regularly testing and updating BCP is the best practice. We do this once a year for example.
My best advise, Keep it simple. As organization grows, not only the complexity of people, areas, but the technology mix would try to make you take decisions based on each product, service, area, that probably would guide you to a bunch of specific solutions for each. So, keep organized, and gather tools that would be easily mantained, maybe hard to implement, but that could be monitored and serviced.
Also As with every IT related project, planning stage and a good group of functional project administrators are a must.
In short, BCRS is an ongoing process. It will depend on company size, critical data, resources and budget. To quote CenturyLink: "The shelf-life of a business continuity plan is like a buying produce—only good for a short while. As a result, business continuity isn’t just about putting a plan in place; it’s about putting a plan in place and then making sure it still meets your needs over time. After all, companies change. Business requirements shift over time. A good plan a year ago might not cover you today.
The bottom line: If you're not thinking about business continuity as a continual process, then your strategy could be outdated. And if you're not building validation into your disaster recovery strategy, you're not nearly as protected as you think."
Also important to note is: "An effective business continuity strategy balances cost and risk, or determining which workloads must operate continuously and which don’t. As part of this, ask what an acceptable level of downtime is. For instance, do you have workloads or business processes that won’t impact operations if they go down for an hour? Or whose impact will be minimal if they go down for a day?"
Resource: "http://app.centurylink.com/businesscontinuity/ebook/six_ways_business_continuity_can_cost_your_business/welcome-54G6-1157W3.html"
Key points in a recovery solution: High Availability, Data Protection and Data Recovery.
You will need to understand the acceptable downtime for that specific company. You need to know your RTO and RPO capabilities. On smaller budgets a N+1 solution is the most basic to plan around. Bigger budgets can accommodate more equipment and higher redundancy. There after backups have to be tested regularly. The BCRS will change as requirements and needs for the company or data changes. Data grows, better technology does become available. It is all about being prepared for the big WHAT IF.
It is important to understand that you will never have 100% uptime and that human error must factor into the calculation. It will take a lot of planning and continued implementation.
The best practice would be to not skimp on the planning, use trusted vendors and technology and staying involved with the Business Continuity Recovery Solution.
Replication Is needed, But just for the deduplicated data, That too Just
the first time Full backup replication & Then the incremental Changes going
from A to B and getting deduplicated at B, Keeping in mind that MPLS can be
controlled & wont choke the entire network. Although going on cloud , Some
people are adopting the need to host VM's over cloud itself by the data
they are dumping, Am not talking about the bucket data, & fetching it,.. Am
talking about the data being dumped to the cloud, 7 hosting a VM over it (
Then again if you will throw the data in a deduplicated manner, you cannot
host a VM, cuz you cannot deduplicate a .vmdk file.
Software & Services Advisor at a tech company with 10,001+ employees
Integrator
2015-10-14T12:09:39Z
Oct 14, 2015
It is a tough job as you said. The hardest part is understanding from the top down, what the business requirements are. To get people outside of IT to answer the questions "how much downtime is acceptable for file and apps" and "if there is an outage, how much data loss is acceptable". In government, that will be tough to get answers to, in other verticals it can be slightly easier... e-commerce for example, downtime can be tied directly to revenue.
I would start by getting user feedback from the top down, develop a questionnaire to identify real RPO and RTO targets, and send out via survey-monkey (or any like email questionnaire). Once you have a solid idea of RPO/RTO targets, I would then do a "DR gap analysis". Review your stack from the inside out, identify how long it would take to recover from everything from a disk failure all the way up to a smoking hole in the ground where your datacenter was.
So at this point you know your RPO/RTO targets, and you also know what your current RPO/RTO capabilities are. Then the hunt begins for technologies, with a good technology partner (VAR or consultant) can help you sift through the marketing weeds and figure out what makes the most sense. Show the budget holders two or three options, silver-gold-platinum, associate each cost with the RPO/RTO capable. This way they see that if they want 0 downtime, they'll pay platinum money... if they are ok with 4hrs, the cost is reasonable, if they are ok with you working all weekend for 48hrs of downtime, they'll get out of it cheap but risk losing a good employee and business critical data.
Director - Data Protection and Availability with 201-500 employees
Vendor
2015-10-14T12:05:11Z
Oct 14, 2015
The question you asked is something I hear all the time. However, we can't
mix a business continuity scenario with Backup and Recovery. They are two
different things. Although they are part of the overall Data Protection
strategy an organization should implement, they should be addressed
separately. Business continuity is all about RTO/RPO and usually a strategy
around DR. Not so much for backup and recovery
Manny Punzo
Principal Architect
Data Protection Solutions
TECHNOLOGENT
(C) 469-215-9454
Senior Automation Engineer at a construction company with 1,001-5,000 employees
Vendor
2015-10-14T12:03:45Z
Oct 14, 2015
Whatever recovery solution you use must be exercised to ensure it is capable of doing the job it was originally designed for. That means regularly taking some pain in testing that it actually works before it is really needed. Whether this be by recovery to a test system or to a remote site I recommend going to the next step of promoting it into production to take over from your prime site else you will never discover all the holes left by an incomplete solution.
Over time your well crafted solution can also become stale. Application and software additions to the system can reduce the overall effectiveness of the solution. Hardware changes could render your BCRS useless, as can failing to keep your hardware up to date. There is nothing so frustrating as finding your hardware dead and a backup that cannot be recovered to anything other than the same vintage equipment that failed. The same is true of licensed locked software that cannot run on replacement hardware.
Data backup involves copying and moving data from its primary location to a secondary location from which it can later be retrieved in case the primary data storage location experiences some kind of failure or disaster.
Hi,
Let me begin with this. BCRS has two parts to it - Biz Continuity & Recovery Solutions.
Every Organization has got the following at the least:
1. Biz space
2. Technologies
3. Process
4. Departments or Units
Now to look at it:
1. Firstly prepare a business case study (sites, headcounts, biz types, units etc)
2. Prepare a risk chart and categorize the assets (people, tools, hw, sw etc) against it
3. RPO & RTO not limited technologies, but has to be applied to the entire biz to achieve best and relative results
4. Put together dependency list and map them
5. Especially from technology perspective (imaging, block level backups, application replications, etc)
6. People point (for eg:- If you have location with 100 seater and a second location with 400 seater, need to define optimizer techniques and achieve the Biz continuity out of it, then prepare recovery solutions to the same)
Summary:
There's no one fit solution for any organization regardless of its head count. There are sever other factors that need to be counted in, as well to keep the BCRS up-to-date, need to start from hire-to-exit workflows, asset assignment to retirement practices workflows...with that the BCRS will be touch based to see if updates are required to tweak in.
Again in interest of readers time, for further queries, feel free to write to komandooru.arun@gmail.com
Hi Tomas,
With BC&RS you will have as many possible solutions as you have experts. To keep some semblance of sanity around this, consider the following:-
Step 1 - What have you got and what do you need
1. Complete or Review you Business Impact Analysis for core systems/applications and data
2. Identify critical data availability needs at a data set level (RPO)
3. Identify critical system availability needs at both an application and system level (RTO)
4. Clarify the actions from system availability to having the organisation running business as usual.
Step 2 - Does it do what I want it to do
1. Recovery / Disaster Tests
2. Process Tests (Combination of system outages to check for weaknesses)
Step 3 - Get everyone into a good habits, Users and IT alike
Users - Like a building fire drill, get everyone into a good set of BC&RS habits.
I appreciate this is vastly simplified, however, it give me a simple check list to make sure I'm on the right track when helping a customer.
Warm regards
Matthew
How big is the environment?
What are the Critical Apps SLA’s?
What is the business requesting in the way of RPO and RTO?
What are you currently using and what are the biggest pain points?
Are you looking to go from a tape backup environment to disk?
Is there any technology/solution that you or your business favour?
What is your budget?
That’s just to start with.
Once you have all these questions answered you can start to look at available solutions within your budget.
Take into account what you already have on the floor and see if any of it can be re-tasked for the new environment that is affordable RE maintenance costs.
We are just starting a project to replace tape as part of our Backup/DR long term plans.
We started with a number of vendors and got down to 2 and done a POC onsite for both to determine which was best for the environment and SLA's, RPO and RTO.
This gave us a great insight to both solutions as well as the local technical skills from each vendor, and we were able to gauge customer reactions to both solutions and choose the best fit with confidence.
Hope this gives you a good starting point.
Will
1) It is always a best practice keep the critical backup in a system apart from keeping it in tape. At least the last backup should be available where we can access it immediately to restore at any time of crisis. If it is not done, then time to recovery will be very high to bring the tape and restore. Tape restoration is sequential and it is slow.
2) Better to have a DR solution with synchronous data replication.
3) Proactive support.
4) Best Monitoring system so that the failure should alert immediately without any delay.
What a great topic - I'll add in a disaster recovery plan template and guide that we've created and made available in an editable format which may be helpful to you or other visitors; it's available here: www.armadacloud.com
An IT DR plan is a big part of business continuity planning; we'll be releasing a BC plan template and guide shortly that includes the disaster recovery plan aspect. Good luck!
Michael, how have you utilized the need to maintain precise documentation in order to choose the right software/hardware? Have you been pleased with your decisions?
Very precise documentation describing environment and proper people which fully understand process/workflow (including possible way to recover environment to a working state). Then you can choose software/hardware and all possible resources. Let's call that strategy - do it properly!
In my opinion, a defined/documented business policy for BCRS/BCDR is the important 1st step. Understanding of the applications /workloads will also help choose the best technology: product or/service. I would recommend looking for cloud solutions as it is in constant innovation, driving costs reduction and it is opex model.
Technology aspects are important, but there are solutions that can help, like replication (at storage or VM or application level), orchestrator (like VMware Site Recovery Manager), ...
Most important are the organization and procedure aspects that are usually the biggest part of a Business Continuity Plan.
First we need to keep up the technologie evolution,
There's no easy or best strategy to develop and maintain the BCRS up to
date.
First we need to think on the hardware on each DataCenter (assuming that we
gonna have Disaster Recovery Site Mode).
We have to think on virtulization, that help us to have almost zero
downtime.
Then we can use replication, to keep all the sistems up in case one site
fails.
Today, virtualization can help us in many ways, including DMZ and by
bringing the operational model of a virtual enviroment to the data center
network, where we can transform the economics of network and security
operations.
Basically, virtualization is a big help on this cases, we can replicate
productions eviroments beetween sites and avoid downtimes in case of site
failure.
We can do the same using hardware replication base, but the costs will be
huge.
This is my opinion this matter.
Cheers
Rui T. Alves
In Management View (Planning Phase)
1)Finance plays an important role in all Businesses, Budget should be allocated in Proper way as per their Business Needs, there should not be any compromise while Planning the BCDRP
2)Understand the Organization's Business Risk & its Critical
3)How to secure the Organization's Information(DATA) in case of Disaster, Hacking & etc
4)Define Organization Loss in case of Disaster, unavailability & etc. by terms of RPO, RTO & MTPOD.
5)Prioritizing the Application's/Information from most critical to Least critical(Keeping Budget in mind)
6)Security Policy, Continuity policy should be planned & Designed
In Technical View(Implementing Phase)
1)Make sure information should be secured as per the Policy (for example anti-virus should be installed on all machines)
2)Network Availability is the base in IT, so it should be Implemented in Hight Availability link
3)Storages, Database & Servers Should be in High Availability(VM, Cluster, which ever suites for their environment )& monitored Properly
4)High Availability will provide always availability(no downtime)
5)People Availability, Team should be available always as per business need
6)Prepare Remote Site(Disaster Recovery Site - DR) & make sure required DR resources were in sync with DC.
7)Clear Document should be prepared for each & every environment
8)Human Error also consider as Disaster, So Automate all Datacenter Processes
9)Test the DR site in regular time interval, this process also should be in automation
In Auditing View (Testing Phase)
1) Documents should be tested
2)Reports should be Validated
3)Conducting DR-DRILLS in regular interval time
My best advise would be high availability, vm replication and a good backup program. Such as Asigra. You want to minimize downtime as much as possible, of course. For example, the program has the ability to save an encrypted local copy on-site as well as off-site. With a local copy, it minimizes downtime when trying to recover. Time that could save money.
Hi Thomas,
I agree with the comments made by Matthew Harris
With BC&RS you will have as many possible solutions as you have experts. To keep some semblance of sanity around this, consider the following:-
Step 1 - What have you got and what do you need
1. Complete or Review you Business Impact Analysis for core systems/applications and data
2. Identify critical data availability needs at a data set level (RPO)
3. Identify critical system availability needs at both an application and system level (RTO)
4. Clarify the actions from system availability to having the organisation running business as usual.
Step 2 - Does it do what I want it to do
1. Recovery / Disaster Tests
2. Process Tests (Combination of system outages to check for weaknesses)
Step 3 - Get everyone into a good habits, Users and IT alike
Users - Like a building fire drill, get everyone into a good set of BC&RS habits.
I appreciate this is vastly simplified, however, it give me a simple check list to make sure I'm on the right track when helping a customer.
Another point to note is that while everyone could identify a perfect BCRS scenario and a pick list of the best available technology, it is rare that a company ever has sufficient funds to implement a solution fully.
If you are a seasoned practitioner/realist rather than a "preacher"/consultant, most often you will be asked to implement or left to work with a system that is just adequate rather than more than enough. Any issues this may create you will encounter over the first 12 months, hopefully before a full disaster strikes.
Is there any best strategy to develop and maintain the Business continuity Recovery solution?
There are variations in business practices or processes designed in the name of implementing BCRS. while some organizations are comfortable with periodic backup of transaction servers, firms that maintain an aggressive risk management detail tend to have a more rigorous and detailed BCRS that looks at all aspects of the business human resources inclusive. In this latter case it is not unusual to find a cross functional team of senior managers maintaining a dashboard of the BCRS on a daily basis.
In between these two extremes, are many variations some responding to audit findings, specific needs of a line of business while others implement a roadmap established by top management. The variations maybe an indicator of the business maturity in risk management but this is best left for another forum.
To the question above, the answer is YES there is a best strategy in Business continuity Recovery strategy. When BCRS is owned at Top management / board level, chances are BCRS best practices are not only followed but also subjected to testing thus enabling continuous improvement. Where lines of business initiate BCRS, we are likely to find overlaps, omissions, duplications not to mention poor response in case of a business crippling incident.
So then what do we find in Best Practice Led BCRS.
• Top Management sponsored & Monitored BCRS Objectives
• A cross functional Business team responsible business risk assessment, Draft / Updated BCRS policy aligned with BCRS objectives. The policy should spell out key roles including third party dependencies, trigger points, records management and remedial measures.
• An implementation team that represents all owners of key assets in the BCRS. This team is the operational component of BCRS and will execute the day to day tasks of the BCRS during non incident days usually on call 24/7. They will be the key hands in case of a business recovery and should demonstrate adequate hands on knowledge of recovering critical business systems.
• Updated audit reports on the compliance of the BCRS team to a) BCRS objectives, Coverage of all identified BCRS processes and Testing of all practices including time taken to restore in case of a business crippling incident.
From the foregoing we realize that it is not just taking backups and keeping them safely. The organization basing on a matrix of objectives will determine the right mix of strategies to enable it maintain its competitive advantage(private sector) or operational efficiency (Public).
Do not be surprised therefore when you find both cold and walk backup solutions, data warehouses for high end Transaction processing systems, off site backups, versioning to track changes over time, etc.
Hi,
Proper Analyse & Automate the end to end Business Continuity Process and
test/audit in regular time interval is the best strategy to develop and
maintain the BCRS up to date.
Regards,
Suren
First of all, I would like to qualify myself. I graduated this June from Colorado Technical University (CTU) (Online) with a BSIT degree with 2 concentrations - Systems Security and Project Management. My final GPA was 3.95 and I was awarded the distinct honor of Summa Cum Laude. Prior to CTU, I attended the University of Cincinnati for 2 years pursuing a BSIT degree with a concentration in Networking.
I have my initial thoughts to throw out there for some more input to satisfy my questions. Is there an existing BCDRP? Was it practiced or simulated in any way to test its effectiveness? Has there been major changes since it was implemented (assuming there was an existing one)? What framework templates have been applied? Have you used the NIST guidelines since you are referencing a government organization? How have you been applying updates and performing Change Management? If there is an existing backup site, and if so, is it a cold, hot, or contracted out site? Are you ISO certified?
If your Technology Bureau Deputy Director lives under a rock, I can see the incompetence of the situation. However, having a title doesn't necessarily make one an expert - but seriously, a Technology Bureau Deputy Director should already know the answer, and if not - you mean to tell me that they don't know how to use Google?
Let me throw this Holy Grail concept out there for everyone! The VM Cloud Computing Architecture utilizing solid state flash drives with a StorageTek SL8500 Tape library for backup, an off-site storage and backup agreement with someone like Iron Mountain, the use of SAN and NAS storage technology with appropriate redundant HBAs and redundant A & B power connections, and a strong UPS should make any Data Center proud... Disaster Recovery is a plan that should be designed at the same time as the Business Continuity plan because of the overlap of Incident Impact - whether natural or manmade. There are hot cut-over sites that have almost the same redundant H/W and S/W applications and same to similar process functionality that is fully operational. There are cold cut-over sites that have the facility space to store the critical components of a skeleton Data Center and the capability to bring in the other items as needed; however, this cold site would not be in a continuously operational state and would have to be brought to life - so to speak... There are the partial needs scenarios - maybe all that happened was a major power outage. By having a pre-signed agreement to have a third party contractor arrive on site with mobile generators to supply temporary power might suffice. Maybe the PBX system had a major meltdown - same principle, contract with a communications vendor to supply mobile PBX capabilities until the permanent onsite PBX system can be repaired and functional again.
Whatever has been planned and approved should be exercised to practice the many different roles performed as assigned during a business crisis or disaster impact - many of which will probably be performed by the Incident Team. There are workstation/conference exercises where the role-playing can be practiced through simulated scenarios right at the organization site. Then, there should be actual mock exercises to make the practice more realistic along with after-action reviews and critiques to make continuous improvements. The more you practice, the more proficient you become...
The very first goal is to perform a risk assessment (RA) and a business impact analysis (BIA) to determine risks, value, and impact as a threat. General rule of thumb - e.g., if the value is $250 with minimal impact - don't spend $2500 for a Business Continuity Plan. Earmark $250 for a reserve contingency fund to replace the like item.
The second goal is to make your plans realistic and doable! You are located in NY state, and you have a BCDRP set up with a similar organization in California. This may look good on paper; however, it's not at all practical nor realistic.
Third, I mentioned on this briefly already, you need a good liability insurance policy, an approved earmarked reserve contingency budget - that's reasonable to cover replacement costs not covered by insurance, and some common sense planning for anticipated risks. If someone had used some common sense, the 911 emergency call center would not have been shut down because of a flooded basement where the operations call center was located... This critical emergency asset should have clearly been located on the 2d floor or higher. Another point - your Data Center draws 50000 Kw of energy to satisfactorly power and run the technical power load. So, why would anyone in their right mind design only one backup generator rated at 20000Kw to supply backup power? Believe it! These 2 scenarios were real live situations that I have actually seen and reviewed for my own research.
Lastly, you don't just stop with DRP... To be prepared for the real scenario of a disaster - you will also need a Disaster Follow-up Plan. Especially when there may be fatalities - you'll need to be sympathetic to the survivors and family members, you need to coordinate with the local authorities in advance, e.g., the local fire station and the establishment of an "Incident Command Post" in the primary effort of establishing some form of organized order - who's in charge and who isn't. Where is the rally point for the employees to assemble in order to be accounted for and be seen and treated if needed by the EMTs or other medical personnel? Who's in charge of notifying employees family members should an employee become hurt from an injury? When do you notify the public media and stakeholders of the situation? Are you obligated by any Federal Regulations to perform certain mandated actions?
If you have all of the answers and are on the correct path for your BCDRP to be successful and effective - then you, my friend, are way ahead of 87% of the rest of your co-harts. Many times, BCDRPs are just given lip service for something that's never going to work and is drawn up and thrown into a policy book and you're done - right? Wrong answer! If that kind of BCDRP has been approved by your upper management - they are willing to take huge risks and put the lives of their employees in jeopardy. I for one would hate to be employed at an organization whose culture is that lax and careless. My advice to employees that should ever work in an environment like that - RUN! Run Forest - Run!!!
There is not a single best solution, while there are many good options, a solution that fits your data protection needs would depend on multiply variables. Discussing your compliance requirements and current infrastructure environment in a white boarding session would help narrow it down a bit
In order to implement a good BCRS plan, there are alot of things to be
considered before selecting the right solution or strategy that suits your
environment. Some things to consider involves what kind of infrastructure
are you trying to protect? is it a physical or a virtual infrastructure?
what is your expected RTOs and defined RPOs? what is your backup plan? and
disaster recovery strategy?
It is usually a good recommendation to implement the 3-2-1 backup strategy
which means having atleast 3 total copies of your data, that is two of
which are local but stored on different media(ie, disks or LTO tapes), and
at least one copy offsite. Having backups locally helps to recover data
quickly when the need arises, however in an event where the onsite backups
are destroyed, using the 3-2-1 strategy allows you to recover data from a
remote site and restore it. There are many products out there that can be
used to implement this strategy, once again depending on the kind of
environment you have, if its a virtual one or physical one, one good
example for a virtual environment is VEEAM Backup & Replication software,
and another good solution for physical environments is Backup Exec from
Symantec. These are just a few solutions among many.
In addition to implementing a robust backup strategy for your environment,
you will also need to have a disaster recovery plan. This strategy is
important because it allow you to return to production if your protected
site goes down. In this strategy you will have to define what you need to
protect from your main site. Then you can replicate that to the recovery
site, this will allow you to failover to the recovery site when the
protected site is down within shorter period of time depending on the
configurations you have made. VMware site recovery manager is one good
product that allow you to implement such a strategy within your
organization to allow to business continuity when you need it. Also certain
advanced backup solutions like VEEAM can replicate strategic servers to a
recovery site which will allow you to bring those systems back online when
there is a disaster, this solutions however are used within virtual
environments. In addition to that many storage vendors offer many
replication technologies within their arrays that allows you to
consistently replicate data between your protected site and recovery site
which allows you to easily failover to your recovery site when the need
arises.
This strategy however can be expensive so you must really define what you
need to protect, so that you will have have to purchase alot of
infrastructure to run your core services when there is a disaster.
Putting considering the above you can tailor it to suit your environment
depending on how it has been configured and how you want to implement your
business continuity strategy.
I'll add only, a tough job is to create BC Plan once, I think. Then regularly testing and updating BCP is the best practice. We do this once a year for example.
My best advise, Keep it simple. As organization grows, not only the complexity of people, areas, but the technology mix would try to make you take decisions based on each product, service, area, that probably would guide you to a bunch of specific solutions for each. So, keep organized, and gather tools that would be easily mantained, maybe hard to implement, but that could be monitored and serviced.
Also As with every IT related project, planning stage and a good group of functional project administrators are a must.
It depends so much on what they're trying to protect, what their RPO and
RTO objectives are, what they have for infrastructure etc.
There are so many resources available for deciding such things. I'm not
their best option, but I will help if I can.
In short, BCRS is an ongoing process. It will depend on company size, critical data, resources and budget. To quote CenturyLink: "The shelf-life of a business continuity plan is like a buying produce—only good for a short while. As a result, business continuity isn’t just about putting a plan in place; it’s about putting a plan in place and then making sure it still meets your needs over time. After all, companies change. Business requirements shift over time. A good plan a year ago might not cover you today.
The bottom line: If you're not thinking about business continuity as a continual process, then your strategy could be outdated. And if you're not building validation into your disaster recovery strategy, you're not nearly as protected as you think."
Also important to note is: "An effective business continuity strategy balances cost and risk, or determining which workloads must operate continuously and which don’t. As part of this, ask what an acceptable level of downtime is. For instance, do you have workloads or business processes that won’t impact operations if they go down for an hour? Or whose impact will be minimal if they go down for a day?"
Resource: "http://app.centurylink.com/businesscontinuity/ebook/six_ways_business_continuity_can_cost_your_business/welcome-54G6-1157W3.html"
Key points in a recovery solution: High Availability, Data Protection and Data Recovery.
You will need to understand the acceptable downtime for that specific company. You need to know your RTO and RPO capabilities. On smaller budgets a N+1 solution is the most basic to plan around. Bigger budgets can accommodate more equipment and higher redundancy. There after backups have to be tested regularly. The BCRS will change as requirements and needs for the company or data changes. Data grows, better technology does become available. It is all about being prepared for the big WHAT IF.
It is important to understand that you will never have 100% uptime and that human error must factor into the calculation. It will take a lot of planning and continued implementation.
The best practice would be to not skimp on the planning, use trusted vendors and technology and staying involved with the Business Continuity Recovery Solution.
Oh & btw, Do check out for a Hybrid solution, Dont put all the mission
critical data on cloud.
PS: Deduplication & compression is already there with everyone, Replication
is what is needed.
Replication Is needed, But just for the deduplicated data, That too Just
the first time Full backup replication & Then the incremental Changes going
from A to B and getting deduplicated at B, Keeping in mind that MPLS can be
controlled & wont choke the entire network. Although going on cloud , Some
people are adopting the need to host VM's over cloud itself by the data
they are dumping, Am not talking about the bucket data, & fetching it,.. Am
talking about the data being dumped to the cloud, 7 hosting a VM over it (
Then again if you will throw the data in a deduplicated manner, you cannot
host a VM, cuz you cannot deduplicate a .vmdk file.
It is a tough job as you said. The hardest part is understanding from the top down, what the business requirements are. To get people outside of IT to answer the questions "how much downtime is acceptable for file and apps" and "if there is an outage, how much data loss is acceptable". In government, that will be tough to get answers to, in other verticals it can be slightly easier... e-commerce for example, downtime can be tied directly to revenue.
I would start by getting user feedback from the top down, develop a questionnaire to identify real RPO and RTO targets, and send out via survey-monkey (or any like email questionnaire). Once you have a solid idea of RPO/RTO targets, I would then do a "DR gap analysis". Review your stack from the inside out, identify how long it would take to recover from everything from a disk failure all the way up to a smoking hole in the ground where your datacenter was.
So at this point you know your RPO/RTO targets, and you also know what your current RPO/RTO capabilities are. Then the hunt begins for technologies, with a good technology partner (VAR or consultant) can help you sift through the marketing weeds and figure out what makes the most sense. Show the budget holders two or three options, silver-gold-platinum, associate each cost with the RPO/RTO capable. This way they see that if they want 0 downtime, they'll pay platinum money... if they are ok with 4hrs, the cost is reasonable, if they are ok with you working all weekend for 48hrs of downtime, they'll get out of it cheap but risk losing a good employee and business critical data.
The question you asked is something I hear all the time. However, we can't
mix a business continuity scenario with Backup and Recovery. They are two
different things. Although they are part of the overall Data Protection
strategy an organization should implement, they should be addressed
separately. Business continuity is all about RTO/RPO and usually a strategy
around DR. Not so much for backup and recovery
Manny Punzo
Principal Architect
Data Protection Solutions
TECHNOLOGENT
(C) 469-215-9454
You have a Remote site for this. If you have not consider this WAN Optimation there are Software or an Appliance methods for doing this.
Whatever recovery solution you use must be exercised to ensure it is capable of doing the job it was originally designed for. That means regularly taking some pain in testing that it actually works before it is really needed. Whether this be by recovery to a test system or to a remote site I recommend going to the next step of promoting it into production to take over from your prime site else you will never discover all the holes left by an incomplete solution.
Over time your well crafted solution can also become stale. Application and software additions to the system can reduce the overall effectiveness of the solution. Hardware changes could render your BCRS useless, as can failing to keep your hardware up to date. There is nothing so frustrating as finding your hardware dead and a backup that cannot be recovered to anything other than the same vintage equipment that failed. The same is true of licensed locked software that cannot run on replacement hardware.