Big Data is characterized by the four V’s: a large Volume of data, delivered with the Velocity of a frequent arrival rate, containing a Variety of data sources, formats and quality, and possessing Veracity implications of predictable data integrity. But despite all the Value that Big Data can bring, it also creates real challenges for any IT department’s disaster recovery plan. Here are some of the bigger challenges.
Challenges of Data Volume and Velocity
Larry Simon had this to say about the impact of Big Data on DR: “Big Data poses unique challenges because you’re not only dealing with huge volumes of data, but potentially a rapid, always-on stream of new data that you want to lose as little as possible during an outage”.1
Disaster recovery is all about speed of response to recover critical operations and requisite data. When you have large volumes of constantly-loading, critical data, you have more reasons not to lose it or have gaps in access because of, say, a storm dropping communications lines. You need to ensure that your data is accumulating somewhere safe and sound, and capable of being available as soon as possible.
The high dependence or value of your captured data may mean that your Recovery Point Objective might be close to zero. In other words, you will need to create DR facilities that can protect against data loss despite large volumes of data arriving frequently.
Challenges of Data Veracity
Mr. Simon had this to say about data integrity in a Big Data world: “The ecosystem is changing so rapidly that the pieces are always moving. Making sure you restore the right versions of things can be a nightmare”.2
Big Data systems are complex and are using emerging and rapidly-evolving technology. If you are going to set up a recovery site for a Big Data application, you need to ensure you have the latest releases for that location. You will need the right versions of the right tools of the right database software, as well as the latest replication software and analysis tools and so on.
Specialized tools and software that are handling Big Data are not the same ones that that have been in use for the past 20 or 30 years in data processing. They tend to be newly-developed, and the challenge is to make sure they are in sync with any older, established, legacy products. Typically this also means you likely need have more than just an “end-of-day” off-site back-up, and presumably a “warm” or “hot” recovery site to ensure integrity following a Big Data disruption.
Challenges of Data Throughput
Again we refer to Mr. Simon: “In a Big Data system, the relative speeds of all of the components become important. Systems that run fine in one configuration may have crippling bottlenecks in a temporary DR configuration”.3
If you’re doing tons of searches in Google, you want to make sure that if you have a failure in your primary site, it will fail over to the secondary site quickly, and that the secondary site has the sufficient capacity. No matter what function you’re performing, if your failover site is short on bandwidth or “burst” capacity, you will have a backlog of demand to work through.
Since Big Data involves a constant stream of data capture, data processing and data feedback to users, any work pileup will be based on the length of the outage that you incur. Your important data may never disappear, but it will always need to be queued and processed, which can be a daunting proposition when you’re just recovering from a long outage.
Big Data changes the game for disaster recovery, presenting new challenges for any DR plan. Next week we will look at some strategies to help meet these new challenges.
1,2,3. Managing Director, Inflection Group Inc. & co-creator of the University of Toronto School of Continuing Studies’ certificate program in Management of Enterprise Analytics and Big Data.