Bridging the data migration gap

In this special guest article, Tony Velcich, Senior Director of Product Marketing, WANdisco, discusses what he calls the “data migration gap” and how that gap has become even larger and more acute given recent events. Tony is an accomplished leader in product management and marketing with over 25 years of experience in the software industry. Tony is currently the Product Marketing Manager at WANdisco, helping to drive strategy, content and go-to-market activities. Tony has a strong background in data management, having worked in leading database companies including Oracle, Informix and TimesTen where he led strategy in areas such as big data analytics for industry. telecommunications, sales force automation, as well as sales and customer experience analysis.

Even like a recent study found that cloud migration remains a top priority for businesses in 2020 and beyond – big data stakeholders still face a significant gap between what they to want do and what they can To do.

Miriam webster defines “Gap” such as “an incomplete or deficient area” or “a problem caused by some disparity”. In the case of enterprise data lakes, this disparity is the difference between what big data professionals to want migrate to the cloud and what they can migrate without negatively impacting business continuity.

I call it data migration gap. And this gap has become even larger and more acute given recent events. Migration to the cloud has never garnered so much interest in the executive suite and the work from home trenches. The COVID-19 pandemic has made everyone realize that cloud migration is crucial for remote productivity. But even as businesses move critical applications and data to the cloud, much of the data accumulated in recent years is being left behind in on-premises legacy data lakes.

Data Lakes: Left Behind

The on-premises data lake was designed and adopted as a cost-effective way to store petabytes of data at a lower price than a traditional data warehouse. Yet companies quickly realized that storage data and using they were two entirely different challenges. Organizations weren’t able to match the performance, security, or business integration of their data warehouses, which were more expensive but easier to manage.

Today, data lakes survive in their original formats in industries where time-sensitive, information-rich analytics are less important, and where costs outweigh efficiency. Yet more dynamic businesses are moving from on-premises storage and billions of batch queries to real-time analysis on massive cloud-based data sets. And for these companies, the question is no longer whether to move petabytes of critical and ever-changing customer data, but how to do without disrupting business and minimizing the time, costs and risks associated with legacy data migration approaches?

Current methods: advantages and disadvantages

What strategies are used to bridge the data migration gap? How are companies currently migrating their active data? There are three common approaches, each with its relative advantages (and pitfalls):

  • Elevator and shift – A lift and shift approach is used to migrate applications and data from one environment to another with zero or minimal modifications. However, there is a danger in assuming that what worked on premise will work as is in the cloud. Lift-and-shift migrations don’t always take full advantage of the efficiency and improved capabilities of the cloud. Often times, gaps in existing implementations move with data and applications to the new cloud environment, making this approach acceptable only for simple or static datasets.
  • Incremental copy – An incremental copy approach is to periodically copy new and changed data from the source to the target environment over multiple passes. This requires that the original source data first be migrated to the target, and then incremental changes to the processed data on each subsequent pass. The main challenge of this approach arises when it comes to dealing with a large volume of changing data. In this case, the passes may never catch up with the changing data and complete the migration without requiring downtime.
  • Double pipeline / ingestion – A dual pipeline or dual ingestion approach is to ingest new data simultaneously into the source and target environments. This approach requires considerable effort to develop, test, operate and maintain multiple pipelines. It also requires that the applications be modified to always update the source and target environments during any data change, which requires significant development efforts.

A fourth path: bridging the data migration gap

A different strategy, and perhaps better suited to the dynamic data environments of most data-intensive businesses, would be to enable migrations without changing applications or disrupting business, even when datasets are in the process of being processed. modification. This paradigm enables migrations of any scale with a single pass of source data, while supporting the continuous replication of ongoing changes from source to target.

While existing methodologies have their validity and use cases, new technologies allow big data players to bridge the data migration gap more cost-effectively and efficiently. Choosing the right option can make cloud migration faster and more accessible for any business.

Register for free at insideBIGDATA bulletin.

Join us on Twitter: @ InsideBigData1 –

Sean N. Ayres