The advantages of cloud data warehouses (DWH) compared to traditional DWHs is a highly discussed topic. For an organization, the benefits of potential new features certainly have to be higher than the effort a migration causes.
But not only the migration itself is a frequently called argument against a transformation. Lots of companies still have concerns about data security in the cloud. To bring some more clarity into this topic, we will summarize advantages, discuss challenges and share best practices for you.
Let’s start investigating: What advantages can be expected from a migration and have been often experienced?
- Speed and Performance: A cloud DWH enables organizations to achieve a higher operational speed through improved infrastructure. Thanks to advanced database technologies and algorithms, cloud DWHs have an enhanced query performance.
- On-demand scalability: With a cloud DWH, companies have the opportunity to increase capacity upon need. Cloud DWHs can simultaneously serve multiple areas of business.
- Cost efficiency: Customers pay only for the resources that they use and save money on storage hardware. Moreover, it is easier to maintain the infrastructure through a SaaS model which eliminates maintenance costs. Anyway keep in mind, that with excessive usage cost can be even higher than before.
- Easier recovery approach: In a cloud DWH, backup and storage are already covered, which saves time and cuts spendings. The provider takes care of these tasks.
- Ease of use: Due to an improved user interface, it is often simpler to use a cloud DWH. They usually offer a variety of different connectable tools which make data applications very comfortable.
Our recommendation for every organization thinking about migrating to the cloud is to start with an investigation of all relevant factors. For a well-planned and smooth transformation, it is essential to be aware of all the following aspects:
- What current infrastructure do we use?
- How are our personal capacities? Are our people skilled for such a transformation? Do they have the necessary time?
- What are we trying to achieve with this initiative?
If these prerequisites are not clear, organizations will face various challenges.
Challenges of a cloud DWH migration
Some of the most typical challenges occurring during the migration from one or more on-premises DWH to a cloud DWH are:
- Incompatibility: During the migration, unforeseen gaps between the cloud architecture and legacy systems occur and cause delays.
- Lack of transparency: Gaining transparency on data assets in the first step is critical for an efficient migration approach. On one hand searching for missing information on data structures and repositories will be time-consuming. Having an understanding of dependencies and data ownership ensures an efficient migration process. On the other hand, “just” migrating everything without respecting structures and dependencies can block data utilization in the long term and can lead to “clogging” the new DWH. Transparent communication of the pipeline supports that people have a seamless transformation experience and can continue working as usual.
- Lack of expertise: To face the technical challenges, sufficient experience and knowledge about both, the on-premises DWH and the cloud DWH, are required. Typical known challenges are APIs, security features, testing and legacy queries.
- Lack of control over cloud migration stages: Companies may face cloud sprawl, if they miss to install a centralized monitoring. This is a major roadblock for troubleshooting.
Note: Depending on the data maturity of a company, it might financially pay out to hire consultants. With a broad experience and a neutral view, they can support an efficient migration process. The cost for consulting can - again depending on the organization (!) - be lower than the costs of a time-exploding, frustrating and unsuccessful self-performed transformation. Additionally you will benefit from the experience that a consultant brings in.
The first best practice we want to share with you is the old proverb: Good planning is half the job.
Planning a cloud migration
Let’s assume after a holistic analysis of all relevant aspects, an organization decides for a cloud DWH. The next question is which provider best fits the companies’ requirements.
The cloud DWH market is dominated by four big providers:
- BigQuery from Google
- Redshift from aws
- Synapse from Microsoft Azure
But there are of course also specialized and smaller providers in the market. We have chosen the following key criteria for an exemplary comparison. This can serve as a guideline or orientation on how to perform an evaluation.
The key criteria are:
- Available connectors and APIs
- Specific functions
And here are the results summarized for you:
BigQuery by Google
Redshift by aws
Synapse by Microsoft Azure
Note: One of the most relevant criteria is missing here. The performance is not included within the charts because it depends on factors like query type, query volume and surrounding ecosystem. Our recommendation at this point is to use your most frequent query as a basis for an individual comparison. You can also usually get a limited free access to all the cloud DWHs for a small testing period which helps to identify performance differences.
Further criteria which are worth having a look at are:
- Ingestion of streaming data
Preparing a cloud migration
The next step on the road to a cloud DWH is to prepare the migration process.
At this point, it is necessary to deeply understand dependencies between data assets, usage & status of elements and also responsibilities & applications. A data catalog is a powerful tool to achieve this: it helps to create transparency of all assets like tables, views, ETLs, reports and more. Furthermore, the gained insights are made available for the whole involved team. Based on the rich information obtained, the migration can be optimally scheduled.
Tip: An agile, iterative approach has proven itself in practice. Companies migrate step by step, one organizational unit at a time. In this way, the experience gained from the pilot projects can be applied to the follow-up projects.
As mentioned above, unused and outdated data can be left out from the migration to increase efficiency. For good data governance a catalog provides the option to neatly document where this data is archived, so that nothing is irrevocably abandoned.
Not only for error solving or problem handling it is important to maintain the overview on already migrated assets and the upcoming schedule. We highly recommend running automated checks on job success. Unexpected gaps between the old infrastructure and the new one are very common and will - if discovered on time - be solved easily.
Again, a central point for documentation will facilitate the joint work of the team.
Migrating to a cloud DWH is an investment in the future. It immediately enables more analysis power and supports a data-driven business in the long term. However, migration also means an investment of time and energy. That is why it is very important that everyone involved (from the data engineer to the management level) stands behind the project and is committed to it. As described above, good preparation is key to a successful initiative. Tools such as a data catalog and, for example, data quality tools (if needed) can greatly facilitate and accelerate the entire process.
After having successfully migrated into the cloud, the next step is to exploit the full potential of a cloud DWH. From the optimization of existing processes to the implementation of new use cases, the aim is to create added value through powerful analyses.
If you would like to discover more about how Contiamo can support your migration and open up new analytics opportunities, check out our dwh migration best practices.