1. Make all data explorable
A matching data catalog allows data experts to search all available assets and data structures in seconds. Automation was the key to success here. This means that the data always remains up to date without enormous manual effort. Analysts can also use the catalog to find out at any time whether the data is up-to-date and suitable for the question at hand.
In addition to clear documentation and business context, a powerful search helps to find the data needed efficiently. The search function was a core criterion when selecting a suitable catalog tool. Diverse filtering and sorting functions were important for the user group, as the available data base had a very large volume. Basically, a catalog with its features should always fit the real user group.
2. Analyze data
Data profiling tools and functions allow data experts to quickly gain a good first impression of the identified data. Statistics, real data distributions and properties of the data set are useful information, as are sample values.
Depending on the issue and the team, the direct connection of the data catalog with the BI tool is very helpful. Many companies rely on Tableau, PowerBI or Redash to provide deeper insights. A central overview was also created to provide the data teams with a centralized and clear view of all other tools used for analysis. The goal was to make the jump from the data directly to the tool of choice as uncomplicated and fast as possible.
3. Curate data
After data for a use case was successfully and quickly identified and analyzed, typical steps of data preparation followed: cleansing, enrichment and combination with other data.
An optimal way to further process the given data is offered by SQL. The modern data architecture implemented in this project allows flexible querying of data from different sources.
Overall, it was of central importance for efficient data curation to provide all employees with the most suitable tools. To ensure this, in addition to the development of several interfaces, the data catalog has been established as a central documentation location. In this way, flexibility and transparency can be combined in real operations.
