Data quality describes the quality and characteristics of specific data in terms of accuracy, uniqueness, consistency, completeness, validity, and timeliness. Data quality is an aspect of data governance.
Consistency means that the data format is uniform and correct for each user. A consistent database enables increased confidence in analysis results and simplified execution of database transactions. Inconsistent data, on the other hand, can lead to serious analysis errors and, as a result, incorrect business decisions.
A data management software is a tool that enables the effective organization of data of different origins and characteristics. Comprehensive management of even large amounts of data includes consolidation, contextualization, harnessing for analysis and reporting, and enabling technologies such as artificial intelligence.
Data governance defines the processes and responsibilities that ensure the quality and security of data. The following measures are typical tasks of data governance: assignment and definition of user rights, task and responsibility management, control and monitoring of data.
Master data management is a combination of processes and technologies for synchronizing and harmonizing master data in an organization. It enables the creation of a unified foundation that provides accurate and complete master data to all employees in the organization.
Data quality management
Data quality management encompasses measures and practices that ensure the suitability of data for analysis and decision making and enable the highest level of data quality. The better the data quality of an organization, the more accurate analysis results can be achieved.
A data dictionary, also known as a metadata repository, is a central location where metadata, including information about data and its origin, format, meaning, and relationships, is stored. The data dictionary ensures data consistency when data is processed in different departments of the company.
A data repository is a unit in which data sets are collected and stored for a specific purpose. A data repository provides users with a consolidated space in which data relevant to ongoing operations in particular can be stored.
A metadata repository also known as a data dictionary, is a central location where metadata, including information about data and its origin, format, meaning, and relationships, is stored. The metadata repository ensures data consistency when data is processed in different departments of the company.
Data maintenance includes measures whose goal is to improve database performance. Maintenance measures help to improve speed, optimize capacity, find and fix errors and hardware failures. Good data maintenance ensures constant access and usability of the data.
A data model describes the arrangement and classification of data and splits structural information and its attributes. The advantages of a clean data model are flexible integration options, easier scaling, the possibility to check historical data and faster adaptations to changing data landscapes.
The tasks of metadata management consist of defining and designing strategies and measures that enable the access, maintenance, analysis and integration of data in the company. The role of metadata management is becoming increasingly important as the amount and variety of data increases.
Big data management describes the management, organization and control of large volumes of unstructured and structured data. The central goals are to ensure good data accessibility and a high level of data quality.
A data catalog is an inventory of all data sets in an organization. The catalog helps to understand the origin of the data, its use, quality, relationships and business context. An important application area is the search for data for analytical or business purposes.
An AI-driven data catalog is a data catalog that provides a high level of automation based on artificial intelligence (AI) or machine learning (ML). The catalog helps understand the origin of data, its usage, quality, relationships, and business context. Next-generation data catalogs leverage AI to enable automated metadata tagging, data provenance tracking, and data consumption analysis.
Data stewards are persons who are responsible for data sets within the data catalog. They ensure its up-to-dateness and the state of the metadata. They take on the role of an intermediary between data owners and users.
Data integration tools are tools that implement the merging of data from different sources. Data integration tools act as a link between source systems and analytics platforms. They usually include optimized native connectors for batch loading from various common data sources.
Data virtualization is a data management technique that allows an application to extract and manage data without querying technical details about it. Data virtualization does not create a physical copy of the data, but creates a virtual layer to mask the complexity of the underlying data landscape. This approach offers a fast, flexible alternative to classic integration.
The field of business intelligence (BI) comprises strategies and technologies used by companies to analyze current and historical business data. The central task of Business Intelligence is the data-based answering of strategic questions.
A business intelligence platform is a tool that supports the BI area in extracting, analyzing and visualizing a company's data. Business decisions are then made based on the results generated. Business intelligence platforms typically include three main areas: Analysis, Information Delivery and Platform Integration.
Data profiling describes procedures for analyzing and evaluating data in terms of its key features such as content, characteristics, links and dependencies. Data profiling provides a critical understanding of data that organizations can use as an advantage.
A data profiler is a tool that enables the analysis and testing of data sets and allows the early detection of problems. By applying descriptive statistics to the analysis of the content and structure, the effort of data preparation can be better estimated.
The General Data Protection Regulation (GDPR) is a regulation of EU law that focuses on data protection and privacy in the European Union. External data transfers are also covered by the regulation.
Data silos are data sets that are controlled and managed solely by a limited group of users in an organization. These datasets cause problems when the information they contain is needed by other departments because they do not have access.
Big Data is data that consists of a large amount of information and the volume of which is getting faster and larger. There are five main characteristics of Big Data- the five Vs: volume, velocity, variety, veracity, and value. Big Data makes it possible to gain more valuable insights, but it also leads to new technological challenges.