Glossary

From Competence Center
Revision as of 16:03, 29 June 2022 by Bryan (talk | contribs)
Jump to navigation Jump to search

Data management glossary[edit]

A[edit]

Analytical data

"Classical" analytical data is a particular type of enterprise data. It is derived from business operations and transaction data and is mainly used to meet standard reporting and analytics requirements by applying descriptive analytics.

Source: Fadler, Martin; Walter, Valérianne; Hasan, Redwan; Legner, Christine: Data Quality Handbook, 2021


Advanced analytical data

Advanced analytical data is a particular type of enterprise data. It is created by applying methods of data science that go beyond purely descriptive analytics (i.e., towards predictive/prescriptive analytics). It is used to identify patterns or correlations in complex (structured and unstructured) data sets (including text and geospatial data, for instance).

Source: Fadler, Martin; Walter, Valérianne; Hasan, Redwan; Legner, Christine: Data Quality Handbook, 2021


Click here to edit Advanced analytical data


B[edit]

Big Data

Big data are data that are so large and diverse that they require cost-effective, innovative forms of data collection, storage, management, analysis and visualization. Big Data are typically characterized by 3 V's: Velocity is the speed at which the data is created and the speed at which the data shoud be analyzed and used. Volume refers to the size of the data which is typically in the range of terabytes and exabytes, whereas variety refers the changing data types in scope ranging from more traditional structured source (spreadsheets, SQL database tables) to semi-structured data (XML, JSON, Semantic Web data) as well as unstructured data (images, texts, files). In recent years, three more V's have been added to the the traditional 3 V's framework to charactarize Big Data: Variability, Veracity and Value. Veracity refers to different levels in reliability and truthfulness of big data sources, while variability describes the high frequency of changes within a data sources. Last but not least value describes the fact that while single data points may not be of high value, value from big data comes from analyzing huge amounts and trends within and between datasets.

Source:Amir Gandomi, Murtaza Haider, Beyond the hype: Big data concepts, methods, and analytics, International Journal of Information Management, Volume 35, Issue 2, 2015, Pages 137-144, https://doi.org/10.1016/j.ijinfomgt.2014.10.007.

CDQ Book
Click here to edit Big Data

Business Analytics

Business analytics: Business analytics is defined as the exploration and investigation of past business data to gain valuable insights and drive business planning. These activities depend on a sufficient volume of data as well as on a sufficient level of data quality. This requires data managers to integrate and reconcile data across various sources (i.e. from various business units, divisions, departments, branches, and information systems), with the goal of compiling a complete picture of the company’s past and current state for deriving future scenarios.

Source: Legner, Christine; Pentek, Tobias; Ofner, Martin; Labadie, Clément: CDQ Trend Study: Trends in corporate data management, 2017 (https://cc-cdq.ch/cdq-trend-study)

CDQ Trend Study Cover2
Click here to edit Business analytics

Business Capabilities

Business capabilities define a set of data-based skills, routines, and resources a company needs to have in order to achieve its business goals through data monetization. The Business Capability design area specifies what data-related business capabilities are required, which of these are already in place to some extent and need to be enhanced, and which ones need to be established from scratch.

Source: Dissertation Tobias


Click here to edit Business capabilities

Business Engineering

Business Engineering: The method-oriented, model-based theory of construction for companies in the Information Age.

Source: Otto, Boris; Österle, Hubert: Corporate Data Quality: Prerequsite for Successful Business Models, 2015 (http://www.cdq-buch.de/)

CDQ Book
Click here to edit Business engineering

Business Object

A Business Object represent a real or imagined object of value generation, that can be either used, changed or analyzed in business processes. It describes reoccurring set of information used in multiple business contexts and minimum one data domain. It is specified by attributes.

Source: Schmidt, Alexander (dissertation)


Click here to edit Business Object

Business rule

A business rule is a statement that defines or constrains some aspect of the business. It is intended to assert business structure or to control or influence the behavior of the business. Business rules may be defined as business definitions for business use (to represent policies, practices and procedures), or defined as executable business rule statements for use in rule-driven systems, or both.

Source: The Business Rules Group (BRG) and the Object Management Group (OMG)


Click here to edit Business rule

Business value

Business value: Refers to the impact of data management on business with regard to financials, business processes, customers, and organizational growth.

Source: Legner, Christine; Pentek, Tobias: Data Excellence Model: Short Description and Basic Terminology, 2017 (https://cc-cdq.ch/data-excellence-model)

Data Excellence Model CC CDQ RGB-01
Click here to edit Business value

C[edit]

CAD

CAD: Computer aided design, meaning designing a product with the help of information technology.

Source: Otto, Boris; Österle, Hubert: Corporate Data Quality: Prerequsite for Successful Business Models, 2015 (http://www.cdq-buch.de/)

CDQ Book
Click here to edit CAD

Cloud services

Cloud services: Data-related services delivered via the Internet in an on-demand model.

Source: Legner, Christine; Pentek, Tobias; Ofner, Martin; Labadie, Clément: CDQ Trend Study: Trends in corporate data management, 2017 (https://cc-cdq.ch/cdq-trend-study)

CDQ Trend Study Cover2
Click here to edit Cloud services

Core business object

Core business object: The central actors (business partners, customers, suppliers and employees), products (incl. materials) and operating materials (systems, etc.) of a company and its ecosystem. These objects are represented as master data for purposes of IT.

Source: Otto, Boris; Österle, Hubert: Corporate Data Quality: Prerequsite for Successful Business Models, 2015 (http://www.cdq-buch.de/)

CDQ Book
Click here to edit Core business object

D[edit]

Data analyst

The Data Analyst is a core data & analytics role in the CC CDQ Reference Model for Data & Analytics Governance. (S)he is responsible for the implementation (development and deployment) and maintenance of reports and adhoc-analysis.

Source: Reference Model for Data & Analytics Governance


Click here to edit Data analyst

Data applications

In the CDQ Data Excellence Model, data applications is about planning, implementing, and maintaining software which is designed to manage data and data products in order to achieve and maintain data excellence. The design area specifies the applications for managing (master) data, managing data quality, and cataloging/curating data.

Source: Pentek, T; Legner, C. & Otto, B. (2020). Data Excellence Model – Reference Model for Managing Data Assets. CC CDQ Working Report.


Click here to edit Data applications

Data architect

The Data Architect is a core data & analytics role in the CC CDQ Reference Model for Data & Analytics Governance. (S)he is responsible for designing, creating, deploying and managing conceptual and logical data models and for the mapping to physical data models. (S)he is also accountable for the implementation and maintenance of data pipelines.

Source: Reference Model for Data & Analytics Governance


Click here to edit Data architect

Data catalog

A Data Catalog is an integrated platform for data curation, matching data supply and demand. It offers users functions to register data; to retrieve and use data; and to assess and analyze data. A Data Catalog therefore should provide a data inventory (for data supply) and features for data discovery (for data demand) as key components. Additional features should support data governance, data assessment, and data analytics, alongside with appropriate features for catalog administration and data collaboration.

Source: Fadler, Martin; Korte, Tobias; Legner, Christine; Otto, Boris; Spiekermann, Markus: Data Catalogs: integrated platforms for matching data supply and demand, 2018


Click here to edit Data catalog

Data citizen

Data citizens represent employees who rely on data for their daily work but are not data specialists.

Source: Reference Model for Data & Analytics Governance


Click here to edit Data citizen

Data democratization

The enterprise’s capability to motivate and empower a wider range of employees—not just data experts—to understand, find, access, use, and share data in a secure and compliant way.

Source: Lefebvre et al. (2021)

Data-demo-in-ent-report-cover.png
Click here to edit Data democratization

Data excellence

Data excellence is an umbrella term that defines properties of data, comprising data quality (defined as “fitness for purpose”) but also on additional dimensions, such as regulatory compliance, data security, or data privacy.

Source: Pentek, T; Legner, C. & Otto, B. (2020). Data Excellence Model – Reference Model for Managing Data Assets. CC CDQ Working Report.


Click here to edit Data excellence

Data governance

A company-wide framework that determines which decisions must be made and who should make them. This includes the definition of roles, responsibilities, obligations and rights in handling the company’s resource data. In this, data governance pursues the goal of maximizing the value of the data in the company. While data governance determines how decisions should be made, data management makes the actual decisions and implements them.

Source: Otto, Boris; Österle, Hubert: Corporate Data Quality: Prerequsite for Successful Business Models, 2015 (http://www.cdq-buch.de/)


Click here to edit Data governance

Data integration

Data integration is the task of presenting a unified view of data owned by heterogeneous and distributed data sources". The need for data integration may stem from (1) technological heterogeneities (different database technologies) (2) schema heterogeneities (different data models and data representations) and (3) instance-level heterogeneities (conflicting values in different sources for the same data object). Data can be physically integrated or virtually, meaning that the data will remain in the source systems, however will be accessed using a uniform view.

Source: Data and Information Quality (2016), Carlo Batini, Monica Scannapieco


Click here to edit Data integration

Data lifecycle

In the CDQ Data Excellence Model, the data lifecycle comprises all processes regarding the creation, acquisition, storage, maintenance, use, archiving, and deletion of data. For a given data object, it defines and documents the data sources, data supply chains, data consumers, and data use contexts.

Pentek, T; Legner, C. & Otto, B. (2020). Data Excellence Model – Reference Model for Managing Data Assets. CC CDQ Working Report.


Click here to edit Data lifecycle

Data literacy

The continuous learning of core skills, knowledge, attitude and values required to interpret data in a critical manner, and derive meaningful and actionable business insights.




Click here to edit Data literacy

Data management

Data management aims at the efficient usage of data in companies. It makes decisions and executes measures that affect the company-wide handling of data (whereas data governance creates the framework for such through the definition of responsibilities and so forth). It comprises all tasks related to the data lifecycle on a strategic, governing, and technical level: the formulation of a data strategy, the definition of data management processes, standards, and measures, the assignment of roles and responsibilities, the description of the data lifecycle and architecture – covering data models and data modeling standards –, and the management of applications and systems.

Source: Pentek, T., Legner, C. and Otto, B. 2017. 'Towards a Reference Model for Data Management in the Digital Economy'. In: Maedche, A., vom Brocke, J., Hevner, A. (eds.) Designing the Digital Transformation: DESRIST 2017 Research in Progress Proceedings of the 12th International Conference on Design Science Research in Information Systems and Technology. Karlsruhe, Germany. 30 May - 1 Jun. Karslruhe: Karlsruher Institut für Technologie (KIT), pp. 51-66


Click here to edit Data management

Data management capabilities

in the CDQ Data Excellence Model, the data management capabilities design area defines a set of skills, routines, and resources a company needs to have in order to accomplish data excellence that results in business value.

Source: Pentek, T; Legner, C. & Otto, B. (2020). Data Excellence Model – Reference Model for Managing Data Assets. CC CDQ Working Report.


Click here to edit Data management capabilities

Data owner

The data owner is a core data & analytics role in the CC CDQ Reference Model for Data & Analytics Governance. Two different role types of data owner are usually being distinguished in practice: data definition owner and data content owner.

The data definition owner is a decentralized data governance role which is assigned typically to senior business executives with global outreach (e.g. Global head of sales). (S)he is accountable for the data definition in specific areas of responsibility (e.g. a specific data domain like product or customer). Here, (s)he ensures that business requirements are fulfilled and data is compliantly accessed and used. Her/his tasks include collecting/defining data requirements and delegating the detailling of a data definition to a data steward.

The data content owner is a decentralized data governance role which is assigned to local business executives/ team leaders with operational responsibilities. (S/he) is accountable for data creation and maintenance (Data lifecycle) according to the data definition for a specific area of responsibility. (S)he coordinates the creation and maintenance of data by data editors.

Source: Reference Model for Data & Analytics Governance


Click here to edit Data owner

Data quality

Data quality is a multi-dimensional, context-dependent concept that cannot be described and measured by a single characteristic, but rather by various data quality dimensions. The desired level of data quality is thereby oriented on the requirements in the business processes and functions, which use this data, such as Purchasing, Sales or Reporting. A low level of data quality will reduce the value of the data assets in the company, because its usability is minimal. Companies are therefore striving to achieve a quality of data required by the business strategy using data quality management.

Source: Otto, Boris; Österle, Hubert: Corporate Data Quality: Prerequsite for Successful Business Models, 2015 (http://www.cdq-buch.de/)


Click here to edit Data quality

Data quality dimensions

The most important dimensions whose data quality can be assessed are: - Correctness: factual agreement of the data with the properties of the real world object that it represents. - Consistency: agreement of several versions of the data related to the same real objects, which are stored in various information systems. - Completeness: complete existence of all values or attributes of a record that are necessary. - Actuality: agreement of the data at all times with the current status of the real object and adjustment of the data in a timely manner as soon as the real object has been changed. - Availability: the ability of the data user to access the data at the desired point in time.

Source: Otto, Boris; Österle, Hubert: Corporate Data Quality: Prerequsite for Successful Business Models, 2015 (http://www.cdq-buch.de/)


Click here to edit Data quality dimensions

Data quality Key Performance Indicators (Data quality KPIs)

A quantitative measure of data quality. A data quality measurement system measures the values for the quality of data at measurement points at a certain frequency of measurement. Data quality key performance indicators operationalize data quality dimensions. One example is the validation of a data element based on business rules.

Source: Otto, Boris; Österle, Hubert: Corporate Data Quality: Prerequsite for Successful Business Models, 2015


Click here to edit Data quality Key Performance Indicators (Data quality KPIs)

E[edit]

External data

External data refers to any type of data that is captured, processed, and provided from outside the company. The major external data types include open, paid, shared and web data. Despite their increasing relevance, external data remain an untapped resource for most companies. External data can be used to complement internal data and help to improve advanced analysis, optimize business processes (e.g. with geolocation, weather, or traffic data), reduce internal data maintenance efforts (e.g. to enrich or validate internal data), and create new services. However, despite their increasing relevance, external data remain an untapped resource for most companies.

Source: Krasikov, Pavel; Eurich, Markus; Legner Christine: External Data CC CDQ Working Report, 2020


Click here to edit External data

F[edit]

First time right

A principle of preventive data quality management according to which data should be acquired by an information system as correctly as possible in order to avoid retroactively correction (at generally higher levels of expenditure)

Source: Otto, Boris; Österle, Hubert: Corporate Data Quality: Prerequsite for Successful Business Models, 2015 (http://www.cdq-buch.de/)


Click here to edit First time right

G[edit]

H[edit]

I[edit]

Internet of Things (IoT)

The "Internet of Things" refers to the idea of an extended Internet that, in addition to classic computers and mobile devices, also integrates any physical objects into its infrastructure by means of sensors and actuators, thus turning them into providers or consumers of a wide variety of digital services.

Source: Fleisch, E. & Tiesse, F. Enzyklopaedie der Wirtschaftsinformatik: https://www.enzyklopaedie-der-wirtschaftsinformatik.de/wi-enzyklopaedie/lexikon/technologien-methoden/Rechnernetz/Internet/Internet-der-Dinge


Click here to edit Internet of Things (IoT)


J[edit]

K[edit]

L[edit]

Linked open data

Linked Open Data defines a vision of globally accessible and linked data on the internet based on the RDF standards of the semantic web. This structured web data is interlinked with other data and can be accessed through semantic queries. Linked open data is released under an open license, which does not impede its reuse for free.

Source: W3C, Tim Berners-Lee


Click here to edit Linked open data

M[edit]

Master data

Master Data is the most fundamental enterprise data type. Master data represent core business objects (e.g. customers, suppliers, or products) which are agreed upon and shared across the enterprise. They remain largely unaltered and are often referenced and reused in business document and data analysis. They must be unambiguously identifiable and interpretable across the entire organization (i.e., across organizational departments, divisions, and units).

Source: Fadler, Martin; Walter, Valérianne; Hasan, Redwan; Legner, Christine: Data Quality Handbook, 2021


Click here to edit Master data

Master Data management

All of the activities, methods and (IT) tools for modeling, managing and providing master data as well as its data quality management. The goal is to provide and ensure a company-wide truth about the core business objects (single source of truth) and thereby to support data users in various business processes throughout the company.

Source: Otto, Boris; Österle, Hubert: Corporate Data Quality: Prerequsite for Successful Business Models, 2015 (http://www.cdq-buch.de/)


Click here to edit Master Data management

Media data

Media data is unstructured content that represents documents, digital images, geospatial data, and multimedia (video/audio) files.

Source: Fadler, Martin; Walter, Valérianne; Hasan, Redwan; Legner, Christine: Data Quality Handbook, 2021


Click here to edit Media data

Metadata

Metadata is »data about data«. This means that metadata describes the properties of other data. Typically, metadata enables retrieval and maintenance of »data containers« (e.g., documents or files) by means of identifying, classifying or descriptive attributes. Metadata helps an organization understand its data and contributes to the ability to process, maintain, integrate, secure, audit, and govern it. Common metadata attributes include context (i.e. the environment in which data is living), terminology (i.e. definitions and descriptions), administrative information (i.e. when the data have been created and by whom) and governance (i.e. ownership and level of confidentiality).

Source: Fadler, Martin; Walter, Valérianne; Hasan, Redwan; Legner, Christine: Data Quality Handbook, 2021


Click here to edit Metadata

N[edit]

O[edit]

Observational data

Observational data are generated by humans or things. They capture experiences and behavior at a very detailed and fine granular level. Observational data includes IoT/sensor data from connected devices (often in the form of data streams), web data generated by user activities on social media platforms or commercial websites, as well as survey data from questionnaires.

Source: Fadler, Martin; Walter, Valérianne; Hasan, Redwan; Legner, Christine: Data Quality Handbook, 2021


Click here to edit Observational data

Open data

Open data can be defined as "data that is freely available, and can be used as well as republished by everyone without restrictions from copyright or patents”. As specific type of external data, open data holds great business potential and is expected to fuel advanced analytics, optimize business processes, enrich data management, or even enable new services.

Source: Krasikov, P., Legner, C., & Eurich, M. (2021). Sourcing the Right Open Data: A Design Science Research Approach for the Enterprise Context.

Braunschweig, K., Eberius, J., Thiele, M., & Lehner, W. (2012). The State of Open Data. Limits of Current Open Data Platforms.


Click here to edit Open data

P[edit]

Paid data

Paid data, also known as commercially available data, refers to the datasets available directly from specialized data providers (or brokers) and data marketplaces, and offered at a certain cost. It is a specific type of external data and is typically coupled with specific services which facilitate its use, such as identification and classification of data by categories, description of the intended use, metadata documentation, and integration services.

Source: Krasikov, Pavel; Eurich, Markus; Legner Christine: External Data CC CDQ Working Report, 2020


Click here to edit Paid data

People, roles and responsibilities

In the CDQ Data Excellence Model, the people, roles, and responsibilities design area defines the culture, organization, roles, boards, and interactions for data management. As data is generated, managed, and used in many different parts of an organization, a dedicated data management organization supports the orchestration and alignment of enterprise-wide data management activities. This is of particular importance as data management involves many different parts of the enterprise. Consequently, data can only be managed consistently if ownership and responsibilities are assigned and trained and when all employees have a data-driven mindset.

Source: Pentek, T; Legner, C. & Otto, B. (2020). Data Excellence Model – Reference Model for Managing Data Assets. CC CDQ Working Report.


Click here to edit People, roles and responsibilities

Performance management

In the CDQ Data Excellence Model, the performance management design area defines how to plan, implement, and control all activities for measuring, assessing, improving, and ensuring data management performance, data excellence, and business value.

Source: Pentek, T; Legner, C. & Otto, B. (2020). Data Excellence Model – Reference Model for Managing Data Assets. CC CDQ Working Report.


Click here to edit Performance management

Personal data

From a regulatory perspective, personal data can be defined as “data enabling direct or indirect identification of a single physical person, data that is specific to a single physical person without enabling identification, data that can be linked to a physical person, data regarding which anonymization techniques cannot completely mitigate the risk of re-identification” (Debet et al. 2015). From a practical perspective, most companies collect personal data about their customers, employees, suppliers and vendors. A particular area of concern typically are customer data that can be defined as “a set of data that represents and is associated with the identity, activities and service offering associated with a unique individual” (Tapsell et al. 2018).

Source: Debet, A., Massot, J., Metallinos, N., Danis-Fantôme, A., Lesobre, O.: Informatique et libertés. La protection des données à caractère personnel en droit français et européen (2015). Tapsell, J., Akram, R.N., Markantonakis, K: Consumer-Centric Data Control, Tracking and Transparency (2018).


Click here to edit Personal data

Processes and methods

In the CDQ Data Excellence Model, the processes and methods design area defines relevant data management procedures on a strategic, governance, and operational level and specifies which tasks are to be executed by whom and in what order.

Source: Pentek, T; Legner, C. & Otto, B. (2020). Data Excellence Model – Reference Model for Managing Data Assets. CC CDQ Working Report.


Click here to edit Processes and methods

Q[edit]

R[edit]

Reference data

Reference data is used to characterize, categorize, validate or constrain other data. The most basic reference data are codes or key value lists, but they can also be more complex and incorporate hierarchies or vocabularies. Reference data can be defined and created internally (e.g., customer classifications, product groups) or received from external sources (e.g., country or currency codes defined by ISO standards, product classifications defined by e-commerce standards).

Source: Fadler, Martin; Walter, Valérianne; Hasan, Redwan; Legner, Christine: Data Quality Handbook, 2021


Click here to edit Reference data

Regulation

A regulation is a document written in natural language containing a set of guidelines specifying constraints and preferences pertaining to the desired structure and behavior of an enterprise. Examples of regulations are a law (e.g., the General Data Protection Regulation - GDPR), a standardization document, a contract, etc. A regulation specifies the domain elements it applies to and oftentimes has implications for data management.

Source: El Kharbili, M.: Business Process Regulatory Compliance Management Solution Frameworks: A Comparative Evaluation (2012).


Click here to edit Regulation

Regulatory compliance management (RCM)

Regulatory Compliance Management (RCM) is the problem of ensuring that enterprises (data, processes, organization, etc.) are structured and behave in accordance with the regulations that apply, i.e., with the guidelines specified in the regulations.

Source: El Kharbili, M.: Business Process Regulatory Compliance Management Solution Frameworks: A Comparative Evaluation (2012).


Click here to edit Regulatory compliance management (RCM)

Regulatory guideline

A regulatory guideline specifies the expected behavior and structure on enterprise domain elements. It additionally defines tolerated and non-tolerated deviations from the ideal behavior and structure, and also defines the possible exceptional cases. A regulation may also specify how the enterprise ought to or may react to deviations from ideal behavior and structure.

Source: El Kharbili, M.: Business Process Regulatory Compliance Management Solution Frameworks: A Comparative Evaluation (2012).


Click here to edit Regulatory guideline

S[edit]

Shared data

Shared data refers to external data which is shared between companies within dedicated business ecosystems. Examples for sharing and exchange environments include Global Data Synchronization Network (GDSN) provided by GS1 or CDQ Data Sharing Community.

Source: Krasikov, Pavel; Eurich, Markus; Legner Christine: External Data CC CDQ Working Report, 2020


Click here to edit Shared data

Social media data

Web data refers to the data made available on the Web (e.g., online sources, websites) and also shared by users (e.g., user-generated content, reactions, comments) of social media platforms, including the metadata (e.g. location, time, language, biographical data). Web data is one of the subtypes of external data.

Source: Krasikov, Pavel; Eurich, Markus; Legner Christine: External Data CC CDQ Working Report, 2021


Click here to edit Social media data

T[edit]

Transactional data

Transactional data is created by business processes. It documents an important business event or the result of a business transaction. Transactional data often references master data, but in contrast to master data, it naturally changes during its lifecycle (e.g. status changes). Furthermore, the volume of transactional data (e.g. number of sales orders) increases with ongoing business activity. Transaction data typically occurs in invoices, purchase orders, or delivery notes.

Source: Fadler, Martin; Walter, Valérianne; Hasan, Redwan; Legner, Christine: Data Quality Handbook, 2021


Click here to edit Transactional data

U[edit]

V[edit]

W[edit]

X[edit]

Y[edit]

Z[edit]