Enterprise-Wide Metadata Management - TIB AV-Portal

Enterprise-Wide Metadata Management

00:00

38

Related Material

Technische Informationsbibliothek (TIB)

Eichler, Rebecca Giebler, Corinna Gröger, Christoph Hoos, Eva Schwarz, Holger Mitschang, Bernhard

Formal Metadata

Title

Enterprise-Wide Metadata Management

Subtitle

An Industry Case on the Current State and Challenges

Title of Series

24th International Conference on Business Information Systems (BIS 2021)

Number of Parts

30

Author

Eichler, Rebecca

Giebler, Corinna

Gröger, Christoph

Schwarz, Holger

Mitschang, Bernhard

License

CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/53708 (DOI)

Publisher

Technische Informationsbibliothek (TIB)

0000-0002-5190-1867 (ORCID)

1080328793 (GND)

04aj4c181 (ROR)

Release Date

Language

Content Metadata

Subject Area

Information Science

Genre

Conference/Talk

Abstract

Metadata management is a crucial success factor for companies today, as for example, it enables exploiting data value fully or enables legal compliance. With the emergence of new concepts, such as the data lake, and new objectives, such as the enterprise-wide sharing of data, metadata management has evolved and now poses a renewed challenge for companies. In this context, we interviewed a globally active manufacturer to reveal how metadata management is implemented in practice today and what challenges companies are faced with and whether these constitute research gaps. As an outcome, we present the company’s metadata management goals and their corresponding solution approaches and challenges. An evaluation of the challenges through a literature and tool review yields three research gaps, which are concerned with the topics: (1) metadata management for data lakes, (2) categorizations and compositions of metadata management tools for comprehensive metadata management, and (3) the use of data marketplaces as metadata-driven exchange platforms within an enterprise. The gaps lay the groundwork for further research activities in the field of metadata management and the industry case represents a starting point for research to realign with real-world industry needs.

Keywords

Metadata Management

Data Transparency

Data Marketplace

24th International Conference on Business Information Systems (BIS 2021)2 / 30

1

18:36

Optimization-Based Business Process Model Matching

2

15:57

Enterprise-Wide Metadata Management

3

13:27

A Scoping Review of the Digital Transformation Literature Using Scientometric Analysis

4

18:59

Stream Processing Tools for Analyzing Objects in Motion Sending High-Volume Location Data

5

15:56

The Digitalization of Local Owner-Operated Retail Outlets: How Environmental and Organizational Factors Drive the Use of Digital Tools and Applications

6

18:18

A Novel Example-Dependent Cost-Sensitive Stacking Classifier to Identify Tax Return Defaulters

7

11:50

An Integrated Group Decision-Making Approach Considering Uncertainty Conditions

8

12:11

Interoperability of Health Digitalization

9

22:42

Potentials and Barriers of Agility in Small and Medi-um Sized Enterprises

10

18:27

Development of the Information Security Management System Standard for Public Sector Organisations in Estonia

11

16:27

Innovating in Circles

12

17:36

Social Media Crisis Communication Model for Building Public Resilience: A Preliminary Study

13

16:10

Predict COVID-19 Spreading With C-SMOTE

14

14:45

Database-Less Extraction of Event Logs from Redo Logs

15

15:03

Predicting E-commerce Item Sales With Web Environment Temporal Background

16

15:15

Evaluating the New AI and Data Driven Insurance Business Models for Incumbents and Disruptors: Is there Convergence?

17

11:14

Supporting an Expert-centric Process of New Product Introduction With Statistical Machine Learning

18

18:28

Evaluation of Deep Learning Instance Segmentation Models for Pig Precision Livestock Farming

19

10:28

Towards a Concept for Building a Big Data Architecture with Microservices

20

15:43

Contextual Personality-Aware Recommender System Versus Big Data Recommender System

21

18:21

Execution of Multi-Perspective Declarative Process Models Using Complex Event Processing

22

14:19

Ontological Modeling of the State Economic Development Policy for Cultural Industries

23

42:30

Keynote 1: Data Spaces as an Enabler for Data Sovereignty in Business Ecosystems.

24

18:07

Semantic Representation of Domain Knowledge for Professional VR Training

25

17:09

Developing a Legal Form Classification and Extraction Approach for Company Entity Matching

26

15:02

Domain-Specific Event Abstraction

27

14:58

Post-Brexit Power of European Union From the World Trade Network Analysis

28

18:41

Mapping of ImageNet and Wikidata for Knowledge Graphs Enabled Computer Vision

29

15:19

Towards a Guideline Affording Overarching Knowledge Building in Data Analysis Projects

30

11:31

Text-Aware Predictive Monitoring of Business Processes

Automatic playback

Speech

Text

Image

00:29

Computer animation

07:10

Computer animation

13:52

Computer animationMeeting/Interview

Transcript: English(auto-generated)

00:00

Okay, so Hello, everyone from my side. My name is Rebecca Eichler, and I'm a PhD student at the University of Stuttgart. And today I will be presenting the content of my paper on enterprise wide metadata management. So specifically, we wrote more about the current state and challenges in metadata management. And this is all based often

00:26

often industry case. I'm going to start real quick with a motivation why we need enterprise wide metadata management and what it's actually about. So what you see here is a variety of data sources that you would have inside of a company and its

00:45

employees. So today, as you all know, companies are collecting all kinds of data. And they are doing this because they hope to derive some kind of value from the data. Now they could be doing this, for example, through an employee like Bob

01:01

here, who is performing data analysis on a few of the data sources and thereby deriving some kind of insight. Now, the problem today or an issue today is that the employees only know about very few of the data sources that are available to

01:20

them. Now this limits the value that we can actually extract from our data. So this is why today, we are trying to make the data that we have available to all of our employees, so far as this is compliant, obviously. And if we do this, we have more

01:40

people working on more of our data. And thereby, for example, we can generate more insights. Now, to share the data in this way, we need enterprise-wide metadata management. So right now, metadata is collected in the scope of the single systems. For example, in the data lake context here, you could have metadata on

02:04

what data is in the lake, who's allowed to access it, what's the quality, what is the lineage information, and so on. Now, to make the data available, we need to make the metadata available first. So we need to bring this metadata together in some

02:23

kind of centralized approach. And this is what we mean when we talk about enterprise-wide metadata management. So basically, this encompasses the metadata management initiatives of single systems and brings all of it together. So yeah, now we went ahead, and we interviewed a globally active manufacturer. And this

02:45

is a company that is active in various sectors, such as mobility, industrial sector, and so on. They have a global manufacturing network, and there's all kinds of data out of this context. And they are currently striving to become a

03:02

data-driven industry 4.0 company. Now, we interviewed them to find out what metadata management they're conducting, what their goals are, challenges, and so on. And that's what I'm going to be talking about today. So first, I'm going to tell you about goals that companies have today, how they are put into

03:22

practice, and what challenges they are faced with. We also went ahead and we did a literature review and a tool review to see if these challenges do have coverage. And we found a few research gaps, which I will be presenting later. And I'm going to close this talk with a quick summary and conclusion. Okay, so

03:46

the current state in metadata management. The first thing we need to understand is that metadata management is an enabler for data management. So the first thing we need to know is the data management goal. And as I explained earlier, one of

04:05

the very big topics in enterprises today is the sharing of their data, because this promotes the extraction of data value. Now, the company that we interviewed explicitly said, we want to share our data freely and

04:22

efficiently. Now this means really with as many employees as possible, and efficiently with as little effort as possible. And what we also have to consider when we're talking about data sharing, it's always it always has two sides, we have one side actually provisioning data and the other side then accessing and using it. Okay, now, to enable data sharing, we need data

04:49

transparency. And this is our big data management goal. Data transparency is all about the ability to actually find the data that you have to then be able to

05:02

understand it. And when once you have understood your data, you want to be able to access it. So this is what data transparency is about. But data transparency is still kind of vague. So how do you achieve data transparency and the company we interviewed, they went ahead and they defined four sub goals for

05:24

metadata management sub goals to do this. The first one is the creation of a data inventory. So we need to know what data we have in our company. The second one is about creating coherent and shared semantics. Now, if we have

05:43

department A sharing data with department B, department B must be able to understand the terms they use, and so on, they must be able to understand the data to avoid misunderstandings, false insights, and so on. And for that, we need shared semantics. The third sub goal is about creating a common structural

06:05

description of data. So today, it's very difficult to understand data based on only their models and to reuse it. So this is an issue they're approaching with the sub goal three. And the fourth sub goal is about creating a

06:20

common data asset description. So a general description of the data that we have. Okay, now, those were the goals. And now it's interesting to see how can we actually put them into practice? How can we implement these and sub goal one can be implemented through a data catalog. This is a metadata

06:44

management tool, that main its main feature is an inventory and we can go ahead and we can register all of our data sources that we have, or many of the data sources that we have, and we can collect additional metadata on them. Examples of this are Alation, Colibra data catalog, Informatica data

07:05

catalog, and so on. The second goal can be achieved through a business glossary. So this is also a metadata management tool. Often, catalogs also contain such a component. And this is basically

07:21

just a list of terms, abbreviations, synonyms, term relations, this is what we define in a business glossary. Examples here would be like urban data literacy. Now, the third sub goal is with a little bit more complex, or is more complex, and the company that we interviewed, they went ahead and

07:43

did this through semantic modeling. And they defined, they introduced a meta model. So they went ahead and they, to bring, to create this common structure description, they went ahead and they created the meta model with various abstraction levels. So on the first abstraction level, for

08:02

example, they defined a business object. On the second lower level, they would say, okay, we have a machine, and machine is an instance of a business object. And on the next lower level, they would say, okay, a drill bench is an instance of a machine, and they would connect these instances. Now, if we have data models, and in these models, we are using a drill bench or

08:24

something of that sort, or machine, we can interconnect them with these entire definition instances. And by doing this, they bring together all of their data models, they add this layer of understanding to it, and they're creating a kind of knowledge graph. So this is their approach for

08:41

creating a common structural description of the data exists. And the last sub goal can be obtained through a metadata standard like Dublin Core, you can think of this as a list of attributes that you're collecting on your data sets. This is great, because the company

09:02

that we interviewed, they had not actually implemented this part yet, and therefore, we did not go into detail on it. Okay, now, those were the goals and how we can put them into practice. And now I would like to tell you more about the challenges that companies are faced with today. So there are several challenges, but we looked

09:21

at three in particular. The first one is relates to metadata management in data lakes. Now, there is a lot of work on this, there is a lot of literature on metadata management in data lakes, for example, on metadata management systems like constants, gems, goods, and so on. Now, there are still open questions. For

09:46

example, data lakes are known for turning into data swamps, a data swamp is a lake in which the data that is contained is unfit for use. This is usually the case because of missing metadata. Now, it is still unclear what tasks do what tasks

10:04

exactly do I require to prevent the creation of a data swamp. So an example task would be data quality management or collecting lineage information or something of this sort. And there, this is, they talk about it in literature,

10:20

but it varies and it is not clear what do I actually have to do, which is the minimum, what are the minimum necessary tasks for this? Also, for these single tasks, there are open questions like, what metadata do I actually need to collect in this context? So they say you need to do data quality management, but we don't know what metadata we

10:42

need to collect within the data lake for this quality information. Also, what tools, protocols, and standards are there that are good in data lakes? And also, how can I then take the metadata that I collected and then later integrate it into my enterprise wide landscape? So

11:01

these are questions that are open. The same here. How do these tasks actually differ in data lakes? So data lakes are different from our other source systems or source systems that we have. So the question is, how do I actually perform metadata management in this lake? And how does, for

11:20

example, data quality management differ in it? I need to know this to collect the right metadata. The second challenge that we looked at was the selection and composition of metadata management tools. Now, there's also a lot of literature on various tool types. I already

11:40

mentioned data catalogs, business glossaries, but there are more. There are data marketplaces, data hubs, data dictionaries, and so on. Now, there are very little comparisons of these tool types. We have a few blog articles that compare subgroups of these tools. And we also

12:01

have lists of actual tool instances, like our compiler list of tools and vendors. Now, what we don't have is an actual comprehensive overview of the tool types, and also a categorization and differentiation of these tool types. For example, how is a data hub different from a

12:22

data marketplace? Is it a synonym? Is it a subtype? How do they actually relate? How do they work together? This is what we need to know when we're building an enterprise-wide tool landscape. Also, what are the building blocks of these tools? For example, a data marketplace often contains a data catalog, and a data

12:42

catalog often contains a business glossary. So they contain each other, and we need to know what are the building blocks? How do they work together in order to know what tools we need to select? The same topic here, it will be really interesting to know what building blocks do I actually need to conduct a comprehensive metadata

13:04

management. Now, these are also questions that are not addressed sufficiently through literature. The third challenge we looked at is on data marketplaces for the internal

13:21

use. Now, for example, we said earlier, we're using data catalogs and business glossaries in companies today. And this helps us to find the data and understand the data, but it does not actually help us to access or provision data. So this is why we need a data marketplace. Data marketplaces

13:43

are platforms for the exchange of data. But currently, they are mainly used for the exchange of data between companies. Now we're looking at data marketplaces for the internal context. Okay, so it's about data marketplaces in the internal context. And we have

14:00

literature on there is a lot of work on data marketplaces, but usually in the external context, we hardly have anything on how we can use data marketplaces within an enterprise, which is why we are basically missing detailed concepts and solutions with architectural proposals with definitions on the functional scopes, and so on in the

14:24

internal context. Now, just to name an example, how they could differ. Compliance is a huge issue in this context internally, but not externally. If I'm a company, and I'm sharing a specific data set with another company,

14:40

then this is a very, I have checked that this is all compliant. But if I'm trying to share all of the data I have with all of my employees, I need a lot more refined compliance regulations within this marketplace. Yes. Okay, so these were a few of the challenges. And we see that there are issues in the context of metadata management.

15:03

And I'm going to summarize or sum up real quick what we learned today. So I talked about the current data and metadata management goals, which are mainly about data sharing and data transparency. I gave you an insight in to the challenges. So which are mainly data marketplaces. This

15:23

is on the right here, because we need this in addition to what they had to the other sub goals. And the other two challenges are more cross sectional on metadata management for data lakes, and also the selection and composition of metadata management tools. Now, these are blue, because we conducted a literature and tool review,

15:43

and we found that these are not sufficiently covered, and therefore constitute research gaps. This is, this is what I wrote about in the paper. And thank you for your attention.