We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Enterprise-Wide Metadata Management

00:00

Formal Metadata

Title
Enterprise-Wide Metadata Management
Subtitle
An Industry Case on the Current State and Challenges
Title of Series
Number of Parts
30
Author
License
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Metadata management is a crucial success factor for companies today, as for example, it enables exploiting data value fully or enables legal compliance. With the emergence of new concepts, such as the data lake, and new objectives, such as the enterprise-wide sharing of data, metadata management has evolved and now poses a renewed challenge for companies. In this context, we interviewed a globally active manufacturer to reveal how metadata management is implemented in practice today and what challenges companies are faced with and whether these constitute research gaps. As an outcome, we present the company’s metadata management goals and their corresponding solution approaches and challenges. An evaluation of the challenges through a literature and tool review yields three research gaps, which are concerned with the topics: (1) metadata management for data lakes, (2) categorizations and compositions of metadata management tools for comprehensive metadata management, and (3) the use of data marketplaces as metadata-driven exchange platforms within an enterprise. The gaps lay the groundwork for further research activities in the field of metadata management and the industry case represents a starting point for research to realign with real-world industry needs.
Keywords
Computer animation
Computer animation
Computer animationMeeting/Interview
Transcript: English(auto-generated)
Okay, so Hello, everyone from my side. My name is Rebecca Eichler, and I'm a PhD student at the University of Stuttgart. And today I will be presenting the content of my paper on enterprise wide metadata management. So specifically, we wrote more about the current state and challenges in metadata management. And this is all based often
often industry case. I'm going to start real quick with a motivation why we need enterprise wide metadata management and what it's actually about. So what you see here is a variety of data sources that you would have inside of a company and its
employees. So today, as you all know, companies are collecting all kinds of data. And they are doing this because they hope to derive some kind of value from the data. Now they could be doing this, for example, through an employee like Bob
here, who is performing data analysis on a few of the data sources and thereby deriving some kind of insight. Now, the problem today or an issue today is that the employees only know about very few of the data sources that are available to
them. Now this limits the value that we can actually extract from our data. So this is why today, we are trying to make the data that we have available to all of our employees, so far as this is compliant, obviously. And if we do this, we have more
people working on more of our data. And thereby, for example, we can generate more insights. Now, to share the data in this way, we need enterprise-wide metadata management. So right now, metadata is collected in the scope of the single systems. For example, in the data lake context here, you could have metadata on
what data is in the lake, who's allowed to access it, what's the quality, what is the lineage information, and so on. Now, to make the data available, we need to make the metadata available first. So we need to bring this metadata together in some
kind of centralized approach. And this is what we mean when we talk about enterprise-wide metadata management. So basically, this encompasses the metadata management initiatives of single systems and brings all of it together. So yeah, now we went ahead, and we interviewed a globally active manufacturer. And this
is a company that is active in various sectors, such as mobility, industrial sector, and so on. They have a global manufacturing network, and there's all kinds of data out of this context. And they are currently striving to become a
data-driven industry 4.0 company. Now, we interviewed them to find out what metadata management they're conducting, what their goals are, challenges, and so on. And that's what I'm going to be talking about today. So first, I'm going to tell you about goals that companies have today, how they are put into
practice, and what challenges they are faced with. We also went ahead and we did a literature review and a tool review to see if these challenges do have coverage. And we found a few research gaps, which I will be presenting later. And I'm going to close this talk with a quick summary and conclusion. Okay, so
the current state in metadata management. The first thing we need to understand is that metadata management is an enabler for data management. So the first thing we need to know is the data management goal. And as I explained earlier, one of
the very big topics in enterprises today is the sharing of their data, because this promotes the extraction of data value. Now, the company that we interviewed explicitly said, we want to share our data freely and
efficiently. Now this means really with as many employees as possible, and efficiently with as little effort as possible. And what we also have to consider when we're talking about data sharing, it's always it always has two sides, we have one side actually provisioning data and the other side then accessing and using it. Okay, now, to enable data sharing, we need data
transparency. And this is our big data management goal. Data transparency is all about the ability to actually find the data that you have to then be able to
understand it. And when once you have understood your data, you want to be able to access it. So this is what data transparency is about. But data transparency is still kind of vague. So how do you achieve data transparency and the company we interviewed, they went ahead and they defined four sub goals for
metadata management sub goals to do this. The first one is the creation of a data inventory. So we need to know what data we have in our company. The second one is about creating coherent and shared semantics. Now, if we have
department A sharing data with department B, department B must be able to understand the terms they use, and so on, they must be able to understand the data to avoid misunderstandings, false insights, and so on. And for that, we need shared semantics. The third sub goal is about creating a common structural
description of data. So today, it's very difficult to understand data based on only their models and to reuse it. So this is an issue they're approaching with the sub goal three. And the fourth sub goal is about creating a
common data asset description. So a general description of the data that we have. Okay, now, those were the goals. And now it's interesting to see how can we actually put them into practice? How can we implement these and sub goal one can be implemented through a data catalog. This is a metadata
management tool, that main its main feature is an inventory and we can go ahead and we can register all of our data sources that we have, or many of the data sources that we have, and we can collect additional metadata on them. Examples of this are Alation, Colibra data catalog, Informatica data
catalog, and so on. The second goal can be achieved through a business glossary. So this is also a metadata management tool. Often, catalogs also contain such a component. And this is basically
just a list of terms, abbreviations, synonyms, term relations, this is what we define in a business glossary. Examples here would be like urban data literacy. Now, the third sub goal is with a little bit more complex, or is more complex, and the company that we interviewed, they went ahead and
did this through semantic modeling. And they defined, they introduced a meta model. So they went ahead and they, to bring, to create this common structure description, they went ahead and they created the meta model with various abstraction levels. So on the first abstraction level, for
example, they defined a business object. On the second lower level, they would say, okay, we have a machine, and machine is an instance of a business object. And on the next lower level, they would say, okay, a drill bench is an instance of a machine, and they would connect these instances. Now, if we have data models, and in these models, we are using a drill bench or
something of that sort, or machine, we can interconnect them with these entire definition instances. And by doing this, they bring together all of their data models, they add this layer of understanding to it, and they're creating a kind of knowledge graph. So this is their approach for
creating a common structural description of the data exists. And the last sub goal can be obtained through a metadata standard like Dublin Core, you can think of this as a list of attributes that you're collecting on your data sets. This is great, because the company
that we interviewed, they had not actually implemented this part yet, and therefore, we did not go into detail on it. Okay, now, those were the goals and how we can put them into practice. And now I would like to tell you more about the challenges that companies are faced with today. So there are several challenges, but we looked
at three in particular. The first one is relates to metadata management in data lakes. Now, there is a lot of work on this, there is a lot of literature on metadata management in data lakes, for example, on metadata management systems like constants, gems, goods, and so on. Now, there are still open questions. For
example, data lakes are known for turning into data swamps, a data swamp is a lake in which the data that is contained is unfit for use. This is usually the case because of missing metadata. Now, it is still unclear what tasks do what tasks
exactly do I require to prevent the creation of a data swamp. So an example task would be data quality management or collecting lineage information or something of this sort. And there, this is, they talk about it in literature,
but it varies and it is not clear what do I actually have to do, which is the minimum, what are the minimum necessary tasks for this? Also, for these single tasks, there are open questions like, what metadata do I actually need to collect in this context? So they say you need to do data quality management, but we don't know what metadata we
need to collect within the data lake for this quality information. Also, what tools, protocols, and standards are there that are good in data lakes? And also, how can I then take the metadata that I collected and then later integrate it into my enterprise wide landscape? So
these are questions that are open. The same here. How do these tasks actually differ in data lakes? So data lakes are different from our other source systems or source systems that we have. So the question is, how do I actually perform metadata management in this lake? And how does, for
example, data quality management differ in it? I need to know this to collect the right metadata. The second challenge that we looked at was the selection and composition of metadata management tools. Now, there's also a lot of literature on various tool types. I already
mentioned data catalogs, business glossaries, but there are more. There are data marketplaces, data hubs, data dictionaries, and so on. Now, there are very little comparisons of these tool types. We have a few blog articles that compare subgroups of these tools. And we also
have lists of actual tool instances, like our compiler list of tools and vendors. Now, what we don't have is an actual comprehensive overview of the tool types, and also a categorization and differentiation of these tool types. For example, how is a data hub different from a
data marketplace? Is it a synonym? Is it a subtype? How do they actually relate? How do they work together? This is what we need to know when we're building an enterprise-wide tool landscape. Also, what are the building blocks of these tools? For example, a data marketplace often contains a data catalog, and a data
catalog often contains a business glossary. So they contain each other, and we need to know what are the building blocks? How do they work together in order to know what tools we need to select? The same topic here, it will be really interesting to know what building blocks do I actually need to conduct a comprehensive metadata
management. Now, these are also questions that are not addressed sufficiently through literature. The third challenge we looked at is on data marketplaces for the internal
use. Now, for example, we said earlier, we're using data catalogs and business glossaries in companies today. And this helps us to find the data and understand the data, but it does not actually help us to access or provision data. So this is why we need a data marketplace. Data marketplaces
are platforms for the exchange of data. But currently, they are mainly used for the exchange of data between companies. Now we're looking at data marketplaces for the internal context. Okay, so it's about data marketplaces in the internal context. And we have
literature on there is a lot of work on data marketplaces, but usually in the external context, we hardly have anything on how we can use data marketplaces within an enterprise, which is why we are basically missing detailed concepts and solutions with architectural proposals with definitions on the functional scopes, and so on in the
internal context. Now, just to name an example, how they could differ. Compliance is a huge issue in this context internally, but not externally. If I'm a company, and I'm sharing a specific data set with another company,
then this is a very, I have checked that this is all compliant. But if I'm trying to share all of the data I have with all of my employees, I need a lot more refined compliance regulations within this marketplace. Yes. Okay, so these were a few of the challenges. And we see that there are issues in the context of metadata management.
And I'm going to summarize or sum up real quick what we learned today. So I talked about the current data and metadata management goals, which are mainly about data sharing and data transparency. I gave you an insight in to the challenges. So which are mainly data marketplaces. This
is on the right here, because we need this in addition to what they had to the other sub goals. And the other two challenges are more cross sectional on metadata management for data lakes, and also the selection and composition of metadata management tools. Now, these are blue, because we conducted a literature and tool review,
and we found that these are not sufficiently covered, and therefore constitute research gaps. This is, this is what I wrote about in the paper. And thank you for your attention.