We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Enterprise Knowledge Graphs

00:00

Formal Metadata

Title
Enterprise Knowledge Graphs
Title of Series
Number of Parts
3
Author
License
CC Attribution 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Computer animation
Computer animation
Transcript: English(auto-generated)
I am going to take it more from an application view of what you can do with knowledge graphs and how we utilize knowledge graphs across human capital domain within NASA. I've had the opportunity to work with knowledge graphs for like the last eight to 10 years in various domains from engineering to international space station to space shuttle work.
Over the last couple of years, I've been looking primarily at how our workforce utilizes their skills, how we can identify the skills and competencies across our workforce and how we might be able to make sure we have the correct competencies and skills that'll take us back to the moon and on to Mars eventually. But so this presentation is more about how you do that,
how you can extract these types of information and how you can actually apply this to various different types of domain. So many of the first questions I get when I first start looking at knowledge graphs is identify what is a knowledge graph? You know, how do we use it? What does it actually do? And knowledge graphs while they've been around for several years now,
with Google's first introduction of the term knowledge graph many years ago, they really started taking some kind of prominence in the last few years with the definitely the development of graph databases, the increased use of graph databases and how you might be able to create a knowledge graph. Simplest way that I explain knowledge graphs to anybody, of course, is just an interconnected group of domains
that find common relationships across those domains. That allows us to find patterns, look at different capabilities and find search, be able to search faster across those various domains to kind of understand how one group or one domain may be similar to another or how we can extract information from that.
And there are many various different types of ways you can do that. Primarily through most of the work that I do is around natural language processing, looking at textual information across my domain and extracting entities from that and how those were related through many different concepts. I'm gonna show one very simple concept on how we did that right now,
utilizing the Doc2Vec algorithm, but we're also looking at other types of algorithms that I can talk about in a little bit. So one of the first things I always get asked is what type of graph database should I be looking for? And there's various types of course out there. There's gonna be a comparison between RDF or LPG.
And this example right here basically just shows, I'm sorry, let me go back, I apologize. This example here just basically shows the differences between an LPG or in an RDF type of a graph concept. You're gonna have some very spirited arguments
in discussions around which one to use. As you can see here from an RDF, you get generally more nodes and edges when you try to relate to it because of the way it's structured as opposed to a label property graph where you introduce a concept of labels that get associated to particular nodes. I prefer to use the LPG models myself.
It's just what I've been more intuitive to myself. It's easy to work with, but again, it's the right tool for the right job, what you're comfortable with and what you're able to utilize, but definitely take a look at the various concepts of various types of database out there. I'm gonna show my work here utilizing the LPGs and the graph database Neo4j
that's been out there for several years now. So let's talk a little bit about the model first. How does that model work and what am I looking for? This is what I call my KSAT knowledge graph model. And KSAT stands for knowledge, skills, abilities, tasks, and technologies. And you get to understand why I call it that is because what I'm looking to do here is connect my occupation domain to my employee domain.
And let me, all right, so in here, so I have to start off somewhere. So I'll start off with my occupation domain and my occupation domain is going to be my base for my knowledge graph. And the reason for that is because
there's a ton of information out there around occupations from various different authoritative sources. In the United States, you've got the Department of Labor's O-Net database, which tracks about 2000 different occupations and has various elements associated to those occupation. Those elements are, again, knowledge, skills, abilities, tasks, and so much more. And those become your nodes
and they're related to each other based on their connections for how knowledge may be applied to a particular occupation. And you have various attributes or properties that are associated to those nodes based on domain experts looking at the importance, the validity, the need of educational requirement for those. All of that information can be ingested
into a graph database. So I use all of that textual information to create a dictionary that I trained the Dr. VEC model on that I could then utilize to infer against my people, my employees. I'd grab all the information I can find about my employees, their job description, their performance evaluations, their LinkedIn profile,
social media, publications they may have written. And I use the Dr. VEC model to infer against that employees to understand what common knowledge, skills, abilities, or tasks they also may have that are related back to an occupation. All of that gets dumped back into a graph database, Neo4j in this case. And then I started utilizing various graph algorithms
to try to extract information out of that graph database to look at things such as skill gaps. Do I have the right skills in a particular work role or in a particular project or a particular center? Am I losing certain skills due to attrition? You know, things that I can start adding it back into the graph database.
I can also look at career paths or succession planning. Who are the right people that could fill the next generation of engineers or research scientists or leaders within an organization based on the training they've had on the path that they've taken within this occupation. All of that can be stored within the graph database and we can start sharing that kind of information,
showing what we can get out of it and really applying this knowledge to our workforce to make it more agile and make sure we're applying the right workforce in the right locations. Sorry. So this is a very, very high level look at the graph database itself.
I didn't want to, because once you start looking at a graph itself, it can get really spread out there. But this gives you a quick idea of how I take each domain, the occupation domain, the employee domain, and I can now add training domain and infer out of that and make relationships connected to each of those main based on their knowledge, skills, abilities,
task and technology. And I'll give you a quick example of how that works. So from an occupation standpoint, as a data scientist, I have to have knowledge of natural language processing. And to do that, having that knowledge, I had to apply a skill of being able to run a doc2vec model. I'm sorry, have a skill of doing a text similarity,
making sure I can compare texts across each other. And in order to do that text similarity, I have to apply a task of learning the doc2vec model. So that's the occupation as an employee, as a person, I have that knowledge of natural language processing. I have that skill of being able to do text similarity, and I have the experience and practice
of running the doc2vec model. So now I'm able to relate myself to this occupation through those common nodes or relationships of the different elements associated to that occupation. But then I can throw in things such as training, because within training, you also have the same type of information. You have an educational class
that may provide you knowledge around natural language processing. And within that, you'll learn a skill of how to do text similarity by doing various exercises within that around doc2vec and training doc2vec model. So again, I've taken another domain, utilizing those same common nodes, related them back, and now I can apply that into a knowledge graph.
So the knowledge graph is growing as I add more information. And I can continue to add to this information based on various types of relationships. I can look at projects, you know, what type of skills or requirements are necessary within a project and relate that back to jobs, to training, to employees, and so forth. The more you can look at more domains, the greater your knowledge graph grows
and the more values you get out of your knowledge graph. So this is kind of more of a, again, still a very high level look at the knowledge graph and what you can do with it as I've added more and more information to it. Now from the occupations and the elements that I have across the occupations, I can create such as a job profiles. These job profiles allow me to really fine tune
what types of jobs I'm looking for within my organization across NASA, across the various centers, and then start looking at how we evaluate those jobs based on the competencies that are required for that job. Similarly, you can do the employee profile, finding the right employees for a particular job
through a talent marketplace concept. So now we can have managers that are looking for someone to fill a particular work role that can go through the employee profiles and find those individuals. Maybe they don't find somebody that's 100% match, maybe they find somebody that's 75% match, but because we also have a training profile around,
we can upskill that employee to reach the necessary requirements in order to do the job they're looking for. And then on top of that, I could add where the employee is charging to, the different types of mission, theme, programs, projects, or cost centers. And now I can start identifying what skills, particular skills or competencies I may have within an organization,
all the way down to the lowest level within that project so that I can really make sure that I have the right competencies and skills across the organization with the right people. And we do that by visualizing and taking a look across our database. And this is a simple visualization straight out of Neo4j running a Cypher query,
just to pull out the individuals that are associated to the earth science theme and the different job roles they may be doing within that earth science. This was a particular request a few months ago from our recruiting group that wanted to show during earth day, what type of jobs are available within NASA around earth sciences.
Utilizing the knowledge graph that we created, we can pull out the various jobs. And there were about 200 jobs that are actually associated within the earth science and the individual people that are in there. And if you start thinking about how you can utilize knowledge graphs, since I have the job, I have the people, I can now break out and aggregate the various competencies and skills within that job,
or really start taking a fine tune of what skill I may need that may be missing due to the number of competencies I may need within that occupation. So there's various things you can do within the graph database in order to really answer some of your workforce questions. So another way of visualizing,
this is another built in product within Neo4j called Bloom that allows you to look at your graph database and pull out information. You can create queries that are parameterized and allowing you to provide information very easily back to your end users. In this particular case, we're looking at one individual, the various skills or competencies they may have.
In this case, it's cross-functional skills from the occupation that they're related to and how they charge to that. So by taking this, again, you can aggregate across that entire group, the different types of skills or competencies you may have finding various patterns that may be evolving, how somebody may have changed their competencies
or skills over time, their career path, and you can then model these individuals to really fine tune your workforce to really train the next level of workforce for the next generation. Couple of last few other things you can look at. This is a simple dashboard, like any other database that you can pull out information
and show that in the dashboard. This is the simple taxonomy of the elements associated within the occupation. This shows the various, again, it's broken at very high level, but this shows you the different elements associated to an occupation, showing the knowledge, the skills, your abilities,
and how it breaks down all the way down to the occupation. And you still see some in the top left-hand corner there are some that are still kind of orphaned or off to the side, that we're still trying to find, develop different types of relationships across that. And some of the ways we're doing that is trying to develop new ways of extracting these different entities from our information we may have
about our occupations or our employees using a very specific entity extraction model we're developing to look at NASA competencies, how they're defined and how to extract that from various information we may have about the individual, again, their research papers, the work that may be done, their resumes or CVs,
anything we can look from that. That model is still in work and I hope to be able to share that in the next several months, but really just taking a look at NASA specific information to find NASA specific competencies and jobs across our workforce or out there within the job opportunities that are worldwide. One last visualization, again, from a dashboard
everything does not have to be shown in a graph as you have on the left-hand side, again, cross-functional skills by individuals, the A's that are active employees and the various projects that they may be associated with. Again, you can show specific cross-functional skills or knowledge or abilities,
or even down to the task levels you may be looking for across your workforce to really identify the things you're looking for, but it could also be shown as a table and pulling out in this particular case, since we do have information in the cross-functional skills as to their importance to a particular job, this is just a printout of particular occupations
in this case from a government OPM classification of a 2210, which is information technology. We have various occupations from computer programmers to web developers down to information technology project manager listing the most important cross-functional skill
to that particular occupation. As you can see the most important here is just writing computer programs for various needs. That is one of the most important things that's identified by domain experts in that field. All of that information can easily be visualized in various types of dashboards. The domain space for graph databases and visualization has grown over the years.
You're starting to see a lot more companies trying to develop ways of visualizing these. There's many open-source tools, it's D3, Neovis, so there's a lot of different ways of looking at visualizations that can showcase your knowledge graphs and their capability. So hopefully I've landed the talk here.
I can stick to landing and if anybody wants to reach out and talk, there's my contact information. I'd be more than happy to answer any questions at this time.