We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Connecting the Dots with PowerShell

00:00

Formal Metadata

Title
Connecting the Dots with PowerShell
Title of Series
Number of Parts
60
Author
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language
Producer
Production Year2018

Content Metadata

Subject Area
Genre
Abstract
One of PowerShell's greatest strengths is its ability to glue pretty much any technologies together. We'll use that strength to pull data from a number of services, connecting the resulting dots with a graph database that could be used as a lightweight CMDB. We'll talk about: * Different interfaces PowerShell can use, from modules to .NET libraries * Graph databases like Neo4j, and how these can be useful for sysadmins * A practical (janky) CMDB, and why these can be useful ## Why the topic: I'm a fan of CMDBs that have useful data. They can drive automation, monitoring and alerting, reporting, and anything else that benefits from visibility. It just so happens that: * This is a great way to illustrate the various ways to talk to things in PowerShell (modules, web APIs, .NET libraries, binaries, etc.) * Graph databases are awesome, and map to real life systems more easily than the cumbersome fun of primary keys, foreign keys, and strict schemas * Neo4j has a free, cross platform community edition, and there's a simple PowerShell module to work with it * We can instill other important lessons, e.g. modules/abstraction, community/sharing * We can provide a practical example that folks without a reasonable CMDB could borrow and extend * Heavy weight, expensive, actual CMDBs are a poor fit for shops adopting DevOps practices and principles
Frame problemTwitterSystem administratorLinker (computing)CubeDatabaseConfiguration spaceCustomer relationship managementTask (computing)Active DirectoryService (economics)Server (computing)Computer networkGroup actionMiniDiscFunctional (mathematics)Query languageCorrelation and dependenceInterface (computing)Directed graphTraffic reportingIdentity managementGraph (mathematics)Direction (geometry)Data typeData storage deviceTerm (mathematics)MethodenbankServer (computing)Domain nameElectronic mailing listReal numberGraph (mathematics)Dot productMiniDiscMultilaterationDatabaseStandard deviationMereologyConnectivity (graph theory)Category of beingQuicksortType theoryFunctional (mathematics)Traverse (surveying)Graph theoryDirectory serviceNeuroinformatikRelational databaseGame controllerQuery languageFeedbackTable (information)Volume (thermodynamics)Formal grammarDirection (geometry)Directed graphScripting languageRow (database)Representational state transferImplementationComputer animation
Service (economics)Frame problemService (economics)Relational databaseLattice (order)Server (computing)TrailDirectory serviceMultiplication signTable (information)CuboidMereologyProcess (computing)DatabaseComputer animation
Graph (mathematics)DatabaseFrame problemService (economics)RecursionDegree (graph theory)Server (computing)Local GroupImplementationForcePhysical systemInstallation artString (computer science)PasswordDirectory serviceVirtual machineMagneto-optical driveGastropod shellDefault (computer science)Device driverOpen setData typeServer (computing)Enterprise architectureService (economics)Query languageCodeComputer architectureSlide ruleOpen set1 (number)Graph theoryGraph (mathematics)Integrated development environmentStatement (computer science)Relational databaseWindowDatabaseFitness functionVariable (mathematics)Term (mathematics)SoftwareTraverse (surveying)CircleLoginQuicksortNeuroinformatikBlogPrice indexWordMultiplication signDirectory serviceVirtual machineComputer animation
Virtual machineDirectory serviceGastropod shellDefault (computer science)Device driverVirtual realityOpen setInfinityFrame problemForceGamma functionSchmelze <Betrieb>CodeDemo (music)Hacker (term)Menu (computing)DemonVariable (mathematics)Integrated development environmentDemo (music)DatabaseComputer animation
DemonForceVirtual machineFrame problemPower (physics)Data typePasswordString (computer science)Physical systemCustomer relationship managementConfiguration spaceElectronic mailing listCategory of beingMultiplicationWage labourHecke operatorFingerprintInstallation artQuadrilateralRead-only memoryRule of inferenceParameter (computer programming)Query languageEmailGraph (mathematics)Core dumpMathematicsServer (computing)Service (economics)Electronic data interchangeOpen setServer (computing)PasswordService (economics)Different (Kate Ryan album)Query languageType theoryAddress spaceLink (knot theory)MereologyTable (information)Information securityDefault (computer science)MethodenbankGroup actionRaw image formatTask (computing)DatabaseGoogolPoisson-KlammerNeuroinformatikInformationGraph theoryPattern languageFunction (mathematics)Direction (geometry)Configuration spaceFormal languageBoiling pointRevision controlException handlingConnectivity (graph theory)
Frame problemMaxima and minimaCodeHydraulic jumpPhysical systemContext awarenessQuicksortServer (computing)Group actionTask (computing)Query language1 (number)InformationMoment (mathematics)Web browserScripting languageService (economics)DatabaseDirect numerical simulationMereologyLecture/ConferenceComputer animation
Demo (music)CodeWechselseitige InformationPower (physics)Frame problemQuery languageDrum memoryAddressing modeHydraulic jumpService (economics)Hash functionServer (computing)PRINCE2TorusQuadrilateralDirectory servicePointer (computer programming)Menu (computing)Port scannerHill differential equationoutputMultitier architectureMoving averageForceCategory of beingSchmelze <Betrieb>Parameter (computer programming)Data typeRight angleSummierbarkeitRelational databasePrice indexCodeElectronic mailing listSubject indexingServer (computing)Directory serviceQuery languageDirect numerical simulationWordDatabaseFile formatFunction (mathematics)QuicksortLogicValidity (statistics)Table (information)MereologyNeuroinformatikObject (grammar)Hash functionComputer animation
Frame problemHydraulic jumpServer (computing)Hash functionPRINCE2Demo (music)CodeRight angleService (economics)Data typeMenu (computing)Query languageSchmelze <Betrieb>Core dumpWeb browserDefault (computer science)Presentation of a groupInformationRight angleRelational databaseSensitivity analysisPhysical systemHydraulic jumpCASE <Informatik>DatabaseEnterprise architectureSlide ruleCovering spaceRegulärer Ausdruck <Textverarbeitung>Graph (mathematics)Computer animation
Source codeFrame problemControl flowDirected graphScripting languageCodeTask (computing)Query languageDirected graphTask (computing)Directory serviceService (economics)Dot productVideo game consoleReplication (computing)Revision controlInformationMathematicsMultiplication signTape driveScripting languageComputer animation
Frame problemSource codeScripting languageMultitier architecturePerspective (visual)Scripting languageCodePlug-in (computing)Descriptive statisticsDot productInjektivitätDomain nameInformation securityGame controllerComputer animation
Frame problemScripting languageSource codeCodeVacuumDot productDirected graphCausalityValidity (statistics)Multiplication signPhysical systemDemo (music)TrailTask (computing)NeuroinformatikMultilaterationComputer animation
ForceCustomer relationship managementPasswordString (computer science)Execution unitQuadrilateralFrame problemOpen setPhysical systemScalable Coherent InterfaceDemo (music)Online helpMaxima and minimaData typeAsynchronous Transfer ModeInclusion mapLink (knot theory)Service (economics)Curve fittingConvex hullComputerMIDICodeDirected graphSource codeComa BerenicesFunction (mathematics)Element (mathematics)Dot productConfiguration spaceRevision controlGraph (mathematics)Scripting languageGraph (mathematics)BitBuildingGroup actionDifferent (Kate Ryan album)Hecke operatorMethodenbankDirected graphTheorySoftware testing2 (number)Online helpPhysical systemAsynchronous Transfer ModeDefault (computer science)DatabasePoint (geometry)
Source codeDirected graphDemo (music)Frame problemQuery languageTetraederParsingScripting languageMagneto-optical driveComputer fileConic sectionData typeFunction (mathematics)Principal idealService (economics)Category of beingDigital filterCountingUniqueness quantificationHash functionComputer iconServer (computing)ComputerScalable Coherent InterfaceMUDGastropod shellQuadrilateralFile formatBuildingService (economics)Field (computer science)Scripting languageLogicMatching (graph theory)NeuroinformatikCode
Frame problemDemo (music)Algorithmic information theoryUniqueness quantificationChi-squared distributionFunction (mathematics)Vertex (graph theory)Digital filterCategory of beingObject (grammar)State diagramMass flow rateAsynchronous Transfer ModeDatabaseCategory of beingGroup actionQuicksortParameter (computer programming)Computer animation
Hydraulic jumpCodeFrame problemDatabaseGroup actionElement (mathematics)Service (economics)1 (number)Computer animation
Demo (music)Frame problemDatabaseProduct (business)Social classDirected graphServer (computing)Group actionInformation securityComputer fileContext awarenessService (economics)Query languageSoftware frameworkRobotRight angleOverlay-NetzPhysical systemData storage devicePlug-in (computing)Scripting languageDirectory serviceType theoryRoutingDatabaseCodeValidity (statistics)Computer animation
Query languageParameter (computer programming)Lie groupFrame problemFunction (mathematics)Power (physics)Term (mathematics)Maxima and minimaDirectory serviceCurve fittingPRINCE2Uniform resource locatorControl flowComa BerenicesDean numberCASE <Informatik>Single-precision floating-point formatElectronic mailing listDirected graphConfiguration spaceParameter (computer programming)Service (economics)Server (computing)Domain namePoisson-KlammerTask (computing)Identity managementObject (grammar)Functional (mathematics)MereologyDatabaseMethodenbankQuery languageSet (mathematics)
IntelConstraint (mathematics)GUI widgetPrime idealScaling (geometry)BackupFrame problemExecution unitInclusion mapQuiltSource codeDirected graphEvent horizonTwitterCore dumpScaling (geometry)Revision controlInformationLink (knot theory)Integrated development environmentQuicksortRootDirected graphNetwork topologyRepresentational state transferDifferent (Kate Ryan album)Tracing (software)Visualization (computer graphics)TrailBackupMathematicsService (economics)Mechanism designCodeProxy serverTwitterSelf-organizationMathematical analysisGene clusterRight angleEnterprise architectureGame controllerMethodenbankTable (information)IterationMessage passingServer (computing)Physical systemProcess (computing)Interface (computing)Motion captureWater vaporGraph (mathematics)Computer animationLecture/Conference
Coma BerenicesXML
Transcript: English(auto-generated)
All right, so today we're gonna be talking Connecting the Dots with PowerShell and Neo4j, a graph database. I'm Warren Frame, I'm an infrastructure engineer at Harvard's Research Computing Group.
If you're in the PowerShell community, you might have seen or used some of the modules that I work or help with, like ps-slack, ps-deploy, invoke-parallel. And like I mentioned, we have some swag, I'll mention how we can hopefully deal with that later. So we're just gonna take some duct tape, some PowerShell and a graph database
and kind of build a mini CMDB. I know you might actually have real CMDBs, but someone gave me feedback when I did a pre-session, they said, well, what is a CMDB? So I'm not going with any formal ITSM definition. For me, just having a list of things that are maybe useful in automation or monitoring
or various things like servers or the disks that run on those servers, maybe data stores, what disks or volumes live on those data stores, pretty much anything that you can think of that could be valuable. And then some people will be like, oh, well, doesn't anyone have a CMDB already? No, there are a lot of people that don't. Maybe they have a spreadsheet, which is not really functional in any way.
Maybe they've spent like millions of dollars on some very expensive CMDB and they spent all the money so now they don't have money for resources to dedicate to putting stuff in it. Or maybe they saved the CMDB for after the implementation and yeah, that's smart. And other people might just have an inflexible CMDB,
like something that's difficult to extend or maybe they need some of the functionality and sort of the graph traversal that comes with a graph database. Other people, when I've mentioned this, have said, oh, wait, I can just query Active Directory or vCenter whenever I need to. That data is stored there already. Ignoring the performance cost
of actually spinning up modules and connecting to all those things in every single script where you get that data, it's just nice to have like a single spot where you have a standard way to correlate all that data. Like what computer name from AD is tied to a VM and vCenter? That logic, having that replicated everywhere is kind of inefficient.
Having like one data source or database with a common interface, whether it's a rest API or just a standard SQL connection that you can hit is pretty valuable. And finally, all of this stuff can be used for automation, monitoring, reporting, and just like ad hoc commands, it's really valuable. So pretend you want a CMDB and you don't have one.
And I've mentioned a graph database a couple times, I guess we have to talk about what that is. So long story short, it's a database engine that stores things in terms of nodes and their relationships. So in this example, we have a server that's a domain controller and it's related to Active Directory. To go into that, the nodes are sort of similar
to a relational database table. They have a label in Neo4j parlance. So this is a server that might be like the table name and it's got the row of data for this particular entry. And this one's Active Directory, it's a service, yada, yada, yada. So they have a label and properties, that's it.
Now, when you don't have a relational database or when you have a graph database, you also have relationships. And these are really valuable because you can actually tag that relationship with a particular type or name for that relationship. You can add properties to it and it has a direction. So DC01 is part of Active Directory.
It's not Active Directory, it's part of DC01, which is something that can come up with a relational database. There's more technical definitions behind this. If you wanna learn what a graph database or a graph theory is, feel free to explore. Don't have time for it here. So some people might be like, I know relational databases, I really wanna use a relational database.
Or everyone where I work knows relational databases. Now, for me, a big part of the problem, I've written something like this before when I was first learning PowerShell, just as a way to learn databases in PowerShell. Having to learn the basics of database design, normalization, primary keys, foreign keys, join tables to actually, more than one server
can be in a service and more than one service can live on a server. Going through all of that design process might not work if you just wanna quickly spin up a CMDB and make it really easy to extend and work with.
Yes, I mean, so you might keep track of the actual win32, like a service on the box versus a service like Active Directory. In this case, yes, yeah. That's probably a better name than you don't get when you just, thank you. Another question is, so let's say maybe you accidentally used a relational database or maybe you like them.
I do like relational databases. So some things that might come up that might point you towards a graph database, maybe you start having some recursive relationships, like a service can depend on another service. That's an indication that you might wanna consider a graph database. If you start having just a whole bunch of many-to-many relationships,
that just can become more and more of a pain in terms of the schema and your join statements and all that. And finally, if you wanna do interesting queries, and this is unfortunately not a very interesting query, but you can find some fun ones if you look up how people use Bloodhound. I'll talk about that in a bit. A lot of queries that sort of traverse a graph
are really performance intensive on a relational database or just not even possible. So graph databases are a good fit if you wanna, you don't necessarily know how you're gonna explore that data or if your data, you're gonna be traversing it to do typical queries. Another reason, so Karen Lopez, I think, has a decent blog on how your data might be telling you
that it should fit in a graph database. And one of them is if it uses the word network. If you look up a network or an architecture diagram, they kind of look like graphs. They don't look like tabular data. So let's pretend I've convinced you to use a graph database. Now you have to pick one.
There are a whole bunch of graph databases out there, Arango, Titan, Amazon, Neptune, Cosmos, and Azure. I think SQL Server even has one in 2017. I ended up with Neo4j. It's pretty cool because it's open source, runs on Windows. There's plenty of PowerShell examples out there. Thank you to Glen Sardi and the folks
that wrote Bloodhound. And let's see, what else? I'm actually just kidding because Glen Sardi mentioned it last year. That's the whole reason I ended up playing with it. The community's really helpful. So Glen, I mentioned, Rohan who wrote Bloodhound, and this guy Michael Hunger at Neo4j. He's sort of like the David Wilson of Neo4j.
If there's answers out there or questions out there, chances are he's answering them and really helpful. So you've picked the database. Now you actually have to do it. So today we're gonna draw a couple circles. I don't have time to explain how you can get this up and running in a really resilient manner.
Again, Glen has some awesome examples out there, how to use this with Chocolaty. He has another one using Docker clustering with Neo4j Enterprise. So there are many examples, but we're not going to go too deep into it. Kind of rushed. Any questions so far? Before we actually dive into demos, sorry for spending so much time on slides.
No, no. Yes, all right. So I have a couple examples I'll include in the code. You could do Chocolaty, just install it on like a VM somewhere. I'm just gonna spin up an ad hoc Docker container.
I'm using the old Docker machine. So I still have to do things like set the environment variables. But realistically, if you're just using native Docker, you can probably run something like this, where you basically define a directory where you might want your data in your logs. You open a couple of ports and you say, hey, in this particular directory, I want my data on my computer
to point to the container's data and log paths. So I'm just going to spin that up. Oh, of course I'm not in PowerShell and I used PowerShell syntax. Oh no. Did I not turn on the Docker?
What? It's running. Sorry about this. My session must have restarted. Oh, starting PowerShell. Demo fail. I'm sorry. I thought that I had this all set up, so let's just kill everything.
Copy and paste should work. So I don't have the Docker environment variables in that session, so I'm just gonna copy them over here. Oh, that will kill the database or the container. Okay, now we're running. Whew, sorry about that.
Okay, so now we have a container up and running. I'm using a Docker machine, so I'm gonna be pointing to like a 192 address. You'd probably use 127.0.0.1 for yourself. I'm gonna be working with PS Neo4j. I've already installed it here, so I'm not gonna go through this and potentially risk more demo fails. I am going to point it to that particular base URI.
So this module is out there. I'll include a link to it, but basically I'm gonna say, hey, I wanna use this particular address as my default address. I'm going to set a default password, and the default password is Neo4j, Neo4j. So I'll use this and change it to a very secure password.
I could update the Neo4j password. This will take what PS Neo4j knows as the password, and use that to write the new password. I've already done this, so I'm just gonna set the configuration, and now I've set my password. I should be able to just maybe, okay. So I've connected. I can see that I am the Neo4j user,
and I can do things like, you know, get the actual like database configuration if I need. So that is with PS Neo4j, and cool, I think we're good to go for that, but we're not gonna do chocolatey. We're done with Docker. Cypher.
Okay, so you probably can't even see that. I'm not gonna really go through Cypher, the language. Think of it, it's kind of like their version of T-SQL. It's inspired by SQL, and you basically use a little ASCII art to kind of describe the patterns and graphs that you wanna retrieve. It's kind of interesting. It's kind of, when you first work with it, it might look a little odd and scary, but there's lots of examples of how to use it,
and they have some pretty solid documentation out there. I'll include a bunch of links to that. Long story short, parens are gonna represent a node. If you see a colon and then something, this is saying, hey, I want all nodes that are servers. This one is saying, I want all nodes that are servers, and assign those to the variable S that you can use later on in the query.
Brackets are relationships. So this one is saying, hey, get me all relationships that are dependent on type, assign that to R, and for any node that has a depends on pointing to it, also set that as S. It's kind of a weird way to say it, but when you Google for these things,
there's usually lots of examples out there. It's gonna be like movie stuff and social network type stuff, not necessarily like IT CMDB examples. All right, so I'm importing my Neo4j module again. So I'm gonna start just using some raw Cypher queries. I also wrapped some stuff, which we'll see later. So if I just wanna return all the nodes,
it's just gonna return a bunch of data of different types, right, like a scheduled task. This is probably a service. This looks like some kind of group, a security group maybe. Of course, it doesn't tell you the label or basically the table name for this, so you can do different query methods to do that.
So to give you an example of a relationship, like what a typical query might look like in Neo4j, I wanna get all servers that are a part of a service and return each of those different types of nodes and this little dash greater than is saying where the direction is pointing to the service.
It's gonna be similar where it's gonna return a bunch of data that's sort of, I don't know what's what but if I actually go to the top, I'll see that the first item is a server, the second item is a relationship and the third item is a service. So you could like work with your queries to kind of boil it down to a more friendly output. These are just examples, they're not particularly friendly.
So in this case, get the same data except for I want the host name of the server to show up as computer name, the type of the relationship and the name of the service. This looks kind of ugly, you're probably not gonna have end users running this, you're probably gonna have something in front of it, whether it's a PowerShell module or something that's gonna abstract it out
so you don't just run raw Cypher queries in this, you really wanna do that. Okay, so that's the basics. There is lots of information on running Cypher queries and if you work with Active Directory, Bloodhound examples are particularly helpful because it's sort of a context that we tend to know about
so I'm just gonna browse to something called the Neo4j browser. It's like a built-in piece of Neo4j that just kind of lets you visually explore the data so I can just pop up like all the nodes that I just created and Neo4j is really helpful and doesn't necessarily give you the name of the thing inside the dot so there's a way to say hey, for servers, you don't have to,
I'm just gonna just do one of these and not actually do the rest. I want this to be, I don't know, the AD DNS name or the dot host name. So now all the servers, I can see what they are. I don't know if that's actually readable for you guys. Anyways, so the whole getting all of the information back is probably not particularly useful. You might actually have canned queries
which you can actually do if you wanna save some scripts here. It also includes some built-in ones like hey, I want to see what is related and how. It's just basically looking at the schema and of course they do the same thing where they don't actually. Okay, so this is just a quick example of what the actual schema behind the data
that's in the database at the moment looks like. So servers can be a part, or a server can be a part of a service, a service can depend on a server, a task runs on the server, a user can own a service, a group can admin a service or use a service, yada, yada, yada. So that's just up to what you include in your database. So it's kind of just a handy way to explore the data
before you actually write the PowerShell queries to pull it. All right, so that is that. You can just delete all the nodes or if you have like indexes and indices or constraints, you can just run clear Neo4j which runs this code up here and then it will programmatically list indices
and constraints and delete them. So I'll just run that. That clears out the data. So theoretically, so before we go into the actual CMBB parts,
just one more thing. So I've been just running like invoke Neo4j query. That's using Cypher. I've also sort of wrapped this in more easier to use functions. So I'm gonna mock out Active Directory because I'm running on a MacBook that's not running on Windows, so I can't run this. I don't have the implicit remoting thing that they demoed.
So now whenever I'm getting a computer, it's just gonna have some mock data come back to me. Can you guys read that recently, Will? Okay, sweet. All right, so, come on, Will. All right, so I'm just gonna pipe all of those Active Directory computer objects to a new Neo4j node and I'm gonna say they are servers.
And so that's all there is to it. Any sort of data, you just send it to a new Neo4j node and give it a label and it will just add data to the database. And if you use pass-through, you'll be able to spit back the actual stuff so you know it was added or you could validate it if you have some sort of validation logic. So I could just send an arbitrary object to the database.
Blah, blah, blah. Maybe I've made a mistake. I wanna change the OU for this. You probably wouldn't do this without actually querying AD, but in this case, I'm gonna do that. And they've added some nonsensical data for the example and changed the canonical name.
All right, so I think that's pretty much it. You can do things like create relationships. So the format that I used here, I guess I didn't really explain it. So I was basically picking what node to update. So there's a couple ways you can do it, but the simplest way is to say I wanna update a node that is a server that has some hash table
that happens to meet the data within it. So a server that has the name ps.01. Same deal with a relationship. I used the words left and right before I realized I should be using start and end. So left label, this is the start node. So I'm gonna say this server that has this DNS host name is a part of this service, Active Directory.
So it's, I don't know, a little difficult to read through, but really it's just pointing to which nodes it's gonna connect and giving it a little data. And it created that relationship with the output there. I'll create this one and then we should be able to just browse to the,
so now if I go back to the browser, I should just see that stuff will pop right up. I could just query in PowerShell, but yeah. So now it just has that basic information I added and of course it's overwritten the default things for the nodes. So the browser is somewhat handy,
but it's not exactly a perfect thing. There's actually a bunch of other examples out there, things that can wrap Neo4j for presentation. Oh, I'm sorry. So ps.neo4j is a, it's a database system.
It's a graph database. So you could use it for pretty much anything. And then I'm going to be showing a really janky CMDB that, yeah, yeah, yeah. So we're actually gonna go to that right now. Good timing. I'll cover this at the end. So there's a community edition, there's an enterprise edition, and I'll cover some of the challenges
if you're using the community edition towards the end of this. But there is a free edition that is pretty solid. It's pretty much most of the database features are there apart from one of the constraints that generally you might not need. Yeah, oh, I didn't mention that. And I wasn't gonna mention that. That is very important.
Case sensitivity. You're used to PowerShell. This is not case sensitive. So when you're pumping data into it, be wary, it might be useful to do like to lower or something so that you have like a really consistent, in case, yeah, that is kind of a pain. Or use, I think the regex might not be case sensitive. I can't remember. The equals, tilt.
Okay, where are we? All right, gonna jump back into the slides. Sorry, a couple more slides. So we're gonna, again, duct tape some stuff. So I wrote this really janky thing. Basically, I wanted everything in source control. I know that I'm gonna be pulling some stuff from like Active Directory or external sources. I don't wanna replicate that data in source control.
But some stuff, like for us, we don't really have definitions of services or the dependencies or like that manual information that you might have, like maybe console or service discovery things will work for you. But in a lot of organizations, that's not gonna cover everything. So for us, we have this stored in YAML, which could actually also pull from console.
And then all you do for each of these sources, like Active Directory or service definitions or dependencies, you just have a script that knows how to extract the data, transform it, and then push it right up into Neo4j. And you can really execute it however you'd like. You could run it at the command line. Ideally, maybe you have a release pipeline for the source control data, the dots data.
So every time you make a change to the YAML, goes to your source control system, runs a build, pushes it to Neo4j, maybe you do some data validation. And for everything else, like pulling Active Directory, SQL, vCenter, Hyper-V, whatever you're working with, you could just have scheduled tasks that runs that.
I think it's really ugly. Just to kind of visualize this from a PowerShell perspective. So I'm gonna run connect the dots. Connect the dots is gonna be like, hey, what are the different scripts I have available? It's gonna run, so there is an opportunity for injection. Make sure that this path is locked down or if it could just put random scripts in there. So this is gonna say, hey, run this service,
depends on script, and that's gonna say, hey, get me all this YAML data and then push it into Neo4j. And then it will say, hey, AD computers, run that script, that will get AD stuff, and push it into Neo4j. Sorry, that's kind of a repeat. I mean, the YAML, I just did some really basic examples. So we use PoshBot, so I gave it some keywords
if I want to look for it later, a description. I don't really care if it goes out, so I'm gonna say the outage tier is three. GitLab is very important for us. So security tier is zero. It has some code that runs on domain controllers. And its outage tier is zero because if that goes down, we're in big trouble.
And then a dependency, basically saying, hey, PSBot depends on GitLab. And the impact, if that goes down, is there's no plugin updates to PSBot, which is not the end of the world. So in the actual script that parses this, it just goes through, reads the YAML, gets the data, pushes it up into Neo4j.
On the other side, so this was the YAML data, stuff where dots is the source of truth. For stuff like AD, I'm literally just gonna run get AD computer, maybe pick out a few properties, filter some things, transform the data how I'd like. I always like to add an update date the last time I pulled from AD. So if it gets deleted, the data will still stay in this database,
but I can sort of track that it's been missing. And then just set the node. So this will either create the node or update an existing node. And I'll have a note on that later. So in this example, I basically, I don't care about computers that are older than a variable cruft date, yeah.
Like crufty, like old, yeah. Stuff I don't care about. Cause they never cleaned up anything. I've already mentioned execution. So if it's like source of truth is in dots in source control, if you don't have a release pipeline, you don't have to do this, but realistically having somewhere where you store your data, a build system that can run every single time
that you update that data, maybe some testing, validation of data, and then a release that pushes it into Neo4j, that would be nice. For the stuff that queries Active Directory, just the scheduled task, or however you manage your tasks would work. All right. So before we go into the demo of that, any questions?
Yes, all right. Again, cool. So where are we? 2.20 on time, cool. Why am I pulling up Slack? So dots is just a PowerShell module.
It's like maybe almost good enough to give a demo on. It is absolutely not something that you should rely on. So I've already installed it. I think I've imported it. I could all just import it, what the heck.
And I can just check the config for this. So this is, dots is the name because I was very creative and took the logo of Neo4j and just said, hey, this is called dots. It's a mini CMDB that I just described. And some of the config, I think the important bits are the scripts path.
This is where it processes, like what are the scripts that process the YAML versus AD and different things. That lives under the module by default. So you probably, if you're gonna update the module, you maybe don't want it to live under the module. So you can update that here. Same deal with the data path. That's where the YAML lives. Script order. So if you're creating relationships,
you need the nodes that they connect to exist first. So this is just a way to say, hey, make sure the actual node creation happens before you create all these relationships. And I have test mode set. It's off by default, but this basically pulls in that mock data and it will use that. So theoretically, once I add test to this thing,
I can use that to run integration tests. So a lot of the explanations for these are just in the help. Yeah, that's it. So you can run get help to see what the different configs do. We'll pretend that it was set to true. We'll just change the config.
I already mentioned that. So I'm gonna clear the database so there's not gonna be any data in it. So I mean, realistically, all this is just a command to connect to the dots to populate that database. By default, it will just run all of the scripts that it finds. So computers, maybe puppet things, services, group memberships in AD.
If you want to troubleshoot, you could do a particular one, although a lot of cases, so with the YAML data sources, I delete stuff before putting it back in because I assume the source control is a source of truth and no one should be updating it outside of that. So I blast away everything and then pull it in from the YAML data. So if you were to run an ad hoc thing
on the YAML data, it would delete it. So in your build system, you might say, hey, connect the dots and only include the things where dots is the source of truth. So we'll just run it and excuse the horrible output. This is just for our GitLab build systems purposes. I'll make it nicer output at some point.
So that just ran, took a couple seconds. Depending on the size of your infrastructure, it could take a little longer. It's not horrible, but it just depends. Where am I going? So now if I go back to the browser, I should have a bunch of data, theoretically. Yay, okay.
So that populated the data. And we can go through. So really all that did, it connect the dots, just picked out what scripts to run based on what was in the scripts path. So it looked and it ran like this service thing. And this goes through, I don't know, it does some stuff, picks directories,
runs mock data if it needs to. Deletes everything. So delete everything that's a service, delete all the relationships that point to services, because I'm going to create them all here. Read the YAML data, do some things, do some things, add the data to Neo4j. Realistically, it's just reading YAML, transforming it and pushing it into Neo4j.
Sorry, the code's kind of ugly, so I'm not even going to look at that. The AD computer one we already looked at, it's pretty straightforward. It just runs, get AD computer, and then goes through some logic and adds it to Neo4j. One thing I should probably note, so if you're building a CMDB, so with AD, I trust the data there,
so I allow it to create new nodes if they don't exist. If I had something like, for us it's rack tables, it's an untrusted data source, I don't know what format people are using in their name field, so I could do no create, and then I can parse the name field and match up stuff, but not create new stuff,
because I know there's bad data in there. So no create basically says, if you find an existing node, update it, otherwise, just forget it. So that's for setting Neo4j node. All right, so I think that was dots. So that's the, I don't know, a really janky database.
Any questions on that so far? That is, again, I think it's just gonna be pulling from, where am I? I think it's just get AD groups. He was wondering, sorry about that.
So he was asking, what do you do for get AD groups, and it's a very similar syntax, so I kind of just had these as little templates, and I have common ways to specify parameters for what properties you wanna pull, what transforms you wanna do, that sort of thing, like calculated properties.
So if I go back in here, so these are the ones that I didn't actually, here, hold on, I'll just really quickly say that I want the semcount name, and users are going to be semcount name. Okay, so in here, I can see that, come on, how do I get rid of that?
Oh, maybe? Can you, is that better? Okay, cool. I mean, so it really depends on the, I have mock data in here, so it's not particularly useful, but I basically pull managed by, so I see that this group is managed by wframe, memberships, in these service definitions,
I have the concept of whether a group or a user uses a service, because for us, it's kind of useful to know who's using or who owns something or who's the data owner, or these different things that might not be recorded in certain places. Yeah, so that's groups and group memberships
and all that. Where am I? So the basic idea for us is, so it's kind of like an overlay database, so we have all our stuff in Active Directory or vCenter,
and that has the actual source of truth data, and then we have a few YAML files like for services and their dependencies, and this just, it puts it all in this database, and someone that would actually want to query this data would just use, I'll show you. Okay, so that's a good segue. So now that we have this database, what are we gonna do?
So there's, I'm just gonna give you like three super simple examples, so alerting systems, if you want to send a notification, something actionable, it's really valuable to know who should you send that to. Maybe only people that really care about it and can do something about it. So if you have data on who uses the service, and you have an alerting system that can run a script like a SCOM command channel,
or there's plenty of other more modern monitoring systems that can do this, you can basically add context to the alert data or decide where to route it based on the alert data. So this is like a really old bad example, but basically I decided what groups to send this to based on a database, included some data,
I don't know, is it a production server, things like that. So it's very useful to add context to alerts because you can get things more quickly to the people that care about them and they can fix them, and you can include all the context they need to fix the thing, rather than get stalled out debugging it. Another option, so Thursday I think,
Brandon's talking about Poshbot. It's this PowerShell-based bot framework that's really cool. So you could do like a Poshbot plugin that goes and talks to Neo4j, so some server is down. Why is it down? What is this thing? I have no idea what the server is, it's not in any documentation. I can go into this database and basically run what is the server,
and it will tell me all the AD data, the puppet data, rack tables, cobbler, and whatever else we have pushing to Neo4j. Just ad hoc queries like that can actually be pretty helpful. I don't have an example here, but because puppet stores what classes are on a particular node, you could also be like,
I don't know what command I have, but basically tell me what servers run a particular class. So if I'm working on a puppet class and my code base is like an octopus or spaghetti of nonsense and I have no idea how to tie it back, I can at least find out what nodes are running a particular class with that.
This one right here? I don't know if you could do it in the actual Neo4j browser, but this is just PowerShell. So Poshbot is just a thing that you type something in Slack, it has some security things and validation and so forth, and it will ultimately just run PowerShell.
Pretty much anything you can do in PowerShell, you can do in Poshbot. So it's a really cool thing and you should definitely check it out if you use Slack. Okay, and finally I was gonna show one more tool. Yay, I got time. So ultimately, I mean, people aren't just gonna be querying Neo4j, you might want to write like a little CMDB module
or something that pulls from this data source. So this is a very horrible function, it's just an example to show you all what this could look like. So now I can say get all the servers and get .server. And it will just list all of that from the database just in PowerShell objects. This one's not part of AD, so it's gonna be jagged objects,
whatever you write should deal with that. But that's pretty much it. You could do the same thing, list all the services. And in these examples, I don't think I showed it before, but I'm using parameterized queries to, when I got the brackets here, this is basically saying replace name with the parameter that's declared here.
You could also do things like, hey, tell me all the services that depend on a particular service. So AD is probably a bad example because if that goes down, you're really not gonna care about looking for what's depending on it, depending on how you're architected. But just for this example, I use it because I didn't wanna come up with too much mock data. So if AD goes down in this environment,
DSC is gonna be down. I think my use case was AD relies on some of the config settings in DSC, so that's a problem. Or DSC, I don't even remember. It's in the documentation. It's a circular dependency, so AD is gonna list it itself. GitLab will have a problem because now I can't actually,
if I have SSH keys, I can push to GitLab, but now I can't actually use my identity in GitLab to do pull requests or merge things and so forth. PSBot is gonna be a problem. That's relying on AD for a bunch of the commands that PSBot uses. So basically, I could write commands that say, hey, if AD goes down,
what's the problem? Or you might have more obscure services that you wanna know what depends on those things. You could also say, hey, what's the outage impact if a particular service goes down? What are all the relationships each describe what the impact is if the service goes down? So you could write functions that do that.
If you store all your scheduled task data here, it's really useful. You can have a central spot where you see, hey, where are all my scheduled tasks? Where are all the scheduled tasks that run as a domain user or that run as an actual user account? And you can do some interesting stuff if you actually have that in a single data source. And you can probably imagine a bunch of other examples for just querying data in the CMDB.
All right, so that's just a few of the different tools that you could have if you just make a simple CMDB. So I showed Neo4j. It's got a enterprise edition and a community edition. There is absolutely, you might run into trouble with some of the constraints of this.
So there's no access controls. If you have a user, they have full access. You can't have like a read-only user. The answer that you might come up with is you could just proxy the access, whether it's through JIA, some API, Poshbot, or maybe since all this data is in source control and directory, if the services pulling from this data
and doing things aren't particularly critical, maybe you don't care if someone has access because all that data will just populate anyways. Proxy access, yeah, yeah, yeah. No clustering. So everything I've run so far has been reasonably performant. I think maybe some of the very large organizations
out there might run into issues. I don't know, Glenn, do you know scalability-wise, like, have you run into any trouble? Yeah, it's very resilient. I wouldn't really worry about this too much, and if you do, you could probably afford to pay for it if you're at that scale. Same deal with HA, if you're gonna rely on this, you'll have to construct your own HA thing or pay for it.
That's just, there's no way around that. Backups for this particular use case, I don't really care because my data is stored in yellow in source control, so I can always go back and I have, like, who's changed things, the full history of it, and the external sources are, you know, I'm just pulling data from those, so that data is still gonna live
in those external services, so I really don't care. The only thing that's gonna be missing is someone deletes an AD computer, and I had previously pulled it into this database, now I'm just gonna miss that stale data, which, you know, for me, it doesn't really matter, but maybe I would take a backup every month or at some, you know, interval that I'm okay with.
More or less Neo4j specific things, so what if you want more data samples, or data sources? You would have to write it. It would be cool if anyone wants to submit pull requests. This is really janky, so I don't know if you would actually wanna use it. I'll probably be polishing it up to use internally for some things, so it will probably get better than it is now, but one thing that's really helpful
about just sort of collecting data from different systems is you'll learn a lot about PowerShell, you'll probably learn about the technologies that you have to pull information from, and you'll learn a lot about the different interfaces that those systems talk to, like you might have to learn REST API to talk to Neo4j,
or learn how to use the extra text tree module to talk to AD, you know, all these different things, it's really valuable to kind of use as a learning opportunity. That's really how I got into PowerShell, so I've always found it helpful to know how to query different systems. Now you also have to maybe get other people to do this, because you don't wanna be stuck doing yourself, and I don't really have any suggestions on that.
That could be tougher. Yes, you could absolutely do that. So this is, I mean, this might be over the top for it. You could put that into Neo4j,
but Kevin Marquette, I think, has a cool module called ps-graph that just uses graph-is, yeah, that I think he might even have that as an example, showing different services and their dependencies and so forth. Graph-is is just like a way to describe data, and then it will create a graph.
This is- You can visualize what you show in the browser, but you can put some CSS on it, or whatever you call it, some styling on it. It's like, it's a data, it's not a database, like it's just, you have like a flat file, and it will create that visualization. But it might be useful for something like that, yeah. Yeah, so, any questions?
So this initial version, one of the things you mentioned is that you're missing stale data, so it kinda sounds like you're overwriting, so you're not gonna be able to get historical. So if things are changing about a particular node set, this iteration doesn't capture that. Is that one of the features of enterprise, to be able to see node history, or is it just kind of destructive and you shouldn't care?
I don't know on the Neo4j side. I can look that up and get back to you. But my thought process was, for the YAML data, I'll have source control, so I'll know who changed things, and when everything changed. For the external services, I'm sort of assuming those have their own mechanism for auditing, like AD has all the event logs and things that you can use for change tracking. So- I'm thinking for like, trend analysis.
Oh, gotcha. I'd have to get back to you, so I'm not sure if that's a thing, yeah. Scale to a forest, root domain, and channels? So, the code out there right now is very simple. We could probably parameterize it to support,
you know, more complex environments. Yeah, thankfully I don't have to deal with that. Our central IT, I think, does, but I don't. Sweet. Anyone else?
Wait, was I gonna say? I think that might actually be it. So, I published this on GitHub, or no, I haven't published it on GitHub. It's on the gallery. I will publish it to GitHub shortly. If you have any questions, ping me on Slack. If you're not in Slack, please get in Slack. It's really useful.
There's the Slack for this summit. It's this one. If you're not in there, just ping me, and I'll ping whoever has admin access to it. And there's also the PowerShell Slack that Joel Bennett manages, and that is, you can get an invite from this bit.ly link. There are lots of really helpful people in there. So, I don't know.
It's kinda handy to be in there. Any other questions? Glen. Swag, yeah, yeah, crap. Okay. How do you do this? All right, so, I don't know if it'd be fair.
So, I don't know. If you message me in Slack, now, first come, first serve. So, basically, I have a small, medium, large, extra large in men's and in women's sizes.
There's like a mug, a water bottle, all sorts of stuff, I don't know, and a bunch of stickers. The stickers and the pens, I'm just gonna bring tomorrow morning at lunch, or at breakfast, and put out on the, I don't know if, I think there's a swag table down there somewhere. So, that will be tomorrow morning.
Oh my gosh, okay. Excellent. All right. So, thanks everyone.