We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

ICS Village - Building a Cyber - Physical Testbed

00:00

Formale Metadaten

Titel
ICS Village - Building a Cyber - Physical Testbed
Untertitel
Blackstart Restoration Under Cyber Fire
Serientitel
Anzahl der Teile
374
Autor
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
TestbedATMBitrateBefehl <Informatik>MereologieRelativitätstheorieRückkopplungPhysikalismusOffene MengeCybersexProgrammierungSoftwaretestPhysikalisches SystemBitQuick-SortTestbedComputeranimation
Hill-DifferentialgleichungInformationTestbedDesign by ContractPhysikalisches SystemLeistungsbewertungARPANetATMPhysikalisches SystemTestbedValiditätExpertensystemLeistungsbewertungSelbst organisierendes SystemProgrammierungTermGrundraumHauptidealBitInformationArithmetisches MittelWort <Informatik>Computeranimation
TestbedATMVarietät <Mathematik>Quick-SortHauptidealNeuroinformatikWhiteboardDifferenteGruppenoperationProzess <Informatik>Physikalisches SystemComputersicherheitMultiplikationsoperatorWort <Informatik>BitratePhysikalischer EffektBitHypermediaGrundraumComputeranimation
TestbedATMEnergiedichteSystemprogrammierungWiederherstellung <Informatik>RechnernetzComputerforensikLeistungsbewertungDateiformatKonfigurationsraumMalwareProgrammARPANetSystemprogrammArchitektur <Informatik>KontrollstrukturMensch-Maschine-SchnittstellePhysikalisches SystemStochastische AbhängigkeitNichtlinearer OperatorSoftwaretestStatistische HypotheseExogene VariableAnalysisQuick-SortMetropolitan area networkObjekt <Kategorie>Funktion <Mathematik>RechenschieberSystemprogrammMereologieProgrammierungNichtlineares GleichungssystemFlächeninhaltGanze FunktionSoftwareProgrammierumgebungNetz <Graphische Darstellung>MaßerweiterungCASE <Informatik>Minkowski-MetrikZentralisatorExogene VariableTopologieMomentenproblemNichtlinearer OperatorKoordinatenGruppenoperationTelekommunikationVarietät <Mathematik>Motion CapturingCodecRPCCybersexKontextbezogenes SystemPunktVerschlingungWiederherstellung <Informatik>EnergiedichteBitLeistung <Physik>AggregatzustandGrundraumBenutzerbeteiligungSpannweite <Stochastik>MAPBildgebendes VerfahrenNeunzehnProzess <Informatik>Mehrschichten-PerzeptronStellenringDatenstrukturBitrateRechter WinkelProjektive EbeneComputerforensikComputervirusNormalvektorPhysikalismusComputeranimation
SoftwaretestProgrammierumgebungRechnernetzExogene VariableLeistungsbewertungTestbedATMRahmenproblemARPANetRadiusProgrammverifikationAnalysisReibungswärmeHill-DifferentialgleichungRechenwerkMomentenproblemStellenringModallogikMotion CapturingTelekommunikationRechter WinkelMomentenproblemQuick-SortZahlenbereichQuaderSystemprogrammZentrische StreckungGebäude <Mathematik>MereologieZentralisatorTestbedDeterminanteZoomBitNetz <Graphische Darstellung>Exogene VariableOffene MengeLeistungsbewertungMinkowski-MetrikProdukt <Mathematik>Arithmetische FolgeSoundverarbeitungMAPDatenflussAnalysisCodecProgrammierungSoftwaretestPunktGamecontrollerProgrammierumgebungZusammenhängender GraphStandardabweichungFigurierte ZahlTermLeistung <Physik>Kontextbezogenes SystemPhysikalismusFlächeninhaltPhysikalisches SystemSimulationKonfigurationsraumSchlüsselverwaltungNichtlinearer OperatorMobiles EndgerätGradientRechenschieberEinsSchnittmengeWinkelElementargeometrieDienst <Informatik>GrundraumInverser LimesTwitter <Softwareplattform>Atomarität <Informatik>EmulatorMaschinenschreibenSpiegelung <Mathematik>Statistische HypotheseReelle ZahlBildverstehenCybersexTaskComputeranimation
TestbedATMReelle ZahlSystemprogrammierungEndliche ModelltheorieModul <Datentyp>Physikalisches SystemHill-DifferentialgleichungSoftwareTemporale LogikComputersicherheitCybersexMaß <Mathematik>ProgrammierumgebungKonfigurationsraumRechnernetzSoftwaretestNichtlinearer OperatorReelle ZahlProgrammierumgebungQuaderPunktwolkeEndliche ModelltheorieModul <Datentyp>DatenfeldPhysikalisches SystemSoftwareDienst <Informatik>MereologieZusammenhängender GraphAnpassung <Mathematik>Varietät <Mathematik>TypentheorieMinkowski-MetrikDifferenteFlächeninhaltSystemplattformInformationQuick-SortFahne <Mathematik>Förderverein International Co-Operative StudiesIntegralFehlermeldungMAPRechter WinkelFunktion <Mathematik>SISPTermValiditätTestbedBitTabelleOffene MengeMathematikAggregatzustandDatenflussKonditionszahlTelekommunikationFigurierte ZahlCASE <Informatik>PhysikalismusCybersexAssoziativgesetzGüte der AnpassungGrundraumGruppenoperationInstantiierungBitrateCAN-BusSystemaufrufKontrollstrukturKlasse <Mathematik>Atomarität <Informatik>MomentenproblemPhysikalischer EffektSummierbarkeitProgrammierungKonfigurationsraumMotion CapturingWeb-SeiteComputeranimation
ComputersicherheitCybersexMaß <Mathematik>ProgrammierumgebungSoftwareTemporale LogikKonfigurationsraumPhysikalisches SystemRechnernetzTestbedATMMobiles EndgerätElektronischer FingerabdruckSLAM-VerfahrenMomentenproblemTheoretische PhysikLeistung <Physik>Inklusion <Mathematik>Nominalskaliertes MerkmalSpeicherabzugTypentheorieSteuerwerkTextur-MappingDiagrammMenütechnikMotion CapturingMathematikQuick-SortProtokoll <Datenverarbeitungssystem>Mailing-ListeTouchscreenSerielle SchnittstelleQuaderWhiteboardPhysikalisches SystemSoftwareLastLeistung <Physik>ProgrammierumgebungDigitaltechnikZahlenbereichReelle ZahlGamecontrollerSystemplattformCOMMinimumStellenringSynchronisierungAnalogieschlussDigitalsignalATMCASE <Informatik>PfadanalyseEinsTestbedNetz <Graphische Darstellung>PerspektiveTypentheorieMapping <Computergraphik>Luenberger-BeobachterVerschlingungAggregatzustandTelekommunikationEinfach zusammenhängender RaumDatenstrukturNatürliche ZahlHardwareNichtlinearer OperatorDifferenteVarietät <Mathematik>Bus <Informatik>WasserdampftafelPhasenumwandlungAusnahmebehandlungOverhead <Kommunikationstechnik>CybersexSpannungsmessung <Mechanik>EnergiedichteSoftwaretestKonfigurationsraumEindeutigkeitGesetz <Physik>Bridge <Kommunikationstechnik>Bildgebendes VerfahrenRechter WinkelBitProzess <Informatik>Shape <Informatik>TermTeilbarkeitAbstandInstantiierungDatenfeldAssoziativgesetzFehlermeldungZusammenhängender GraphInzidenzalgebraRouterMAPVarianzInternetworkingZeitzoneBenutzerschnittstellenverwaltungssystemComputeranimation
TestbedATMSystemprogrammierungTheoretische PhysikCybersexAnalogieschlussProgrammierumgebungSynchronisierungWärmestrahlungBeschreibungskomplexitätProgrammARPANetDruckspannungQuaderArithmetisches MittelCybersexMomentenproblemGebäude <Mathematik>MultiplikationMultiplikationsoperatorResultanteExtreme programmingMAPErwartungswertVektorpotenzialEinflussgrößePerspektiveRauschenForcingZahlenbereichGamecontrollerPhysikalisches SystemBildschirmmaskeSystem FATMTermWellenlehreKonfigurationsraumSchlüsselverwaltungKonditionszahlProgrammierumgebungOffice-PaketSoftwaretestNichtlinearer OperatorGraphiktablettRichtungAbgeschlossene MengeQuick-SortSystemprogrammDruckspannungAnalogieschlussProgrammierungProtokoll <Datenverarbeitungssystem>TopologieGenerator <Informatik>LeistungsbewertungShape <Informatik>BitSynchronisierungSchnittmengeAnalysisTVD-VerfahrenAdditionLeistung <Physik>StrömungsrichtungSichtenkonzeptDeterminanteRechter WinkelReelle ZahlOrdnung <Mathematik>MathematikTaskGrundraumARM <Computerarchitektur>RoutingProzess <Informatik>TypentheorieVierzigServerInstantiierungTropfenDifferenteDienst <Informatik>GeradeGüte der AnpassungInternetworkingInterrupt <Informatik>Computeranimation
ProgrammBeschreibungskomplexitätProgrammierumgebungDruckspannungTestbedATMSystemprogrammierungSoftwaretestWitt-AlgebraCOTSCybersexWiederherstellung <Informatik>Wurm <Informatik>FlächentheorieFlächeninhaltZeitbereichRechnernetzFeldgleichungSoftwaretestComputersicherheitQuaderLeistungsbewertungWiederherstellung <Informatik>Automatische HandlungsplanungSoundverarbeitungWurm <Informatik>PunktDifferenteSicherungskopieVarietät <Mathematik>CybersexPhysikalisches SystemCodecReelle ZahlKonditionszahlProgrammierungSystemprogrammNichtlinearer OperatorTypentheorieProgrammierumgebungQuick-SortMultiplikationsoperatorBitIterationSoftwareImplementierungLeistung <Physik>Arithmetisches MittelKonfigurationsraumPerspektiveCASE <Informatik>MAPMereologieInformationsspeicherungMinkowski-MetrikSystemplattformNeuroinformatikDatenfeldDigitaltechnikFlächentheorieFlächeninhaltGüte der AnpassungInhalt <Mathematik>Kartesische KoordinatenRechenschieberTestbedSoftwareschwachstelleFehlertoleranzAbstandBitrateGrundraumOffice-PaketInformationsmanagementVerschlingungMinimumSchlussregelZahlenbereichÜberlagerung <Mathematik>Lesen <Datenverarbeitung>WiderspruchsfreiheitAlgorithmische ProgrammierspracheAuswahlaxiomGruppenoperationFehlermeldungTopologieProzess <Informatik>Generator <Informatik>ARM <Computerarchitektur>UnternehmensarchitekturComputeranimation
TestbedATMSoftwareInverser LimesZeitzoneUmwandlungsenthalpieZahlenbereichGarbentheoriePunktwolkeMereologieCAMFörderverein International Co-Operative StudiesQuick-SortAbstandDatenparallelitätVarietät <Mathematik>Prozess <Informatik>VerschlingungMomentenproblemComputersicherheitProgrammierumgebungElektronisches ForumURLBitMultiplikationsoperatorRepository <Informatik>DifferenteMinimumMetropolitan area networkWhiteboardPhysikalismusGamecontrollerStrömungsrichtungKreisbogenFigurierte ZahlComputeranimation
Transkript: Englisch(automatisch erzeugt)
Hi, so my name is Tim Yardley. I'm here to talk about some work that I've done under DARPA RADIX program.
Whoa, I'm getting audio from somewhere, hold on. Sorry about that. I was getting feedback from the live channel. So I am here to talk about building
a cyber-physical test bed and how we've used that to support black start restoration under cyber fire. So this has been part of a DARPA RADIX program. The RADIX program is Rapid Attack Detection, Isolation and Characterization Systems. And I'll talk a little bit about what that is here in a minute.
So just for sort of coverage and statement, the test bed work that I'm gonna talk about was funded under the DARPA's Rapid Attack Detection, Isolation and Characterization Systems program. I work for the University of Illinois. I'm the principal investigator of the University of Illinois effort
under there. We also had some government SME support, subject matter expert support from Idaho National Lab. And then the program evaluator is a company called Provatec. And all three of those organizations have been critical to the work that I'm talking about
in terms of enabling the black star restoration efforts and the validation of the work itself. So let's talk a little bit, give you a little background on me. So my name's Tim Yardley. I'm a principal research scientist at the Information Trust Institute at the University of Illinois. And I am also a father, a husband
and broadly a researcher. I've been doing work in industrial control systems for about 14 years now. And I've been doing work in security for probably almost 30 years now across the board. My background came from sort of the think tank IRC eras
back in the early days of Fnet groups that let's say like to explore and like to figure out things but also cause some havoc in the process. One of the original members of WooWoo
and a variety of other security think tanks over the years. And I've sort of been in the realm of computer security for quite some time. I've been working in academia for about the past 12 years and prior to that, I worked in industry supporting a variety of different efforts.
So the DARPA RADIX effort itself, let's talk a little bit about what it is. So the program is designed to build technologies that fill a gap. And that gap really is the notion that if we are attacked by nation states,
we are ill-prepared to be able to determine exactly what happened, where it happened and how it happened and get rid of it as fast as possible. So DARPA stood up this program to build technology that advanced that state of let's say preparedness and then to evaluate that technology
on a realistic facility. And so our role in that program was to build that realistic facility. We'll talk in more detail about sort of the objectives of the program itself. So these are all public slides from DARPA that are here, but the key objective of the program itself
is to enable black star recovery of the power grid amidst a cyber attack on the energy sectors critical infrastructure. In this particular case, the critical infrastructure is the electric power grid. So in a prevention sort of state, you would say, okay, well, let's defend and detect what's going on.
But this program starts from the adversary has been successful. So you have a complete blackout, no power anywhere. You have to figure out exactly what happened, how it happened, et cetera. So the devices that are involved, you have to figure out what devices can be trusted
because you don't know what's compromised and what is safe or what is operating per norm. And many of those of the physical infrastructures are effectively controlled by intelligent devices. So these are the ICS devices that you have to explore on let's say the cyber side of the equation
that are controlling the physical aspects of the grid. And these assets are spread throughout the country. So how can you do this in a scalable way? How can you deeply and forensically poke at these embedded devices? How can you figure out exactly what is trustable,
what is not trustable, what was attacked, how it was attacked and then get rid of it as fast as possible. The goal is to do so across the entire United States within seven days. And that's to isolate, characterize and restore any crank paths necessary to bringing the grid for the United States back online.
So let's talk about the manifestation of this. So in the first year or so, year and a half of the program, we built exercise environments that were run at the University of Illinois that were getting people's feet wet and getting the technology ready to explore
let's say the beginning edges of the problem space. And as we evolved both the technology and the program, we had to turn the corner. So people will only believe what happens in a lab to the extent that it's, oh, that was in a lab environment. Oh, that's fine if you do it just on science,
but that's not the real world. So we took it to a federal island, which is called Plum Island. It's the home of the Animal Disease Center controlled and owned by DHS S&T and is a former sighting of Fort Terry in the Spanish American war and a bunch of other stuff.
The size of the space that we're controlling there is roughly the size of Central Park and we have built electrical infrastructure on that island. And so I'll talk about the most recent incarnation of it that was completed in November of 2019. That was the sixth exercise of the program.
We have one more exercise still to do. COVID has postponed that a little bit, but still pending so far has not been fully canceled. And we looked to sort of do that last hurrah of the program and validate the advancements from the last exercise till now.
So in the island infrastructure, we basically have built three full utilities. And those utilities are represented by gear that we'll talk about here in a moment. They have an emergency operation center for each utility, and then there's a regional coordinator that's trying to coordinate the actions amongst the different utilities as it goes through.
We have the National Guard involved, we have the radix performers involved and lots of other entities that help establish communications in a variety of way and really set up what is a fairly austere environment from the beginning to enable the capture
and execution of these exercises. So utility A has five low voltage substations, and I'll talk about what low voltage means to us. And one high voltage substation, it also had one generator which is the crank path in essence that's getting to the high value assets.
Utility B had seven low voltage substations along with three high voltage substations, and that crank path had a critical national asset on it that needed to be up and maintained no matter what. Utility C was similar to utility A, it had five low voltage substations and one high voltage substation as well as one generator.
And so each of these utilities differed in the physical topology and layout of the substations. They also differed in the equipment that was deployed across each of those substations. So the program is really broken into multiple different technical areas.
You could look at it as four or five, depending on how you wanna count. The first area is situational awareness. And so when they deploy on this infrastructure that we've built, they are trying to figure out what happened or what is currently happening on the infrastructure to get as much situational awareness as you can provide. And the reason for that is you can't really trust
the devices once they've been attacked to be telling you the right things. And you are in a blackout scenario, so you may not have visibility in much of the grid environment. Network isolation is technical area two, and technical area two is focused on taking
the communications that are no longer necessarily trustworthy and expanding those into a realm of trustable or not. And so by that, I mean point A to point B may have talked to each other before, and it used to be a dedicated private link or whatever it may be, but now traffic that's going across there
is not reflecting what it did previously. So maybe somebody is man in the middling it, maybe somebody is manipulating it, maybe it's getting black holed in some other way, unknown operation, right? So how do you take the outputs from A and get it to B in a way that's trustworthy and secure
and such that it cannot be modified in the middle or that it is evident when it is modified in the middle? So that's the area of research for technical area two. Technical area three is threat analysis, and they really are intended to do the forensic response per se on the devices itself.
So how do you diagnose and remove cyber threats from the embedded devices, from the different pieces that are involved, et cetera, as you go through the investigation part of the environment. So the environment itself is technical area four,
and that's conducted by us at the University of Illinois. And that is the environment by which the exercise happens, which is on Plum Island, but also the environment at the University of Illinois' campus that the performers remote into to build out their technology,
to extend their capabilities and to investigate the edge cases of how the grid operates when certain things happen to these devices. So that's part of our central facility, and I'll talk a little bit about that. And then the island is basically a distributed manifestation of that central facility at Illinois.
The last technical area is technical area five, and effectively that is the evaluators of the program. And they build out how to run an exercise in this space and how to determine whether or not progress is being made on the technology, what the appropriate challenge levels are,
let's say pain points or how deep or how hard the red team pushes, et cetera, as we go through the environment. And they technically grade technical areas one, two, three, and four as they go through that.
So onto the next slide. Radix exercise six was really a move from the strategic notion early on in the program to operational notion with the utilities involved, et cetera, to a tactical deployment on top of that. So if you look up in the upper left-hand corner,
the strategic notion is the concept that we talked about of you have a large scale blackout and what you need to accomplish. The operational notion is the building and establishment of these crank paths. And then the tactical is let's execute on those crank paths and figure out exactly what happened
and how it happened. So the picture on the bottom right at the moment is a drone footage sort of zoomed in on one of the substations. And you can see that they're built in shipping containers and those shipping containers have gear inside them that I'll show you here in a moment.
And then these containers are arranged and linked together to build the crank path. And we do that with basically above ground, number two SO cord that has connectors on the end that we plug into to the individual gear that we have inside the boxes. I'll show you what that looks like here in a moment.
So you can see sort of a little bit of an edge here of what is inside a container. So this is standing inside a container looking out at the moment with the door open. So on the left-hand side there, you'll see what we call the relay box. On the right-hand side, you'll see what we call the power box,
which is a skid mounted Hoffman enclosure that controls the flow in essence of the substation. So that's the power grid aspect and the controls of those grid components are in the relay box on the left-hand side. There's also some sensors that are up on top that are providing or that are performer technology
providing some of the situational awareness and attempt to determine ground truth as we go through the environment. And I'll zoom in on a lot of this as we talk. So I'm going to only talk about the test bed itself. I'm not gonna talk about any of the performer technology that's specifically built, but I'll dig quite deeply into the test bed,
which is my area of responsibility. So the mission that we set out to do is to provide realistic environments that enable this cutting edge R&D that's not yet done by any commercial available product or any existing research off the shelf.
And then we take and create this environment in a way that allows us to validate the effectiveness and frankly, the efficiency of those tools as we go through. So the goal of the program itself is really to take a generational leap forward in the capabilities of test beds. So we've been building cyber-physical test beds
and leveraging the cyber-physical test beds for about 13, 14 years now at the University of Illinois. By many accounts, we're sort of the gold standard in terms of capabilities across the nation and arguably the world. But even so, when we pitched our capabilities
for this program, our proposal and the going in of our proposal was effectively, we have assembled the right team to solve this problem, but what technology exists today and where test beds are today across all of them that you will encounter
and anyone that bids, all of them are woefully inadequate to be able to actually go to the level of realism that will be necessary to validate these tools. And even with us, it is an extreme long shot as to whether or not this will be achievable in the timeframe and advancing fast enough
to be able to support this post-attack analysis. And so let me riff on that for a second. Cyber-physical test beds, before this program started, were primarily focused on let's either build an environment where we're looking purely at a physical phenomenon or let's build an environment that's proving out a hypothesis, physical or cyber,
and look at it from that particular angle, sort of ignoring all of the other details. But in this program, everything is unknown coming in. You don't know what the attackers did to you. You don't know even what the attackers want to do to you on the environment. So you have to have every piece of it as real as possible,
but you can't possibly go build three real crank paths and 24, 27 real substations out there because it just costs way too much money to do and it's a dangerous environment to be in. So how can you minimize the environment, maximize the safety, and also maximize the realism
without running into problems of naysayers with simulation being involved or emulation of devices, et cetera. So they need to be able to touch it. They need to be able to feel it. They need to be able to see it and they need to be able to trust that what it does and how it works
is going to be reflective of what happens in the real world. So the outcomes of the testbed work itself, obviously we've created a lot of tools. We've pioneered some new techniques and methodologies. We've combined existing solutions, both that we've had and that others have had
to build an environment together. And we combined that, not just the academic knowledge that we had at the University of Illinois, but in partnerships with key vendors and also with asset owners and operators and to build really an environment that reflected not only the real world,
but that took a whole leap forward on its ability to evaluate research. So what is a testbed? So a testbed really is somebody has a need, let's call them the customer, and that need is to evaluate something. And so a testbed is assets,
the thing maybe that they want to evaluate on. It's the people with the knowledge on how to build that environment in the way that represents the scenario they need to look at, et cetera, what to capture, how to capture, where to capture it, et cetera. It's the science of how to do so in a realistic way while still enabling the necessary data capture
that sometimes these systems inherently don't support. And then it's that data itself. And that data itself is what is captured from the devices, either willingly or not, how the system was operating, packet captures as an example of the communications that are going across, ground truth as to the physical telemetry
of what was actually going across, not just simply what the devices are reporting is happening, and then a manifestation of, or configuration of all of those things together that is provisioned out into an environment that you then do the work on. And so our capability, we can provision locally
the assets in our central environment. We can provision portable environments, like what we've built on Plum Island and deploy on Plum Island. And we can also provision into the cloud in a variety of different ways. So why a test bed? What's the value of a test bed? You've seen obviously the ICS village
and capture the flag stuff, but this is a little bit different, right? And the reason for a test bed, the reason for the work that we do is that this mission critical technology, and why do I call this mission critical technology? So the technology being built under RADIX is intended to be, let's say habits glass broken
when we're in a blackout scenario effectively after we've been hit. That's where the real value of this technology is. And there's arguments, and I am one of the people that will argue this that that technology needs to be used even before we're hit. But in the end, our grid is down.
That's what this technology is built to solve. And so it is absolutely essential if this technology is called to practice that it works and that it resolves the issue or that it can figure out what's going on in the issues, et cetera, before we need it. Because if we don't, and we're in a national blackout, a national disaster sort of scenario,
attacked by an enemy or otherwise, how do we come back if we break the glass on this technology and it's not been proven to actually work? And so you run it and it's like, I can't figure out what's going on. I see nothing wrong. There's no problems here whatsoever, but the devices still won't turn on.
The grid still is down. These devices aren't operating correctly. And that's a bad scenario to be in. So this is truly mission critical technology. And we have to prove that it's effective before we need it. But we have to go beyond the theoretical testing of it. We have to put it in all sorts of scenarios across all sorts of different platforms to verify that it works,
even in edge cases that it wasn't expecting, in circumstances where it's missing data, in circumstances where it's even being directly attacked or attempted to be, let's say misled on what is going on. So our solution for that is a realistic, recomposable and well-instrumented test bed
is essential to being able to prove that out. Because even the real grid environment cannot be manipulated in the way that we can with the test bed environment. And I'll talk a little bit about some of the innovation in that space. And frankly, everything, as I started with my opening salvo and the proposal that existed, including the Illinois capabilities,
wasn't good enough before this program started. So our approach across it is to build real systems, we also build models, looking at models on the cyber side and physical side that adapt to the exercise needs that help us build out behaviors and changes
in the flows and the communications of systems to operate like the real world, or to operate in a way that an adversary may be able to manipulate. Everything is built in this modular way, it's adaptable and let's say, recomposable in a variety of different ways, so we can take a piece and take a substation
and how it's physically wired and physically set up at the moment, press a couple buttons, push a different configuration, and now it's a different substation, same devices, different configs, different network layout, et cetera. And all of that adaptability or modularity allows us to recompose the system in any way necessary
to present different challenges, et cetera, as we go through. There's also instrumentation, and so the instrumentation is key in that many of these systems, let's say, will cooperatively give you a certain amount of data. But sometimes when you're doing forensic analysis, you need to be able to gather things that are deeper,
or if you're trying to do experimental validation, you need to be able to look at things that the system inherently won't tell you, or you need to look at it with much more scrutiny than what you would typically look at in the real world. So how do you do that and turn on an appropriate level of data output, data capture, et cetera,
but that doesn't actually affect the behavior of the systems? Because sadly, some of these systems, as you may know, are underpowered, and if you turn on, let's say, full, complete logging of the system or other things, it can bog down the operation of the system, and then it no longer participates or acts as it would in the real world.
So the last part is really knowledge. And by that, I mean, we don't just say, look, as academics, we're bright people, trust us, this works. We had to bring in real operators. We had to bring in the manufacturers, the vendors across many different platforms and talk through with them their best practices,
their common misconfigurations that they see when integrators are building their platforms, the common ways that they configure their substations in the real world for the asset owners, et cetera, to both cause, let's say, a human error to happen
in ways that people accidentally misconfigure things, but also to mimic as closely as possible how people are actually configuring these in practice. And that is to get the right level of protection, the right level of output of data, and even notions of like, okay, what does a SIP compliant substation look like in terms of what it is logging or not,
versus one that's not. So that all comes together in that knowledge area. On the innovation side, we had to innovate quite a bit. And so the, let's say, orchestration or automation stuff that we had previously was good enough for research,
but it made a lot of assumptions. And so by that, I mean, it would operate in a centralized environment, but when it tells something to be reconfigured or tries to control something, it expects that A, the device is reachable, B, that it has access to that device,
that it has the credentials to get on that device. It assumes, I guess, C, that it knows what the state of that device is. And then lastly, everything that it did before, it also assumed that the device was trustworthy and in a known sort of condition. So we had to, let's say, break down
all of those assumptions and operate in a way that, let's say, didn't rely on any of those existing substations. We applied a bunch of research as well. Obviously, we had over a decade of work
in the prevention space and in the detection space and remediation space at the University of Illinois. And we had used that in the test bed in a variety of ways. We had used it or proved it out in the test bed, some of which is even in formal companies now transitioned either to big vendors or as startups. And we had to apply that in sort of a different way
as part of the test bed, not to just say, okay, look, here's the test bed environment. But for instance, if we could reach deeper into a device, then we used some of that research that we had to dig deeper into those devices and pull out and extract information that supports the validation of the technology
without affecting the performance of the device itself. Our team in particular, myself and a few others, have went really deep on some of these platforms over the years. And so we brought a wide variety of devices to the table that we already knew quite a bit about on the inside.
Sometimes even let's say one could argue more than what the vendors know about their own devices in terms of the knowledge and ways that we could poke around inside of these platforms. We had to also build, because we needed to deploy on an austere environment,
if you don't have it, then you better bring it type notion. And so we had to build these boxes in a way that were field serviceable. We had to be able to quickly replace components of the system if it were to break, or if the intent of the cyber attack against it was literally to brick it so that it was no longer functional.
How did we restore that or replace that in as fast of a situation as possible to move on and not basically stop the whole exercise if something were to break. We had to advance our automated configuration, data extraction, and also the notion of the system and its observation when even the network links
were no longer trustworthy or reliable to be up or down. Remember, we're in a black start scenario, so we're not even guaranteed that we'll have power on each of the substations to be able to communicate to them. And when they're brought up, we need to maintain the state of everything we captured as they go up and down like a see-saw
as they're being attacked and brought up and brought back down, et cetera. And then we also needed to have the environment in a way that could be recomposeable, change the structure of the crank paths, change the behavior of the substation itself without going through and re-cabling or rewiring everything that's in there
on a hands-on nature. So what are these environments? So combined, I call them the substations in a box. It's two components. This has been built on the extensive facilities we have at the University of Illinois that I've sort of alluded to. We have roughly $100 million worth of hardware and software
at the University of Illinois that have been built up over the past decade plus, much of which by donation, that's enabled all sorts of research that we've done in the past with trustworthy cyber infrastructure for power, which was an NSF effort, DOE DHS effort called
trustworthy cyber infrastructure for the power grid. It had grid to the end. Our most recent center that's wrapping up in the next year or two called the Cyber Resilient Energy Delivery Consortium, which is also DOE and DHS funded, our Critical Infrastructure Resiliency Institute, and a variety of other things
that have have and leverage the test bed resources at Illinois. So the substations in a box, as I mentioned, they're designed to support this black start crank path analysis and deployed in the field, real grid environments built, et cetera. They're built in Pelican style cases. So they're literally shippable and deployable
anywhere we need to stand them up. They're generally mostly IP55 water tight when they're shipped and moved around. When you physically deploy them, we take the case lids off and put them in enclosures. The reason for that is literally so the devices inside don't overheat,
but also because you do need some physical access to the devices to control breaker operations and other aspects. We built an environment on an island. So the power infrastructure of what is in the overhead and underground stays, but basically everything else gets torn down
and built back up every six months in a different way. There are currently 26 variants of substations deployed across that infrastructure. Those substations have relays, RTUs, substation network switches, routers, as well as an experimental fabric underneath that's controlled by SDN
that allows us to do a lot of the capture and let's say dynamic changes of the substation itself. All sorts of protocols are deployed. You can see a list of them up on the screen. There's both serial and ethernet communications. We have custom power connections on the power boxes that allow us to link these systems together
in a safe way. And then we have a high voltage infrastructure that I'll talk about as well. So what's a power box and what's in a power box? So power boxes basically think of it like the physical infrastructure of the island or of a real substation.
So that's the breakers, the bus bars, the incoming and outgoing feeders on the system. We have a local load feeder as well. We have signalization lights that indicate what the status is of energization, what the status is of breakers. We have a dead bus sync light that's provided.
We have analog sync check relays, contactors, auxiliary contacts on the systems, CTs and PTs, control circuitry behind that allows us to operate breakers and various other stuff. We have different modes of operation, sort of a safe mode where we can walk away from the system and the system can't possibly change,
which is a sort of a unique scenario from the real world, differing from the real world. And then we obviously have the ability to locally control breakers as well. So what does that look like? So there are effectively two types of power boxes inherently that we've built. One is a 208 volt three phase system.
One is a 480 volt three phase system. They look basically identical from the front, except for the size of them is a little bit different. We also have these high voltage systems, which really are Hoffman enclosures that are wall mounted and act like, let's say semi-intelligent
breakout boards for providing telemetry from the high voltage gear to the corresponding devices that are then operating and controlling that high voltage gear. And so think of it sort of like a mapping board in a way. And each power box generally has an incoming circuit,
a load circuit, and then two outgoing circuits as it's built out. So basic electrical diagrams in the middle, but nothing, let's say, shocking about that. And then there's the other side of it. And so each of these devices have the number two SO cord
coming in to these Hubble connectors on the edge, but they also have umbilical cords, which are amphenol connectors, mil-spec amphenol connectors that basically take all of the telemetry of what is happening inside the box and provide that telemetry to the devices that need to control it. So that includes the analog and digital signals
that need to be sent back and forth between the devices to control them. But also the CTs and PT outputs, et cetera, from the system itself so that all of the sensing is detectable by the relay boxes. And so what are the relay boxes? Well, the relay boxes are really the brains
of the substation. So here are a couple examples showing some of the different technology that's in play. Up in the upper left, those are ABB relays along with an ABB RTU. This is sort of a legacy RTU platform that ABB leverages or uses and has deployed around the world called the RTU 560.
In the middle, you'll see some more ABB relays, middle top, and above that, instead of an RTU 560, you see a Motorola device. This is a Motorola ACE 3680. If you move to the next image, upper right-hand corner, that is an ABB COM 600 rack mount or a COM 600R
that is acting as the RTU over those ABB relays that are there. Bottom left-hand corner, you'll see some touchscreen SEL 751 relays. Those are controlled on the RTU. In that particular case is an SEL RTAC, a 3505.
In the middle, you'll see touchscreen relays. And in that one, there's an SEL RTAC as well, but that's an SEL RTAC. It's an SEL 3530 instead of a 3505. And then the far right, you'll see more SEL relays. These ones are not touchscreen. These are another variant of the SEL 751.
And those ones are being controlled from an RTU perspective by a Novatek Orion LX. So this shows just some of the diversity of platforms that are there. There are much more, obviously, across 26 substations. Every single substation is unique in some way, shape, or form.
So we have a lot of diversity across the environment in terms of platforms, technologies, and configurations. And so diversity could be purely on the configuration side. For instance, different protocols being communicated, different topologies being set up between utility A,
utility C, utility B, et cetera, and lots of other variation on top of that. So let's talk about some, let's say, lessons learned from the program, more challenges that we had to tackle now that you understand some of the gear. So first off is safety.
When you can't trust anything in the system whatsoever because it is compromised, and it's compromised in a way that you may or may not know, may or may not be able to determine, and you don't know what is trustable or not. All of the systems that are there are effectively designed to protect you.
But if you can't trust the digital systems to protect you anymore, then you need additional layers of protection. So we had to layer protection throughout the system in both physical and cyber form. And that included things like analog protections in the system, like analog sync check relays,
time over current protections, thermal protections. On the digital side or on the cyber side, the protective relays were configured with safe and sane settings for over-voltage and under-voltage conditions, et cetera. The boxes had arc flash analysis done on them to determine potential exposure or safety measures
from PPE perspective that needed to be done. The cabinets were all isolated, either pad lockable or direct key lockable. We had on the high voltage side, Intelliruptors that were acting as fail safes. If there's anything that flows through the low voltage to the high voltage side, an out of sync close or something like that that happened to happen
or a surge somewhere, the Intelliruptors or a fault on the line, the Intelliruptors were there to protect the system at the high voltage side. We also purposefully did not target the high voltage control. That way we didn't run into, let's say, big issues.
Low voltage was something we could cause a problem on and be okay with, but on the high voltage side, people could get hurt. All of the connectors we used were screw-in locking style connectors. We had all sorts of internal wiring protection, et cetera. So the key is that the environment itself was designed to be safe no matter what.
So people that knew nothing about power systems could still safely operate in this environment and not run a risk of being electrocuted or whatever. We always had power engineers and safety officers effectively onsite that were making sure that people did safe operations and maintained the necessary perimeters,
even at a noise level from the generators and various other aspects. But really the system protected itself in every way, shape and form. Even when the system wasn't trustable. So let's talk about some operational lessons that we learned in executing exercises
for Blackstart restoration, but also in austere environments under conditions of blackouts, et cetera. And so let me talk a little bit about the mode of execution as we go through this. So when I say that we're operating utility environments, we are actually operating the utility environments.
And by that, I mean, utility operators from real utilities come to the island and they run the infrastructure. And they basically take control of it. We hand it over to them and then they tell everyone else what to do, how to do it, when to do it, et cetera on the system. Now, obviously we have some exercise control over that,
but the intent is to really make this as real as possible. So let's talk about that realism. Many people don't believe what is possible until it really is, let's say, slapping them in the face. And by that, I mean, a common view is, look, relays, they're embedded devices.
You can't make them do things. You can't disable their protection. You can't change their mode of operation beyond their config. And until they saw us do that, they didn't really believe it. Even when it was happening right in front of them,
they still didn't believe it until they dug deeper and started to look deeper at what was happening and how it was happening to realize that, look, bad things really are possible that right now your mindset is that this isn't possible at all, that no one can do that. You had to physically modify the device in order to do that.
And it's like, no, we can actually do that via cyber means. So let's talk about the people, right? One of the things that was an operational lesson is academia is really good about thinking outside the box, but we needed to be extremely agile and think even further outside of the box
and push through and build stuff that, frankly, was, let's say, impossible to build at any given moment. We took on building one utility, then two, then three, all in six-month iterations, completely different architectures, completely different devices, building the physical boxes
and standing it up in a new realistic configuration at each sort of interval. That's pretty difficult to do. It's multiple weeks to build the environment each time. It's multiple weeks of testing. It's multiple weeks of evaluation to make sure that it is as real as possible,
that it doesn't have inherent artifacts itself that people may view as compromises, et cetera, in the system, that it is pristine, blue sky, trustable, et cetera, and built the way it should be built. And that results in, let's say, extreme levels of stress at times. But as a team, not just the University of Illinois,
but broadly, the entire program, we all pulled together and found success in every exercise that we had. There was no exercise that failed because the infrastructure or the people failed to deliver. And so that brings us to pace. So I mentioned every six months.
So DARPA programs move very fast, and the expectations are very high of the technology, of the evaluation, of the test bed, et cetera. And so there were many times that we were facing failure, but let's say by pure blunt force and by pure blunt force, I mean number of hours
and long days and leveraging and leaning on each other throughout the program, we were able to pull through and pull off what seemed to be impossible when we started. So those are some just operational aspects that we learned as we built the environment,
especially out on an island. And so when I say, what are some challenges, right? So there were times when we were, we took a ferry every day from the mainland to the island, and there were times when Nor'easters were blowing in and other aspects where the waves were 12, 14 feet high, and the conditions were such that if you go to the island
you're going to be sleeping there because you're probably not gonna make it off. And that's kinda, let's say a bit much, but there were many days that happened like that. There were days we were out there executing the environment, trying to restore after the cyber attack.
And there was Nor'easters blowing through with 40 plus mile an hour winds and downpours and inches of rain falling an hour, et cetera. So not just hard from a technical perspective, but even harsh environments that we were out in as we were trying to restore these devices
and as the teams and the utility operators were operating these devices faced with those types of environmental conditions. So we learned a whole bunch of other lessons too. So one of them that was interesting to us was that when the systems are intended to break,
when you know that you can't trust anything anymore, you have to think differently about what you can build and how you build it so that it works consistently and reliably, even when everything is intended to be broken. So that goes back to some of the stuff we did with safety, but also on the cyber side.
So as an example, we had no guarantee of power reliably or not at any point in time. So how do we guarantee we don't lose data as an example as we're going through, or that we get, let's say eventual consistency, where if part of the network is up and being controlled by our orchestration
and another part of it is down, how do we get it to catch back up and get in the right configuration when it does come back up if it needed to be changed or to collect all of the data that it had when it wasn't centrally reachable, when it was isolated and only, let's say on backup power on its own.
So that presented some interesting challenges that we had to tackle, which is basically fault tolerant and distributed computing in a nutshell, right? The other issue that was somewhat interesting that we had to innovate on was the gear in the field typically isn't hot swappable. You can't just go grab a relay that's in a real substation and pull it out
and pop another one in, in most of the substations without having to do some rewiring, without having to take circuits out of commission, de-energize them, lock out, tag out procedures, et cetera. But if you're in a fast-paced exercise environment, we have seven days and if people get stuck and we need to reverse out
basically what the bad people did, how do we reverse that out in as fast away as possible without bringing the whole exercise down or the whole exercise to a halt for hours on end? So we had to create sort of quick connects and all sorts of other things that allowed us to swap gear out very, very quickly. Further, when we're testing bleeding edge technology,
things fail because sometimes that technology doesn't work or the research doesn't do what it's supposed to do. And so that's your primary plan is, okay, look, it's going to try this and if it works, great. Well, you have to have a backup plan, right? And in many cases, we had to have another backup to the backup plan because we couldn't fail
and make everything stop working. So if anyone in the program failed, we had to have a backup plan for what would happen if they failed and then a backup plan for if our backup plan failed. And then the last one which is a lesson learned
is we thought we were prepared in many, many, many occasions, but let's say in between days of the exercise when we would pause and go back to our hotels to sleep, et cetera, I would often make runs to Home Depot or Lowe's or electrician stores or whatever, because no matter how many spare parts we had,
no matter what tools we had on us or available to us, anything that can break will break at some point when you're running these environments. And it's almost always in the way that you never thought of or that you couldn't have anticipated. Something that has a 10,000 hour meantime between failure
or 100,000 hour meantime between failure fails in 10 hours instead. So lots of interesting challenges on just keeping things up and operational and making sure that stuff was readily available if anything did break. So let's talk a little bit about some personal takeaways and I'll wrap these up really quickly here,
but let's talk, let's say not as the University of Illinois, but let's talk about this from my perspective. So vendors be this ICS cyber solutions or the vendors themselves, they often claim capabilities that let's say,
are much more limited in the real world application than what most people realize. So a utility may say, I go buy this platform and it's got me covered and it can do all of these things and I can check that box and I'm good. The reality is no matter what vendors claim, there's often still very large gaps there.
And so even the commercial off the shelf ICS cyber solutions that are out there, they're missing huge amounts of surface area on what adversaries can do against these boxes. Radix technology was designed to close some of those gaps, but not all of them, right? So even as great as the Radix technology is, we still have a long way to go, right?
In building the environments that we built and helping to construct the exercise evaluations and the test effective payloads and all that in essence is, how do you cause a condition from a cyber means to happen on these devices such that there are artifacts or implementations
that then people can forensically find, right? So all of those, in doing that, we found hundreds of issues on these devices. And it's not like we've never looked at these devices before. And by issues, I don't necessarily mean security vulnerabilities, but in some cases it was security vulnerabilities. But merely other items like look, compatibility between host A and host B, it doesn't work.
Even though they're supposed to interoperate, it doesn't. There's nuances differences. There's differences in documentation on what is communicated on the actual wire versus what it says it's going to communicate as. Another thing, which was a personal takeaway is being the adversary is fun, right?
It's nice to be on the red team and hack these devices and break them in a variety of ways. But in the program, the defensive recovery technology, including the radix technology and commercial off the shelf stuff that I've personally seen, it's still way behind what I was capable of
or what others on the red team were capable of doing to these systems. So if we're so far ahead, we could just obliterate any of that tech, then what fun is there in that? It's not even a fair fight at that point. So a lot of times we limited it, the activities that we were doing
to sort of poking the bear rather than destroying it. So we didn't go out for the throat kill. We kept a pace with what the technology was capable of and designed something that was just a little bit ahead of that, pushing them each iteration to improve. And so one sort of broad takeaway is,
and I say this not just of technology in North America or whatever, but the whole world, what they have in this space in industrial control systems, detecting cyber attacks on these devices, that detection, that mitigation and that remediation technology for the electric power grid. And I'll stretch it a bit and say,
actually all critical infrastructure still really has a long way to go. There's still so much work that needs to be put on those platforms to really protect the systems from somebody who is truly focused and determined and understands these systems at a deep and inherent level.
So that's all of my technical content. I will just flash a slide real quick, which is the test bed at Illinois and much of what we've built in the RADIX program has been enabled by lots of companies. These are some of the companies that have donated gear to us, software to us, et cetera,
that have helped enable the things that we've done, but without them and without the commercial support and the vendor support and the utility support broadly across this program, we wouldn't be successful. The DARPA RADIX program wouldn't have been successful. So thank you to all of those companies and what they did. And then I think we still have a couple minutes
for questions and so I'll also leave a sort of bonus link down there at the bottom. There's a GitHub repo that I created a number of years ago at the S4 conference and I've been maintaining it sort of ad hoc ever since. And that's a bunch of ICS security tools
in a variety of different forums that are aggregated and categorized in various ways and mirrored when their original location is no longer available. So do check that out. It has a whole bunch of useful things in it. And with that, I will stop talking and I think we might have a little bit of time
for question and answer. Let's see. Hey Tim, this is Bryson, how are you doing? Wonderful, hi Bryson. Thank you, thank you for the talk. I noticed that you are on our Discord and I actually promoted you to a speaker during your talk.
So you should now have that badge tied to you. And what we recommend is if you could post that GitHub link in the speaker Q&A section of Discord. Okay. And then reach out for folks to engage you there and for questions. Okay, sounds good.
So really appreciate you jumping on and giving this talk. Having been and seen this for myself, it is really impressive. And my favorite part for the sensors was of course the inflatable guys like you see next to the used car sales. I thought that was a really interesting way to show whether something's up or down
at a physical distance. Yeah, we affectionately call them the dancing men. And sort of a funny aside, in the Nor'easters when you have torrential downpour, those things get wet and then they turn into sort of like whiplashes. So I was up there untangling them on many occasion
and getting sort of smacked by the dancing men as I was trying to untangle them so that they could fly. Well, that's how you know it works. Well, anyway, Tim, appreciate you joining us and supporting the village and look forward to the commentary and the Q&A on Discord.
Awesome, yeah. And everyone do check out what they've set up for the CTFs and other things in the village. They do an awesome job of creating environments that you can play with. Sadly, I can't offer my environment out to the world in such an easy and accessible way. But hopefully in the future, I'll get deeper engaged
with the ICS village and bring some of this tech and some of this capability to the village so you guys can all play with it too. Yeah, we look forward to that. That's probably been the biggest innovation we've had is because of the pandemic. The amount of effort we've had to spend on figuring out how to make these things virtually accessible, which is the typical limitation
for concurrent access. We kind of have solved it. So we'd love to touch base with you afterward and talk about it. Yeah, it's great to hear that you guys have had to tackle that. We're currently tackling that for the DARPA RADIX effort as well. With the last exercise, it will be predominantly remote.
And we went from the prior exercise being network isolated, completely trusted, everyone in a specific zone to, part of this is deployed in the cloud and everyone is distributed around the country and all accessing this in a controlled and crazy way, including like streaming body cams that we'll have
and all sorts of stuff. So we're on that same rollercoaster due to COVID at the moment. Well, we look forward to collaborating. So again, Tim, thank you very much and we'll see you on Discord. Yeah, you guys have a great day. All right, take care.