An introduction into AMD/Xilinx libsystemctlm-soc - TIB AV-Portal

An introduction into AMD/Xilinx libsystemctlm-soc

00:00

9

Related Material

Iglesias, Francisco

Formal Metadata

Title

An introduction into AMD/Xilinx libsystemctlm-soc

Title of Series

Number of Parts

542

Author

Iglesias, Francisco

License

CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/61433 (DOI)

Publisher

Release Date

Language

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

This presentation will give an introduction into co-simulation, the AMD/Xilinx QEMU fork supporting co-simulation, the libsystemctlm-soc library containing the infrastructure for co-simulating SystemC and RTL with AMD/Xilinx QEMU and the systemctlm-cosim-demo project containing co-simulation examples. In the final part of the presentation a co-simulation demo will be shown.

Speech

Text

Image

00:00

SimulationField programmable gate arraySource codeData modelPhysical systemEmulationProcess modelingRevision controlVirtual machineComputer hardwareImplementationCommunications protocolLink (knot theory)Physical systemBridging (networking)Communications protocolEndliche ModelltheorieImplementationTelecommunicationCore dumpSpeech synthesisLibrary (computing)Projective planeRevision controlInteractive televisionFocus (optics)Virtual machineComputing platformSoftwareProcess (computing)Standard deviationSimulationMultiplication signSeries (mathematics)Software developer40 (number)Open sourceWhiteboardRepository (publishing)Interface (computing)CASE <Informatik>Descriptive statisticsOpen setLimit (category theory)WebsiteChannel capacityZirkulation <Strömungsmechanik>Observational studyCartesian coordinate systemSemiconductor memoryEnterprise architectureLogic gateEmulatorPrototypeFunctional (mathematics)Computer fileOrder (biology)TouchscreenDemo (music)Computer architectureModule (mathematics)Different (Kate Ryan album)Remote procedure callConnected spaceField programmable gate arrayWrapper (data mining)Computer simulationComputer hardwareStreaming mediaSlide rule

07:53

Wrapper (data mining)Physical systemEmulatorImplementationCommunications protocolModul <Datentyp>Chi-squared distributionDatabase transactionVertex (graph theory)SimulationDemo (music)EmulationMach's principleCore dumpSource codeDatabase transactionMonster groupElectric generatorPhysical systemSimilarity (geometry)Greatest elementSemiconductor memoryEmulatorComputer hardwareDemo (music)Library (computing)Wrapper (data mining)Communications protocolBridging (networking)Device driverConnectivity (graph theory)PCI ExpressCartesian coordinate systemConnected spaceInterface (computing)Core dumpSoftwareEndliche ModelltheorieOpen sourceMobile appFlow separationProjective planeSoftware testingPoint (geometry)TelecommunicationGame controllerCASE <Informatik>DemosceneModule (mathematics)Chi-squared distributionDescriptive statisticsComputer simulationParticle systemError messageObservational studyNetwork topologyAxiomVideo gameSpacetimePersonal digital assistantWebsiteDecimalSimulationTerm (mathematics)PiLiquidQuicksortSheaf (mathematics)Data storage deviceDot productMultilaterationComputer animation

15:40

View (database)Row (database)Execution unitConvex hullMaxima and minimaSynchronizationPhysical systemQueue (abstract data type)Radical (chemistry)Cartesian coordinate systemComputer animation

16:13

SynchronizationMaxima and minimaConvex hullEmulationSemiconductor memoryWater vaporSimulationCommunications protocolDescriptive statisticsError messageDatabase transactionMessage passing

17:01

Maxima and minimaMenu (computing)Execution unitCircleDatabase transactionCASE <Informatik>Greatest elementGroup actionDemosceneCommunications protocolError messageSemiconductor memory

17:58

EmulationMereologyView (database)

18:22

Order (biology)Right angleComputer fileWordProcess (computing)NumberArithmetic meanGroup actionElement (mathematics)WebsiteMonster groupEmulatorRule of inferencePhysical systemComputer animation

21:20

Open sourceSoftwareRandomizationCASE <Informatik>Pointer (computer programming)Physical systemDatabase transactionCommunications protocolImplementationEndliche ModelltheorieLimit (category theory)Order (biology)CommutatorHeat transferLogic gateBlogConsistencyPhysical lawQuicksortDependent and independent variablesGenderInterior (topology)MereologyCore dumpExpert systemInformationComputer simulationRight angleMathematicsReal numberSubsetLine (geometry)Cartesian coordinate systemSource codeProcess (computing)Module (mathematics)Block (periodic table)Electronic data processingStreaming mediaCodeComputer hardwareFunctional (mathematics)Software developerLecture/ConferenceMeeting/Interview

26:42

Lecture/ConferenceMeeting/InterviewProgram flowchart

Transcript: English(auto-generated)

00:05

All right. We are ready. We fixed it. It broke again and we fixed it again. I didn't do anything. The green shirts did it. All right. So next we have Francisco Iglesias. How do I pronounce that? Good? And to me at least, I'm just going to rant, but to me because it's always interesting to see how

00:22

emulation is used in the enterprise, in the, you know, people say money world. So, we're not. We're not. So, let's see how it goes. All right. Okay. Hi everybody and welcome to this presentation.

00:41

My name is Francisco Den. I work at AMD with QEMU development and System C development. Can you speak up a bit? Yes, can you speak up a bit? No, he doesn't like me. This is for the stream. Okay. So, I'll try. I have a little threat problem.

01:04

Okay. So, today I will be speaking about our open source co-simulation solution. At the end of the talk then, first I will give a short introduction into what co-simulation is.

01:23

And thereafter, I will be speaking a little about the AMD Xilinx QEMU itself. And proceed with introducing libSystem C, tell them SSE. And system repository, System C, tell them co-sim demo.

01:44

And lastly, I will show a short demo where QEMU is co-simulating with a couple of RTL memories. Using the infrastructure in libSystem C, tell them SSE.

02:01

Even more. So, in this slide I tried to capture one of the trade-offs that is done when you choose simulation technique for your RTL. And it is the trade-off between speed and design capacity visibility.

02:24

And we see that the three techniques that is used for RTL development, RTL simulation, hardware emulation, FPGA prototyping, they all come with a different cost on the simulation speed. And on the left side here also we have the virtual platforms that are fast

02:42

and great for software development, but they do not help with the pure RTL debugging or development. So, an approach that can be used here to try to leverage from the two worlds here is to place a portion of interest in the, a portion of the RTL on one of the RTL simulation techniques.

03:10

And then keep the rest of the system modeled in one of the virtual platforms.

03:20

And this way you will then keep most of the system simulated at a quite fast speed, while still keeping the visibility to this portion of RTL that is in focus. So, this is what we mean with co-simulation, that you are mixing these two worlds.

03:50

In our open source co-sim solution we have the Xilinx Qemu, where we model the processing systems of the FPGAs.

04:02

And then we have System C that we use for modeling the programmable logic. And LEM System C, the LEM System C, it has bridges that allows us to connect System C models of RTL

04:22

and also FPGA prototypes and the hardware emulators. I will be speaking more about the bridges shortly. But first, a little about the AMD Xilinx Qemu fork.

04:44

So this is where we have our improved support and modeling for the Xilinx platforms. And today it is based on the Mainland Qemu version 7.1.0.

05:04

And we upgrade it around once a year to a more recent Mainland version. And the AMD Xilinx Qemu then has some extra functionality. One of these is that it can create machines through a hardware DTB.

05:25

And this allows us for having a more flexible machine creation and modification process. And the AMD Xilinx Qemu also has an implementation of the remote port protocol.

05:41

This protocol is the protocol that is used when we co-simulate both different Qemu architectures and also when we co-simulate with System C. This is an overview of this, where we see an AR64 Qemu co-simulating with a microblaze Qemu.

06:06

And also with a System C application on the side. Continuing with LEAP System C, FLM SSC. This is a project that was started by Edgar Iglesias in 2016 and the license is MIT.

06:29

One of the core features is that it has the remote port protocol implementation in System C. That is then used for connecting with Qemu and co-simulating with Qemu.

06:43

And going together with this, it also has System C wrappers, what we call. These are for wrappers for our Sink, SinkMP, Versal, Robson, Epton. And the short description of a wrapper is that it wraps Qemu into a System C module.

07:02

So that for the rest of the System C application, the interaction from the other modules with Qemu is done through the standard System C interfaces, as TLM and signals, etc.

07:24

The library also has TLM bridges into AXI4, AXI3, AXI4 Lite, TPB, ACE Lite, CHI, CXS, TLP, XDMII. And a bridge converts communication from the TLM side into the protocol-specific side.

07:51

So here's an example of the TLM to AXI bridge, which translates TLM into AXI. And these bridges then is what allows us to co-simulate.

08:08

For example, in this case an AXI DUT that has been generated from RTL.

08:20

So we see here that the System C wrapper communicates through TLM to the bridge. That then converts this TLM to AXI signaling and communicates through this AXI signaling with an AXI DUT.

08:41

And this is how Qemu on the left-hand side can access the DUT. There are also RTL bridges in the library for AXI4, 3, AXI4 Lite, ACE, CHI and CXS.

09:01

And the RTL bridges have two components. The first one is the bridge itself that is placed on the FPGA or in a hardware emulator. And the other component is the driver of the bridge that is placed on the System C application software side.

09:21

So the way it goes is that TLM transaction enters the driver, which then configures the RTL bridge to replicate this transaction as an AXI transaction, for example, inside the FPGA or the hardware emulator. And this is an example of when these bridges are used with an Aldeo U250

09:50

card, where we have between the bridge driver and the bridge we have some infrastructure there.

10:04

We fire PCIe next to the AMADAN, and one can see these components as a transport channel where the driver accesses go through towards the RTL bridge.

10:21

And looking at how it looks inside a hardware emulator is very similar, but instead of PCIe, here the vendor bridges are used for this transport.

10:50

In the library we also have protocol checkers for AXI4, AXI3, AXI4 Lite and ACE Lite, CHI.

11:01

And the protocol checkers, they are connected to the signals and monitor the signals and try to find issues, violations to the protocols. Also in the library we have modules that can be used for generating ACE traffic.

11:31

So we have ACE, ACE Lite monsters and ACE interconnect. So the monsters here, they generate ACE transactions towards the interconnect, and the interconnect will then, when

11:44

required, snoop the other monsters and otherwise forward the transaction to the TLM memory at the bottom. We have a similar setup for CHI, where we have request nodes that generate CHI traffic and a

12:08

CHI interconnect that does snooping when required or forwards the request to a slave node at the bottom.

12:25

Also in the library we have a tool called PySimGen that can generate simulations from IP exact descriptions. And there's a basic TLM traffic generator that one can configure to generate randomized traffic or provide a description of transactions to issue.

12:46

And there are some simple, easy co-simulation examples that one can have a look at as a starting point. There's a lot of documentation for all the components and we also have an extensive test suite.

13:09

The system C TLM CoSim demo is also a project that was started by Edgar Iglesias in 2016. And the license is MIT.

13:22

And this contains several QEMU co-simulation demos where we co-simulate the SYNC, SYNCMP, QEMU, and Versol QEMU with a PL model on the system C side. And there's also a RISC-V demo where a RISC-V QEMU is co-simulating with an open source Ethernet controller core on the system C side.

13:57

We have several x86 QEMU that co-simulate with PCIe endpoint models on the system C side.

14:05

And there is also a PySimGen demo where the system C side of the co-simulation has been completely generated from IP exact. And these demos, they demonstrate how to embed the libsystemc library in a known project and how to use it.

14:34

So for the demo that I'll show now, here I will be launching a Linux system on the SYNCMP QEMU.

14:49

And it will be co-simulating with a system C app where that includes a couple of RTL memories. One of the RTL memories has an AXI4 interface, and the second one has an AXI4 Lite interface.

15:09

On the AXI4 Lite signals, there's a protocol checker connected. And I also modified the AXI4 Lite memory here and injected an error so that we can see that the protocol checker finds this.

15:26

So we see here that on this left terminal, this is where QEMU is being launched.

16:06

And on the yellow terminal on the top is where the system C application has been launched. And we will start by doing some accesses to the AXI4 memory.

16:23

And thereafter, here comes the accesses for the AXI4 memory. And then thereafter, we will do an access towards the AXI4 Lite memory that has an error in it. And here we see that the protocol checker found the error and outputted some description message.

16:46

After the simulation, you get a trace that we can inspect. And we can see here, follow the AXI signals. And look at the transactions just issued.

17:05

See that it is the expected data that we're seeing in here. And you can see those at the bottom here. These are the data that we were writing to the memory.

17:21

Then the protocol checker's error is also connected to a signal in this case. So for the transaction that failed, it can be found when this signal has been asserted. So this is seen at the bottom here, where there is the asserted signal.

17:45

And then we can look into the transaction here and find the problem.

18:02

And that is all what I had today. Thank you for listening.

18:26

That's a dumb question, which I'm known for. Because like I said at the beginning, I'm very interested in how this works in enterprises. I'm curious, how do you guys decide a feature to be implemented? How do you plan it? That kind of stuff.

18:42

I don't know how that works in a community, or if you're in your basement, but I don't know how it works. Do you mean like in QEMU or in the system C? No, that's your employer rule. So how do we decide the features that we implement?

19:01

And it's actually the demand that drives this. So if we see that some team internally at AMD Xilinx needs a feature in QEMU, then we implement it. Or if we see that there's a feature that might be useful later going forward.

19:28

Not right now, but perhaps in a year or so, then we will consider implementing it too. Often it ends up that our demands are pretty similar to all other demands.

19:47

So if we implement a feature, it often becomes useful for others as well, not only for the Xilinx. In a small follow-up, do you guys probably do Agile like the rest of the world?

20:03

I'm curious, how do you guys refine a story like this in Agile search? And I'm very sorry. I'm pleased with that. How do we use Agile development? No, I don't care about Agile. I really care about the refinement. I don't like Agile actually. How do you guys brainstorm together on a feature? What do you put on paper?

20:27

It needs to be this, but how do we do this? It's not always comparable to something that already exists with Emulators. It's usually something that's never been done before.

20:41

I'm really sorry about this question. Yeah, no, it's a very good question, and I have to admit that I'm not sure if we have such a process that we're probably looking at here. We get a request in our group to implement. We need this feature from, for example, one of the RTL groups.

21:02

They need a feature, they ask us, and we implement it. So we don't have really a process where we kind of do this very Agile in that sense. This is our team. It might be different in other teams at AMD.

21:23

I'm curious, how do you get the System C model from Verilog? Does that also work for core-gen-generated IP, which might be encrypted? So how do we get the System C model from Verilog?

21:44

There's an open source tool named Verilator that will correlate the Verilog and create the module for you. But that's not going to work for the core-gen-generated IP, which is encrypted and which Verilog cannot process. Yeah, for that I have to admit that I'm not sure how to do that, so sorry for that.

22:02

There is no free line. Do you know what I mean? Like, your core-gen is exciting to write, so maybe they have some System C model for their own IP. I don't have to speak on that, because I have to admit that I'm mostly on the QEMU development side.

22:26

But if you ping me afterwards, I can take your card and see if I can give you a correct contact or something. Yes? Is there something for VCL as well? I think there are tools that do this, but if there is a tool that automatically generates a System C model from a VHDL, there are tools apparently.

22:55

I'm pretty sure there are too, but we have not used them.

23:00

Yes? Are you limiting yourself to the synthesizable subset of System C, or do you not care? No, there we don't limit ourselves to in System C now.

23:21

I'm coming from the world of open source software-defined radio, so I have flowgraphs where I have data processing blocks that are running in software. On an empty source R64 core, what I want to do is I want to take a block, accelerate it by implementing it in some RTL and getting it to run on the FPGA part, getting the data in and out.

23:44

How does that work? I have some part of software that I want to be accelerated by an FPGA accelerator. What's the workflow you're using? These tools, you mean? Yes. Yes, so in that case, you could...

24:01

Yes, how... Random acceleration implementation of software. How do I go from software implementation to software implementation? Yes, so I can... How do I go from software acceleration to hardware implementation?

24:22

I know how to write hardware. Yes, yes. Yes, so I have to admit that I myself am not an expert hardware engineer, but I think that the way I would have done it is just to go ahead and create the parallel code.

24:41

And with this tool, it's pretty sweet because you can connect it to the QEMU system. This is actually an XE stream. I'll just put it in there and I call C functions in the end, right? You can launch your real software in QEMU that interacts with it and then...

25:03

How do I exchange data with the library? What's the interfaces? I see internally it's TLM, it's called the system C, right? I don't get to choose that. What's on the surface? How do I get data in and out? How do you get data in and out, the simulators?

25:21

Perhaps I would have needed a better overview picture, but you can get how you get data into your system C application. We don't have any magic for this, but...

25:58

We don't have any transactions from QEMU into the system C side or to another QEMU.

26:05

So it's not really a way that will allow you to load in a bunch of data into the system C application. Any more questions? Did I answer your question?

26:34

Yes, ping me afterwards and I can. We don't have time. Thank you very much.

Recommendations