We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

One test output format to unite them all

00:00

Formal Metadata

Title
One test output format to unite them all
Title of Series
Number of Parts
490
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Since several years, software quality tools have evolved, CI systems are more and more scalable, there are more testing libraries than ever and they are more mature than ever and we have seen the rise of new tools to improve the quality of code we craft. Unfortunately, most of our CI system still launch a script and check the return code, most of the testing libraries don't allow to select finely which tests to launch and most of CI advanced innovations, parallel running, and remote execution, are not available to developers on their workstation.
Statistical hypothesis testingBoris (given name)Presentation of a groupStatistical hypothesis testingExecution unitSource codeArchaeological field surveySoftware developerBitStatistical hypothesis testingProjective planeExecution unitMacro (computer science)Flow separationVirtual machinePresentation of a groupSoftware developerUnit testingArchaeological field surveyHand fanComputer animationSource codeMeeting/Interview
Statistical hypothesis testingFunction (mathematics)WritingComputing platformSurface of revolutionInterface (computing)Computer fileObject (grammar)Equals signExpected valueSuite (music)Server (computing)Formal languageExpert systemIndependence (probability theory)Function (mathematics)Graph coloringStatistical hypothesis testingUser interfaceConfiguration spaceInformationInterface (computing)Arithmetic progressionMultiplication signElectronic mailing listLetterpress printingRandomizationServer (computing)SubsetDebuggerExpert systemFormal languageComputer animation
Statistical hypothesis testingLatent heatStatistical hypothesis testingDemo (music)Computer fileSuite (music)Latent heatComputer animation
Server (computing)MereologyFormal languageStatistical hypothesis testingFile formatStreaming mediaTap (transformer)Independence (probability theory)ParsingMessage passingoutputImplementationElectronic mailing listError messageTotal S.A.Computer-generated imageryComputer fileStress (mechanics)Suite (music)Independence (probability theory)Cartesian coordinate systemParsing1 (number)Statistical hypothesis testingTap (transformer)Computer fileImplementationState of matterFile formatError messageWeb pageLine (geometry)Flow separationDemo (music)Differenz <Mathematik>Function (mathematics)NumberStatistical hypothesis testingFormal languageLatent heatServer (computing)Real-time operating systemMessage passingDifferent (Kate Ryan album)Interface (computing)Medical imagingGoodness of fitTotal S.A.Binary fileJava appletAxiom of choiceDefault (computer science)Electronic mailing listCodeWeb 2.0BuildingoutputComputer animation
outputFile formatBeta functionFunction (mathematics)Error messageFormal languageStatistical hypothesis testingIndependence (probability theory)Repository (publishing)Message passingScripting languageStreaming mediaFehlende DatenRevision controlBinary fileJava appletSource codeSuite (music)ArchitectureLatent heatType theoryExpected valueValidity (statistics)Front and back endsComputer fileSingle-precision floating-point formatoutputStreaming mediaSynchronizationFile formatSoftware bugStatistical hypothesis testingField (computer science)Projective planeConnected spaceNumberMedical imagingCASE <Informatik>Message passingSuite (music)Function (mathematics)Process (computing)Repository (publishing)Plug-in (computing)Communications protocolFormal languageImplementationLine (geometry)Limit (category theory)LoginRevision controlBinary codeDifferent (Kate Ryan album)Intrusion detection systemOptical disc driveText editorFeedbackSingle sign-onElectronic mailing listStudent's t-testComputer animation
ArchitectureMaxima and minimaSimilarity (geometry)BuildingStatistical hypothesis testingIntegrated development environmentPhysical systemInstallation artFeedbackCAN busSource codeInclusion mapError messageFile formatoutputComputer fileVertex (graph theory)Metric systemoutputIntegrated development environmentDatabaseFile formatGraph coloringFormal languageConfiguration spaceComputer fileFunction (mathematics)Revision controlDirectory serviceMessage passingStatistical hypothesis testingStreaming mediaCombinational logicMatrix (mathematics)Library (computing)Binary codeCommunications protocolInformationServer (computing)Open sourceBuildingTraffic reportingFeedbackInstallation artStatistical hypothesis testingLatent heatRootSpeech synthesisPlug-in (computing)CodeMultiplication signBinary fileComputer animation
FacebookPoint cloudOpen source
Transcript: English(auto-generated)
Okay. So, I guess that's it. Can you hear me correctly? Okay. So, thank you everyone to be here today, and I'm very happy to being able to be here to speak about one of my projects from several years ago.
So, first, quick presentation about me. I'm a Python and testing fan as you can imagine. I'm a former macro reviewer, and I currently work for commit.ml which is doing a monitoring solution for machine learning, but it's not the subject today. You can find me with my pseudo,
which you'll learn almost everywhere on the web. So, let's start by a question. Who in this room is writing unit tests? Okay. So, pretty much everyone. And who in this room is not writing in test, but executing them?
Okay. So, that's pretty much the majority. So, I have a good news. We are not alone. In fact, more than 70 percent of the JetBrains developer ecosystem survey answer that they are at least running unit test, and most of them are writing then running unit test.
So, a bit of background. I actually do both because I'm writing Python, most of the time, and sometimes JS. So, of course, what I'm doing, I'm writing some code, I'm writing some tests, and when I write some tests, I run them. So, with the experience,
I've learned how to run all of my tests, subset of my tests, or a single test, check why my new test is not running, or I know it should fail, but it doesn't, so why? I know where to find the data I need in the output of the test runners,
to understand why it's failing or not, and I know how to debug a failing test efficiently, either resulting on debugger or adding some prints and knowing exactly where it should appear else. But this knowledge, I took it from
lot of years of actually writing tests, running tests, and debugging them. And when you first come to run tests in Python, so, for example, with Pytest, is the interface you can use, you can have. So, you have pretty much everything, but you need to know where to find them.
And if you move them to JavaScript with, for example, Jest, it's a totally different interface. So, for beginners, it's very hard to being able to use both tools efficiently, this in the beginning. So, that's for the language I'm the most able to write,
because that's what I'm using from day to day. But I'm actually also, unfortunately, the CI expert is almost all of my jobs, and I am the one guy in the office, or remotely, when people say, hey, my wheel is failing,
but I don't know why. I'm, okay, send me a link. Okay, you have this broken dependency, or you have, here, you have an assertion failure that you are expecting true, but it's false. Let's try to debug that. So, I have to understand failures from random tooling that I don't know they existed in the first place.
I have to find the data in the output and maybe add them, because by default, I don't know, you don't have logs, or you have logs, but in info and not in debug, stuff like that. From time to time, I even have to run them, those tests, even either on another server
to see if the server configuration is impacting the test, or locally. And then, okay, so you need to install this package in this way, then when this weird command with those flags, if not, that won't work. So, always an unfamiliar language, a tool, an interface.
So, I was unhappy about this situation, and so I decided that we need some common tooling. We are engineers, we love to write tooling, so let's write a tool that can bring all of this information into a single interface.
So, what do we want in an interface? So, my Christmas list, yours can be different, but I think most of them will be there. So, first, color, because when you are writing dozens and dozens of line, colors definitely helps.
Progress bar, of course. We're launching only failing tests, because if you have a build that takes three hours, you don't want to add another three hours to your debugging session just when you add a print. Launch specific tests. You might know exactly which tests you want. It's either a failing one or one that should be failing,
but you know it should be failing, but it's not. And finally, a web interface. So, what I did with this Christmas list is I turned it into an interface. Let's meet Balto. So, Balto is a language-independent test orchestrator.
Balto checks all of the wish lists. It has colors, progress bar. We're launching failing tests only, launching specific tests, and if you don't trust me, I will show you with a live demo.
Wish me luck. Okay, so here is Balto. Can you see correctly? Okay, so you can collect all of the tests, which will ask the test runner to give you all the tests you have in your test suite.
You can select a single test, which will give you some very basic information, because you didn't run everything yet. So, let's try to run this one specifically. So, here I get some new data.
So, I know that the test is passing, which is good. I get some additional tooling, and if I launch everything now, let's launch those two files at the beginning, we can see that we get some failure
with some trust back, and now we can launch everything, and you can see that you have, it's actually not a good idea to have, not mirroring my screen, but you have like stdout,
you have stdr somewhere, you have logs, you have everything I need when I debug test suite. So, okay, but that's not the way I'm here. Let's go back here, okay.
So, Balto, Balto, what is doing under the hood to get all of your data? Balto is watching the process, okay? That's what is the big secret of Balto. It's reading the subprocess stdout, that's it.
But, they still need one little piece. The plugin, which is running in TestRunner, the web server and the UI need to speak the same language. So, what are the possible language to talk with each other? Because you might have, here is both Python,
the example it was with Python, and the interface is in JavaScript. So, there's already some couple of output formats. So, you might know some of all of them. You have gunit, tap, moslog, and subunit.
So, gunit, very quickly, is based on XML. It's well known and used in the Java community. It's one big XML file at the end of the build. Its format is tied to gunit. There's no independent definition of it. It's non-streamable, as it's one big file at the end.
You need to wait for the end of the build before being able to consume it, of course. You have also tap, tap which is mostly famous in the Perl community. In the Perl community. It's simple, but hard to extend.
I didn't put an example here, but you can take a look online. Its format is also tied to the tap Perl implementation. There's no independent definition of it, and it needs, then, an independent parser in both Python and JS. So, it was not good.
I mean that you don't have, the format is deeply tied to the default implementation. There is no, if you want to discuss the tap format, you need to create issue on the tap Perl implementation for tap.
Okay, oh, I didn't know, so. Okay, I will update the slide, then. Okay. Yeah. Okay, so the question was, what is non-independent definition?
And you get the answer. So, sorry.
Okay, hopefully I will offer another solution for that. The comment was, there's already several producer and consumer in several languages. So, as a starting point, you will start with that. So, there's also Mozilla, which is used internally at Mozilla.
And one particular design choice was that you have one message at the beginning of an execution of a test and one message at the end, which means that readers need to keep some kind of state. And I decided that one test equals one message
is easier to construct. The last one is Subunit, which is actually the closest to design to what I have designed. It's a binary format. While the other ones are text-based, an effort has been made to try merging, actually,
Subunit and LTF, which I will present just after. And the biggest issue was that Subunit doesn't have an input format. So, what was, again, my Christmas list? So, my Christmas list was a format which is easy to write and easy to read,
which is streamable because here with Balto, it's a desktop application. So, you get all the data in real time from a WebSocket connection, but you can have also on the CI. And as soon as you get a failing test, you can mark the CI build as red without needing to wait for the full build to be finished.
And finally, a format defined outside of an implementation. Code dies with age, while formats defined independently can continue to evolve and get, hopefully, more traction and more tooling if it's not tied to a specific implementation
and specific usage. So, I designed a new format. So, what do we need in this format? We need a test name, we need a test status, and we need an error message. So, let's use Jason. Can you read correctly? No. No, basically.
Okay. Instead of the explain soon. Okay. You read, that's fine. Okay. Sorry. So, the two example was with a facing test and one with a failing one. So, that is actually a missing piece.
We will want one message at the beginning which tell us how many tests we are gonna have in the test suite and one at the end for the total duration and the number of failing tests, passing tests. But we could add more data and I actually showed you more data in the Balto demo. We can have timing, log message,
SDO, SDR, text and image diff. For example, if you've done some snapshot testing either on the front end or with some specific test runner and on the common line, snapshot testing is when you expect a full HTML page or full text
and you get something else. So, you want to diff what was expected and not and find file, line and more. There was actually also one stuff missing. How do you launch a specific test?
That's a good question because it's dependent to all different languages, different test runner. Even in the same, with the same languages, different test runner might have different way to run a specific test if they have one in the beginning. So, we talk about output formats which is just like G-unit, MOS-log, sub-unit
but we were missing something. We need also an input format. Why? Because what you want to run a specific test, you don't want any tool that need to do that. I think, okay, so for pytest, the common line is this way for just the common line this way,
for knows the common line this way, you will be duplicate everything. While if you define an input format, all the tools can implement it in the plugin which is already needed for the output format. So, if you have also an input format, any tool can actually use this format
to talk with the test runner. So, there is two main case. One is I just want to collect all the different tests in the test suite. And the other one is I want to run specific files or specific node IDs. But for nodes, that's a bit odd because how do you format nodes?
We have the same issue. So, let's ask the different test runner to add it and let's create an ID field. So, whatever test, whatever unique ID a test runner can generate, you send it back to whatever consumer, for example, Balto, and we can send it back to say, hey, this specific test, I want to run only it.
So, give it to you. I don't know anything about how it was formatted, created, does it include the test hash, whatever, I don't care. Just run it again. So, if we take a full example, there is a valid ADTF output
with both name, the unique ID, which is, again, obscure for Balto, the outcome, and zero. So, that's the format I'm trying to define. It's called ADTF for Language Independent Test Format.
It's defined in its, sorry, is in its repository independent, which lists both producer and consumer. So, that means we can have discussion and effort independently from any implementation.
Each message that I gave an example is defined in its own with JSON schema because it's quite easy to actually say what is required, what are the kind of type that you are expecting. One thing that was, I think, very important
is that you have in this repository helpers and tools which can help you both validate streams that you are creating are input streams. So, when you are developing a plugin, for example, a test runner in Rust,
you don't need a consumer to tell you, okay, it's valid. You have an independent format which is an independent test suite which tells you, okay, this is valid. Any consumer should accept it and understand it. Might be some edge cases in consumers,
but at least the stream is valid. So, what is the missing data? It's currently working for me. I'm using Balto with Python in my day-to-day job. So, the two main thing I see right now is version number because hopefully it will evolve
and binary data. As it's based on JSON, binary is kind of hard to send if you want to send images for this thing or any output which might not be valid Unicode,
you have an issue. So, I know what is missing. But it works for me. It works for the test runner I'm using. I'm trying to create a plugin for Jest in JavaScript, but I realized that in Jest, the logs are not grouped per test. They are for the whole test file.
So, you need to dispatch, you don't have lines, for example, for specific test. So, there's one limitation. So, I'm trying to get more languages to support it, to see what kind of assumption I made in the language itself of what the test runner have
and can send me. And actually, some test runner cannot give me lots. Like, I expected to get a line and a file from every test, but apparently that's not the case. So, I'm looking specifically for those languages, but if you want to have editor support
for your own test runner in other languages, I will be very happy to have it also. And more importantly, get your feedback on the format itself, to being able to support everything. One thing which is also on the to-do list
is currently I'm launching sub-processes and running stdout, but it's only a stream. So, I could also load a stream from an SSI connection or from a Docker container remotely. I had it working in the past. I have a bug right now that I need to fix,
but it's working. It's working in Python. So, thanks to Python new capabilities in Python 3, it's in synchronous. So, that also means that if you have a project with a backend and a frontend within two different languages, which is a good possibility,
you could actually run both of your tests in a single tool in parallel. So, that would be awesome. I don't have the case myself. I'm doing only Python, but that's definitely possible. So, the architecture, just for reminder, is Balto is speaking only about
LETF-compatible plugin or processes. If any of the test runner will implement LETF directly, that will mean there is no need for plugin, but until it's the case, a plugin will do.
And it may actually be helpful for getting all the stdr out, stdr, and be sure that if the test runner is failing, we get something to catch it and send something back to Balto. So, in conclusion, LETF is a new protocol because I think it's input and output.
And similar to HTTP or LSP, if you know about it. So, LSP is a language server protocol. It's a new protocol pushed by Microsoft. So, you ID, code, talk to any LSP-compatible servers. So, you have one LSP server for Python,
which can be used from Emacs, maybe not nano, VS Code, Sublime Text. So, it's exactly like LETF. It's cutting down the compatibility metrics to one format. So, everything that talking LSP or LETF will be able to talk to each other.
So, I hope that it will be a foundation that could be used for building tomorrow tools on testing. And for example, if you want a curse tool to run your test, you can. And if it's speaking LETF, it will be compatible with any LETF test runner. And if you want a detailed HTML report about your test timing, you also can.
And again, if it's speaking LETF, it will be compatible with all LETF test runner. So, if you want to test Balto, the instruction online, you can install PPXP because packaging in Python is up for now.
I hope in the future, I will get a better solution to avoid installing something else. But then, once you have Balto on your system, you install the LETF plugin for a test runner, and then you can enjoy. Okay. So, how can you help? Any feedback about LETF is a good feedback.
It's young, so nothing is rather in stone yet. Can create new LETF producer, any plugin for whatever test tool you are using or designing. Can create new LETF consumer. I will be very happy to get both ecosystem
on both ends of the format. And use Balto or speak about it. Of course, both Balto and LETF are open source on my GitHub accounts. Feel free to open issue or at best, send pull requests.
And that's it for me. So, I have time for question. Thank you. We have one question there.
Yeah. So, in the input for, yeah, sorry. So, the question was, in my presentation, I talk both about input format and output format. So, when I speak about the input format,
yeah, the first example is the input that Balto is sending to the plugin, which is just collect only, and then the test runners itself has already all the code to just collect the test without running them. And so, it's sending back a specific message
for test collection without the status, without the timing, stuff like that. Does that answer your question? So, the question was, how do you discover the test in the first place? So, there's a config file for Balto
at the root of the test deal. So, I'm just detecting the config file, and then I'm pointing the test runner to that directory.
So, if I understood the question correctly, is that test runner can actually have colors in the data they send back to whatever consumer. So, I plan to support it in Balto. Yes, sure, we need to find a way to encode it and being able to transmit it.
But yeah, definitely, definitely.
So, the question was, when you're doing tests, for example, when you are testing against a database
and you want, you might have your test working with one version of the database and not another, or different database might have different failure. So, do I plan to support that? Yes, I plan to support that in Balto. For now, likely with both metric definition and Docker Compose support,
you can bring up your database automatically and snapshot it, but I'm not sure how it will involve the LETF format, apart from sending information about the environment and which testing matrix combination you are testing.
Yeah, so the question was,
I said that Subunit was a binary format, while LETF was a text format, and it was asking me, what are the links? The binary versus text is likely, the fact that LETF is a text format is mostly just for easiness to write and read,
because JSON is easy to write, easy to read, you have library and everything. Subunit is more designed to handle higher loads, and it has capabilities to actually filter streams, split streams, NERSH streams. So, if we have to support binary outputs,
for example, in LETF, we likely have to move to a binary format anyway, but for now it's tagged because it is you. Okay? Yeah.
Okay, so the question was, how do I test results? For now, I don't.
Balto is only a tool for desktop, but with LETF, I hope that we can get CI system, which is much, much more smarter, and we'll have to store that. For now, I don't.