We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

EuroSciPy 2017: Lightning Talks

00:00

Formale Metadaten

Titel
EuroSciPy 2017: Lightning Talks
Serientitel
Anzahl der Teile
43
Autor
Mitwirkende
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache
ProduktionsortErlangen, Germany

Inhaltliche Metadaten

Fachgebiet
Genre
5
Vorschaubild
1:34:10
33
Vorschaubild
1:31:57
34
Vorschaubild
1:28:12
35
Vorschaubild
1:27:32
41
Vorschaubild
1:31:21
43
MereologieBinärcodeOpen SourceProjektive EbeneArithmetische FolgeOffice-PaketFormation <Mathematik>Quick-SortComputeranimationJSON
BinärdatenOffene MengeDatenmodellSelbst organisierendes SystemMultiplikationsoperatorMomentenproblemEndliche ModelltheorieDokumentenserverSoftwareentwicklerKurvenanpassungNeunzehnBinärcodeSelbst organisierendes SystemOpen SourceKontinuierliche IntegrationMereologieProgrammfehler
RankingDämpfungJSONComputeranimation
WarteschlangeSimulationATMDatensatzTotal <Mathematik>Maß <Mathematik>Dienst <Informatik>RechenschieberPhysikalisches SystemExponentialverteilungWarteschlangeGeradeSoftwareentwicklerMultiplikationsoperatorMailing-ListeSoftwareKartesische KoordinatenSimulationSystem FDienst <Informatik>MittelwertRechter WinkelInformationDistributionenraumEinsRechenwerkDeterministischer ProzessProgrammbibliothekBeobachtungsstudiet-TestDatenloggerComputeranimation
MultiplikationDämpfungMereologieRFIDOffene MengeSoftwareOpen SourceSoftwareOffene MengeBildschirmmaskeOrtsoperatorSummengleichungE-MailMailing-ListeRechenschieberSelbst organisierendes SystemSoftware EngineeringSchlüsselverwaltungKartesische KoordinatenMaschinenschreibenInstantiierungKollaboration <Informatik>Computeranimation
SicherungskopieSuite <Programmpaket>WellenpaketOpen SourceInteraktives FernsehenQuick-SortSoftware Development KitSuite <Programmpaket>RechenschieberKartesische KoordinatenSoftwareentwicklerNotebook-ComputerGraphische BenutzeroberflächeComputeranimation
Große VereinheitlichungWellenpaketOpen SourceInteraktives FernsehenNotebook-ComputerPlot <Graphische Darstellung>Front-End <Software>CodeBitPlotterProgrammierungKartesische KoordinatenSkriptspracheNotebook-ComputerCASE <Informatik>Offene MengeCodeFront-End <Software>ProgrammbibliothekProgrammfehlerDefaultUltraviolett-PhotoelektronenspektroskopieRefactoringStandardabweichungVerzweigendes ProgrammComputeranimation
VersionsverwaltungDesintegration <Mathematik>SoftwaretestMittelwertCodeAnalysisVerschlingungVersionsverwaltungCodeVerschlingungCoxeter-GruppeKontinuierliche IntegrationFehlermeldungAnalysisHIP <Kommunikationsprotokoll>InstantiierungKomponententestVektorpotenzialGradientHydrostatikInterface <Schaltung>SoftwaretestLeistung <Physik>Projektive EbeneDreiecksfreier GraphKreisflächeOpen SourceDienst <Informatik>eCosGeradeSuite <Programmpaket>BildschirmfensterStapeldateiAnalytische FortsetzungRechenwerkFreewareGraphische BenutzeroberflächeHinterlegungsverfahren <Kryptologie>Repository <Informatik>Computeranimation
SicherungskopieBaum <Mathematik>ATMDigital Object IdentifierPaarvergleichGoogolDemo <Programm>InformationProdukt <Mathematik>GoogolTwitter <Softwareplattform>CodeMAPProjektive EbeneAbstraktionsebeneVersionsverwaltungSummierbarkeitRechter WinkelSoftwareMathematikComputeranimation
BefehlsprozessorTotal <Mathematik>SummierbarkeitRechter WinkelVersionsverwaltungCodeSoftwareZweiSummierbarkeitMathematikRepository <Informatik>Computeranimation
Data Encryption StandardDiskrete-Elemente-MethodeFamilie <Mathematik>Repository <Informatik>Minkowski-MetrikXMLComputeranimation
Demo <Programm>Minkowski-MetrikZahlenbereichBimodulZehnZoomBrowserComputerspielMultiplikationsoperatorCheat <Computerspiel>ZeichenketteVersionsverwaltungBitNeuroinformatikInformationDefaultElektronische PublikationComputeranimationProgramm/Quellcode
BimodulHomepageE-MailInformationCodeTermDemo <Programm>InformationElektronische PublikationMultiplikationsoperatorComputeranimation
InformationMathematikElektronische PublikationMetadatenDokumentenserverQuilt <Mathematik>Attributierte GrammatikFehlerbaumBildschirmfensterMultiplikationsoperatorSystemplattformInformationBitDokumentenserverComputeranimation
Physikalischer EffektMeterComputeranimationProgramm/Quellcode
Haar-MaßURLHypermedia
Twitter <Softwareplattform>Pell-GleichungDigital Object IdentifierTwitter <Softwareplattform>ProgrammierungSystemaufrufWellenpaketMAPComputeranimation
ProgrammbibliothekFunktion <Mathematik>CodeInterface <Schaltung>WiderspruchsfreiheitMessage-PassingSpeicherabzugInformationsüberlastungWiderspruchsfreiheitProgrammbibliothekParametersystemWeb SiteFunktionalRechenschieberAggregatzustandInterface <Schaltung>Objekt <Kategorie>
Klasse <Mathematik>PunktInterface <Schaltung>Funktion <Mathematik>CodeProgrammbibliothekWiderspruchsfreiheitAggregatzustandMessage-PassingÜbertragFunktionalObjekt <Kategorie>Klasse <Mathematik>Interface <Schaltung>PunktZahlenbereichMusterspracheBitProgrammbibliothek
Funktion <Mathematik>CodeInterface <Schaltung>ProgrammbibliothekWiderspruchsfreiheitMessage-PassingMathematikEin-AusgabeMultiplikationsoperatorDifferenteProgrammbibliothekAlgorithmusMathematikFunktionalTypentheorie
CodeInterface <Schaltung>Funktion <Mathematik>MatrizenrechnungProgrammbibliothekWiderspruchsfreiheitMessage-PassingObjekt <Kategorie>DatenmodellFlächentheorieInformationsüberlastungTypentheorieCodeMatrizenrechnungInterface <Schaltung>Objekt <Kategorie>ZahlenbereichOffice-PaketKategorie <Mathematik>DatenmodellMeterWellenwiderstand <Strömungsmechanik>Zirkel <Instrument>FlächentheorieInformationsüberlastung
ProgrammbibliothekFunktion <Mathematik>CodeInterface <Schaltung>WiderspruchsfreiheitMessage-PassingSicherungskopieDemo <Programm>DateiformatStichprobePlot <Graphische Darstellung>TabelleMereologieKontrollstrukturFrequenzOrtsoperatorVersionsverwaltungCoxeter-GruppeFigurierte ZahlLeistung <Physik>CodeFunktionalQuellcodeZentralisatorURLSkriptspracheBitmap-GraphikNotebook-ComputerDateiformatHochdruckCASE <Informatik>TabelleElektronische Publikationt-TestBitMAPComputeranimationDiagramm
Digital Object IdentifierInklusion <Mathematik>Office-PaketOffene MengeDAP <Computer>SchnittmengeNeuroinformatikCASE <Informatik>TeilmengeProjektive EbeneClientServerProtokoll <Datenverarbeitungssystem>RPCOffene MengeComputeranimation
ServerLokales MinimumInformationsspeicherungObjekt <Kategorie>Proxy ServerThreadOffene MengeDADSInformationsspeicherungServerObjekt <Kategorie>CodeKlassische PhysikNotebook-ComputerComputeranimationDiagrammFlussdiagramm
ServerLokales MinimumInformationsspeicherungObjekt <Kategorie>SchlüsselverwaltungKlassische PhysikMiddlewareNeuroinformatikCodeServerTypentheorieBasis <Mathematik>VerkehrsinformationComputeranimationDiagrammFlussdiagramm
Office-PaketMAPPrototypingMenütechnikCodePrototypingQuick-SortSchätzfunktionComputeranimationProgramm/Quellcode
MAPDifferenteMapping <Computergraphik>MereologieDatenbankPlotterProjektive EbeneFreier LadungsträgerPolygonnetzOffene MengeProdukt <Mathematik>Interrupt <Informatik>XMLComputeranimation
TypentheoriePlot <Graphische Darstellung>MaßerweiterungTesselationServerPlotterSoftwareentwicklerFlächeninhaltMAPMathematikComputeranimation
Lokales MinimumHebelPoisson-ProzessAbstraktionsebeneDatenstrukturProjektive EbeneQuantisierung <Physik>Einfach zusammenhängender RaumBildschirmmaskeFormation <Mathematik>Endliche ModelltheorieMultiplikationsoperatorMaterialisation <Physik>Reelle ZahlPhysikalismusBeobachtungsstudieAdressraumSoundverarbeitungWeb SiteTypentheorieNotebook-ComputerQuantenmechanikMathematikProgramm/QuellcodeComputeranimationTechnische Zeichnung
Lokales MinimumKraftDesintegration <Mathematik>VersionsverwaltungKontrollstrukturBinärdatenImplementierungURLDokumentenserverPatch <Software>SynchronisierungSelbst organisierendes SystemKlon <Mathematik>VerschlingungVersionsverwaltungMathematikKlon <Mathematik>SoftwareentwicklerInstantiierungMultiplikationsoperatorRechter WinkelBitSoftwareschwachstelleIntegralCoprozessorSoftwaretestForcingMAPSynchronisierungProjektive EbeneProzessautomationLastBitrateRepository <Informatik>DokumentenserverElektronische PublikationGrenzschichtablösungKontinuierliche IntegrationProgramm/QuellcodeComputeranimation
DokumentenserverSynchronisierungProzess <Informatik>Selbst organisierendes SystemKlon <Mathematik>SicherungskopieProjektive EbeneGeradePatch <Software>Virtuelle MaschineFortsetzung <Mathematik>Schreiben <Datenverarbeitung>GruppenoperationProgrammierumgebungMusterspracheUnrundheitVertauschungsrelationEinfach zusammenhängender RaumComputeranimationXML
MittelwertGewicht <Ausgleichsrechnung>ProgrammierumgebungGeradeVirtuelle MaschineAlgorithmusKundendatenbankDatenverwaltungBildschirmmaskeHalbleiterspeicherPunktwolkeAbfrageMereologieFacebookComputeranimationDiagrammFlussdiagramm
Interface <Schaltung>Virtuelle MaschineDienst <Informatik>Einfach zusammenhängender RaumFunktionalAbfrageCodePunktwolkeGrundsätze ordnungsmäßiger DatenverarbeitungSoftwareentwicklerZirkel <Instrument>Leistung <Physik>ProgrammierumgebungComputeranimation
ClientZählenMultiplikationBaum <Mathematik>Metropolitan area networkAbfrageEinfach zusammenhängender RaumProgrammierumgebungWeb SitePunktwolkeComputeranimation
StichprobeModul <Datentyp>CodeKonditionszahlLipschitz-StetigkeitProzess <Informatik>Office-PaketProgrammbibliothekProjektive EbeneQuellcodeMultiplikationsoperatorWhiteboardComputeranimationXML
StichprobeNotebook-ComputerCodeRuhmasseVerschlingungEin-AusgabeSupport-Vektor-MaschineQuellcodeInterpretiererSichtenkonzeptFreewareNotebook-ComputerFunktionalKlasse <Mathematik>KonfigurationsraumXMLComputeranimation
Notebook-ComputerStichprobeQuellcodeMaßerweiterungEin-AusgabeSkriptspracheTuring-TestWort <Informatik>Notebook-ComputerZweiComputeranimation
Güte der AnpassungOffice-PaketFontGemeinsamer SpeicherComputeranimation
QuellcodeQuantenzustandVisualisierungMinkowski-MetrikDifferenteGewicht <Ausgleichsrechnung>SchnittmengeComputeranimation
QuellcodeRohdatenInterface <Schaltung>Befehl <Informatik>Bildgebendes VerfahrenInterface <Schaltung>Minkowski-MetrikRepository <Informatik>CASE <Informatik>Computeranimation
FontSkriptspracheGleichmäßige KonvergenzService providerTeilbarkeitLokales MinimumRippen <Informatik>MathematikFontVerschlingungComputerspielEntscheidungstheorieGruppenoperationTypentheorieComputeranimation
URLGruppenoperationComputeranimation
Transkript: Englisch(automatisch erzeugt)
Hi everyone, so I'm Philip Elson, Jacob gave a good talk about what the Metaphys do. I'm also at the Metaphys and we have a really progressive policy on open source contribution.
On the back of that I'm talking about a project that I worked on as part of the Metaphys and that's CondaForge. You may have heard of it already. It is a Conda channel of binaries for both Python and non-Python packages.
The key premise really is it's got this open development model where anybody can come along and add new packages and if you find a bug with a package or would like to improve it, you can contribute new improvements back.
So quite an amazing growth curve on CondaForge, so essentially we're a GitHub organization, we've got now I think over 3,000 repositories, which is pretty massive, and every single one of those has continuous integration which builds and uploads the binaries to anaconda.org.
Those 3,000 repositories have the best part of nearly 600 maintainers, which I think is quite impressive, and the channel itself is 250GB and the Python binary itself has
been downloaded more than 1 million times. So that's all I wanted to say really. CondaForge.org, anyone can contribute and everyone's contributions are welcome. That's all.
Okay, so I'm going to very briefly talk about a Python library for simulating queues.
It's called Q-C-I-W, which is Q in Welsh. The main developer of this is not me, it's one of my PhD students. That's a queue, it's a waiting line, people come in, they get served, they leave. The reason we study this is self-evident, hopefully, but a huge application is in healthcare, for example, waiting lists, which is a big problem.
So if you consider this very simple queue where every time unit someone turns up and every time unit someone leaves. This is how you'd study it in queue. You'd create a distribution, say deterministic, every one time unit. You'd create your network. Queue is actually designed for simulating very complex queuing networks, and then you'd
say, I want to simulate 5,000 people going through. Queue has these nodes, and again, you can have much more complicated networks. Inside the nodes, you have these individuals. Those individuals, once they've gone through the simulation, so they're in the exit node, have a data record that holds all the information about the individuals as they went through.
We can get the total times and the service times from those individuals. And obviously it looks like that. If everyone's turning up on one time unit and everyone's leaving, nothing is happening.
But that was a deterministic queue, and obviously in healthcare and other things, it's not every one time unit, despite a lot of people planning on the average, which is a terrible thing to do mathematically. Here's how you do it with an exponential distribution. Slide. And that's what happens. In a very simple system, now some people are spending more than 100 time units in the queuing system.
And that's why we need to understand queues and simulate queues and not just use averages. Thank you.
I can talk for longer. Slide two, please. Slide two. So if you consider yourself a research software engineer,
or you have never heard about that term, but you think, wow, that's exactly what I am. Or you think, well, right now I am a researcher doing software for my own needs, but I could make that a career and somehow end up in a permanent position. Who knows? Then go to research software engineers, Google for it, how so ever.
There is a new initiative to network among ourselves. There are now conferences of research engineers. National chapters are forming. They are running mailing lists. Subscribe there so that we get better in touch, that we hear about news.
For instance, in Germany, big research organization, DFG, published a call. Hand out money to make open source research software more sustainable. They will give 7 million euros. They received 120 applications, but I heard about that only when the deadline was over.
So it's really helpful for us to get into a huge network to hear about such activities. From the political side, from top down, much of it runs under the label open data.
There is a huge investment into a huge international collaborations, which is also necessary, also very important, but I think there is a certain imbalance between this political emphasize on open data. We should more emphasize research software.
We have to lobby for it. I wanted to talk quickly about the InThought Tool Suite for those of you who don't know about it. If you want to go to the next slide. The InThought Tool Suite is a rapid application development toolkit for building GUI applications in Python.
It has back ends using both WX Python and PyQT. If you want to see what it can do, I've got a very simple charcoal application running on my laptop. A little bit hard to get it running up there, so hopefully you can see.
It's basically built upon Traits and Traits UI. This is a bit like React in the JavaScript world. It allows you to do reactive programming. Then it has 2D and 3D plotting, Charco and MyAV built on top of it. Open source, BSD licensed. Next slide. What have we been working on over the last year or so?
We now have pretty much full Python 3 support. We switched using QT as the default back end. Lots of improvements to Charco, including 1D plots and some speed ups. VTK 6 and 7 support from MyAV. Jupyter notebook support from MyAV and so on. Lots of code cleanup, lots of bug fixes. We've been working on bringing the code base up to modern standards.
Last slide. What's coming? We have in the master branches, but not yet released, better Python 3 support, QT 5 support, the Ag back end. This is the 2D drawing library. We've now got a new Cythonized Ag back end for that, thanks to John Wiggins.
We've got more speed ups for Charco, refactoring of Traits UI, and lots more bug fixes and cleanup. Thank you.
What I'm talking today is Badgers. Yesterday we learned that Badgers kind of have this magic power to influence your code quality.
What Badgers should your Python project have? First of all, continuous integration, basically tests after each commit. There are three services, Travis for Linux and Mac OS, CircleCI for Linux, and if you use Windows AppVaya. All free for open source projects. They cost if you want to test your private repos. Then we're talking about test coverage.
How many lines of code are covered by your unit tests? There are two suites, two sides. You can use CodeCore for coveralls. The next one is something like inside baseball. Static code analysis basically runs PyLint, PyFlakes, and so on and so on on your code base,
and they present the errors and potential flaws in your code in a very nice interface. You get a nice batch with a grade. That's the most important stuff. If you want to have more, you can use version.ai. I actually haven't used it for Python. Basically it says whether your dependencies are out of date
and whether you have some license clashes in your upstream dependencies. And the last one, you can go to shields.io. It's a very nice site, and it can have all these hipster badges, for instance, license or PyPy version or supported Python versions. So please click here, and there you'll see all the badges once again
with links and examples of how the GUIs look like. Thank you. Hi. Who of you uses MATLAB? Who of you has used MATLAB?
Who of you likes MATLAB better than Python? I know. This is Google Trends of Depression and MATLAB, and look at it. So there's a better way, of course. There's always one more level of abstraction that makes things better. So this is a Python project for running MATLAB code from Python.
I mean, it starts MATLAB and then talks to it, right? So you import it, you start MATLAB, which takes a while. We know MATLAB. Sorry. And then you can run your code, and you get NumPy arrays, and you can ask it, you know, it's MATLAB.
Also, this is so important that Mathworks itself has its own version of it. You can see on the right the MATLAB version. On the left, the transplant version. This is our code. This is just calculating the sum of one million values, and you can see that it takes, wow, we have to serialize it, send it over the network with transplant, and that takes a while,
so this takes 20 milliseconds with transplant. Do that in MATLAB's own version, and, well, eight seconds. That's it. Go to the repo. You can use both MATLAB and Python without getting crazy, and have fun.
I saw mine further down the last branch. Do you want to click the link? So how many people have put a package on PyPI? How many people have released a Conda package?
How many people think I can do both within the space of a lightning talk? Depends if the Wi-Fi works. Okay, zoom in on the terminal-y bit and click the play button, and we'll hope that there's still enough time to make it through this. You'll need to zoom in with the browser thing. Otherwise, nobody can see this.
So we have a module here. It's got a doc string. It's got a version number. We're going to package that within the space of a lightning talk. So, flit init. We're putting in some metadata. I'm cheating a little bit. I did this on my computer. It's already got some of the values saved as defaults. We pick a license. And so now we have our flit.ini
packaging information file, which looks like this. We are now just going to add the requirements to that. So you may have very briefly seen that the package requires numpy and matplotlib. So we're adding those here as requirements. And I will save that file and quit. And so this flit init command has created these two new files, flit.ini and license.
And I have added them to git. So this is already a git repository. And we'll run flit build to check that this is working. And then flit publish to put these packages on PyPI.
So this bit is relatively, well, relatively stable. Something you can use today. That package is now on PyPI. Now we're going to go into the wilds. I have the untested bit that you definitely shouldn't use yet, which is Flonda, which converts flit information to Conda packages. So that has just built Conda packages for five different platforms there.
So 32-bit and 64-bit Linux and Windows, 64-bit Mac OS. And now we're just uploading them to anaconda.org. So it's taking its time there. And there we are, just about done.
If you like Python conferences, then I have something for you.
That will be PyCon.de, which is a German Python conference. This will be in English. So far we've mostly been in German now. The whole conference will be in English. And it will be end of October. You'll see the 21st, 27th of October in Karlsruhe. And that's a URL, pycon.de. Next slide, please. And you see that's Karlsruhe.
We are right there. This white dot, that's we. Up there is just a few, maybe 200 kilometers or so from here, very close. And Karlsruhe is a very nice place to be. And it's also very close to France, very close to Switzerland. So if you want to come from St. Wels, it's not far to travel. Next slide, please. And you have a very nice location. This is essential for art and media technology.
It's a very nice place. And we have nice rooms. And that's the team. And that's, we're also on Telegram and Twitter. So if you want to have some information, you're invited. I think the call for papers is close to be about to publish the program in the next day. So the program will be up. We have trainings. We have talks.
There will be two days of sprints. So please come to Karlsruhe to continue your Python conference here. Thank you. Thank you. So I love great APIs. So I tried to do the sci-fi API design.
So the first thing is... No? Okay, next slide. Nope. Yay! The first thing is consistency. So for instance, this is an example of two different functions that have arguments, similar arguments, that are flipped over.
I can never remember. Or two different arguments that are named almost the same way, in the same library. I can never remember. The second thing is that functions are easier to understand in classes. Objects have hidden states. Objects have no universal interface.
Entry point. Outpoint. Other thing? A library should hinge on a small number of concepts. So the idea being, if you understand the usage pattern of a library, can you use it in other places of the library? Can you use it everywhere? So this is a bit more abstract, but how many things do you have to explain to someone so that he understands the library?
Common data containers make the ecosystem stronger. They facilitate working with multiple libraries together, and they make it easier to get up to speed with a given library. So as an example, I would rather have 100 algorithms on one data container than 10 algorithms each time on 10 different data containers.
Each function should have one and only one purpose. The dangerous thing would be a change of behavior depending on the input type. Code for interfaces, of course, but don't overdo duck typing.
So the interface defines the object, but incompatible behaviors lead to bug. Think about the NumPy matrix. Properties are for impedance matching, and I'd like to say only for impedance matching. Properties will obfuscate the data model of an object, and they can create hidden costs.
Shallow is better than deep. So objects are understood by their surface, and composition is great for design, but it creates overload. Can I have the switcher?
Thank you. Okay, great. So we've all heard about how transparency and less centralization is very important to make research better, to make researchers more independent, and I think that's particularly relevant in publishing. So I'd like to talk to you a bit about reproducible self-publication.
Generally, you might feel that you want to share your data as IPython notebooks. I think that's one of the concepts which is very well known here. The problem is that that's not really a big format. The big formats are like presentations, poster, article drafts. And the usual use case is that you have a couple of killer figures which you want to put in a presentation for your colleagues, in a poster for a conference,
another presentation for another conference, and hopefully in an article draft. And ideally, you can put the figures which you get from your Python scripts directly into your document. This is a LaTeX document. This is not a bitmap anywhere. This is generated directly from Python code, as is this, as is this.
The code for this looks like this in the LaTeX document. It's very simple. You have a small wrapper, and then you call a figure, you call a function which has here the location of the script, a label if you want to link to the figure, and also a caption if you want to pass the caption directly to the figure. All of these places, like this place, supports LaTeX tags. If you want to print LaTeX tag with the same Python code, with the same technology,
you can use these verbatim containers. You can also do tables directly from CSV files. This is really cool. We do it via the Pandas to LaTeX interface, which we can call internally. And of course, if you want to look at a couple of examples, because this is all about giving you the power to just fork and go,
you can find it on GitHub. This presentation, as well as a poster and an article using the same technology, and my poster, which is outside, and the poster of my master's student, which is also outside, have been written with this technology. You can see them compiled and on paper, and you can download the source code. Thank you.
Thanks. Hi, my name's Alex. I work at Malphas Informatics Lab. I'm going to talk today about an easy way that you can access structured datasets from within S3 without moving them out. Looks like these. So I probably maybe don't need to introduce Open DAT, but just in case,
it's a really cool protocol and project that lets you subset remote datasets. The kind of traditional way you might set that up is to have a bunch of data on a server, and then a client can access data on the server through an API. Looks like these. So our problem is that our data isn't on the server, it's on S3. Our tools don't work with S3, they work with Open DAT.
So we put together a plug-in that proxies NetCDF requests to the Threads Open DAT server into S3, uses byte ranges, so it doesn't move much data around. You kind of get the best of both worlds. You get the cheap-ish storage of an object store, and you get the flexible API of Open DAT. And it's completely generic. Sorry.
It's completely generic, so you can access any NetCDF object. You just need to point it at the right key. Thanks. Next. So you could do that in the kind of classic middleware server style, but now that it's just proxying requests, it's not doing any work, you can run that code on your laptop, so there's nothing between you and S3, and it also empowers you to scale out data access up.
S3 is theoretically infinitely scalable, so you can have a cluster of computers or having the request proxied directly into S3. The code for that is on GitHub, but it's like a real prototype. We basically just sort of kind of mushed together some work that much other people had done, and the data that we're building that for
is on S3 at those buckets, but, again, this will work with any NetCDF or HDF data. Thank you.
Okay, so I'm going to be speaking about CardSpy. CardSpy you can use to create maps and different projections and plot 2D data onto it. So it interfaces with Map.lib to make the plotting of data very simple. So next slide, please. So here we've got a very simple example. We set up the projection
that we want our data to be projected into, and then we add p-color mesh data to that. We specify our longitudes and latitudes and our data array, and we tell CardSpy that the data was originally in a plat carrier projection, and we want to re-project it into this interrupted, good homologizing projection,
and that is what you can see there. So next slide, please. CardSpy can do more sophisticated plotting. So here we've got a couple of examples of it re-projecting WMTS data and also accessing OpenStreetMap tile servers. Next slide, please. Basemap, Basemap exists in a similar area,
but they are coming to the end of the development, and CardSpy was originally developed to improve on Basemap. So if you're still using Basemap, I'd encourage you to move over to CardSpy. So next slide, please. And finally, contributions are very welcome to our GitHub CardSpy repo,
and CardSpy is also available on CondaForge if you'd like to play around with it. And that's it. Thank you. Hi. I was going to talk about my project, Astimo.
It's a pretty niche project, but I'm hoping this will give it some publicity. It models the quantum mechanical band structure of nanostructures called quantum wells, if any of you study physics, and it particularly studies them
for real semiconductor structures, and it includes various real physical effects that shift it from the more abstract math you might have learned at university. It works particularly well at the minute for gallium arsenide structures, but it can also model other types of materials.
There's two solvers, one for the conduction band and one that does valence and conduction band. And just really hoping that if anyone ever needs something like this, this will help them find it, and if anyone wants to contribute, it's all on GitHub. The address is at the top,
and there are tutorials at the website in notebook form for anyone who's interested. Thanks very much.
I have a question. How many of you use Miniconda for your continuous integration to do automated tests?
Typically, the setup looks like something like this. We start by downloading Miniconda, and then we install it, et cetera. Here we download, for instance, Miniconda latest, which we expect to be the latest version of Miniconda. It turns out that this actually links to some outdated version from 2016, and you should use either Miniconda 2 or Miniconda 3 now.
And this link was removed from essentially the Miniconda repo. It still kind of works because it's cached somewhere, but basically, by removing this link, we break the CI of, like, 3,000 projects, right? So essentially, the expected behavior is that... Next.
...that the developers will just see that CI is broken. They will just search online, find the problem, implement the fix, and then it'll be fine. But this is a waste of resources because we have, like, how many hundreds of developers all looking for the same thing and all trying to fix the same problem. So this also applies, for instance,
if we want to fix some vulnerabilities, if there is some URL that is set that changed and we need to change it in all the repositories that use it, et cetera. So the question is, can we actually try to make, apply some changes across GitHub repositories? Next. So here's, like, an idea that is at very early stages.
So basically what we do, we choose some GitHub repositories, we fork them, then we create a single repository using shallow clones, and then we make a PR to that repository. And then we have some kind of synchronization engine that pulls back the changes to the different forks
and makes those separate PRs, right? So we can synchronize using, for instance, Git patches, which is just text files. If you want to talk about it, let me know.
Okay, in this morning, my talk, I said I am a commuter of Apache High Mall project, and it's Java, so I have no way to talking about it from here. But, honestly, I have a way to talk about it in a connection between Python.
And High Mall is a project which realize launching machine learning algorithms on top of SQL syntax. So if you want to launch logistic regression, you just need to simply write 10 lines of SQLs, and that's it. Machine learning launches on top of Hadoop environment.
And our company, Treasure Data, hosts a customer-managed data management platform, and we provide customers to analyze their data on top of cloud environment. And our environment behind has two query engines. One of them is Presto. It's in memory of their first query execution engine developed by Facebook, and another one is Hive,
which was developed, of course, you know, developed by Apart Software Foundation. And basically, for heavy query, we can launch Hive query and customer launch Hive query, and behind that, we have Hive Mall. So we put the machine learning functionality
into the Hive query executor on our services. And here is our Python code. We provide Python connection to execute query on top of cloud query executor on our services. So we have mainly for our customers, but it's public,
we developed some Python packages, so TD, Client, Python. You can execute query on our environment, or TD, Pandas TD, you can execute Pandas query on top of that. So SciPy is going to be a connection to the cloud environment.
That's it. Thank you. I'm Lucas Tev, working at Inria, mostly on Scikit-learn and Joblib,
and I'm going to talk about something I've been involved in, which is called Sphinx Gallery, and it's something to improve your documentation with a gallery of example. The inspiration, as most of you probably are aware of, is Matplotlib. So basically, say I have plotted something in a while, so I go to the Matplotlib documentation, visually I can find something that kind of looks like what I want,
so I click on the example, I copy and paste the source code, I tweak it, and in a very short amount of time, I'm able to do whatever I want. So basically what happens is a few packages add something like this, like a gallery, and you know they tweaked it their own way, and basically it kind of diverged, and Sphinx Gallery is an attempt at kind of
consolidating the features in a single library that hopefully a lot of projects can use and improve. So that's how it looks. You have the gallery, you have an example, you can click on any of these thumbnail, these plots, and then you get to the source code, and not only can you get to the source code, but these days you can have kind of rich text
and sales that kind of look like a Jupyter notebook kind of thing. Also something that's quite useful when you have your reference API that's automatically generated, it's generally quite obscure, so you want a way to know how to use it, and now it's automatically added,
you can configure Sphinx Gallery to automatically add all the examples that use this particular class or function that you're looking at. Something that's been read, I did recently as well, is you can export all your examples to notebooks, so if you want notebooks that people can execute, you can do that.
More details, I have five seconds left, so I've got plenty of people using it, Scikit-learn, and PyTorch is using it, actually, Astropy and MyPlantLib as well. And use the sprint if you want, we'll be around, so if you want to get started on using this. Cheers.
Hi again. This is a talk from me, Phil Elson, not me, the mess office employee, and this is about a fun thing that I just did, and I thought maybe I'd share it. So, anyone recognize this font by any chance? XKCD. Right, XKCD.
Kind of universally recognizable. Good. So, matplotlib. A few years ago I was involved in getting matplotlib to kind of speak XKCD, and this is actually a visualization produced by matplotlib, and yeah, there you are. The one thing that always upset me
about this visualization was the font, it turns out, and there's kind of like different weights, spacing's not great, the sizes aren't great, but yeah, so, I was aware of a data set that Randall Munroe, the creator of XKCD produced, where he basically kind of gave us some characters
and some spacing, and I really wanted to turn this into a font. So, I did. So, there's a bunch of tools that I made use of. First of all, that image I segmented, kind of labeled so that I could pick out the characters. I used a tool called FontForge, which has a Python interface
that allows you to build up fonts and put spacing together, and I kind of set this all running within Docker so I can kind of reproduce this font, and it's all set up on a repo, ipython slash xkcd font, where kind of, if you were to make any contributions,
those changes would percolate through nicely. Now, that is a link, if you wouldn't mind clicking it. And there is also, I'm gonna run out of time, there's just a link here, a live preview, just here for a bit.
Anyway, what I was gonna say, there's a live preview where you can go and type and play with the font and kind of see that in action. That's all. Thank you. Thank you.