We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Free and Open Source Scientific Software

00:00

Formale Metadaten

Titel
Free and Open Source Scientific Software
Serientitel
Anzahl der Teile
5
Autor
Lizenz
CC-Namensnennung - Weitergabe unter gleichen Bedingungen 4.0 International:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
This talk is part of the LinuxDays of the student association TheAlternative, the digital arm of [project21]. In the digital age, we want to draw attention to a sustainable interaction with knowledge and culture. We want to encourage usage and comprehension of Free and Open Source Software (also known as FOSS) as an alternative to proprietary software, as well as promote similar open alternatives in other areas. Our LinuxDays are a series of events designed to introduce Linux to beginners and help those who already have it enhance their knowledge. They are created mainly for users of other operating systems who want to discover and/or eventually switch to Linux. We start all the way from the bottom and guide you on your way up to a safer and more comfortable digital lifestyle. In our courses, we cover philosophical aspects as well as technical content. Users already familiar to Linux can jump in at any time to learn more about the wonders of Linux and Free Software. During our Install Events, we will personally assist you in installing Linux on your laptop. Don't worry - of course you can keep your existing OS and additionally use Linux if you want to. After visiting our courses, you will not only have an easy-to-use and efficient operating system on your computer, but also know how to get the best out of it and find more information on your own in case you are enticed with Linux. You don't have to worry about the limits and costs of proprietary software anymore. All students of Zürich may attend our events for free. In consideration of international participants, all courses are held in English.
SoftwareOpen SourceFreewareBitE-MailSoftwareMereologieGesetz <Physik>GeradeMinkowski-MetrikCoxeter-GruppeVorlesung/KonferenzBesprechung/Interview
FreewareSoftwareOpen SourceClientNabel <Mathematik>PasswortTexteditorCodierungstheoriePASS <Programm>Program SlicingNotebook-ComputerRechenschieberProgrammSoftwareInterface <Schaltung>DatenverarbeitungssystemOpen SourceNabel <Mathematik>CodeFreewareCodierungInstantiierungElektronische PublikationPasswortServerMereologieTypentheorieTexteditorSuite <Programmpaket>Arithmetisches MittelSystemaufrufRechter WinkelCoxeter-GruppeClientBildschirmfensterGeradeStapeldateiKardinalzahlZahlenbereichProgramm/QuellcodeXMLComputeranimation
Open SourceFreewareSoftwareKollaboration <Informatik>ProgrammierumgebungÜbersetzer <Informatik>Gleitendes MittelSoftwareWärmeleitfähigkeitFront-End <Software>Attributierte GrammatikFreewareQuellcodeCoxeter-GruppeOpen SourceNeuroinformatikAutomatische HandlungsplanungArithmetisches MittelKollaboration <Informatik>OrdnungsreduktionAdressraumE-MailPhysikalisches SystemSichtenkonzeptPunktResultanteDistributionenraumt-TestBitSuite <Programmpaket>SkriptspracheSampler <Musikinstrument>UmwandlungsenthalpieRechter WinkelMaßerweiterungAutorisierungProgrammbibliothekNichtlinearer OperatorSelbst organisierendes SystemProdukt <Mathematik>FaserbündelMereologieComputeranimation
Stochastische AbhängigkeitDistributionenraumDigital Rights ManagementSoftwareOpen SourceMereologieGleitendes MittelPhysikalisches SystemSystem-on-ChipProgrammierumgebungInterface <Schaltung>DateiformatMaschinelles LernenCodeSoftwareInformationDigital Rights ManagementKontrast <Statistik>OrtsoperatorOffene MengeGrenzschichtablösungLeistung <Physik>Suite <Programmpaket>SchnelltasteBinärcodeData DictionaryElektronische PublikationPhysikalisches SystemPunktInstallation <Informatik>MultiplikationsoperatorRechenschieberUmsetzung <Informatik>Open SourceSampler <Musikinstrument>Modulare ProgrammierungDivergente ReiheVirtuelle MaschineQuick-SortDatenanalyseDatenverarbeitungssystemProdukt <Mathematik>Arithmetisches MittelBitFunktionalDateiformatQuellcodeDokumentenserverInterface <Schaltung>PlotterResultanteBinärdatenProjektive EbeneGüte der AnpassungDistributionenraumProgrammbibliothekOktave <Mathematik>StatistikFront-End <Software>Parametrische ErregungMereologieStützpunkt <Mathematik>HalbleiterspeicherVersionsverwaltungGanze FunktionSichtenkonzeptWort <Informatik>Gebäude <Mathematik>Rechter WinkelMAPPatch <Software>SchlussregelFastringHyperbelverfahrenMinkowski-MetrikInstantiierungQuaderOktaederStrömungsrichtungDirekte numerische SimulationEndliche ModelltheorieRastertunnelmikroskopMathematikCASE <Informatik>VariableCompilerDifferenteComputeranimation
Digital Rights ManagementPhysikalisches SystemCodierungstheorieAuswahlaxiomSystemplattformSoftwareOpen SourceFreewareGraphische BenutzeroberflächeObjektverfolgungDesintegration <Mathematik>RechenwerkTexteditorNabel <Mathematik>ClientPasswortPlotterPlot <Graphische Darstellung>Arithmetisches MittelDokumentenserverSoftwareZahlenbereichVersionsverwaltungVerzweigendes ProgrammSoftwareentwicklerPhysikalisches SystemZentralisatorProgrammVerzeichnisdienstOffene MengeElektronische PublikationPasswortHilfesystemProgrammfehlerMaschinenschreibenBimodulWeg <Topologie>MereologieMehrrechnersystemAdressraumOpen SourceTypentheorieServerGüte der AnpassungVirtualisierungHyperbelverfahrenBitLoginCASE <Informatik>DatenverarbeitungssystemRechter WinkelOISCMultiplikationsoperatorEndliche ModelltheorieWeb SiteInterface <Schaltung>CodeSchreiben <Datenverarbeitung>Modulare ProgrammierungKlasse <Mathematik>Patch <Software>Virtuelle MaschineXMLProgramm/Quellcode
Open SourceFreewareSoftwareVerzeichnisdienstDatenstrukturTexteditorAuswahlaxiomStellenringProgrammProgrammVerzeichnisdienstRichtungBimodulEndliche ModelltheorieServerDatenverarbeitungssystemRechenschieberFunktionalOpen SourceElektronische PublikationBildschirmfensterTypentheorieCodeLie-GruppeTexteditorTeilbarkeitComputeranimation
PlotterSchwellwertverfahrenPlot <Graphische Darstellung>SoftwareOpen SourceBimodulStellenringPunktParametersystemFunktionalElektronische PublikationMAPSkriptspracheCodeSoftwareArithmetisches MittelTexteditorKontrast <Statistik>URLDifferentialZeichenketteZahlenbereichDatenfeldOpen SourceServerGeradeRohdatenMultiplikationsoperatorInterpretiererKollaboration <Informatik>PlotterProgrammierumgebungProgrammTypentheorieAttributierte GrammatikPoisson-KlammerVariablePixelTaskSystemprogrammSchreiben <Datenverarbeitung>Digital Rights ManagementSchwellwertverfahrenEndliche ModelltheorieSchnittmengeGraphfärbungMathematikIdentifizierbarkeitKonditionszahlStichprobenumfangFigurierte ZahlZellularer AutomatRechter WinkelVollständigkeitSignifikanztestProgrammierspracheModulare ProgrammierungStatistikMailing-ListeCodierungMapping <Computergraphik>FlächeninhaltEin-AusgabeQuantenzustand
Plot <Graphische Darstellung>FreewareOpen SourceSoftwareVersionsverwaltungBitDigital Rights ManagementMapping <Computergraphik>WinkelFunktionalProgrammElektronische PublikationInformationSoftwareMereologieProgrammierspracheStatistikIntegralVariableStellenringBimodulParametersystemRechenbuchSkriptspracheVirtuelle MaschinePhysikalisches SystemCMM <Software Engineering>Installation <Informatik>PlotterSchnelltastePunktCodeProgrammfehlerOffene MengeQuaderOffice-PaketFamilie <Mathematik>GeradeGamecontrollerSchnittmengeRechter WinkelDatenverarbeitungssystemProfil <Aerodynamik>Dichte <Physik>MultiplikationsoperatorSchwellwertverfahrenXML
VersionsverwaltungSoftwareOpen SourceFreewareE-MailElektronische PublikationEinsFunktionalCASE <Informatik>Digital Rights ManagementMini-DiscVerzeichnisdienstProgrammRechter WinkelEinfache GenauigkeitSchnelltasteBeweistheorieGamecontrollerParametersystemDrall <Mathematik>Modulare ProgrammierungArithmetisches MittelVersionsverwaltungSoftwareRoutingVariableAdressraumDivergente ReiheNominalskaliertes MerkmalMinimumHilfesystemCodeStabsinc-FunktionComputerspielMultiplikationsoperatorVerkehrsinformationFahne <Mathematik>Kollaboration <Informatik>Hinterlegungsverfahren <Kryptologie>Automatische DifferentiationTeilbarkeitZahlenbereichWurzel <Mathematik>BimodulOpen SourceFormale SpracheSymboltabelleInterpretiererPhysikalisches SystemE-MailProgrammierspracheReelle ZahlSpeicherabzugSystemaufrufSkalarproduktComputeranimation
CodierungstheorieOpen SourceSoftwareDokumentenserverMessage-PassingGraphische BenutzeroberflächeCodeWeb-SeiteMomentenproblemElektronische PublikationDokumentenserverRechter WinkelSchreiben <Datenverarbeitung>GraphfärbungHinterlegungsverfahren <Kryptologie>ProgrammMAPKollaboration <Informatik>Interaktives FernsehenStrategisches SpielSkriptspracheAbgeschlossene Menge
DokumentenserverOpen SourceCodierungstheorieSoftwareFreewareGamecontrollerRPCGeradeServerVerschiebungsoperatorURLProgrammWeb-SeiteSchlüsselverwaltungInterface <Schaltung>SoftwareDokumentenserverWurzel <Mathematik>OISCProgramm/Quellcode
DokumentenserverOpen SourceCodierungstheorieSoftwareProgrammZahlenbereichVerzweigendes ProgrammDokumentenserverRPCURLSoftwareCASE <Informatik>GarbentheorieSystemprogrammInhalt <Mathematik>PlotterWort <Informatik>Virtuelle MaschineStabMailing-ListeSchnittmengeArithmetisches MittelBenutzerbeteiligungXML
ProgrammDokumentenserverOpen SourceCodierungstheorieSoftwareOverlay-NetzDokumentenserverGeschlecht <Mathematik>SoftwareMereologieServerZahlenbereichComputerspielCoxeter-GruppeInstallation <Informatik>
Overlay-NetzSoftwareFreewareOpen SourcePlot <Graphische Darstellung>VersionsverwaltungMultiplikationsoperatorOrdinalzahlDigital Rights ManagementSoftwareVirtuelle MaschineLokales MinimumSchnittmengeOverlay-NetzInterface <Schaltung>AdressraumSkriptspracheNabel <Mathematik>CodeDifferenteMehrrechnersystemOrdnung <Mathematik>Klon <Mathematik>MultiplikationVerzeichnisdienstMereologieVariableMathematikKategorie <Mathematik>LaufzeitfehlerPhysikalisches SystemFunktionalTypentheorieGebäude <Mathematik>Rechter WinkelArithmetisches MittelSoftwareentwicklerProgrammDistributionenraumModulare ProgrammierungQuellcodeStrömungsrichtungAuswahlaxiomMinkowski-MetrikBinärcodeElektronische PublikationDeskriptive StatistikRepository <Informatik>HomepageGeschlecht <Mathematik>PolygonOpen SourceDefaultTexteditorGamecontrollerServerPrinzip der gleichmäßigen BeschränktheitDatenverarbeitungssystemInstantiierungStapeldateiProzess <Informatik>GeradeFreewareTrennschärfe <Statistik>InternetworkingVerknüpfungsgliedSprachsyntheseMusterspracheMini-DiscMetropolitan area networkComputerspielLochkarteStab
Elektronische PublikationSoftwareOpen SourceFreewareInstallation <Informatik>SynchronisierungTypentheorieArithmetisches MittelMultiplikationsoperatorMathematikMatchingMereologieSoftwarePunktDigital Rights ManagementServerOverlay-NetzMessage-PassingMinkowski-MetrikVerzeichnisdienstRadikal <Mathematik>Web-SeiteInternetworkingTouchscreenVersionsverwaltungVerkehrsinformationRechenschieber
E-MailMIDI <Musikelektronik>VerzeichnisdienstSichtenkonzeptElektronische PublikationSharewareMobiles InternetElektronischer FingerabdruckMailing-ListeCachingOverlay-NetzKlon <Mathematik>Funktion <Mathematik>SoftwareOpen SourceModulare ProgrammierungCASE <Informatik>MereologieSkriptspracheQuick-SortVerzeichnisdienstMatchingVirtuelle MaschineAdressraumDatenanalyseFigurierte ZahlElektronische PublikationGeschlecht <Mathematik>SystemprogrammPhysikalisches SystemProgrammbibliothekWeb SiteFunktionalGrenzschichtablösungServerMapping <Computergraphik>GeradeSoftwareMultiplikationsoperatorProgramm/QuellcodeComputeranimation
SoftwareOffene MengeOpen SourceFreewareKlon <Mathematik>Funktion <Mathematik>Installation <Informatik>SynchronisierungÜbersetzer <Informatik>SichtenkonzeptNabel <Mathematik>Elektronische PublikationFunktionalOffene MengeZeichenketteWort <Informatik>MultiplikationsoperatorFigurierte ZahlFunktion <Mathematik>Bildgebendes VerfahrenMereologieMessage-PassingZweiKontrast <Statistik>MAPOpen SourceDatenanalyseFehlermeldungSoftwareArithmetisches MittelRechter WinkelVerzeichnisdienstPlotterDokumentenserverGeradeSoftwaretestURLWurzel <Mathematik>EinsWeb-SeiteData MiningIntegralCodeStrömungswiderstandTypentheorieParametersystemNichtlinearer OperatorSkriptspracheStatistische HypotheseProzess <Informatik>ExpertensystemRechenschieberGanze FunktionSharewareDatenverarbeitungssystemResultanteFrequenzKlon <Mathematik>Metropolitan area networkWeg <Topologie>Interaktives FernsehenMapping <Computergraphik>Radikal <Mathematik>WellenpaketAggregatzustandWhiteboardRelativitätstheorieProgrammBrowserGraphfärbungBimodulDichte <Stochastik>Geschlecht <Mathematik>TropfenDemoszene <Programmierung>Computerspiel
Open SourceSoftwareDruckverlaufFreewareRechenschieberMeta-TagVideokonferenzDienst <Informatik>MultimediaSchnittmengeQuellcodeRechenschieberSoftwareCoxeter-GruppeURLForcingÄußere Algebra eines ModulsCodeOpen SourceComputeranimation
Transkript: Englisch(automatisch erzeugt)
OK everybody, so we're still getting some stuff prepared, but we have quite a bit of stuff to cover today, so I think it's best if we already got started.
So if you don't mind coming down, like closing the door and having a seat. First of all, before I start the presentation, welcome to the last course of the Linux days. This is the advanced Linux course. I hope it won't be as complicated as it might sound.
I sent you an email yesterday evening and I hope all of you got it. As you might have read in the email, we'll be having a short introductory theoretical part, and then a slightly longer practical part where you will learn how to write and package your own scientific software under Linux. So it's important that you have everything you need.
Before we get there, that's why I have the slide first. If you don't have anything, you can install it and get prepared during the theoretical part, so that you have everything when you start working with it, actually. So the most important thing is that you have your laptop with you. Does everybody have a laptop? We will be working mainly remotely, meaning that to make sure that everybody has all of the software they need for the example,
we've set up a server which all of you will be able to log into. We already have usernames, they're like numerated, and the usernames are user, and then two numbers. It will be starting at 10, so 1 for instance will be 11, 2 will be 12, and so on.
And the password is the same thing as the username. While working on the server, we will mainly be working from the command line. The next thing is you need a shell client, so you need something in which you can type the commands and interface with the server.
If you have Mac or Linux, this is already given. If you have Windows, we ask you to download Git Bash. It's like a suite of commands which has everything you will need for the course for today, and lots of other things which might benefit you in the future. Reliable text editor, you don't really need this. I'm going to ask you to edit stuff in nano. It's like a text editor which you can use from the command line.
If you really, really want to, you can also edit the text files like the program which we will write on your computer and upload it, but that's not that important. And if you want to be able to experiment with the social coding part, like sharing your code and being able to install it, it would be important that you have a GitHub account.
Who here has a GitHub account? Wow, okay, that's a lot of you. If any of the others still want to make one, it's very fast, you can make one during the initial part of the presentation. If not, again, you don't have to, but then you won't be able to follow along with the instructions. So, the Linux days is all about FOSS, free and open source software.
And what this means exactly, you might recall it if you've been to the initial presentation, free and open source software is a concept for writing and distributing software which is based on these three attributes and which is indeed very good and very well suited for scientific software.
Free, as in free and open source software, means free as in freedom, it has nothing to do with the price, it has something to do with the fact that you're allowed to do whatever you want with the software, meaning that you can redistribute it, you can modify it and so on, something which is very important if you're a scientist, if you want to be creative in the way in which you develop the software,
if you want to be collaborative, work with colleagues from all over the place and making a better research tool. Open source means that the software can be read in the sense that you as a human can understand how it's programmed and how it works. You have access to the source code. And that is also very important for scientific purposes because that guarantees that the software is transparent and it's reproducible.
The same way in which you document your experiment when you do it to make sure that it's reproducible, you also document your software and you show where it comes from to make sure that people can reproduce the results which you get with your software. And, of course, free and open source software is software. And software is a very powerful tool for science.
I put it here, software is powerful, because we're going to talk a bit about the issues with scientific software and one of the big issues still, even nowadays, is that very many scientists are still a bit skeptical about using software too much in their workflow, partly because they don't understand it, partly because they think that things can be done by a human operator better.
But in the world of bigger and bigger data, software is an indispensable tool and it's very important for research, which is why we're talking about it right now. Scientific software. Software which is used for research, for conducting organizing experiments, for collecting data and so on.
It's not necessarily like medical software, if you might think of it like that. Generally, if you use software for diagnostics, it's not necessarily scientific software anymore. It's already a product which you deploy to serve a specific purpose. Scientific software is meant to help you to find out more about the world, analyze it. In a way, if you're a scientist, it's a bit like an extension of your mind.
And therefore, it's mainly free and open source because science in itself, I told you, good science is also reproducible and well-documented. So as a scientist, instinctively or maybe intuitively, if you write software, if you conduct your software behavior in a similar way in which you conduct your research behavior
and it will end up being free and open source. Most scientific software, thankfully, is free and open source. Sadly, most scientific software does rely on non-free and open source compilers and backends, maybe because as a scientist, you do the free and open source part involuntarily and you perhaps lack an appreciation
of how important free and open source is also for the software backends. Another attribute of scientific software is that it evolves organically. I think most of you are informatics students. I looked at your email address. Most of you are just students. I think you're informatics students. Is anybody here not an informatics student?
Wow, okay, actually quite a few. Anyway, so if you work with software a lot, then you will know that software should be planned, should be designed carefully. This is sadly not always the case with scientific software. It evolves organically, meaning that someone writes a suite of scripts and he says, okay, then they look at it, okay, which scripts give me the most interesting results?
And they build on top of that and everything else is kind of forgotten but not always also deleted. So the issues with scientific software, I've already talked about the first, is it's great that most of it is free and open source, but it's only incidentally free and open source. Also, it's seldom refactored. This is something which I talked about
when I said that it evolves organically, meaning that very seldomly in scientific software do you have people who sit down and say, okay, let's like completely redesign this so that it makes sense from the software point of view. This is sadly infrequently done. And it also often comes bundled. See, if you have lots of publications and former research based on your software,
you want to be able to make sure that people can reproduce that. So backwards compatibility is rather a big thing with scientific software. And the way in which this is solved is sadly that most software distributions ship with all the libraries which they need and so on and this is more or less the de facto way of installing scientific software,
which is bad because it leads to deprecated dependencies which are bundled with the software, does not use like your system backends and it's actually quite a big issue all around. There are a couple of repositories which try to overcome these issues and to make it able to install scientific software
on a system in a coherent way, in a way which is robust, in a way which doesn't waste space, memory and which makes sure that your system is stable. And I'm a neuroscientist, so much of the actual examples we're going to see today are neuroscience based. But I think it's a good example because neuroscience is so data heavy
and therefore you have a huge software ecosystem around it. So probably I think it's one of the best disciplines to exemplify the current issues of scientific software. So NeuroDebian is a repository for neuroscientific software, so it's not a separate distribution of Debian, it's just a repository of software. And it's based on the Debian package manager.
And it's a huge thing. So it's a large project based at the University of Marburg in Germany at Dartmouth College in the United States. And it mainly provides you with binaries for lots of neuroscience packages.
So very many of them have been rebuilt to bundle less software to be more appropriate for a system-wide install and packaged in Deb files. These are binary files which practically your package manager just copies in the right positions. Another large repository of neuroscience software, although considerably smaller than NeuroDebian, is NeuroGentu.
It's practically the same thing. It's actually not at all the same thing. It's a similar thing which is built on top of Portage, the Gentu package manager instead of Debian. It's mainly developed here at the ETH, mainly by myself and a couple of colleagues. And practically it's source-based. So in contrast to NeuroDebian,
it does not provide you with a series of binaries, but it provides you with a series of instructions which can automatically fetch the source code from upstream and build it on your computer. If you're looking for reproducibility, this might be a bit better because then you make sure that the software itself is reproducible and that whatever you get as a product of that software
is not necessarily an artifact of how it was built. You have access to compile time variables, you have access to slots, meaning that you can have parallel installations of the same software if you want to test how your data analysis looks under different versions. And now I'm going to show you a couple of big neuroscience packages and what kind of issues they have
with which these package management systems, these repositories, have to deal with. So these are the major software packages in neuroscience. I certainly don't want to make this slide a point about the fact that they're bad. They're certainly not bad. They're really great packages of software. Hundreds if not thousands of high-profile research papers
were built with them. But even such big, very serious software packages have issues. SPM, for instance, still relies on a MATLAB, on a proprietary, on a closed-box backend. There are efforts made to make SPM run on Octave. However, this is still not the case, meaning that if you run SPM, statistic parametric modeling,
one of the major software distributions in neuroscience, you will be forced to rely on a MATLAB compiler, which you don't really know what it does. I mean, you can say, yeah, it's made by a big company. We're going to trust it. But I guess the point of good research is that everything should be transparent, which if you use SPM, it is not. AFNI is another really big package for neuroscience,
and one of the shortcomings it has is that it lacks a build system. It's evolved very organically with lots of makefiles. Everybody contributes like a couple of packages. They write a makefile for that, and in the end, you have a jumble of makefiles, which all point at each other. And it kind of works, but debugging a compilation of AFNI
is not something which is very nice. FSL does not share some of these problems. It shares, however, another problem, namely that it's licensed quite restrictively. It is theoretically open source, but it's certainly not free, meaning that if you want to use it for anything commercial,
you will have to consult with the people who do it. If you want to download it, actually, they try to prompt you to give your personal information and so on. FreeSurfer is another very popular plotting tool for neuroscience, and its issue is that it depends on very outdated packages. This is one of the examples for bundling,
which I was talking about. The issue here is that it relies on Mink. It's like a neuroimaging medical format data, but it relies on a version of Mink, which is incredibly outdated, and they haven't changed it, mainly for backward compatibility. New neuroscience players.
In the face of the issues with the old, the very good and established neuroscience software, an entire new ecosystem of scientific software, which is maybe more aware of the importance of freedom and of open source at the code level, not necessarily at the interface level, has evolved, and it's mainly built around Python. One of the first examples is NiPipe.
It's a suite of Python bindings which practically lets you use all of these other big packages from a Python interface and gives you practically the power of Python, but keeping the backends. There's also native packaging for neuroimaging in Python. It's called NiPy. NiBable is a dictionary conversion tool with which you can view, read, and write
all sorts of neuroimaging formats natively in Python. NiLearn is a very interesting example and one with which we will work with later today. It's practically a machine learning package, but it's also a plotting package. It's again an example of how scientific software evolves organically, meaning that lots of people who work with neuroimaging
have tried to make good plotting tools, but some of the best plotting tools nowadays are part of NiLearn, because the people who are doing these machine learning functions also put a very high emphasis on having decent plots with their results. So probably at some point this is going to split off
and be a separate plotting library. This is how open source software evolves. Of course, to be able to do all of these things, you need to be able to collaborate. You need to be able to see who modified what. You need to be able to fork and branch and to merge software, and this is done by a number of version control systems.
One of the most prominent of which is Git. Git is practically very interesting because it's one of the first decentralized version management systems, meaning that everybody can have a repository of the software and everybody can contribute to each other, and practically all of these repositories are like the master branch in themselves, so to say.
So you don't need a centralized repository. You don't need an institution or one person to coordinate the effort, but the effort goes there where you, for instance, have the most active development. It's a very natural way of developing software. Very many people like it. GitHub is a nice interface for that. It's practically a website
which allows you to publish your Git version software online, contribute, track issues, and so on. We will also use this. Most of you undoubtedly already know it since you said that you have a continent. Okay. This is the theoretical part, and now we are going to start with the practical part.
We're going to go through this step-by-step. If you're ready with everything before the others, then you can see maybe anybody needs some help. One thing which I want to tell you about is I tested everything not once, not twice, but a number of times, and it works. Whenever I had bugs with the code,
which I'm going to show you, it was always because I misspelled something, yeah? Computers, as you might know, are very obedient but not very intelligent, so they will end up doing what you tell them and not necessarily what you mean them to do. So please be very sure that you spell everything correctly. And now let's get started. So I told you that we have a cluster server prepared for you.
The address is demo.chimera.eu, and you can log into it via SSH. Your username, so user NN. NN is the number, so if you got the number 1, you're 11. If you got the number 10, you're 20. And please log into that and tell me if you're having any troubles.
Everybody's logged in? Who's not logged in? Okay, are you having any issues? Doesn't it want to connect? Oh, okay, sorry.
So no, you added to 10. I'm sorry for the system. We started the numbering at 10. I don't know why. So if you have a single digit, like if you have 5, your number is 15. Yeah, I'm sorry about that. Okay, so everybody's in now.
The password is your username. Exactly the same.
Yes? You should have gotten one of them. Didn't anybody give you? Give him the number, please. Okay, is everybody in?
Good. Really great. Sorry, you're still not in?
What's the problem? One more thing. I'll leave the server open. So it's a virtual machine, meaning that technically you shouldn't be able
to burn the server with it. But I'm going to leave it open for, I don't know, another day or so. So if we don't finish anything, you can get the slides and try to do the rest at home. It's really interesting, and I really encourage you to do it, though I hope we're going to be finished. Okay, so everybody's logged in now? Okay, great. So now that you're logged in,
we should start writing a program, okay? So it's going to be a simple neuroscience program in Python. And the most important thing when you're writing software is that you should stay organized. So let's make a directory where you're going to put all of your source packages, which you're going to write today.
If you would be doing it on your own computer, it would be all the source packages, which you're going to generally write, and let's do this via this command. This makes a directory in your home directory called src, yeah? Type that. And once you've done that, you practically have a place to hold all of your other source subdirectories.
After you do that, you can make another directory for the actual program, which you're going to be writing. Every one of you is going to write a different program. Actually, it's going to be the same program. It's going to have a different name. We're doing this so that we can exemplify how you can collaboratively write software. So after you do that, make another subdirectory here
with your program name. So again, the same command under your home path, under src, acti brain, and this is the same number which you use for your username. Sir, you have a question? Okay, what's your number? 15, so that's 25. Is anybody else user 25?
Yeah, yeah, yeah. You should be 14. Yeah, I'm really sorry about this inconvenience. We just didn't coordinate the numbers with the user so well. So you're in right now? You're 25, you're in.
You've made the directories? Okay, great. After you make this directory for your program, you should make another directory for your module, yeah? So Python works in a slightly organized way, I'd say, in that you can have, as part of a software package, a number of modules,
and these are the things which actually get installed. We're going to see that a bit later on. So the module which you're going to have as part of the software you're writing is going to be called plotting-nn. So after you make this directory, you go and make another one, make directory under your home path, under src, under acti-brain-nn,
which is your number, plotting-nn, again, your number, yeah? This can actually all be done with a one-liner, but I think it's better if we go through it step by step. No child left behind and so on. Is anybody this far? No? Okay. After you do that, for Python to be able to recognize
that the directory is indeed meant to be a module, it needs to find a file inside the directory. That file can be empty. It's important that it has this name, yeah? So after you got here, you should create a file inside this path, which is called underscore underscore init, underscore underscore dot py, and you create that file
via the command patch, yeah? It just touches the file and it leaves an empty file as a trace behind. So right now, we made a directory where you keep your programs, you made a directory for this program, you made a directory for this module, and we signal to Python that that directory is going to turn into a module, yeah? Now we're going to go and write the actual program.
The actual program looks like, okay, for some reason this doesn't work now. Okay, the actual program will see, it will come on the next slide. Practically, you can write the program in two different ways. You can write it with nano directly via the command line on the server, which is what I recommend you do,
or you could write it on your own computer and then upload it. But please do try to use nano if you really want to do it on your computer, you can, and then you can upload it with scp. I'll show you the command. So, if you're going to use nano, you practically run this command, nano, your home path, the source directory of all your programs,
the base directory of your program, the base directory of the module, and the name of the actual function file which you're going to use, yeah? So it's called brain underscore activation dot pi. So write this, type enter, and then you should have a window appearing which is a text editor.
And then we can go to the next slide and I'll show you the actual code. So if everybody is in nano, we can move on to the actual program, yeah? This looks like quite long, but this is really the entire program, so actually as scientific software goes, this is quite compact. So I'm going to go through it with you line by line and you're going to have plenty of time
to type it in while I go through this, yeah? So practically the first thing you should do when you're writing a Python software package, generally for any programming language, is you have to tell the interpreter what kind of functions you are going to use, yeah? Since this is a rather high level task which we're trying to do, this program, as you might have guessed from the names of the files,
is about plotting neuroimaging data, yeah? We're going to use toolkits which other people have developed. Toolkits which are free and open source and have been collaboratively developed. In fact, toolkits from NILearn. You might remember the package I told you about at the beginning. So on the first line, we're going to tell Python that we're going to use from NILearn, yeah?
The module datasets. So from NILearn import datasets, yeah? This is a module which will help us select some datasets which are automatically shipped with NILearn, some example datasets, yeah? After we do that, we're going to tell Python that we want to import matplotlib pyplot, yeah?
This is a plotting utility which we will need to make the figures. You're going to see these are like really nice figures and we're going to do amazing things with them, yeah? So you should write import matplotlib.pyplot as plt. This practically means that it will import this but you can call it via this shorter name so that you don't have to type long names.
On the third line, the last thing we need to import is again from NILearn plotting, yeah? So plotting is another module from NILearn which lets you plot things, obviously. We're going to use it down here. So you see practically all of these files are used at some point in the code which follows, yeah?
This is why it's important to import them. After we do that, we're going to write a function. A function is something in Python which you can call. So you can call a Python script as a script, so you can call the file, but also from a file, you can call a function, meaning that practically the interpreter goes inside here and executes what's written underneath this function.
And to write a function which you can call, you're going to have to define it. Actually, just one question to see how fast I have to explain this or not. How many of you can actually program in Python? Okay, so that's quite a few of you, but some don't, so I guess I'm still going to go quite slow through this, yeah? After you define a function, you should give it a name.
You see, this name is quite long. I didn't do this so that you can make a lot of typos, but it's actually quite important when you name your function and modules in Python to give them really explicit names because you're going to see now we have a really small file, but after you write like 20 of these, you're going to get really confused if you call your files like P-A-B-I.
You're not going to know what that stands for anymore, so it's actually a very good... Also, it's very bad for collaborative coding, like your collaborators will have absolutely no idea, so it's really a very good thing to get used to to name your things extensively. Also, most advanced IDEs or simply text managers will have auto-completion,
so this is difficult for you now because you have to type everything in nano, but normally, you would use like a text editor specifically made for coding, which can auto-complete all of these variables and function names. Anyway, you define the function, and any function can take any number of inputs, so this could be zero, it could be an empty parenthesis,
or it could have like 20 fields in here, which means that you can give to the function 20 arguments based on which it computes something, yeah? We're going to give it just one argument, which is the identifier, yeah? Based on this identifier, it's going to create a variable, which we are going to call the localizer dataset. It's practically going to be an array
which shows where in the brain activation takes place, and this array is going to be taken from the datasets, which we imported up here, from the datasets which ship with nalearn. It's going to fetch the localizer contrast, so these are going to be contrast-based localizers, meaning that you give a person a task, A and B, and practically you look
at the differential brain activation. You don't need to know that, I'm just explaining where that comes from, yeah? And it's going to fetch the localizer contrast, which has the identifier, which we specified here, so practically this, you give this to the function, and this is where it lands, this is where the function actually uses it. The number of subjects, so this is data from up to 94 subjects.
If we import from all of 94 subjects, we'll be here for quite a while, because there's 40 of you, and the server only has eight cores. So we're just going to import data for two subjects. And the getTmaps true, meaning that we'll have Tmaps, so we will not have, say, the raw activation data, but we will have Tstatistics of the activation data.
This is practically a statistical test, which allows you to say, with what kind of certainty do I get differential activation in this area, yeah? So again, localizer dataset, this is the variable which we're creating, this is the module where it comes from, this is the function which gets it, and these are the attributes we pass to that function.
Please remember the brackets here, they're very important, because for some reason this function wants this to be a list, yeah? Okay, after we get there, we're going to select the location of the Tmaps, which we just downloaded, so localizer Tmap file name, this is going to be a file name, this is going to be a string,
and it's going to select from the localizer dataset, which we created here, the Tmap 1, the first Tmap, yeah? And once you do all of that, we can plot it, plotting, which is the module which we imported here, plot glass brain, glass brain is a nice way to visualize the brain,
it's like a transparent brain through which you can see the activation, and here you select the localizer Tmap file name, so the location from which it should take that activation map, and the threshold, practically you're thresholding everything which has over a value, which has a T-score of over three, yeah? Meaning that you do this so that you don't have colorful pixels
everywhere in the brain, but you have colorful pixels there where you have a high certainty of differential activation between two conditions, yeah? So practically this is a function which you, if you would be a neuroscientist, can use to very rapidly just specify what kind of contrast you want and get a picture from this dataset of participants
of how the brain activation would look for that contrast. If you're not a neuroscientist, maybe you can write an article about psychology and so on and make yourself more credible by putting in some figures, you shouldn't do that because that's lying, but you can also use it. So after you wrote this, we're going to write another function, as I told you a Python file
can contain any number of functions, the next one is going to be really, really easy, it looks like this, yeah? So this function will just take a data file, which is a map of activation, and plot it on the glass brain, yeah? So you're going to call it def plot activation by data, separated by underscores, localizer data,
this is going to be a string like a location of a file, and the command which is going to get executed if you call this function is going to be plotting, so again from here, yeah? So plotting, plot glass brain, it's the same function, the glass brain plot, and the first argument is the localizer data, so the path of the data which you're going to plot,
and this is the threshold, yeah? So you add this to the same file, just so that it's clear. And after you're done with that, you already wrote your neuroimaging program in Python, yeah? It took a bit of time, but that's because we're so many
and we had to coordinate everybody making it to this point. But you can see it's really not very many lines of code with which you can do quite awesome stuff. Of course, if you do actual research, it's a bit more involved than this, you have to get the activation maps from somewhere, yeah? But plotting is also an integral part of that,
so you could practically think if some of your colleagues did all of the statistical calculation for you, you would already be able with these functions to plot neuroimaging data and to visualize it. Quite simple, yeah? So after we wrote all of this, now we already have a Python program, like a Python module with two functions
to plot neuroimaging data, yeah? You can already use this, so you could already use this on your own machine. But as a scientist and as a programmer, and especially as both, it's very important to package your software in a way in which other people can also use it. It's actually a reason why lots of effort gets lost in science,
because everybody is writing their own tiny scripts and they're not packaging them. They're not making them accessible to other people, not necessarily because they don't want to, but just because they don't want to be bothered. They think that software which you can install should be like big software and their script is probably not worth the bother. But no, if you write something, a program which helps you in your research,
in your plotting, in designing your papers and so on, you should upload it because then you can save other people work and what's even more important for you personally these other people can contribute to it, because if they end up using your stuff as a shortcut and they find any bugs in it, they will also have a stake in making it better.
They might find bugs for you. They might know programming better than you. They might come from a different angle. They will definitely help you improve this function. So in the hope that that might happen with this program, let's package it so that we can install it on the system just like you would install OpenOffice or anything else through the package manager. And it's actually quite simple because Python is a rather mature programming language
and they kind of already took care of that. So this is like the finalizing the program. If you're in nano, you just need to get out of nano and save your stuff. You do that by pressing Ctrl-O, Enter, and then Ctrl-X. And if you did insist on writing this program on your own computer, you could upload it with SCP.
Did anybody write this on their own machine? Okay, so you can upload it with SCP to the correct path. Sorry? Okay, great. Well, that's another way to do it. Okay, so now everybody should have finalized the program and you can add a setup file.
This is practically a really, really short Python file which tells whatever Python package manager you use how to manage this package, yeah? Actually here you can add a lot more variables and normally you should because you want to give a lot of information about your program to people who don't know it so that they can understand what it's all about.
But here we have like a really minimal example and for that you'll have to create a file. Oops, I'm really sorry, what did I do? Okay, for that you'll have to create a file called setup.py under the main root of your program. Yes, you do it again with nano, tilde, this is the symbol for your home directory, the source where you keep all your programs,
this particular program, and then setup.py. And inside you import, again, as I told you in Python, on most programming languages you start by telling the interpreter what functions to import. So you will import from distutils core setup. This is the function which setups the package. Yes, you have questions? Do you know the Swiss keyboard?
Like those, how do you call them, document marks? Those quotations. So I also have a Swiss keyboard right here. The quotations are the second button. It's next to zero. So it's the next button right after zero.
It actually doesn't matter. You can use single quotations or you can use double quotations. The only thing that matters is that you stay consistent. So if you start using double quotations, which I suggest you not do because then you might be confused, then always use double quotations. If you start using single quotations, that's also okay,
but always use single quotations. And after you do that, you press control-alt to save, enter, and control-x. Practically what this contains, so after you import the setup function, you can pass a series of arguments to the setup function. Remember, I told you that in Python, any function can have anywhere from zero
to whatever amount of arguments. Setup can take a lot of arguments, but we're just going to tell it the name of the package. This is x-tybrane-nn. In case this wasn't clear, please write your username plus 10 here. It's very important. Don't write nn. The version, this is a live software package, meaning that we're not going to version it.
We're going to upload it, and we're going to be able to install it directly from the code you wrote a couple of minutes ago. It's a way of managing software which is maybe not very robust, but if you are using software for your research, especially software which you wrote yourself or which close collaborators wrote, where you can pick up the phone and ask someone
if you have a bug, then it's generally better to use live software because then you have direct access to the newest code which your collaborators wrote. You can debug it together. You can contribute to it better. If you have any issues, since it's the software which you wrote or your collaborators wrote, it's easier for you to debug it. I wouldn't use live software. I wouldn't use a live package for Chromium
or for something like that, but definitely for my scientific software. That's why we don't have a version number. These are the packages which we installed. I told you that practically in this big software package we're going to have a number of sub-packages, a number of modules, and this is plotting NN. This is why we had another directory here.
You remember the directory under which you wrote that little init file? This is how we tell Python that that's what it should install. Practically you could have any number of subdirectories here, and via the setup file you can tell Python which ones to install. You could actually also select at compile time, so when you're installing the software, which ones actually get built,
so which ones get installed to your system. Is anybody not done with this? Everybody's done? Wow, this is great. We're starting to pick up speed. Now that we wrote the software, it's time to get collaborative with it. You wrote a Python neuroimaging program,
you wrote the setup file, which practically enables Python package managers to install it. Now you want to be able to upload it so that colleagues of yours might also install it. Right now you set up all of these things, but the only one who can install it is you because you're the one who has it on their computer. The way we're going to do it is by pushing it to GitHub.
First we're going to need to register it in Git. The first thing which you will need for that is you will have to tell Git who you are. Git can also be used for proof of authorship if you're interested in that. It's a built-in fact of Git that it will not let you make commits
if you don't identify yourself. Please write these. You can actually add dummy variables if you don't want to tell Git your real name or your real email address. Actually, for GitHub, if you're going to use GitHub, which I think many of you do, it's good to give the same email address which you registered with on GitHub so GitHub knows who made these commits.
Here you can write your normal name. The way in which GitHub will identify you is over the email address, which is why I only wrote about GitHub here. After you're done with this, which I hope all of you are, we can move on to committing your program to GitHub.
First, committing your program to Git and then uploading it to GitHub. First, you have to make sure that you are in the program directory. If you're not already in there, you can change the directory, home directory, source directory, program directory. You can initialize an empty Git repository,
meaning that you practically start Git, this version control system inside of your program root with git init. Then you can add all of the files you have in there to Git. You do that via git add and dot. Dot practically means the current directory. After you do all of that, you can commit everything, meaning that you register it in Git.
Git will memorize forever unless you tell it to forget that you committed this code at this date, yeah? So practically you write git commit minus A means all. So all of the files which have been added and have been changed, which at the moment was all files, get committed. Minus M stands for message.
This is a commit message which Git will write, and you can write new program or anything else you want. Practically this commit message is just important to give you an idea of what happened at every commit so that you can tell your future self or your future collaborators what you did there, yeah? It's practically just a way of staying organized, yeah.
Git doesn't accept empty commit messages, so you should definitely write something in there. So now that you committed it to Git, you're this close to actually making your code public, allowing other people to install the script you just wrote and allowing people to collaborate with you in an interactive, colorful way. And you do that via GitHub, yeah?
And to be able to host all of this on GitHub, you have to create an empty repository. You can do that best from the graphical user interface of the web page. So you can go to... I don't think you need the W. You can go to github.com and go to your page, so log in with your account, and then you will have a button which says create new repository.
And after you have created the repository, then you can go on the repository page. It's actually the first thing which the GitHub interface will show you, and you will see a lot of lines which tell you how to initialize your repository. Most of those commands you already wrote, like git init and git add and so on.
The command which is important so that you can push stuff from the command line from the server directly online is the line which looks like git remote add origin, and after that it can start with HTTPS if you don't have your SSH keys registered with GitHub or it can start with git. Whatever line starts with git remote add origin
from the page you get sent to after you create the repository, please copy that line. Copy it with control C, and then go to the terminal and paste it after you're in the root of your program. So after you're under your user name, SRC, ActiBrainNN,
then you paste it there with control shift V. And then you hit enter, and this will practically add a remote, so it will tell the git software on the server that it can push stuff to this web address. So this practically tells git that it should open like the remote section of the program
and add to the remotes list a remote called origin and which has this web address. So git is the software, remote is a list of remotes, add means that it adds one in the list, origin is the name of the remote, and this will be the web address.
This is what it all means. So a remote can be a web address, a git address, a place to which git can push your content, so where it can send it to or from where it can pull content. Exactly. So in this case the remote will be your GitHub account,
your public GitHub account on which you have created this empty repository, that will be the remote where git can send stuff, where it can also pull them. Like if somebody would contribute to your software, you could make it better online and then you can pull from there to your machine. So is anybody still working on this?
So everybody's done. Yeah? Okay, great. So if you do that, you can go to the next command which is git push origin master. This means that it will push the content, as I told you, yeah, to this remote called origin,
and master is the master branch. Usually you can have a number of branches on your repository. That's not that important now, yeah? But the main branch is called master, quite intuitively. So after you run this command, your software is on GitHub. People can install your software. People can review your software. People can set up, like can track issues with your software
and people can collaborate with you, yeah? So if you'd be a neuroscientist and you wanted to have a utility to easily plot stuff, now you would already be ahead. So now you'd also already have something which might benefit you in the next years of your masters and so on, yeah?
Okay, after you do this, yeah, we will move to the next part where you can actually install software from GitHub, yeah? I told you about a number of repositories for managing neuroscience software, and I told you one of them is called NeuroGentu. It's actually a part of Gentu Science and it's developed mainly here by me.
And practically this is what the server runs, yeah? So now I'm going to show you how to add a package for your software to an overlay which the server uses and to install that package. And I hope we will be done with this before we get kicked out. Actually, there's another step after this which you can do at home, which is actually the best thing about this presentation,
but sadly you need all of these other things before you can get there. But for the software which you install, you can do two things. And the one thing which I would recommend you do, just to get a feeling of how easy it is to install software which other people wrote just bare minutes ago, the package which you will be writing
will not be for your own software, yeah? So it won't be for ActiBrainNN, but it will be for ActiBrainNN. Actually, I don't know. Some of you already gave up, so actually I think that's a bad idea because we don't know which numbers are still in the run. Yeah, so now you should do it for your own software then.
So ActiBrainNN, the same NN which you've been using throughout the course. But please keep in mind you could do it just as easily for the software which anybody else of you wrote here, yeah? This is how easy it is to get the software which other people just published online live at your fingertips. Live on Gentoo. Moving on, an overlay is practically
a set of e-bills on Gentoo. It's a set of instructions of how you download and install automatically different software packages, yeah? So in order to contribute to this overlay, you have to fork it and to clone it, yeah? So the first thing which you should do
is you should go on GitHub again, go to this address, and you will see a button in the upper right which says Fork. Please click that button. Yeah? After you have clicked that button, go to the shell with which you've been working, change the directory to src, and then type this command, git clone https github.com username.
This is your GitHub username. This is not the username with which you're identified on the server, yes? Please, please keep that in mind, yeah? And overlay.git. This will practically be the address of your fork once you fork my overlay here, yeah? And this will copy everything to your system
where you can easily edit it. Can you elaborate a little? Of course, so an overlay, yeah, it's okay, I got your question. So an overlay is practically a set of ebills. Ebills are like the package atoms for Gentoo. In most distributions, a package atom is a precompiled binary, a huge file, a deb file, for instance,
for Debian, for Debian's package manager. In Gentoo, you do not have large binaries as package atoms, but you have really small scripts which just contain the very basic minimal examples, basic minimal instructions with which your package manager automatically downloads the code and compiles it, yeah? So an ebuild, as you will see,
you're going to write an ebuild in the next couple of minutes, is a really, really short bash script. She is not really even a script. It's just a collection of bash variables, and this is what a Gentoo package atom looks like. It completely, seamlessly integrates with the source. There's nothing precompiled there, yeah? Your package manager is really intelligent,
gets the source, and compiles it on your machine. It's completely reproducible. That's why I think that Gentoo is an amazing distribution choice for scientific purposes. So after you've done that, you have the overlay, this collection of Gentoo package atoms on your system, and you can add another one and push it upstream to me, okay?
That's actually going to be a huge issue now that I don't have internet, but let's move on, yeah? Okay. We're going to figure this out. So, after you did that, yeah? So, for chimeric overlay, we already did that. You should create the appropriate directory
for this actiBrain package, yeah? So you should type make directory mkdir tilde for your home directory src. That's where you keep all of your source files. Overlay is where you cloned, where you hopefully cloned my overlay into.
DevPython, it's the category of package atoms, atoms like Python development packages, and actiBrain is going to be the name of the package which you're going to package, yeah? actiBrain nn. Please use your own nn, which you've been using throughout the course, yeah? After you did that, you can change the directory so that you're inside,
and you can open the new file with this command, yeah? Nano, you've already used it today, actiBrain nn, yeah? It's the same package. It's okay, we don't need that. I figured out how we're going to do that. And minus 999, 9999 is the package version which indicates to Portage that this is a live package, yeah?
So this is telling Portage that this is not a version package, meaning that every time you update stuff, it will automatically get reinstalled to make sure that you always have the absolutely newest code from upstream. So once you're here, you can actually start writing the e-build. As I told you, these are just a set
of very simple Bash variables so that Portage, the Gen2 package manager, will know what to do with this and how to install it, okay? So practically, the first variable which you're going to have to select is the e. It's like a prefix which Gen2 package management uses for everything the programming interface version,
which is five. It's the newest. It makes sure that practically all of the functions which you're going to want to work with are supported, yeah? The next variable is the Python compatibility. So which versions of Python is your software compatible with, yeah? We're just going to select two, like the major Python 0.2 version, 2.7,
Python 2 version, 2.7, and the current Gen2 Python 3 version, which is 3.4, yeah? So equals parenthesis. It's very important that you leave a blank space here, yeah? And the next line, as in Python, we have to tell the program at the beginning what kind of functions it's going to use.
It's actually the same for an e-build. So we're going to tell the e-build that we're going to use the distutils functions, which are the functions that can automatically manage that setupPy. So we distributed our code with setupPy. I told you that setupPy allows Python package management software to take care of stuff, yeah?
Distutils R1 is the part of Portage, which interfaces with that, yeah? Git R3 allows Portage to download stuff directly from Git, which we will be doing for our live software. Description, you can actually leave this out if you're in a hurry. Otherwise, it's just a general description of the software so that people who install it
have at least a punchline to go on. Usually, you have another variable for the homepage and so on, but we don't need those. eGit repo URI is practically the address from which the software is going to be fetched. It's git colon slash slash. It's practically the address from which your latest program version can be fetched.
The license is GPL3. This can be whatever license this program is distributed with. We're going to pretend we distributed it as GPL3. The slot, as I told you, Portage allows you to install multiple versions of the same package via multiple slots, yeah?
So, by default, every software package in Portage, which is not otherwise slotted, is slot zero. Keywords and IUs, these are just important so that Portage knows whether or not it's safe to install or not, but we don't care about that right now. And the really important stuff,
if you're packaging things and you want to make sure that people can install them and actually run them, is you have to tell Portage what kind of stuff your script depends on, yeah? It depends at build time on setup tools. You might remember we imported that in the setup.py, yeah? And it depends at runtime on nilearn
because that's where all of our plotting functions come from, yeah? So, you should write this, and then, as usual, you press Ctrl-O, enter, Ctrl-X. Okay, for those of you who have already finished, yeah, you can do the same thing which you already did before.
This is actually nothing new, so this is, again, in the directory, you add this to the Git repository, you commit this, yeah, and you push it. So, git add, git commit, minus a, minus m, new ebuild or whatever message you want to type, and then you push it. And what you should have done after that
if I would have actually had internet access was to file pull requests. So, practically, you can, after you do this, go to your GitHub page, and you're going to have a button pop up somewhere in the upper right part of the screen, actually in the middle upper right part, which says pull request,
meaning that you can submit a pull request to me because you forked this overlay from me, and I can review the changes you made, so this additional package you added, and I can pull it. And if I pull it, practically, this becomes available to the server, yeah? Sadly, we cannot do that, so after you've done all of that, we're just going to proceed to syncing and emerging a package,
which is exactly the software which you already wrote that I already have in my overlay, yeah? And after that, the time's going to be up, yeah? Okay. I'm going to assume you're all here, and we're going to go to the next slide, yeah? This is how we synchronize all of our software so that we make sure our package manager
knows about everything that's been modified, and quite a few things have been modified, but we don't have to run this because I haven't, like, pulled anything from you, so the package is already at the newest version. You could have done this via sudo, yeah? But you don't need to do it now, and you certainly shouldn't all do it because then there's going to be lots of requests running over the server.
The next point would have been install package, sudo emerge actibrainnn. This would have installed a package of your choosing from everything which we have online, but we already have a package here. You can type on your terminal, eix, eix, eix, space, and then actibrain,
and then you can type enter, and you can tell me what comes up. So eix, eix, leetsachian, so space, and then actibrain this without the nn, and type enter. What comes out? In any directory, this doesn't matter.
No matches found. No matches found, really? Miki, is that exactly so, no matches found? By actibrain, eix, actibrain.
Actually, I've already tested this, so if there are no matches found, it means I didn't update the directory. So in any case, you have a software package there which is called actibrain 85 from the user 85 with which I've tested this, yeah? And with that, you should be able to run the next part of the script,
which I'm going to probably leave for you as a sort of homework if you want to do it. As I said, I'm still going to leave this machine on, at least until this evening, maybe also tomorrow. And the last part would have been this. Python text. Python text is a library which is also managed
and installed on the server, which allows you to automatically display figures from your Python data analysis in your documents, yeah? And I've already written everything so that you can use it, and I have a document which is at this address, yeah? You can see it's also a Git repository,
but this time it's on Bitbucket. It doesn't matter. There are so many sites on which you can host your Git stuff, yeah? So you can clone this into your home directory, yeah? And you can go into the file, and you can edit this file to make sure that it's using ActiBrain 85,
because right now it's using ActiBrain NN. After you do that, you can practically compile your Python document, and then it will have all of the brain maps which it has specified displayed directly into it. Did anybody manage to clone this? Yeah?
Did you manage to open this file? You can see I think somewhere on the fourth line you have a function which calls ActiBrain NN. Yes? You can change that to ActiBrain 85, and you can save the document. Yeah? Yes? I have a question.
Certainly. This emerge, how does it work? Where does it get the software from? Because our EIX fails. Yes, the EIX is like a separate utility in Gentoo with which you can browse what packages you have. The problem is that they get updated separately, and apparently I forgot to update EIX.
Portage is the part of Gentoo which you use to actually install stuff, and EIX you can use to browse your packages. Actually, you don't need EIX. EIX is an optional package if you want to browse with lots of colors, which I like to do. You can technically also browse with Portage. So EIX not working is simply an omission of mine.
It's not an integral part of Gentoo. It's a part which helps you see everything better. Did everybody modify this file? So if you type this nano pythontechfunctions.tech after you clone this and you go into the appropriate directory, somewhere at the beginning of the file
you will find a line which calls ActiBrain NN, and you should modify that to say ActiBrain 85. Did you do that? Did anybody manage to do that? Yeah, it's a line four. Okay. Yeah? Sorry?
Which command? So this would have been the emerge command, yeah? Please don't run this because all of the software which you have actually packaged, which I'm sure would have worked, is not pulled by my overlay. But we have at least one example of the software
which is inside it, yeah? And which is already installed. Just remember the step from where you left off with the package you wrote to this software actually being installed would have been just two commands, yeah?
Yes, exactly, I'm sorry, yeah, yeah, it's plotting NN. Though it, yeah, exactly, it's plotting NN. Because that's the name of the module, yeah. Okay, so a couple of you managed to do this, yeah? Is anybody still working on doing this?
No? With 85, so instead of NN, you write 85, no. So is anybody still working on this and didn't get to do this yet? No? Okay, so for the ones which were able to do this,
we can move on and we can actually compile this document, yeah? You compile this document very easily, so you run PDF-LATAC, simply PDF-LATAC doc.tech, this compiles the document. Of course, you need to call Python as well so that the figures get generated, and you do that after this command is through
via pythontech.py, and again, the name of the document. This makes sure that all of the Python functions have been run, and by running the first command a second time, you practically make sure that they're placed on the pages. These should give you a lot of output in the terminal, maybe even an error message or two, but they should work.
In the directory which you changed into here, so LD expert doc in the directory of this repository which you cloned. So then you're actually ready to see the final result of your hard work, which is the, where is this, exactly, which is the file which you created, yeah?
So after you did this, a file will be produced called docpdf, which contains a lot of text, which I've written in this repository, and it also contains two figures which were generated automatically from, so to say, your Python data analysis program, yeah? And you can download this to your computer
to actually be able to look at it via SCP. You can run this command, so scp user, the NN which you've been using the entire time at demo.chimera.eu, colon, src, LD expert doc, docpdf, yeah? Did anybody of you manage to download a PDF? Do the figures look amazing or what?
No, they don't? Kind of, okay, well, you haven't seen lots of neuroimaging plotting software then. By those standards, they look pretty damn amazing. Exactly, you don't really know what it is, but you do, because these figures, so the output which you get
was selected by you in the document file. Of course, you didn't write the document, I did, but if you open the document file, you can see exactly what they are calling, yeah? So practically, these figures are called in your document file by pythontech, not functions.tech, but the other document which is inside here, which is called,
which is called pycode.tech. So if you can open nano pythontech pi code.tech, you can see the functions which actually
create these images, yeah? And there are two functions, one for each image. The first one uses the first function which we wrote where you can practically generate a figure based on an activation map, on a contrast, yeah? And I think the contrast which I've selected is checkerboard or something like that, which means it shows you what parts of the brain
are active when a participant concentrates on a checkerboard and opposed to the resting state, yeah? And you can select very many other strings which would give you a different activation. So practically, if you would edit that file, you can actually try that, yeah? If you open that file, so pythontech functions.tech, I actually wrote it right here, I'm sorry.
So download them via document, you can play around, yeah? So you can edit this file and at the beginning, when the first function is called, you practically have a commented line which tells you what other kinds of strings you could use, yeah? You can replace them, yeah? You can take one of those and put them
inside of the string, which is the argument of the function you're calling. And then you can recompile the document with these commands and after you do that, suddenly the figure in your document will change, yeah? The second, which command?
Ah, yes, sorry, it should have had a period here, so that would have copied it into the current directory, yeah. I'm very sorry, it's an omission on my part, yeah. In this file, I told you, so the first function,
the first picture which you see, takes a string and it shows you the activation map described by that string, that's the first function you wrote, yeah? So practically, if you're writing a LaTeX document and you want to say, okay, this is what activation looks there and there and there, you don't have to export these figures and manually drag them into your document,
but you can just call this function three times with a different argument, yeah? The second function is, in my opinion, even more interesting, it simply takes the data file, yeah, which is an activation map and it plots that. But you will notice that at the beginning of the second function, that data file is added as a dependency,
meaning that Python Tech keeps an eye on that data file and if that data file ever gets modified and you recompile your document, your figures change like that. This is very useful if you have an ongoing document, for instance, your master sees this or your dissertation, where you might add on some new data at the very end
because then you just change the data file and already all of your figures are up to date, yeah, with no exporting, dragging and dropping, yeah? So you can actually also test that by going into the root directory, so practically this one here, yeah, and moving data new to this location, yeah?
This is the location which is plotted on your second figure, this is the data which is plotted on your second figure, yeah? So imagine that this is your new data and you're replacing your old data with your new data, you just recompile your document and your figures are already up to date, yeah? Did anybody manage to recompile this document
after doing any modifications? You did, did it work? Did you download the file to your computer? Did you notice that it changed? Okay. Scientific publishing for your manuscript, for your dissertation, for your thesis, with no drag and drop, with no exporting, pretty much live.
You have to compile it so it's not entirely live but it goes directly from the code to the document with no intermediaries which require operator interaction, yeah? So practically these commands, they're three but you can make a script which calls them, so it can be one, yeah? In fact, if you want to be very fancy,
you can do a cron job and you can say, okay, well I'm editing my data and my master's thesis gets recompiled every Friday or every other day, yeah? So I'm sorry if some of you didn't make it till the end. I have to say I had hoped we can all make it but sadly the time was short and we had some technical issues and also some historical issues which I'd like to apologize for.
But I hope the ones of you who were able to follow through realized why free and open source software is important for science, how it can help your scientific career and I hope the ones who weren't able to follow through might still give it a try at home and might just take my word for this message
or the word of your colleagues, yeah? So wrapping up, these are the slides. You can download them from this location. You can see the source of these slides. These slides are written in LaTeX, yeah? If you're curious how you could do a nice presentation written completely in LaTeX with code snips
but you can also look at my source code. It's licensed, I'll do an open source license and this is my contact. If any of you are interested in the alternative, you can still join us. If any of you are interested in NeuroGen 2 and bringing neuroscientific software to Gen 2, then you can contact me and I'd be happy
to show you more about Portage and how you can help make it bigger and better. So thanks very much.