Merken

Software sustainability - guidelines for the selfish scientist

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
check
the when viral and thank you very much to the organizers for inviting me here and particularly for the wonderful dinner at that said we had last nite I thoroughly enjoyed myself and before I not so this this talk is called self sustainability guidelines for the selfish scientist but I realize as as we've gone through the days of this conference that perhaps
this isn't quite the right talk for this audience I want to have a quick check and how many people here consider themselves to be researchers all scientists OK and how many people who consider themselves
to work with researchers and scientists to provide information on OK so maybe this is the right of and it isn't and I also have another talk I can give uh which is more on the work we do directly in the way that the Software Sustainability Institute is organized in run I'm happy to take questions on that as well but effectively this talk is meant to be a motivation and the Guide for how you can talk to other scientists that you collaborate with and work with to help them see and why
they doing things which are very much just for themselves but overall software sustainability for everyone and and the sort of background to this is faced a lot on uh where
we've been working where particular and I have been working and the sorts of the in working with and some of you all to dinner I mentioned the fact that I started off in High Energy physics which obviously has a very large uh background in developing computer software it has many software engineers working for it and it has a large amounts of practice and a large amount of a kind of the process so it knows how to develop software in some respect but for me and it's
also very very uh difficult because you're 1 person in a huge consortium and you don't really get to
do very much exciting stuff so I ended up going over for a different direction I ended up working with what we call that the long tail of research
so how many people had that long tail yeah so there is this idea that you can apply the long tail to everything Simon to try and apply the long tail to something which probably doesn't fit with which is this
idea that and in science and research we've effectively got some people who are the superstars of the recent research world
these are the nobel prize winners these are the people who basically provoked a produce a large number of cited papers so these are the people who have an h index which is in the hundreds and these are the people who are asked are rightly seen as the people who have a lot of impact in research the but the thing is we also have has
long 10 people and we have a large number of researchers who were doing work that is also being cited and is also valuable the differences the products a slightly different they may not produce as many papers they
may not be cited as many times but when you look at it actually the total number of citations from this long tail or as I will call it most scientists the mainstream of scientists and is important to and and uh anyone is anyone a the paleontologist store and someone who works of dinosaurs know of have OK so what I have heard that his picture completely wrong for dinosaurs because they
don't have tales like this new research means that this is actually how dinosaurs that looks very have their tails of high and if we look at this with with what we see in the research world this is what I think is the really interesting and so on that and there this is where my colleagues at the High Performance Computing Centre
work they look at the people who are the the top 1 % and they help them make their work even better and understand in this the here looking to see how we work with all of the other people who might become the top 1 % and make their works slightly vector so this is where the improvements in practice and can have the most effect it's in the long tail of
researchers and and the question really is what can we do there how can we persuade people that things like producing better software how useful time and the 1st question I can only get
asked is what software got to do my research because a lot of the people in that area do not consider themselves software developers it's kind of
interesting actually and yesterday even in this workshop and when the we had the 1st that your presentation and people asked time people are are users and you know quite a few people put up their hands and then people were asked how many of you are developers there was nobody yet if you're not a user it's very difficult not to be in our develop and are is a platform that means that your writing scripts you are doing computational work you're extending were very few people who are all users who simply blindly run something that they have downloaded without changing anything so there is this disconnect where most people
do not think of themselves as software developers when actually they are software developers they're just not software developers who developed in code for other people to create a product they're not people who are and engineers they're not people who are seeking to do this because they want to produce a something that they can sell that
and and 1 of the things that's happened recently is that we have had a blurring of the lines of the different paradigms of sites so people will have seen this kind of idea role of of the fall paradigms of science from the empirical and theoretical uh we've known for many years through to more recently computational and like a lady data science as well and it's really great that I had the talk just before
basically layouts and a lot of the 2 input data sites but these these are no longer distinct and in almost all disciplines what happens is that you need to know how to do all of these different techniques for science and therefore we see not just in the computational in the data
exploration paradigms the use all of the uh software but we see it everywhere I mean even the even some people who are working in the areas of further theoretical mathematics on now starting to at least accept the idea that they may be computational
proofs someone researchers is impossible without software there's so many different places where you can see software being used
and I think the important thing for me is that mostly we sort of think of as software uh for science in this area sort of the particle physics in Large Hadron Collider or all climate science the very large models
but most software in science is a completely different and the scale it is the scripts that are written in XML and the use of the models that are defined in and something like Matt like so many people are using software in developing software but it's not necessarily the software we think of as the scientific software and to so we did some work to try and understand quite what effect that would have so we did a survey and in
2014 trying to understand how people regard itself I the big take away here is that the 68 % of researchers and that we interviewed across the leading research and intensive universities in the UK said their work would be impossible without software so it's
not and it's not just they will be hard it that it would be an you know they would not be able to do any of their research without the this specialist research sulfur in different in different ways and the other thing here this is going on something that part of is the i our survey results kind of I chime with many of the other and so being done to say that was there a lot of people who developing office over half of them are developing software and most have had no formal software training so what we end up with is a whole set of researchers who don't think they're developing software and or know that developing software but had no training at all and we feel that the software is vitally important to their work so this is a problem then and it's a problem for
many different reasons so as well
as this uh in don't knowledge that's offers important what we've seen over the last 10 years is an increasing number of articles coming out which are casting doubt about the true false and of science it's basically have things like the reproducibility crisis which you heard of an awful things where we are looking back and looking at the published um research and going can we have any trust in this so it happens in the bioinformatics and genetics so here we have a study which shows that it's very hard to repeat some of the top analyses in microarray gene expression and it's the same thing in
computer science so this is a study in which is looking at whether you can get to the software that is mentioned in papers and and the the overall summary that is mostly you can't an the and and even when you
get to the point where I magazines like the Economist which are nothing to do with science and our taking an interest in the reproducibility crisis you know
that overall we maybe do you have a crisis here for researchers so that we was set up the sulfur
Sustainability Institute as a way that uh the research funders in the UK could start moving some of this and it the some of these challenges outwards from being top-down challenges that were being can put out as guidelines by the research can funders to being
more bottom-up initiatives that looked at trying to change the way that people work across research in the UK and collaborating with people across the world and i are
of I realize I'm not wearing a tee shirt we actually have T-shirts and you can buy the T-shirt someone us about sustainability of all platforms and every time you buy a T-shirt we get an extra equivalent of 1 you were 50 so if you buy T-shirts who given to a higher a new member of staff and goods there but we we have this slogan better software better research
because we think that body developing software better actually you become more efficient and more effective the research and the rest of my talk is really about the very simple steps that we try and a tell people to do to make them more effective researchers and I think this is something that's that's changed a lot the types of skills you need as a researcher in the modern world
are different from the ones that 20 years ago and definitely different from the ones 40 years ago and we need to keep up with its and and the whole load of skills mostly around data analysis and data management that we do not teach people so uh so itself was not so important why is
it so hard to reuse because the other question we get asked is well that we we think this is a good idea to do this but no 1 else seems to be doing it and
you know why should we spend the time when no 1 else's the and a lot of different reasons and
Victoria Stodden in the US and did so a survey of the Machine Learning community for both code and data sharing
and a lot of the things that you see there are probably the ones that you know and yourself are the ones that you you kind of worry about so it things like the time it takes to document and clean up your software or this 1 100 right and the 2nd 1 is dealing with questions from users and and you can also click questions like that ransom dealing with questions from users could be re-expressed as a starting collaborating with
great new our collaborators so there is this thing goes but of a lack of incentives and for sharing code and making code reusable 10
analyzed by the online as well and I put up of external link and there's a a tweet that's gone which has the link to the slides as
well as the other thing that is a problem is and is something that's happening more recently and I think is a real issue and for science and that is expressed in this kind of statement here and I'm just going to kind of like highlights the Seoul last so this is the example of someone can share their code and then had their peers
criticize them for doing it because the code was not necessarily great code it is fine code that but the problem is that nowadays because it's all out in the open potential employers can see this as well and they might not have a look at your code base so when you look at the comments thereon that code and this is a problem because basically we are now in a kind of culture which is all about competition there are not enough jobs so people compete this also means so that the competition can get very vicious for different reasons there are possibly other reasons flow of
things like this and there is a lot of work that is needed to understand diversity and inclusion so the other thing I've not put on this slide deliberately is that this is the experience of the female coder not enabled code but but we have this problem so there is no incentive to share code because even if you do share your code you might end up with a whole of people criticizing you for very minor things
so that's the problems now we've ended up with a research culture where and you don't share your code because it's a fear of being found out the poor of software engineering skills and there's no reward for
publishing Co um as has been mentioned many of the other talks and uh there's there's no incentive to actually get a good software product out there because if it comes to a promotion and Committee they'll just go over what how many papers have been published what is you will have that and and a lot of people feel being scooped foot also someone stealing and their work and and getting the publications that they think they should have booked them and the other thing is that uh and it was great to see a talk on and on copyright licensing many organizations do not understand how to exploit open-source licenses and I know the my university is only just starting to understand how to use open source source licenses effectively
to exploit their intellectual property so this is the this is the main meat of
the total what himself a scientist you to get ahead instead and I'm going to give 5
basic guidelines and I hope that you basically just nodding along with the idea and because I don't think any of them a particularly spectacular and none of them look at on on new and all of them are hopefully very simple and but the problem is that quite
often we don't think we have a time even to do some very simple things uh and the first one is just improve your skills and
so all we have about Software Carpentry how also mentioned data carpentry which is the equivalent for data analysis and data management skills but the point is that these a jumping off points and the idea is to continue to learn and I know point people at this so I'm a few years ago we wrote a paper called best practices and scientific computing which is a great paper I think for this audience because it brings together a lot of references to software engineering studies that show what practices that actually worked so
it's all based on an evidence-driven research the problem is that almost all scientists will not be able to I apply best practices because they arrive the less likely to time consuming or they do not have the experience to this so a subset of people from that paper I wrote this thing called good enough practices and scientific computing and that is
the paper that I would suggest you give to all your columns and and give an example of 1 of the things that they talk about uh to do revision control on the next page but it is important to remember that not all of the people you may work with our necessarily uh have a good background in this area so whilst you will be very happy with best practices for most people might be happy with good enough practices 10 and the whole point of this is I mentioned that is this kind of skills
difference really what we're trying to do is improve the efficiency and effectiveness of your research and so by continuing to learn new skills whether it be in new technologies or new techniques or things like information visualization and all of these help you get across your research that can
help you do your research quicker and by for myself and trying to learn how to do a better data analysis using at hand as in data frames which I am completely failing to do just know at which is means that I I 2 am someone who could benefit from this trend found on the suspect stresses the scientific
community so the 2nd thing as 2nd it is keeping things tidy and he is a good example of the difference between what's for instance uh we tell people to do in good enough practices versus best practices so best practices it talks
about using revision control systems that might get all mercurial or SVM and In good enough practices but it mentions the fact you should just the understanding what things are different versions even if that is simply by having a good naming scheme for your files because some people find revision control systems really hard particularly gets and having people find it hard yeah so
so what we try persuade people to do is use version control of any sort and because any sort of version control is better than nothing and most people we talked to nowadays will be using something like Dropbox or Google Drive as their 1st attempt at version control and then as they go a bit further on band start using things like it help because of the other things that they know that the uh infrastructure platforms like it provide 10 and we're trying to persuade him to do it for everything so not just for their software but the data that papers that talks and and the other
thing is making sure that that's backed up because 1 of the great sellers of version control and revision control systems is getting back to previous versions of your work and and 1 of the things we would like to persuade people to do is to check that their back ups work for all of the stuff that they putting at to the version control the other thing is
to get into a state where they the um I work with things like tidy data and tight code so this is the idea that and you're trying to get your data in your code into a form which makes them reusable and so makes a much easier to use with different tools and and it makes it much easier to be reused by themselves and here's where the selfish part comes in 3 what we're talking about is a whole set of practices which are useful for that particular researcher and we don't really care if they're not shared with other people at this point and all we're trying to do is make sure that they are not losing out and as a by-product as a as a secondary symptom almost it makes it easier for them but to share and conduct research of other people as well it
the OK as an example is the libel notebook and so if you're if you're trying to kind of see an example of how to find a tidy things up and share it well but
I would like you to go and see that I would explain that much detail and that the release
early release off so research in iterative process a lot of people see it is just being a publishing papers but if we see it actually is a career what you're doing is publishing many papers as an ongoing and set of of I guess outputs in research field and and released in your software and data forces you to
check and clean them it kind of make sure that your tidying does the right thing I 1 of the things we say here is even if you're not wanting to go completely open from the start to make sure you're releasing to a trusted colleague so there's not just you looking at your software and so even if you are not making it available outside make sure someone else's looking at and but of course open has many benefits and and he really was looking about persuading people to become and make the research more reproducible but mostly by their own team uh so 1 of the biggest problems we have is with uh for instance PhD students who don't show their close to their supervisors or to some of the other members of the team until the point where they
get to review and and that's too late right we want to share their code that and and the many reasons why you should do this that sort
of say actually that makes you have a high scientific and impact and fall for it is to get
credit for everything and we're already had a lot of talks in the sum of its dwell on this but the main thing is to make sure that uh your work is easy to cite and reference and the important thing here is to provide a good enough metadata so others can actually find your work and credit you for it because if you want to get credit for everything and
after your career it's no use if no 1 understands what your work is or where it is or how to cite it so that increases your visibility and your reputation and
and I won't talk about it here but said this but I
the I and the editor-in-chief of the Journal of Open Research Software then and many many places where you can publish software or software papers or anything to do software and and there's a lot less than the link there but these last software journals and it's gonna hold a lot so when I started to reading this list there about 7 places you could publish software in there now over 80 so there's no excuse for not getting credit for your work
yeah and then the last step and the Euronews a 1st so try and find software that erect really exists I'm an extended modify but most importantly develop software that solves your own research questions and so don't try and create a product for someone else and try and use your software with yourself as a main and as the main user
and use the different kinds of infrastructure to try and split apart and recorded different roles so that you can both the user and the developer and of manager 10 and once is working for you to find others like yourself to collaborate is this is basically the golden rule of start come up with an idea that you would like to see happen work to make it work for very small and very specific audience and then go global because that's the only way really of making something that is ultimately reusable is by reusing it yourself and and he is a
great example of creating a whole set of reusable resources that can be used by
someone in a high school and by bringing together open-source software open data and open computational resources to solve new problems OK so I need to
recognize as so I've kind of given 5 different sets for and how selfish scientists can make their own work better but really what these are at its that all help drive software sustainability
so they're all about making things more available more reusable and and more maintainable in the future so that software can be used to meet new needs on new platforms and a lot of this is just driven by limited research resources so really what we're talking about is this quote and is sort of necessity being the mother of invention so in some sense what we're telling people with you don't need a lot of resources to do all of this in fact the fewer resources you have the easier it is to follow these guidelines and have results so then
I'll finish there with that sort of set of guidelines and I'm happy to take
questions thank you kinds
Software
Selbst organisierendes System
Dienst <Informatik>
Information
Computeranimation
Software
Elektronischer Programmführer
Information
Computeranimation
Software
Prozess <Physik>
Software
Physikalismus
Information
Software Engineering
Quick-Sort
Computeranimation
Open Source
Spezialrechner
Software
Einheit <Mathematik>
Zahlenbereich
Dienst <Informatik>
Information
Steuerwerk
Steuerwerk
Computeranimation
Fitnessfunktion
Richtung
Open Source
Spezialrechner
Software
Automatische Indexierung
Zahlenbereich
Besprechung/Interview
Zahlenbereich
Dienst <Informatik>
Information
Steuerwerk
Computeranimation
Open Source
Spezialrechner
Subtraktion
Total <Mathematik>
Zahlenbereich
Zahlenbereich
Speicher <Informatik>
Biprodukt
Steuerwerk
Computeranimation
Soundverarbeitung
Spezialrechner
Open Source
Software
Supercomputer
Zahlenbereich
Dienst <Informatik>
Vektorraum
Information
Steuerwerk
Computeranimation
Open Source
Spezialrechner
Software
Flächeninhalt
Software
Zahlenbereich
Softwareentwickler
Steuerwerk
Computeranimation
Software
Schlussfolgern
Skript <Programm>
Dienst <Informatik>
Information
Biprodukt
Softwareentwickler
Kombinatorische Gruppentheorie
Systemplattform
Code
Computeranimation
Software
Subtraktion
Web Site
Programmierparadigma
Dienst <Informatik>
Information
Ein-Ausgabe
Gerade
Computeranimation
Software
Mathematik
Flächeninhalt
Software
Beweistheorie
Vorlesung/Konferenz
Physikalische Theorie
Computeranimation
Soundverarbeitung
Zentrische Streckung
Tabellenkalkulation
Dienst <Informatik>
Quick-Sort
Gerade
Computeranimation
Informationsmodellierung
Software
Flächeninhalt
Maßstab
Komplex <Algebra>
Software
Code
Skript <Programm>
Skript <Programm>
Resultante
Subtraktion
Wellenpaket
Formale Grammatik
Dienst <Informatik>
Sondierung
Computeranimation
Office-Paket
Spannweite <Stochastik>
Software
Gruppenkeim
Menge
Software
Mereologie
Grundraum
Beobachtungsstudie
Prozess <Informatik>
Division
Zahlenbereich
Spieltheorie
Datenreplikation
Dienst <Informatik>
Information
Extrempunkt
Natürliche Sprache
Analysis
Computeranimation
Datenhaltung
Homepage
Software
Digital Object Identifier
Natürliche Sprache
Versionsverwaltung
Beobachtungsstudie
Beobachtungsstudie
Fehlermeldung
Punkt
Gebäude <Mathematik>
Computeranimation
Software
Software
Datenverarbeitungssystem
Verschlingung
Formale Sprache
Code
Informatik
Streaming <Kommunikationstechnik>
Software Engineering
Software
Randwert
Dienst <Informatik>
Information
Computeranimation
Software
Randwert
Software
Stab
Güte der Anpassung
Dienst <Informatik>
Äquivalenzklasse
Systemplattform
Computeranimation
Software
Datenmanagement
Randwert
Last
Software
Datenanalyse
Datentyp
Dienst <Informatik>
Information
Computeranimation
Eins
Subtraktion
Sondierung
Gemeinsamer Speicher
Raum-Zeit
Programmverifikation
Maschinelles Lernen
Dienst <Informatik>
Vektorpotenzial
Sondierung
Information
Code
Computeranimation
Software
Code
Wissenschaftliches Rechnen
Algorithmische Lerntheorie
Sondierung
Desintegration <Mathematik>
Raum-Zeit
Programmverifikation
Maschinelles Lernen
Code
Computeranimation
Eins
Kollaboration <Informatik>
Software
Rechter Winkel
Software
Code
Wissenschaftliches Rechnen
Befehl <Informatik>
Sondierung
Raum-Zeit
Desintegration <Mathematik>
Programmverifikation
Peer-to-Peer-Netz
Maschinelles Lernen
Vektorpotenzial
Binder <Informatik>
Code
Computeranimation
Software
Twitter <Softwareplattform>
Reelle Zahl
Code
Wissenschaftliches Rechnen
Punktspektrum
Rechenschieber
Software
Prozess <Informatik>
Code
Dienst <Informatik>
Vektorpotenzial
Information
Inklusion <Mathematik>
Datenfluss
Code
Computeranimation
Gemeinsamer Speicher
Selbst organisierendes System
Open Source
Linienelement
Güte der Anpassung
Dienst <Informatik>
Quellcode
Information
Code
Computeranimation
Physikalisches System
Open Source
Software
Exploit
Software
Code
Strom <Mathematik>
Software Engineering
Grundraum
Software Engineering
Physikalisches System
Open Source
Software
Exploit
Kategorie <Mathematik>
Code
Linienelement
Dienst <Informatik>
Strom <Mathematik>
Computeranimation
Software Engineering
Beobachtungsstudie
Punkt
Kontrollstruktur
Datenanalyse
Äquivalenzklasse
Optimierung
Computeranimation
Software
Softwaretest
Datenmanagement
Software
Wissenschaftliches Rechnen
Punkt
Visualisierung
Software Engineering
Punkt
Kontrollstruktur
Güte der Anpassung
Versionsverwaltung
Dienst <Informatik>
Information
Optimierung
Computeranimation
Homepage
Teilmenge
Software
Softwaretest
Flächeninhalt
Wissenschaftliches Rechnen
Wissenschaftliches Rechnen
Punkt
Visualisierung
Soundverarbeitung
Subtraktion
Rahmenproblem
Kontrollstruktur
Datenanalyse
Dienst <Informatik>
Information
Optimierung
Computeranimation
Software
Softwaretest
Twitter <Softwareplattform>
Visualisierung
Wissenschaftliches Rechnen
Punkt
Information
Visualisierung
Hilfesystem
Subtraktion
Kontrollstruktur
Güte der Anpassung
Versionsverwaltung
Nummerung
Physikalisches System
Dienst <Informatik>
Elektronische Publikation
Information
Datensicherung
Computeranimation
Software
Code
Versionsverwaltung
Instantiierung
Bit
Kontrollstruktur
Versionsverwaltung
Physikalisches System
Dienst <Informatik>
Information
Datensicherung
Systemplattform
Datensicherung
Quick-Sort
Computeranimation
Software
Software
Code
Gruppe <Mathematik>
Versionsverwaltung
Punkt
Kontrollstruktur
Gravitation
Wellenlehre
Dienst <Informatik>
Extrempunkt
Information
Datensicherung
Code
Computeranimation
Bildschirmmaske
Elektronische Unterschrift
Notebook-Computer
Code
Theoretische Physik
Gleichungssystem
Gammafunktion
Binärcode
Kollaboration <Informatik>
Gerichtete Menge
Spirale
Software
Menge
Mereologie
Notebook-Computer
Sigma-Algebra
Versionsverwaltung
Wärmeleitfähigkeit
Aggregatzustand
Kollaboration <Informatik>
Prozess <Physik>
Prozess <Informatik>
Iteration
Kraft
Computeranimation
Software
Datenfeld
Iteration
Forcing
Software
Notebook-Computer
Funktion <Mathematik>
Punkt
Prozess <Informatik>
t-Test
Abgeschlossene Menge
Kraft
Dienst <Informatik>
Information
Quick-Sort
Computeranimation
Software
Iteration
Software
Rechter Winkel
Instantiierung
Elektronische Publikation
Gewichtete Summe
Metadaten
Programmverifikation
Bildverarbeitung
Computeranimation
Metadaten
Open Source
Software
Code
Wissenschaftliches Rechnen
Skript <Programm>
Flächeninhalt
Messprozess
Bitrate
Software
Metadaten
Software
Offene Menge
Code
Desktop-Publishing
Skript <Programm>
Mailing-Liste
Dienst <Informatik>
Information
Binder <Informatik>
Computeranimation
Software
Subtraktion
Software
Digital Rights Management
Dienst <Informatik>
Information
Softwareentwickler
Biprodukt
Computeranimation
Software
Menge
Software
Offene Menge
Datenverarbeitungssystem
Open Source
Rechenschieber
Systemplattform
TLS
Gasdruck
Dienst <Informatik>
Hilfesystem
Computeranimation
Resultante
Systemplattform
Dienst <Informatik>
Information
Systemplattform
Quick-Sort
Computeranimation
Modallogik
Software
Verbandstheorie
Menge
Software
Inverser Limes
Schlussfolgern
Software
Dienst <Informatik>
Information
Computeranimation

Metadaten

Formale Metadaten

Titel Software sustainability - guidelines for the selfish scientist
Serientitel 2nd Conference on Non-Textual Information: Software and Services for Science (S3), May 10-11, 2017 in Hannover
Teil 9
Anzahl der Teile 13
Autor Hong, Neil Chue
Lizenz CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/31022
Herausgeber Technische Informationsbibliothek (TIB)
Erscheinungsjahr 2017
Sprache Englisch

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Software is fundamental to all areas of research and science. The move towards Open Science has made it even more important that software is made accessible, reusable and maintainable: all facets of software sustainability. However we still face the challenge of translating the enthusiasm of the Open Access, Open Data and Open Science vanguards to the wider community of researchers who may lack access to infrastructure, skills and effort. This talk will draw on the experiences of the Software Sustainability Institute in working with the long tail of researchers, including the formation of the Journal of Open Research Software, to present a different perspective of software for Open Science.

Zugehöriges Material

Ähnliche Filme

Loading...