We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback
00:00

Formal Metadata

Title
RSE 2.0
Title of Series
Number of Parts
60
Author
Contributors
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Research Software Engineering is gaining in recognition as both a discipline and a career thanks to established bodies such as the SSI and newer initiatives including deRSE. But what does the future hold as RSE groups mature, research becomes increasing data-driven and the practice of software engineering evolves ever-more rapidly? This talk considers potentially disruptive developments in technology that RSEs may need to embrace to play a ongoing role in keeping their institutions at the forefront of research: from machine learning and "Software 2.0” to serverless deployments. I hope this will inspire discussion within the community regarding the future of Research Software Engineering and how we prepare ourselves for it.
6
Thumbnail
15:56
53
Thumbnail
22:03
Standard errorComputerSoftwareBitProcess (computing)Group actionMultiplication signReceiver operating characteristicPattern recognitionStandard deviationMereologySoftwareLecture/ConferenceComputer animation
Standard errorMultiplication signProcess (computing)SoftwareExtension (kinesiology)Selectivity (electronic)BuildingReceiver operating characteristicArithmetic progressionBitLevel (video gaming)Computer animationLecture/Conference
Standard errorSoftwareReflection (mathematics)Software engineeringGroup actionCodierung <Programmierung>Service (economics)Personal digital assistantEndliche ModelltheorieFormal languageComputer programmingArchaeological field surveyStack (abstract data type)Software developerBuffer overflowControl flowRevision controlBuildingSoftware testingScripting languageVulnerability (computing)AutomationCodeMeasurementTrailIntegrated development environmentFacebookProcess (computing)Computational physicsProgramming paradigmImperative programmingMachine learningComputerSkeleton (computer programming)SubsetSpacetimePattern languageComputational intelligenceSoftware frameworkData integrityUsabilityInformation privacyInformation securityMachine learningArchaeological field surveyDependent and independent variablesCollaborationismShared memoryBitDifferent (Kate Ryan album)CASE <Informatik>Server (computing)Data managementLevel (video gaming)PressureAutomationSoftware testingCodeStapeldateiState of matterData structurePhysical systemReverse engineeringCondition numberPlastikkarteFormal languageGroup actionLocal ringProcess (computing)Einbettung <Mathematik>Instance (computer science)QuicksortMathematicsSupercomputerUsabilityInformation privacySoftware frameworkIterationRight angleTerm (mathematics)Contrast (vision)Observational studyMathematical analysisPlanningWebsiteLine (geometry)SoftwareObject (grammar)Multiplication signTranslation (relic)TwitterEvelyn PinchingAuthorizationNeuroinformatikPerspective (visual)Software developerEndliche ModelltheorieComputer programmingWeightProgrammierstilFacebookProgramming languageExploratory data analysisMetric systemField (computer science)Personal digital assistantAstrophysicsWeb 2.0Near-ringSpacetimeGoodness of fitCountingDefault (computer science)Sign (mathematics)CodeExpected valueInformation securityType theoryStructural loadWave packetHypothesisLink (knot theory)Interactive televisionDisk read-and-write headVulnerability (computing)Codierung <Programmierung>Revision controlNatural numberPay televisionSoftware engineeringBlogReading (process)Virtual machineSystem callDigitizingINTEGRALStatisticsScaling (geometry)Extension (kinesiology)Online helpSuite (music)Product (business)Computer animation
Pattern recognitionService (economics)Artificial intelligenceGroup actionData storage deviceScalabilityProduct (business)BuildingControl flowOpen sourceSupercomputerCodeStaff (military)PrototypeResource allocationSoftwareDigital rights managementSource codePublic domainTuring testMathematicsMachine learningSoftware engineeringBefehlsprozessorGraphics processing unitPoint cloudSpectrum (functional analysis)Software frameworkMathematical singularityAutomationVideo gameGraphical user interfaceServer (computing)Solomon (pianist)Computer networkDigital signalOpen setStandard errorSingle sign-onArchaeological field surveySoftware developerBuffer overflowStack (abstract data type)Gateway (telecommunications)Cartesian coordinate systemSoftware developerArchaeological field surveySoftware engineeringProjective planeCodeVirtual machineMultiplication signHyperbolischer RaumFunction (mathematics)Capability Maturity ModelIntegrated development environmentType theoryPosition operatorComputer programmingMathematicsTwitterReal numberProduct (business)Turing testScaling (geometry)Disk read-and-write headUniverse (mathematics)Open sourceState of matterEndliche ModelltheorieLine (geometry)Staff (military)View (database)Perspective (visual)UsabilityServer (computing)Software maintenanceMereologyTraffic reportingPublic domainCASE <Informatik>Open setLaptopGroup actionPlanningField (computer science)Machine learningConvolutional codeStructural loadSlide ruleService (economics)SoftwareSuite (music)Natural numberProcess (computing)Graphics processing unitPressureComputer configurationBlogWave packetMetric systemBitAdditionTerm (mathematics)Level (video gaming)Data structureWordPlastikkarteWritingKey (cryptography)CausalityCodierung <Programmierung>Water vaporExpected valueRevision controlIncidence algebraReflection (mathematics)Software testingObservational studyBefehlsprozessorShared memoryElement (mathematics)Receiver operating characteristicNumberData managementString (computer science)Matrix (mathematics)Ocean currentFormal languageFood energyLatent heatPopulation densityArithmetic meanNeuroinformatikPerformance appraisalResultantThumbnailBuildingGoodness of fitData storage deviceConsistencyWell-formed formulaCellular automatonComputer animation
Goodness of fitMultiplication signSlide ruleXMLComputer animationLecture/Conference
Standard errorCartesian coordinate systemGateway (telecommunications)Service (economics)Sign (mathematics)XMLComputer animation
Electric generatorMultiplication signSoftware developerNormal (geometry)Data structureEvent horizonComputer configurationNP-hardLecture/Conference
Transcript: English(auto-generated)
Just to introduce myself a little bit more, I work at Imperial College, I lead the RSC team there. I've been doing so for about 18 months now, which is when the group was founded. I've been working as an RSC for a long time, not exclusively in my career. My first job was an RSC, but since then I've worked in industry and academia.
So the purpose of this talk is to describe some of my experiences and my opinions about where RSC can and might go, but I think it's important to recognise that RSC is still emerging along three axes, as it were. I think people who develop software as part of their research, I think standards are
improving there and recognition of what is best practice is definitely improving. We obviously have roles now in research where people spend a majority of their time doing RSC, even if it's not identified as such. And now in the UK and more recently in Germany, I think there are jobs entitled research software
engineer, and I think that's great progress, but of course the people who attend conferences like this are to some extent self-selecting. We recognise ourselves as doing research software engineering, but there are many researchers who write software who don't, and there's still a big gap, I think, to cross there. And that's where a lot of effort has been taking place recently.
Community building at the grassroots level is to try and bring those people in to the RSC community, and I think that's incredibly valuable. But the people who are now working as RSCs formally, or otherwise, what can we do to look a bit further ahead and establish how we can genuinely accelerate research in the future?
So these are just some opinions. I wouldn't bet your career on any of these, but I think some of these are more likely to become true than others, but yeah, take it with a pinch of salt. So basically I'm going to just look at some trends that I've seen recently and just describe what I think the implications of those are for RSCs, groups and other stakeholders
in the field. So first of all, the obvious one, I mean, how is technology itself changing? And I think what we see actually is a lot of disparity here. So rarely does old code kind of die. So we have that around and we have very old established disciplines that predate the
term resource software engineering, but have been doing a lot of software development, scientific software development. So we see communities that have been long established, I mean, particularly, I know, Astro or Earth Sciences, groups that we work with very actively at Imperial. But then there are newer disciplines, I guess, genomics that don't have Fortran code
simply because they postdate the emergence of that kind of research. And similarly, that plays in with the kind of languages that people use and what they expect development or software development to involve. I mean, we're involved right from developing new codes in Julia through to modernizing codes in Fortran.
And it's very different considerations to take into account. I mentioned Julia specifically because recently that's an emerging language with a lot of very enthusiastic followers, a very different language in many ways, but with a similar goal to languages like Fortran.
So that's where we see some growth right now. So there's very much a kind of a big gap again between the established and the emerging there. Obviously, in terms of the infrastructure. So my group closely collaborates with others within research computing at Imperial. So managing the HPC facility. And of course, there we have infrastructure that we've had on premise for a long time.
It doesn't fulfill everybody's use cases. So of course, that's very much batch processing. But what we're seeing now is people wanting to do more exploratory work, particularly around data science. So that's more interactive. And we are seeing a move obviously towards funders being a little bit more sympathetic to paying for services,
subscription based services, shall we say, rather than funding big capital expenditure on local infrastructure. And finally, I mean, the pace of change does appear to be increasing for sure, right? So compute capability and accessibility. What I mean by that is, you know, you can get cloud instances that are highly capable now.
And some of the vendors have been speaking out this conference on that topic, but also tools. We're seeing like new frameworks. I mean, TensorFlow has a lot of mind share right now as an example. But that's already been through a couple of major iterations and develops very fast, but is not easy to install, set up or understand. So it's no longer possible to be complacent if you want to use emerging technologies.
But there are some obvious things, right? This has already been mentioned in contrast or to complement maybe the RSC survey. This stat developer is a big survey, of course, less academic, perhaps more industrial in terms of the responses they get. But there, I mean, Python is only surpassed by JavaScript for different reasons.
But again, the growth continues there a lot. And despite the emerging languages, Python still has a great user base and lots of use cases. So software engineering, it seems like we've got to a stage now where version control is mostly accepted, as we've seen in the survey responses here.
And people are beginning to turn their mind to ensuring that the software can be easily built, tested. And I think the next frontier is definitely embedding CI as a practice. So that involves automated testing. But also moving on to automated quality assurance. So really ensuring that people are doing things like enforcing code style, doing properly automated testing.
And I mean interesting, more interesting types of testing, like hypothesis, which has been mentioned already. And then I think something that can no longer be ignored, especially with the types of analysis people are doing, particularly with data science, is looking for vulnerabilities in software and ensuring that those tests and checks are automated. And then finally, when you have automation, you can do rather interesting things like tracking metrics over time.
So performance test coverage, even establishing or quantifying your documentation coverage. And we're trying to establish these practices with scientists who may not have considered this necessary or even possible before. We're seeing great benefits from doing that.
And the next step, I think, perhaps is around people who market this stuff would call it AI. Of course, it's not, but it's code assistance. So it's more than just doing starting analysis for formatting and or tests. It's actually identifying copy and pasted code or doing checking whether code has various other quality metrics.
And I think autocomplete is like autocomplete on steroids. So but Facebook have some good examples here. I mentioned Facebook. It's hard to ignore them from a software engineering perspective because they code at scale and they have very good engineering practices and some novel tools. But I mean, I can distinguish that from the business itself.
And these are definitely worth looking at because they're interesting ideas and they're written up quite extensively. But what they're doing is actually helping you write code interactively by identifying common mistakes, code that's similar elsewhere in your code base or other people's and helping you improve its quality.
So, I mean, this is pretty clear right now that although things are improving, the absolute best practice for industry aren't commonplace in academic software engineering. And I think the sooner we accept that, the better. But it presents a great opportunity for RSCs to again, help with science. I think that's something I'll mention just using some quotes, perhaps,
is that there's a real interest now in entirely new models for programming. And this comes out of machine learning, but it's really based around objective, what I'd call objective based development. So Eric Meyers, he's head of research engineering at Facebook, but he's previously been quite influential in development and design of some other programming languages.
He's worked on Haskell, C-Sharp and others. And he's talking about how we go about adopting machine learning in software development. These are all links and definitely worth, as a video, it's definitely worth watching.
Again, Karl Pathy, who's the head of engineering at Tesla. Again, it's about specifying a program space and then trying to encourage a program to emerge from this based on your objectives. So a different way of thinking about code and doesn't suit all use cases, but it's certainly very interesting and it gives you an entirely new perspective on software development.
Wolfram, again, he's founded, created Mathematica, has written some very weighty tomes on software and engineering and where things are going. He has a product to sell, but this is another compelling talk and blog post that's definitely worth taking a look at. Depending on what we want, we define what we want to do rather than how we're going to do it.
And somehow we, at least in an assisted manner, end up with code that does what we want. So looking at, we've looked at technology and looking at the research itself now or the science, what we're seeing is everything being digital by default. I mean, all data, all research becoming sort of digital research. And that involves embedding data in every stage of the process, right from planning to analysis.
So this is a very interesting article that's worth reading. The Turing Institute in London is doing very interesting work along these lines, and this explains how data science is influencing research and vice versa.
Of course, we're seeing a lot of interdisciplinary research, which involves adopting not only a common language, but sometimes a common infrastructure and a workspace for people in which to do their research. And that has some overlap with collaboration. Of course, people doing research in very distributed disparate teams. And that involves not just doing the research, but the data gathering itself. So how can we, how can we enable that?
And a big feature or focus right now is around integrity. So how can we ensure that research is reproducible and repeatable? And that is something where software engineering can definitely play a role. And then just a bit more broadly, we're seeing pressure and interest in, for everyone, from institutions to funders to ensure that what we do has a quantifiable and quantified impact.
We see researchers coming to Imperial who seem less prepared than ever to deal with the existing infrastructure. It doesn't mean that they're short of skills. It means that the infrastructure we're providing may no longer be suitable. So this is about batch Unix-based HPC systems.
People who have grown up, to put it simplistically or notably, using something like an iPad. How do we bridge the skills gap there? And of course, we can do some training and we can assist. But it's something that needs to be recognised. And that relates to people's expectations regarding accessibility and usability of both
scientific software and the infrastructure. And there's privacy and security concerns coming to the fore as well, regardless of the nature of the data and the research that's being done. And I think we're also seeing loads of interesting research being done in industry. There are places like, to mention one, DeepMind, where you have access to enormous compute and data.
You're tackling interesting challenges that academia no longer has a monopoly on. And at the very least, we have to recognise that we're competing for talent in that market. So what can we do to embed our position and ensure that academia remains an attractive and great place to work? But on the positive side, we're seeing research software engineering as being useful
beyond research and academia. I think government in particular are seeing the benefit of having domain specialists in roles that can also produce some real advances with technology itself. And of course, we're seeing just look at the survey results. Generally, diversity improves outcomes. And if we recognise that RSCs and RSC groups aren't as diverse as possible,
what are we going to do about that? And who is going to drive that change? So for groups, I expect them to provide broader services. So UCL branching out from RSC in particular to an AI studio. Of course, there's some branding in that. But what the work that that team is doing and the skills that they have is somewhat distinct from traditional RSC.
As I mentioned, new kinds of infrastructure, not only to suit nature of research and data science, so that's around GPUs for machine learning, perhaps. But do we provide institutional CI solutions in addition to source control? What do we do in terms of usability? So notebooks, notebook servers, JupyterHub, BinderHub over traditional HPC.
And if you're doing more data science, you of course need more storage. And to get a better return on the investment, you need to do more scalable activities. So when you start out developing products for people rather than working alongside them, or even pairing doesn't really scale depending on the size of your team. So you focus perhaps more on self-service resources, project templates, exemplars,
training, and more community building. So helping people help themselves and having peer groups where they can share knowledge. And yeah, I'm under pressure particularly to measure what we do and how beneficial it is. And you can pick your own metrics, but maybe it's better to have some than none.
So I wouldn't agonize about what they are. But we do our best to say, here's what we're doing, how it benefits people and how our metrics are improving. And to reflect the pace of change, you need to give more staff time for learning and prototyping in particular, I think. And the most interesting thing is as RSC groups reach a maturity level where they're growing
and need some kind of structure, there's emerging models for how that might be done. So this is an interesting paper. Three case studies, actually. I think one in the States, one in the UK, and I forget where the other one is. But it involves three organizational structures for RSC groups and how they've benefited the group's productivity and their outcomes.
And maybe we should be considering, given how much open source software there is and how much reinvention goes on, we should be looking a lot more at writing less code and focusing more on code review in particular to improve the quality of the stuff we write and to ensure that it's not duplicating effort. That also launches a domain in the field.
But Alex Hill gave a great talk here, and I've linked here to her blog post and her personal blog post and one from her group which describes the practices that they carry out at Imperial. This reflects that. Having loads of code is not always a good thing. I think people know this, but it's always worth reminding.
It's particularly relevant as RSC groups go, and they're bound to have, as time goes on, a larger state of code to maintain. You've got to consider now, if every line of code you write, you want to be maintaining in five, 10 years time as a group. Otherwise, you're bound to get bogged down in maintenance. So RSCs themselves, rather than the groups, have to find time
and be interested in continuous learning because the pace of change is not going to slow down. And over time, as you become part of a group that grows, you should consider how you want to specialize. And there's lots of ways you can do that, whether you go from software development into, I don't know, DevOps or documentation specifically, whether you want to go into medicine or engineering,
whether you want to adopt particular programming languages, or whether you want to get into more machine learning rather than be a general software developer. So several axes, again, that you can move along. And I think as the community grows, there's much better opportunities to find mentors. And this, I mean, this is a great place to start. I try and find someone who's two, three years ahead of you,
no more, no less perhaps, but see how their career has progressed, how they can advise you to improve as a developer and find career paths within academia. So the interesting thing about data science and machine learning is unlike some other stuff like testing, it's something that people are very enthusiastic about trying and learning.
And this is a great place to start. It's written from a software engineering perspective, but again, an output from the Turing that helps you find out how to do data science. Imperial, I plugged this through a great recent Coursera course on mathematics for machine learning, which is a good fundamental to trying to get into that field. And Microsoft have written an interesting piece about how you can fuse the best of software engineering
and use it in machine learning. And it's not going to hurt to learn some things that appear to be buzzwords but are bound to be prevailing trends in the longer term, which is kind of off-premise and task-specific CPU, GPU, TPU. So it's about technologies or server capabilities
that relate to specific scientific domains and research. And just to reflect on data science, so this is James who's head of engineering at the Turing Institute saying that the job title does no longer have a meaning. So data science and software engineering merging into one. It does not mean that everybody is both, but it means that most people's jobs have an element of each.
So familiarity with these technologies is going to be very useful for people. So it's about executable code, of course, notebooks in general, containers definitely here to stay, automation. And there's some really interesting new stuff. So iodide is in browser, notebooks are entirely client-side
enabled by WebAssembly. Reproducible documents and journals that you have to submit or you can submit executable articles. And general accessibility and usability of the software you produce for new audiences. And we're running a workshop this year at the UK RSC conference on this. Diego Alonso Alvarez is a member of my team. So WebAssembly, here's some more hyperbole
from someone who has an interest in the field, a vested interest. But again, this is something that I don't think can be ignored because it's very interesting as an execution environment for any types of code universal execution environment. So definitely take a look into that. So institutions need to do better to join people up so to share knowledge. And Jeremy gave a talk here about,
there's a poster here, about doing that in the UK. And we've got some challenges, I think, around career paths. And King's in particular doing a great job with that. They've published all of their guidelines. So I definitely review those. And from my perspective, the challenges are around recruitment and growing RSC teams because talented developers have no shortage of options
when it comes to choosing where to work. So training is really important for early career researchers to ensure that they have some of these skills and appreciate what RSCs can do for them. So just to back a little bit more of this up, we've got the CC report from the Open Science Monitor making very clear and explicit conclusions
regarding policies. Universities need to create research software groups. Funders should expect, I think, RSC involvement in any grant proposal. They should be demanding software management plans. I think this is bound to happen and we need to be prepared for this. They need to acknowledge sustainability to fund long-term projects. And by that, more fellowships, infrastructure,
and fund initiatives like the recent Aspiring RSC Leaders Workshop for people to move in their careers within RSC groups and start new groups. All of the slides are there. They're definitely worth looking at if you're interested in one day forming or leading an RSC group. So the Open Science Monitor had views on this as well. They must include RSCs in preparing bids.
And I just put this in, nearing the end, just to a different perspective to the RSC survey. And they find a lot of academic researchers aren't very happy, don't earn very much, and are looking for new work. So I guess it depends on which surveys you're looking at. But I think maybe I see more of this
in perhaps in the UK. In Germany, it could well be different, but not everyone is happy and we need to consider why is that the case. And part of it is incentivization removal. And again, this is a recommendation from this report that you can get via the link here. So I think, like I said earlier, I think we've embedded some RSC practices.
I think we've got a long way to go though, but I think we're now going to look ahead and see what is the maturity model for RSC groups? How do we grow as RSCs? How do we choose our own careers? But I mean, the good news is that there's still a huge untapped demand for research to be accelerated and research that can be accelerated through RSC.
And it's clear that if you're suitably equipped, you can contribute incredibly to relevant projects, which will become all research projects in time as they go digital. So the slides are online. They'll be published with the conference outputs, I guess. And feel free to email me with any comments or disagreements.
And thanks to everyone, including Jeremy and my team at Imperial who contributed to this talk. Thank you. Thanks. And we don't really have time for questions, but I'd still like to slip in one while I was kind of setting up the slides for the next talk as well. So we've seen all these great things
and I think it'll take us a good amount of time to get there. How do we bridge the gap from what we've seen in Martin and Stefan's talk, where we are now, where we still try to get people to use Git to all these wonderful things that we can do and should probably do in the future?
I think there is a lot of work to do. And I think some of it's a matter of time. I think these things do trickle down. And as new generations come into development where Git is the norm, rather than people having to convert from other tools, I think that will help. It involves working alongside researchers. We do very actively. And when we do that, of course,
we set up the infrastructure and they have no option but to use it. But it's going to involve a lot of hard work and not just RSCs writing code, but promoting events like this, doing more community development, which I think is incredibly important, and in the end, demonstrating the benefits. So it feels like common sense, but it's definitely a kind of multi-pronged approach, I think.