How to get credit for your research software - TIB AV-Portal

How to get credit for your research software

00:00

4

Formal Metadata

Title

How to get credit for your research software

Alternative Title

The Journal of Open Source Software: Credit for invisible work

Title of Series

Number of Parts

490

Author

License

CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/46929 (DOI)

Publisher

Release Date

Language

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Researchers rarely cite software they use as part of their research. As a result, research software and the time spent developing it have become invisible scholarly contributions. This lack of visibility reduces the incentives that are necessary to produce and share high quality software that are essential for the progress of science. The Journal of Open Source Software (JOSS) is an open source, open access journal primarily designed to make it easier for those individuals authoring research software to gain career credit for their work by publishing short software papers. Software papers are a recognized mechanism for authors of research software to create a citable ‘entity’ which can easily be cited in journals and as such directly impact a researcher’s career via established metrics such as the h-index. JOSS is unique in that it only accepts very short (~ 1-2 pages) papers, with short summaries and links to the software repository. In that sense, the software papers are not the focus of the review. Instead, we ask reviewers to conduct a thorough review of the associated software (which must be open source) ensuring that it is well documented, straightforward to install and functions as expected. In this talk I will describe the origin and impact that JOSS has had on research open source and also touch upon issues such as sustainability and credit.

Speech

Text

Image

00:00

SoftwareOpen sourceProjective planeDifferent (Kate Ryan album)Right angleSoftwareMultiplication signComputer animation

00:35

SoftwareGUI widgetPredictionData modelTheoryGoogolSoftwareAstrophysicsField (computer science)Different (Kate Ryan album)CodeComputerArchaeological field surveyMultiplication signOpen sourcePredictabilityHypermediaNeuroinformatikEndliche ModelltheorieData miningSocial classRoundness (object)Type theoryWordComputer animation

03:03

Formal languageWebsiteUniform resource locatorSoftwareNormed vector spaceForm (programming)Physical systemElectric currentData modelSmith chartLocal GroupGroup actionFluid staticsSoftwareProjective planeStandard deviationCASE <Informatik>Computer configurationMathematicsHypothesisDesktop publishingPeer-to-peerType theoryRevision controlDifferent (Kate Ryan album)Process (computing)Formal languageMereologyEndliche ModelltheorieMassSubject indexingWebsiteNatural numberSet (mathematics)Smith chartAuthorizationText editorNetwork topologyGroup actionPiExterior algebraOpen sourceComputer animation

08:06

SoftwareData modelOpen setOpen sourceMechanism designSoftware developerElectric currentPhysical systemStatement (computer science)Web pageFunction (mathematics)DisintegrationConvex hullMathematicsAnnulus (mathematics)Text editorExecution unitOpen sourceProbability density functionElectric generatorChecklistAuthorizationPower (physics)Point (geometry)Formal languageService (economics)MereologyRevision controlType theoryHand fanForm (programming)Web pageTwitterStatisticsSoftwareFile archiverRight angleFile formatWeb crawlerUsabilityFlow separationPhysical systemProjective planeRepository (publishing)Different (Kate Ryan album)RobotResultantExpert systemDescriptive statisticsObject (grammar)Goodness of fitText editorMultiplication signGreatest elementRoboticsTask (computing)FamilyWebsiteCentralizer and normalizerSmith chartSpacetimeLevel (video gaming)Computer animation

13:10

Projective planeProcess (computing)Open sourceBitPoint (geometry)Text editorComputer animation

14:20

DisintegrationSoftwareZeno of EleaGoogolStatement (computer science)Data modelProcess (computing)Open sourceMaxima and minimaFreewareMereologyIntegrated development environmentCarry (arithmetic)Decision theoryAutomationOpen setSlide ruleSoftwareCodeAuthorizationProcess (computing)BenchmarkRight angleBusiness modelComplete metric spaceFile archiverPhysical systemStrategy gameModulare ProgrammierungMereologyGame controllerIdentifiabilityLoginMetadataChecklistWhiteboardFunctional (mathematics)RobotPoint (geometry)IterationSign (mathematics)Software testingBit rateOpen sourceSoftware developerComputing platformElectric generatorView (database)File viewerFormal languageGoogolSubject indexingRaw image formatMultiplication signSubsetVideo gameComputer animation

20:32

Open setSlide ruleSoftwareOpen sourceWhiteboardAreaCodeExpert systemFunctional (mathematics)Fatou-MengeMultiplication signFormal languageText editorFile viewerMereologyProcess (computing)Execution unitSoftwareOpen sourceForm (programming)ImplementationFiber bundleMathematicsUtility softwareVirtual machineField (computer science)Digital rights managementDescriptive statisticsLevel (video gaming)AuthorizationElectronic mailing listWebsiteRepository (publishing)Revision controlRobotUnit testingComputer animation

26:44

SoftwareOpen sourceSlide ruleOpen setHill differential equationLink (knot theory)InformationReading (process)AuthorizationComputer animation

27:14

Point cloudFacebookOpen sourceComputer animation

Transcript: English(auto-generated)

00:05

All right. Thanks for sticking around till the end of the day. My name is Karthik. Can you hear me in the back? Yes. So I'm with the Berkeley Institute for Data Science, and I'm also involved with a lot of different open source projects.

00:20

I lead a large open source project called rOpenSci. But the thing I want to talk to you today is how to get credit for any research software that you might develop. So this is my first time at FOSDEM. I actually thought this was a pretty small conference and that I would have to convince you that software is important. I don't really have to do that, but I'm going to do it anyway.

00:43

So the thing that's always striking to me as an academic is that software is so prevalent in every academic endeavor. It's not just heavily computational fields like astrophysics or bioinformatics, but even many of the humanities that you think are not computational

01:03

are computational these days. And it's hard for me to actually capture this better than, say, Guy Elverakou, who is the lead for Scikit-Learn, is that software helps us make predictions from models,

01:21

it helps us run experiments, it helps us derive insight from data. And this is just universally true in a lot of different fields. And this is not just anecdotal. So I have some colleagues that have done surveys in the UK and the US to try and find out how much research software people use.

01:42

And this is not just things like Microsoft Excel or Microsoft Word, but software designed for research. And it turns out that more than 90% of people rely on these types of software. And when asked what would happen to their research, if the software disappeared, more than 63% of them said

02:03

they'd just have to stop doing what they're doing. So research software is really quite critical for a lot of different fields. And more and more, I am seeing academics spend a lot of time taking their code and packaging that into software.

02:20

You see everyone pulling together a package these days. And it's really interesting that currently the skills that we require to thrive in academia are not actually all that different from being in industry. And funnily enough, the person that said this, Jake Vander Plaas, a friend of mine, was an academic and now works at Google.

02:41

But luckily he's still just doing whatever open source he used to do before. He's just getting paid a lot more money to do it. But the challenge with doing software in academia is that it's not considered research and it's very hard to get credit for academic software work.

03:02

And there's a reason why. We really don't know how to give credit to software, which is the biggest problem. Very simple things like we really don't even know how to cite software. So it's kind of all over the place. Sometimes people cite a paper. Sometimes people cite the entire language to reference a single package.

03:23

And this is not just true of tiny journals, even big journals like Nature and Science. People just casually mention software. They don't quite cite it, which means that you will never be able to track down the actual version that anybody used if you're ever trying to think about reproducibility.

03:42

And then there's a big consequence to all of this. If we are not going to get credit for our software, we're not going to do a very good job of it. It's not going to be sustainable. It's not going to be collaborative. And that sucks for everybody. So we need to really find a way to make software scholarship and give people credit for that.

04:02

And that's what my talk is all about. And a handful of us decided we need to do something about this. And so when you think about software and citation, it's just a mess of different types of challenges. There are very few things that are technical. It's mostly cultural. We're not quite ready to do this.

04:21

But for example, there's not an easy way to cite software because we haven't quite agreed upon writing papers about software or directly citing the software itself. Software citations not being allowed is a weird cultural thing. It's not usually indexed by the Bain counters.

04:42

And until recently, people didn't actually peer review software. And that's something that I've been involved in for a handful of years now. And even though software has dependencies, we don't actually have a clear way of saying what software is connected to what other software. And so just a quick reminder about why we cite things.

05:02

We're trying to give credit. We are trying to make it clear to everyone that we've done our homework, make sure that we're not stealing from other people. But at least with software, the biggest thing that we're trying to do is to make sure that we give proper credit to someone who's building that software.

05:20

And so how do we go about recognizing software for academics? So there's two possible ways. One is we could try to do things in ways that academics are already quite familiar with. We do research. We write papers. We get the papers published in the best journal possible. People cite our papers eventually.

05:41

The citations add up. We get credit. Or we could try and think of something new and interesting instead of doing the same old boring nonsense. So because software has dependencies, there's more interesting things that you can do and just go beyond that very simple, one-dimensional credit model. So if you imagine this hypothetical example

06:03

of Arfan Smith, who wrote a paper, and then he references a couple of his old papers because he's building upon previous work, he references a couple of large data sets, and then he references two critical pieces of software that he used, in this case Astropy and Scikit-Learn.

06:22

But because of dependency trees, we can tell that these two depend upon NumPy and SciPy. So there should be a way to automatically assign credit to software without having to cite every single piece of software. But we're not very good at agreeing upon any particular standards.

06:42

And trying to get a large group of people to agree to a big piece of change is very, very hard. So we'd have to get buy-in from individual authors, editors, then move on to whole communities and societies and then get the journals on board. And that's really not going to happen. And so the alternate option

07:02

is just writing papers about software, which is pretty easy to do. And it's not a bad thing, because a paper can be something that you can just easily cite. We don't really have to create any new infrastructure. And then if you are writing software that is important to your community,

07:21

the best way to bring this to the attention of people in your community is just to publish it in a journal. People read it. There's a challenge, though, trying to get a software paper into an existing journal, which is that if you've already done a full research paper, writing another software paper is really painful.

07:44

You're going to copy all of the documentation that you've already extensively written as part of your software into a paper. Most journals don't publish software papers. And for those of you that contribute to open source, if you join a project that already exists, it's very likely you'll never get credit for that work,

08:00

because they already published one canonical paper for that work. This is a common problem for people that join the Jupyter team. Everyone cites something that's quite old at this point. So we've come to realize that trying to change the system is really hard, and so we are going to have to stick with what exists

08:21

and just hack something around that. So my colleague, Arfan Smith, who founded the Journal of Open Source Software, several of us started talking a few years ago and said, can we create a new type of journal that makes it easy to publish papers about software without making it very difficult?

08:40

And so we created JAWS, Journal of Open Source Software. It is entirely free, open access. It costs nothing to publish. And we created a new system that is very developer-friendly. And by developer-friendly, I mean that if you would like to get a publication for your software,

09:01

assuming you have followed all the best practices, you've written good documentation, you have tests, you have clear installation instructions, you've designed a usable piece of software, you've got a good open source license, then we expect that it should take you no more than an hour to write a paper.

09:22

And a JAWS paper is actually fairly simple. It's often no more than two pages long. It's a very, very high-level description of what your software does to someone who's not an expert in your field. We're really looking for you to cite who funded you, major references that influenced you, and we really do not want you

09:42

to put any results in this paper. It's just a simple, citable object for your paper. And we tried to be as conventional as possible in the scholarly space, so we didn't throw anybody off. And this is what the form looks like.

10:01

Has anyone submitted a paper using Manuscript Central? A handful of you. You know how painful that is. Here, all we need is your name, your version control repository, and if you have an editor in mind. That's really it. You can even skip the title and the description because we already have that as part of your paper,

10:20

which you write in Markdown, and then put it in the same repository as your project. And the thing that we built is a robot, which is basically a Ruby bot that runs on Heroku and listens for every single activity on a GitHub issue.

10:46

And this bot, which is the one in the middle, can talk to GitHub, and it can talk to a bunch of different services. We named it Wieden because the Journal of Open Source Software is Joss. Joss Wieden for sci-fi fans.

11:01

And as soon as a paper goes into the review queue, Wieden steps in and says, Hello, I'm a bot. I'm here to help you. If you'd like to know all the commands that I can do, just type in at Wieden commands, and Wieden will tell you everything it can do. It immediately starts to identify what language it is

11:21

and starts to tag the language of that particular submission. And then Wieden is really nice because it gives different powers to different people. So if you're an editor, you can assign reviewers. I can just say, Assign Matayush to be a reviewer, and then it'll assign him as a reviewer.

11:40

If I'm an associate editor, I can assign someone else as an editor, and I can also just start a review, and it just creates a giant checklist for the reviewer to work through. It also gives powers to the authors and the reviewers. So at any point in time, they can say, Wieden, generate a PDF. So it goes through the markdown, goes through the references, generates a beautiful PDF.

12:02

You can look at it. If the formatting is bad, keep adding more commits, generate another PDF. It will check references. It will go crawl all the DOIs and let you know when something is broken. And then if you've got superpowers as an associate editor, you can say, Wieden, accept this paper, and then it will go deposit all the metadata,

12:22

archive the paper, and right before that, you also archive the software itself. And in the end, we create a PDF that looks like a standard PDF to most academics, which is good, because you don't want to confuse them by showing them a software paper. So this is a paper that was recently published

12:42

about the tidyverse, and when you go to the JAWS page, you can see the review, you can download a PDF, and you can see all the orchids for all the authors. So we've been running the journal for more than three years now.

13:03

We wrote a paper right after the first year describing how the journal works. Some of these statistics are outdated, but the trends also hold. We tend to get a lot of submissions that are Python packages and R packages, and most of our submissions come from the U.S. or the U.K.

13:20

We're getting more submissions now, and we're growing quite a bit. We've published almost 700 papers. We publish 30-ish papers a month. We've got a lot of editors, and we're constantly growing. And in many ways, even though we created a journal,

13:42

we really just created another open-source project. So if you imagine an open-source project, you get users who get very excited, and at some point, they start contributing, and then they end up becoming maintainers, so you can then step away and hand this off to someone else. It works similarly for us, too.

14:00

We get people that submit to us, and because our reviews are 100% completely open and public, they get to see how the whole process works. They want to come back, and they want to review, and then at some point, if someone reviews too much for us, we just make them an editor. Lastly, I want to give you a few insights

14:22

that we've learned running JAWS over all of these years. So one of the things that we've done is we've tried to make JAWS, even though it's experimental and interesting, seem very much part of the scholarly infrastructure. So we don't have our own login system. We use Orchids, which are researcher identifiers.

14:44

As soon as we accept your paper, we archive all the metadata with Crossref. We archive the paper and the reviews with Portico, and then just about a couple months ago, JAWS paper started getting indexed by Google Scholar.

15:01

We're still trying to work with Scopus. We love best practices. So all papers are open access. All the authors have complete copyright control over everything. Our governance strategy is fully open. Our business model is that we don't have a business model.

15:22

It costs us very little to publish any paper. And then even though we're doing all of this, we're doing a pretty thorough job reviewing your software. So we're giving you a citation for a very short paper, but along the way, we're checking to make sure you have a good license,

15:40

your software functions as intended. If you claim any performance improvements, somebody will go test that. And then we go through a pretty big checklist. The process is quite fun for authors. So we just heard a talk about how eLife is trying to make things open and transparent.

16:01

JAWS doesn't really reject papers. Our goal is not to reject papers so we can inflate our... I forget what we're trying to inflate. Our acceptance rate. So we always do a desk reject if the software is not fully complete or not appropriate or it's not a research software.

16:21

But once you get past that point, we want to help you succeed. And the goalpost is very clear. So if I'm telling you there's not enough tests in your software, you know what to do to get to that point to get a submission. And because everything is open, nobody's really a jerk about anything.

16:41

We try to leverage the best parts of open source. So we figured developers are already on GitHub, the journal lives on GitHub, and the bot acts on GitHub. And then we try to automate things that are very tedious and boring, and that's what Weeden really does. So if any of this seems appealing to you and you would like to submit to JAWS,

17:01

please submit a paper if you have a software package. And if you have expertise in any open source language, please sign up to be a reviewer. And I'm happy to take any questions. Thank you. Yeah.

17:32

So we have a potential solution to this, which is that... Oh, sorry, I'll repeat the question. So the challenge with contributing to an existing open source

17:42

is that there's already old publication and you will not get credit. So we would like to encourage people to submit new papers when the software has made a major milestone leap. So at that point, everybody that contributed in that previous iteration will come on board. So that is a way we can give other people credit.

18:41

Yeah, thank you for that comment. It does work, but the challenge with traditional publishing is that you have to have novelty. So it's always hard to get a publication that adds to an existing software, but if that works, that's great.

19:02

Oh, yeah, okay. I'm pretty excited about this. Yeah. So we work through a checklist,

19:20

and that checklist is very public. So we're not really passing judgment on quality, but we want to make sure the documentation is easy to understand, the software actually functions, so reviewers have to step through every single function and every single example to make sure it actually works. And so you end up with software that is not broken,

19:45

that is not difficult to install on a different platform, actually has a license. So we just make sure it meets a whole bunch of benchmarks that are signals of a good usable piece of software. So we're not doing a very deep code review.

20:01

Yeah, yeah, right. And if someone else has done a much deeper code review, we will actually rely upon that review as well.

20:20

So reviews are transferable into JAWS. Good question. So this is a very difficult question. I'll repeat the question. How small can a piece of software be to get into JAWS?

20:42

This is the common topic of discussion among the editors. So we have a cutoff that you cannot publish a minor utility. It has to be slightly substantial in what it does. And Whedon actually does a quick scan of the software,

21:00

and then if there's any doubt, we will have a discussion among the editors to get consensus. So somebody who's an expert in the field will come in and say, oh, this actually is a very trivial implementation of one single method, and then we reject that.

21:33

We do, actually. So how do we deal with papers from languages that are not very common, like Haskell?

21:42

So right now we don't get any as far as I know, but we struggle with some languages like Julia where we don't have enough reviewers. So as soon as we know that this is a problem, we just try and reach out to more people to sign up. And so our reviewer sign-up form tries to get expertise on languages,

22:03

and every few months or so, the editorial team will decide that we are lacking editors in a certain area and then go reach out to people to come join the board. So if you know someone who's an expert at Haskell, maybe you, that wants to help out, feel free to reach out. Is open source part of the academic criteria?

22:26

Excellent question. I don't know if it is, but if you want it published here, you have to be open source. There's no other way. But overall, I think it's against academic spirit

22:42

to not be open source. So especially if it's publicly funded research, why on earth would you create proprietary software? So unless someone can come up with a good reason, I don't have one.

23:01

Oh, that's totally fine. Our review process happens on GitHub, but your code can sit in any version-controlled repository of any kind. Yeah.

23:35

Let me see if I follow your question. How do you actually cite other methods?

23:41

It's tricky. So we don't require a lot of citations as part of JAWS because we are only looking for a paper that is a very high-level description. So if you want to cite a fundamental paper that describes random forest, you can put that in your references. It's totally fine. But we're not looking for a very exhaustive list

24:02

of references. People usually put maybe 5 to 10 references in their software publication. Wonderful question. How long is the review process? I've seen one that has happened in a couple of days, and I've seen one that's taken many months.

24:22

It's all up to how fast we can work with the authors. So, for example, you submit something, and I say, your unit tests are missing. And then you tell me, I just started teaching this semester. I have zero free time. I'm going to come back to this next semester. That happens quite frequently.

24:40

But if a software is pretty well used, pretty feature complete, sometimes we don't really have much to point out. And the few things we point out, people will immediately commit, and then we just go accept, and then it's done.

25:04

Good question. So we haven't encountered that problem yet. How do we deal with plagiarism, especially when people are stealing other people's code? So we have not really done much about that yet. I'm sure we can build in more functionality into the bot

25:21

to actually try to detect some of this. But so far, we haven't come up with this. People do have to explain what they've done and how is this different from something else. So you cannot just make an incremental change to an existing piece of software and say, this is my Joss publication.

25:40

And if you're submitting something to Joss, you actually have to demonstrate that you have contributed substantially to that piece of software. And the editor who's handling that paper has to go verify that. But that's actually a great suggestion. And one of the things we're doing is we're making our bot smarter, and two people are now working full-time on the bot. So that's something that we could add

26:02

for the bot to look into.

26:24

Oh, yeah. Great question. So how do we integrate references back to package managers? It's all up to the authors. So as soon as a paper is accepted, we tell, for example,

26:46

here's a submission that was accepted, and the very last thing we tell the authors is congratulations, your paper is now accepted. Please add this information to the readme and the citation. So if you're in CRAN,

27:01

you add this back to the citation file. Same thing with PyPy. So it's up to you to advertise everything back if you don't have an automatic way to do this. Thank you very much. Thank you.