We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Ups and downs of open science in a pandemic

00:00

Formale Metadaten

Titel
Ups and downs of open science in a pandemic
Serientitel
Anzahl der Teile
32
Autor
Lizenz
CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Vorlesung/KonferenzBesprechung/Interview
Computeranimation
Besprechung/Interview
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Computeranimation
Transkript: Deutsch(automatisch erzeugt)
Thank you for the introduction, Andrea. I'm very, very pleased to be here. I'm so pleased, Dirk invited me and just reminded me that he's been honoured me for a couple of years now
to come and give a talk about Open Science, my experience as a researcher, but also as someone who has an interest in open access publishing and believes fully in Open Science. So I'm going to give quite a personal insight in how the last two and a half years have been for me
with Covid and with my experiences in Open Science. This is how I found out about this new pneumonia. It was New Year's Eve. I was in bed, it was in the middle of the night.
I was looking at my phone, saw this thing come from flu trackers, which tracks flu and other emerging infectious diseases, saying China, several suspected SARS cases, so that was a previous epidemic of a similar virus, in Wuhan, Hubei province.
This grabbed my attention and I woke up a bit because I feel I have a connection with Wuhan. This is all very personal to me. The photo is of Hangkou Road. It's actually in Shanghai, this signpost.
But Hangkou is one of three cities that were joined together to form Wuhan City. And my father, who's there here in this photo with me, was born in Hangkou. So it just really grabbed my attention. And from then on, Covid became very, very real.
What I would like to tell you about is a little bit about how the evidence about Covid came about, my involvement with it, and as Andrea said, the ups and downs a little bit about open access and open data. I'm trying not to use too many abbreviations.
The ones that I will use will be familiar to you. Covid-19 is the coronavirus disease from 2019. The virus that causes it, SARS-CoV-2, I will mention, severe acute respiratory syndrome, coronavirus 2, and WHO, the World Health Organization.
So here we are back to finding out about... This is how Twitter nerds get their information about what's coming out. Emerging infectious diseases, they're coming out all the time. They're emerging all the time. You may have seen yesterday a case of Ebola virus in Uganda.
So we find out, and we've had monkeypox, if you do emerging infectious diseases, you have to be prepared to be involved all the time. But this is how we usually find, or how the public, or the science world often finds out about new diseases. And these would be announcements that come from the World Health Organization on their website
from press releases that come over to us in the news. And they announced that the China country office of WHO had been informed by officials in Wuhan about cases of pneumonia of unknown etiology, meaning unknown cause.
And that was the start. As you saw in the flu tracker slide, pneumonia like SARS. So in 2003, there had been a major outbreak at the time of a severe acute respiratory syndrome caused by a coronavirus. And they were worried that this was going to cause a similar outbreak.
Little did they know. So this slide is a slide that I put together at the very beginning to teach medical students and showing this timeline of what had happened and really showing actually how fast things moved
according to what we knew at the time. So this was the 31st of December. As you know, it was a seafood market that was thought to be the epicentre of this outbreak. And by the 7th of January, they had already identified by electron microscopy and by targeted sequencing methods
a new coronavirus. It was not SARS, but it was similar and was causing pneumonia. So they called it SARS-CoV-2. Originally the disease was called N-CoV-19, new coronavirus.
And here the first death occurred on the 11th of January. Then the sequenced virus. So this is just unbelievable. Ten to eleven days after announcing that they had this cause of an unknown pneumonia, they have sequenced the whole virus.
And they send it to WHO. It then becomes publicly available, open science, to all research groups who want to work on it, want to look at it. And a week later you get a first protocol for a PCR molecular detection,
so that you can actually find this virus. And then after that, these cases really start taking off. Up to 2000, by the end of January. Shortly afterwards, Germany saw its first case. It was actually a Chinese woman who had come to a meeting in Germany, had very few symptoms. She said she felt tired.
She was reported as being asymptomatic. Actually she was tired and had a fever, but she put it down to jet lag. She infected a number of people in Germany. And just shortly before that, the first cases in Europe had been identified in Italy. And we know what happened in northern Italy. They were absolutely devastated at the beginning of this first wave.
By the 30th of January, the World Health Organization had announced what's called a public health emergency of international concern. Meaning that the whole world has to mobilize. They have to report cases to the World Health Organization. And there we were.
But actually in China it was more serious than that, because at the end of January is Chinese New Year. This is when the whole of China gets on the move. Everyone wanted to go and visit their families, go home to visit their families. So they all crowed into the train station, all wanting to get out of Wuhan
and subsequently bringing Covid to other parts of China. So there we are. What about the research? How did research react? Well, David Heyman, who is a phenomenally famous epidemiologist, who was part of the eradication of smallpox programme,
wrote an editorial about the first two papers to appear in the Lancet. And these were reporting cases, a family that went to Hong Kong and it appears that were no longer linked to Wuhan. And they got infected in Hong Kong, showing that person-to-person transmission of this virus was possible,
which had not been thought to be the case up till then. And another case series of the initial clinical presentations, saying that this is a fantastic, exemplary form of data sharing to let the whole world know in the scientific press.
It turns out that it wasn't quite like that. Jeremy Farrer, who is the head of the Wellcome Trust charity for medical research, who incidentally I went to medical school with, he wrote a book about his first experiences of the Covid pandemic.
And in the first chapter, he is talking about a colleague of his, who got in touch with him about a paper he'd received for peer review, who knew that this had to be published now and not wait for the processes of a medical journal, called Jeremy and said, what do we do about this?
And Richard Horton, the well-known editor of the Lancet, apparently refused to take his calls or reply to his emails. And they actually went and said, you have to publish this now or we are going to say it anyway. And the papers were published on the 24th of January.
Now, you may not think that eight-day-a-week period is very, very long, but actually, of course, what you also know is that the virus had by then travelled around the whole world and people did not know that it was transmitted from person to person. So, the Wellcome Trust,
at the end of January, just after the announcement of the Public Health Emergency of International Concern, put out a call for publishers and for science to share their research data and their findings. This was a reiteration of a call
that had been first put out in 2016 following the Zika pandemic, asking that people would make their new findings known publicly. They would publish them on pre-print servers before peer review, so that people could see them without the delay of peer review.
And that they would share their findings with the World Health Organization as soon as they'd been submitted to a journal. And that they would share data and protocols so that everyone could benefit from this new knowledge. 160 signatories signed up to this. And then you can see three months later,
30 major publishers committed to making all of their COVID-19 content freely available. And then the papers would be put onto PubMed Central so that everyone could read them, even if they were published in subscription-only journals.
Now, so how did this play out? Well, here you can see the Lancet stable of journals has a COVID-19 resource centre. They advertise gladly and openly that all of their COVID-19 content is free to access.
And in the Times Higher Education Supplement saying that this example of having all this free-to-access content shows how we need full open access for the whole of science. Now, of course, you're all librarians, you're in the data science teams, you know and I know that free-to-access
does not mean open access. Many people think that it does. Many people think that... Well, for most people, to be fair, it does give access so that the journal articles are not behind paywalls. But of course, we know that that content can't be further used.
And a nice definition of free access is like giving a child a Lego car, telling her she can look at it, but she can't certainly change it from being a car into making it an aeroplane. And so the full potential of these publications can't be realised whilst they are simply free to access.
And what we also know, and we also have to really be aware, we still have a public health emergency of international concern. Joe Biden might have said that the pandemic is over, but it is not over. But these publishers could withdraw at any time this free access, when they think that the world no longer deserves
to see the content for free. So free to access is not open access and it can be withdrawn at any time. Now, how did the rest of this evidence play out and come into being? This is where we got involved very early on,
that we decided that this emerging literature that we were seeing, we would make a little database of it, because it would be fun to keep track of it and see who's publishing what. And we wrote some shiny apps in R that would collate it. We put them on Redcap, which is an open source database.
It was all great. You can see here that it's on a server at the Institute of Social and Preventive Medicine called Zika. And that will tell you something, that this is not the first time that we had done this. I'll show you that we did it for Zika as well. But what you can see is, on the website,
you load the database. It takes now a really long time, but it shows you all the new articles and it scrapes them from PubMed, MBase, PsychInfo, Bioarchive and MedArchive, which were the two biggest preprint servers at the time. And we kept it running until the beginning of March this year.
Just to show you a comparison, that when we started doing this for the Zika virus, at the time we reported that for Zika... So this is the timeline of Zika virus, which you will remember causes microcephaly
in the children, in the affected babies of infected pregnant women. And also can cause a neurological syndrome called Guillain-Barre syndrome. And it's carried by mosquitoes. So we see this big spike in cases. They were mainly in South America dying away.
And we reported a massive and unprecedented number of publications, 60 publications a month, for this new or reemerging virus that kept on going for a whole year. And we had a database that we could trawl and we could write, do living evidence reviews on
as we accumulated them. And we just made this little comparison when SARS came along. The cases, firstly, you can say they're orders of magnitude greater than the numbers of cases of Zika. And so are the numbers of publications, orders of magnitude above those for Zika.
And they go up very, very quickly. By chance, for both of these diseases, the Public Health Emergency of International Concern was announced either at the end of January or beginning of February. So you can look at these and these timings go along in sync.
But just many, many more SARS-CoV-2 publications. So to summarise, in our database, by the 1st of March this year, we had more than 300,000 publications on SARS-CoV-2, which is an extraordinary number for a single disease
that has only been discovered two years previously. Here, from our database, it turned from being the Zika Open Access Project to the COVID Open Access Project. And here we plot the number of cumulative records in peer-reviewed journals and of pre-prints.
But that doesn't tell you the whole story. What we divided, we colour-coded all publications in the Living Evidence database. Pink for pre-prints and green for peer-reviewed publications. And you can see, this is the first page, that right at the very, very beginning,
we've actually got more pre-prints than we have peer-reviewed publications. So people are sending their stuff to pre-print servers. They're immediately available to everyone as Open Access publications under Creative Commons licences. And then they get sent at the same time to peer-reviewed journals and eventually they appear in the peer-reviewed literature.
But what you saw in the previous slide, I break down a little bit more. And you can see that while pre-prints were very, very prominent at the beginning, actually they've stayed at a fairly stable level. And it's the peer-reviewed, people are just publishing, going straight to peer-reviewed publications.
Such that they only form, well, in these servers, 7% of all publications are actually on pre-print servers or the pre-print servers that we cover in our living evidence database. But what do the availability of this information tell us?
There are some good things and there are some bad things. So here is MedArchive. This is the pre-print server for the medical sciences. And they combine in their COVID section the pre-prints from MedArchive and BioArchive
from the biological more laboratory sciences. They've got nearly 25,000 pre-prints now. And this has been an amazing resource. It started, in fact, luckily for MedArchive actually, they didn't really get going. They started to get going in the autumn, later half of 2019.
But actually, the advent of COVID really gave them a burst of a lot of material that people were sending to them. And I got involved in it quite early on because I knew that this was coming up. And one of the founders, Richard Siever, had written to me and asked me if I wanted to screen
the publications that were coming along. For BioArchive, biological sciences, there isn't much immediate impact. But when you're talking about a clinical disease and things that people are actually going to act on clinically and people publish randomised control trials of interventions that they think should be implemented,
they decided that for these papers they would actually have a screening process. And then you've got this in your inbox, you've got this list of titles that you have to screen. I've got eight to screen at the moment. At the beginning of January to March, April, there were like 50 that would appear on this list.
You're supposed to go through them all. Well, sorry, you just have to stop at some point. So I did as many as I can. And you screen these to make sure that they're not absolutely... You can't screen them for quality. You're allowed to publish rubbish stuff on MedArchive. But you're looking for ethical issues and real clinical flags that you want to avoid.
And what MedArchive does is that it makes the information available, as I said, to everyone. And when that's game-changing information, it's really important that everyone knows about it. So here we have from a large trial called Recovery,
which was done in hospitals in the UK. Fantastic, randomised control trial, testing lots and lots of treatments and found that dexamethasone, a steroid treatment, saves lives for people who are severely ill with Covid in hospital
and reduces deaths by about 17%. So they write this up and they post it on June 22nd. At the same time, they submit it to the New England Journal of Medicine. And in the New England Journal of Medicine, it appears on February 25th, 2021.
Well, that's not actually quite true because a preliminary version was posted on July 17th. But still, that's three weeks after it appeared online. And it was truly recognised as game-changing information. Hospitals changed their treatment protocols immediately
to make sure that people who had severe illness would receive steroids. And it undoubtedly saved lives. And in those three weeks, it also would have saved lives. If they'd waited three weeks to publish in the New England Journal, there were definitely people whose lives were saved. So this is game-changing.
And this is one of the great advantages of open access. It also allows knowledge about ineffective treatments to become available earlier. And we all know our friend Donald Trump, the great proponent of hydroxychloroquine, he was told everyone that he was taking it.
Just phenomenal how... Actually, if you're a medic, why would hydroxychloroquine, this anti-malarial, work against an RNA virus? You kind of think, nah. But this has gone around the world and it is still being promoted and people are still saying it works.
There have been a vast number of trials of hydroxychloroquine. And in fact, what you could see on Med Archive, same trial, the recovery trial, looked at hydroxychloroquine, posted a preprint saying, it doesn't work, folks. Do not use it in hospitals.
And this article wasn't published until October the 8th. So ineffective treatment. What do you do? Now, there is a major criticism of preprint servers. This is that the information has not been peer-reviewed.
So scientists with the best knowledge haven't decided that this is worthy, that this is ethically sound, that it is well-done research and we should take note of it. And this is a major, major criticism. And it is absolutely true that a preprint that was put, I think it was on Research Square,
about ivermectin, another one of Donald Trump's favourite treatments, doesn't work. And so this was held up as the criticism of preprints that we cannot rely on this non-peer-reviewed information. We must be very, very careful. And you saw actually on the banner of the Med Archive website,
they say this hasn't been peer-reviewed. You should look at it very carefully and decide whether you want to report on it. And it turns out that this preprint was made of fabricated, manipulated data, trying to show that ivermectin worked. And it was a master student who picked it up,
which is a really, really nice story. He was given it as an assignment and read the preprint and thought it looked a bit dodgy. And it turns out that he was absolutely right. So, you know, we shouldn't trust preprint servers. But what you forget is that actually it isn't only preprint servers. The best journals in the world for medical science,
the Lancet and the New England Journal of Medicine, published papers that they had to retract very, very quickly. Because after they had peer-reviewed them and published them, it was shown that these studies that they were based on, a database that was called Surgesphere,
which claimed to have collated information from 194 different settings and analysed the data and put out these ground-breaking news about the effects of treatments for Covid, it turns out that this was actually fraudulent. The database probably never existed.
And they had to retract these articles. So they said that... But when asked to say, could we have a look and work out how this terrible thing could have happened, they declined to saying peer-review unfortunately is confidential and they couldn't give us the details about what had happened in peer-review.
And that's what you know from subscription journals that do not make their information freely available. So I had a little look in a database and a website that I'm sure you know about Retraction Watch. This is absolutely fascinating. And Ivan Oransky and Adam Marcus collate and collect retractions
from the whole across medical and life sciences. And they've got a little database of Covid retractions, which is up to 260 now. And I had a quick look through. And 38 of those 260 Covid retractions
are articles that were originally published on preprint servers. So that is probably disproportionate, right? Given that we know that Covid preprints only account for 7%. This is more than 7% of all of the articles. Even so, numerically the most retracted articles on Covid
are published in peer-reviewed journals. And they just make this point that journals retract papers either at the author's request or the journal can decide. Whereas preprints are usually withdrawn only at the author's request because it's already known that the information hasn't been peer-reviewed.
So that's something you need to know about retractions. And interestingly, these retractions, if you look on Web of Science, this looks like a very impressive graph with really long lines. The blue lines are for subscription or subscription and hybrid publishers.
And the red ones are for two large open access publishers. But numerically, we're talking about tiny numbers. And very, very few papers are actually being retracted out of all of these 300,000. So it brings me on to saying,
who was actually doing and publishing all of this information during the Covid pandemic? And I'm going to put the timeline back a little bit. So we're going back to 2012 here. And we've got all the major publishers including large open access publishers.
And these last two lines after this would be the ones published during the Covid era. You can see the ones that are going up or down. The one that I want to draw your attention to is this one here, highlighted here, which is MDPI.
And this line, so it trolls along at about the same pace as all of these other, these are all open access publishers. But suddenly after 2017, really, really takes off in the sheer number of articles being published by this publisher.
And I just want to go into this a little bit in detail. It is not all to do with Covid, but Covid certainly, I am sure, played a role. And I'm going to show you. So I showed you 2020 and 2021. By 2022, MDPI is the third largest publisher
of all publishers in the medical and life sciences. And Frontiers is the sixth largest. These are phenomenal for presses that really didn't exist very prominently more than 10 years ago.
MDPI themselves tweeted proudly that they have published 240,000 articles last year out of 480 submissions and 738 review reports.
Meaning that for each submitted article there are fewer than two reviews and about half of anything that gets submitted to MDPI will get published. Now if you work and you submit articles to medical journals and your hit rate is, when you're trying to publish your research,
your hit rate is rarely better than one in two. This is showing you an awful lot of publications, journal articles that are being published, that is higher than most people's hit rates. And you can see on the right-hand side that actually in 2020 and 2021
they've just massively increased the number of journals that they publish. And we'll come back to look at that. So I want to link this, you may think rather uncomfortably, to a definition of what a predatory journal is. And a predatory journal is basically a journal
that exploits the open access model at the expense of scholarship and that deviates from best editorial and peer review practices. And there's often a lack of transparency in what is being submitted, what is being published and from where they solicit articles.
So you may think that's really, really unfair. So maybe we will change the language a little bit, maybe not call... Well, decide whether we want to call it predatory. Let's call these practices questionable at the very best. So here this article is called
Is MDPI a Predatory Publisher? And this is the publications. And they are split into the kinds of articles that you can publish in the journal. So green is normal, so you'd submit your article as an original research article
in the normal way. You've got collections, sections. Then you've got this thing in red called special issues. And this is where the journal says, we're going to do a special issue on COVID. They find a couple of editors who then have to go and solicit
their friends and their colleagues to try and get content for these special issues. I think what you can see here is that the sheer volume, this is the first year of COVID, absolutely extraordinary, massive number of special issues. This is across all MDPI journals.
But our COVID open access database is only about COVID. And we stopped collating this at the beginning of March, as I said. A couple of days before we stopped it. So here we've got pre-prints,
here we've got peer-reviewed articles in green. And I clicked through the new uploads to our database. And from page two to page 24, that's more than 220 articles in one day. All of these are from guess what?
MDPI journals, which you can tell because you love their one single word titles for the journal. So we've got pharmaceutics. Actually, these ones isn't the International Journal of Molecular Sciences. But they've got viruses, they've got pathogens, they've got cells.
You can spot these a mile off. One publisher, 220 articles in a single day. And these are all COVID special issues. Now, I don't know what that makes you think. It makes me think that there are maybe questionable practices about the way in which these papers are being solicited
and they are supposed to be being peer-reviewed. It turns out that MDPI boasts the speed in the turnaround time that it has. I'm sorry you actually can't read the numbers on here very well.
But between 2016 and 2020, the time to publication, the spread and the time to publication across all of their journals goes down massively such that it's around a month by 2020. Now, this is from submission to publication. Now, if I submit an article to a journal and then get it published,
you know, six months, nine months, at least three months is my normal kind of timeframe. Their average is just over a month. And if you compare it to, again, these are COVID publications
and we've compared these, Leonie Herr and my postdoc, randomly sampled, just you've just got 10 articles from journals published by each of these publishers and worked out what the median, the line is, the interquartile range and the range are. And you can see here for MDPI, this is, I think it's 38 days.
And Frontiers is next. BMJ Group, these are not open access, normal publisher. And PLOS, which is an entirely open access publisher, but a much longer day from submission to publication.
I didn't read the peer review reports. I can't say anything about the quality of the peer review. It could have been fantastic. But the suspicion is that if you can have such large numbers of articles being submitted and published within a month, that the care taken over the peer review reports,
the revisions to those and the subsequent publication, there may be some rushing, to say the least, and whether the quality of those articles is something that you would want to rely on, I would call that into question. So I'm going to spend the last bit of my talk
on some of the other aspects of open science. And from that call by the welcome in January of 2020 for people to make data and protocols freely available, we have all benefited from open data massively.
You are probably aware of our world in data. This is a just simply fantastic effort from data scientists at the University of Oxford, who started just scraping the data about the numbers of cases being reported.
They got them from the Johns Hopkins database, which was also collating, and subsequently from other sources. But they put together, this is our timeline for Switzerland. So you can see our first wave, which we thought was absolutely terrible when we were all in lockdown, that we were totally unprepared,
despite large pleas for additional measures in the autumn of 2020 for the second wave. And what we are now experiencing from now on, this was the first of the Omicron waves, which we knew swept through being much, much more transmissible. But you can do this for any country.
They started off just by plotting out numbers of cases, but they plot out hospitalisations, they plot deaths, they plot vaccination, policy responses about the severity of the restrictions in place. This is our world in data.
They do this not only for COVID, the new one at the top of the menu is now Monkeypox, so they really get onto this extremely quickly. This is the use of open data for everyone's benefit. And the other thing that I will draw your attention to is about when I said that it was extraordinary
that the genome had been sequenced in the first couple of weeks of January 2020, that what we know from the sequencing of genomes and tracking these across geography and tracking them over time is what fed into the phenomenally fast development of vaccines
was this sequencing effort to let us know firstly, starting in December 2020, when the Alpha variant became known and now we're onto Omicron, so tracking these and showing us what to expect and trying to improve our vaccines. This is a project called Nextstrain.
It is an open source project and two of the proponents, one is Emma Hodcroft, who works at our institute, Richard Ney in Basel, but also Trevor Bedford in Seattle. It's really a global effort. But this, for any disease, every single one of these dots is a sequence
and they're just phenomenal numbers of sequences, starting from the very beginning over time. You can follow them and you can follow in Switzerland. These are our waves. These are the initial wild type and a slight variant in Europe. But from here on, these are waves of variants
as they have affected Switzerland. You can do this for any country. It's absolutely a phenomenal resource and I highly recommend it. What we also know is that there are big holes in Africa. There's actually remarkably little known about what has happened with Covid in Africa.
I've just come back from South Africa. Actually, the strength of the measures in South Africa was often stronger than it was here. But there are large numbers of countries where we have no idea really what the clinical toll has been. And you can see that all these countries in white,
early on countries where there had been very, very few genetic sequences. And another amazing piece of open science has been led from Tulio de Oliveira in South Africa to put together and generate regional networks of laboratories
where they could locally sequence or regionally sequence samples that came from other countries in Africa. And they have just put together an open access paper in science of the first 100,000 sequences from Africa.
It's absolutely extraordinary. And all of these data are open access. They're open access. The sequences you can use from anywhere. And you can put these data together to let people know what is going on with Covid, where they live. So, to finish off,
I believe that open access and open science really provided us with very, very good early knowledge about how Covid was affecting the world. Preprints have been phenomenally successful in speeding up the availability of data about what works and what doesn't work,
more so at the beginning than now. Unfortunately, open access didn't reduce research waste. You should put on... We have open databases to register trials, to register systematic reviews. It turns out that people will go and do their trial or go and do their systematic review anyway. So that's a bit of a failure of open access.
But I seriously question these very large open access publishers like MDPI, particularly MDPI, who it seems have exploited the open access model and the Covid epidemic simply to increase their market share in journals
and make themselves one of the largest publishers that there is now. And finally, the access we've had to open data and open source genomics and databases has really, really strengthened. You may not believe it, it has strengthened our ability
to combat the spread of SARS-CoV-2. But remember, as I said at the very beginning, these papers that you are able to read now, they're free to access and that could end at any moment. So what we want to promote is full open access for all our emerging infectious diseases.
Thank you very much.