We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

III. The integrity of published information: Writing a macromolecular structure paper with publBio

00:00

Formal Metadata

Title
III. The integrity of published information: Writing a macromolecular structure paper with publBio
Title of Series
Number of Parts
15
Author
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
For biological macromolecules, where structures and experimental data have traditionally been deposited in a central archive (the Protein Data Bank), integrating structural data with derivative publications is more complex. IUCr journals have developed an online publication tool that can extract deposited macromolecular data from the archive, and prompt the author for the additional information required for the fullest characterisation of a macromolecular structure determination. Again, the goal is to maximise the integrity of the scientific discussion.
Keywords
Crystal structurePaperCosmic microwave background radiationBauxitbergbauTypesettingCrystal structureCrystal structureDaySystem in packageRoll formingToolFormation flyingFiling (metalworking)Finger protocolBook designPipingOrder and disorder (physics)Bubble chamberKickstandMusical developmentScoutingCrystallizationCommunications satelliteSource (album)Leap yearX-ray crystallographyTuesdayHourHomogeneous isotropic turbulenceSundayInitiator <Steuerungstechnik>NanotechnologyHot workingCosmic microwave background radiationTypesettingSpare partPaperCommon Intermediate FormatLecture/Conference
ScoutingFiling (metalworking)PaperCrystal structureCommunications satelliteLecture/Conference
Interplanetary magnetic fieldFinger protocolScoutingLecture/Conference
MaterialScoutingHot workingLimiterLecture/Conference
Source (album)Page layoutNanotechnologyLecture/Conference
Filing (metalworking)Crystal structureNoise figureMint-made errorsLecture/Conference
Noise figureColorfulnessMaterialPaperLecture/Conference
Finger protocolPaperLecture/Conference
PaperTypesettingPage layoutRoll formingAutumnDefecationButtonAudio feedbackHot workingController (control theory)Cocktail party effectOrder and disorder (physics)Dual in-line packageStagecoachCrystal structureLecture/Conference
Transcript: English(auto-generated)
Well, first I would like to thank Brian for inviting me here. And I'll be talking about some new developments
that has gone on at the IUCR office in Chester. I should say that I'm in my second life. I'm a section editor of Acta Crystallographica section F, Structural Biology and Crystallization Communications. And what Tony said about chemists
and not wanting to write in CIF or using public CIF is probably even more true for the biological community. Our main customers or authors are biologists and biochemists and we have to go even lighter in the sense to make things easier for those guys.
Now, to give you a little bit of background, before I talk about public bio itself, I want to give you some motivation slides why we started to work on that.
Now, as of Tuesday, there were about 93,000 structures in the protein data bank and close to 90% of these are X-ray structures. In 2012, there was close to 9,000 structures deposited.
So that's almost 10% of the total content of the PDB. And even taking into account that 2012 was a leap year, this is more than 24 structures a day. That's about one structure every hour that is added to this archive. Every day, even Saturdays, Sundays and bank holidays.
Now, if you do a text search on a PDB with a phrase to be published, this reiterates what John Westbrook said earlier today in the discussion,
this returns close to 20,000 hits. That means that a lot of these structures, they do not have an associated publication. And on a slightly different note, Acta Crystallografica F published in 2012,
264 crystallization communications. So there is some differences in these numbers and this, in my eyes and in the eyes of a lot of other people, does require some action. We have obviously a backlog in publishing.
A lot of these structures that are not published, I should say, come from the so-called structural genomics initiatives where the production of structures was actually put in the foreground and publication of it was sort of put aside and it takes time and it's tedious
and people then just don't do it. This also leads to a loss of information obviously because what is in the database is not everything that a possible consumer or reader or somebody who wants to do some work on the same structure
need to know. And this is of course one of the reasons why we embarked on this public bio project. Now, I mentioned already there is, in principle we do need two types of publications, one I call crystallization communications
and one is a structural communication. You may wonder whether the first one is just part of the way to the second one. This is certainly true. But as long as authors still fancy this type of publication and as long as we still get enough requests to publish such papers,
we will probably do this. Now, the public bio idea is we want to help authors in writing a publication effectively and quickly. So ideally, an author should be able to finish a manuscript in a day or so.
Also, we would like to facilitate the editing and also the refereeing. We want to capture at least some of the unpublished structures in the PDB. We want to ensure that crystallization information is not lost and ideally this information should be minable
and this is where some of the CIF stuff comes into play. If we capture information in tabular form, in principle we should be able to store this away in CIF or mmCIF format. So what we did is we created publication templates
and with tables which should enable us to capture the most relevant information in tabular form. And these tables, they can be populated automatically from an existing PDB file or from an mmCIF file. These are files which are available to the author
when a structure has already been deposited with a PDB or one could start such a publication project from scratch and start to fill in things from the lab book by hand. And the system should also be somehow linked to the IUCR submission system
in order to pipe it quickly into the editorial office in Chester to the corresponding co-editor and referees and so on. Right now, there is two versions of this public bio
which we call public bio annotator. This is a tool for editing mmCIF, so you can basically read in an mmCIF or you can start from scratch and write out an mmCIF and there is public bio publisher and this is what I want to talk about today,
a tool for writing and submitting articles. I should also say that I cannot really walk you through the whole thing but if you're interested, there's an IUCR stand in the exhibition. area during this meeting. I'd be happy to show you more if you want to know more.
Alright, so let's get started. When you log on to publbio.iucr.org you can log on using your IUCR ID. Then, of course, if you do this the first time then you are faced with an empty list of projects.
Then you will have to start a new project. If you have done this before, you have a list of the projects that you have already worked on. Now let's pretend we want to start a new one. This is done here. You can either enter a PDB code or you can upload an mmCIF file or a PDB file
which you have stored on your hard disk or you can write an article without a data source. Let's assume we work on a structural communication, a template which looks like this. It almost looks like it has already a structure of a paper.
If you read in a PDB file, you're immediately faced with a look like this. So it has already populated the title taken from the PDB file or the mmCIF file again, the author list, keywords, and I will show you a few more things that has already gone into this manuscript
from the header of the PDB file or the mmCIF file. All these sections here are text sections, so you can just click anywhere into the text and start to edit this.
If you have some text written in a Word document or somewhere else, you can also just cut and paste it in there. That is working as well. You can add authors by just typing in their
IUCR IDs, for instance, retrieving all the author details from the world directory. Also, if you have in a previous project used some authors, you can retrieve the information into a new project so that all these fields are easily populated.
Each of the sections, synopsis, abstract, introduction, and so on, materials and methods, will give a short description of what is required. There is still some work to be done. We would like to guide the authors by asking specific questions,
what they may want to answer in these questions. Up here, public bio counts the words that have been written, and this section turns red when the word limit is exceeded. Currently, we do not impose any word limits, but we warn the authors if they write too much.
The tables I mentioned already, they are prefilled from data in the mmCIF or the PDB or whatever users had. For instance, this is Table 1,
titled Macromolecule Production Information. What we would like to know is what source organism this is from, what is the DNA source, what is the expression host, and most importantly, the complete amino acid sequence of the construct produced. So this is all automatically filled in.
If it's available, if not to be asked, the authors to fill in. Currently, the discussion is centered around which of these table entries have to be made mandatory and for which ones we can be a bit more lenient.
This is why I italicized this one here. The final layout of the table is not yet defined, not yet fixed. Same here with Table 4, structure refinement.
All of the relevant numbers are automatically put into the table. Some of the numbers are calculated from what is found in the mmCIF file or is derived from numbers in the mmCIF file and is not necessarily correct because there might be errors, obviously,
and these numbers are flagged in bold and the authors are asked to check them and basically to tick them off and say that these are indeed correct and that they can be published in this way. Figures.
There is some templates or some ideas for figures are given. Figures can be uploaded. Of course, then they have to be prepared before. Figures when they are in the paper already can be removed. One can move figures around
and basically define where they should appear in the paper and maybe most importantly we try to give the authors some ideas of what they might want to present as figures. For instance, if they work on an oligomeric protein
then very often the first figure they want to show is the actual protein oligomer with the different subunits highlighted in different colors. Currently we have a maximum of three figures that we allow. If authors want to have more,
we can always put them into supplementary material. Captions, of course, can be added to the figures and as I said, up to three figures are allowed. So the paper is finished now, hopefully.
Of course, the author has provided some text. Then a review PDF can be generated for proofreading and then the project can be shared with the co-authors. So you can put in the email addresses.
Actually, they should be automatically there. And then by just clicking one button the paper goes out to all of your co-authors and they are asked to read the paper or to read the manuscript and provide their comments
again online in the system and this comes back to you and then you maintain the control over this manuscript and you can decide which comments or which modifications to accept and which ones not.
And finally, when you're done, when you have incorporated all the suggestions from your co-authors you click on this button and it automatically goes into the submission system to Aktakrist-F. Then you enter the submission system. Some of you are probably familiar with that
at the stage where you're able to choose the co-editor who you would like to handle your paper. And then it goes the usual way from then on. When you get the co-editor then of course invites referees.
The referee reports come back. You will receive the referee reports and then you can go back to this system make the necessary changes to your article and resubmit it as a revised version. So in summary, what we have done is
we have defined the content and the layout of two structured article types. The content is more or less standard so we basically know what we would like to have and we tell the authors what we would like to have.
The most relevant information we would like to have in tabular form and not in text form because then we can easily harvest that and produce some MMSIF output which we then can use to complement the MMSIF which is already in the PDB for instance.
These structured article types should be easy to work with and they should remain linked to the IUCR submission system. So if you like you can try this out yourself.
If you like it tell your friends and colleagues. If you think it's crap and need serious mending then tell me. And then I move on to the last slide. There is a number of people involved.
Most of all is Simon Westrup from the IUCR office who is doing most of the technical work behind the scenes. So without him this would not at all work. John Westbrook from the PDB who is in the room was very much involved in the beginning
with all these MMSIF definitions. Janet Newman, one of our co-editors was also instrumental in providing some of the feedback. Most importantly for defining the content of the table that describes the protein crystallization.
Louise Jones and Howard Einspar from the ACTA-F team also have contributed significantly. And with this I think I would like to stop and close and thank you for your attention.