We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

III. The integrity of published information: Publication of small-unit-cell structures in Acta Crystallographica

00:00

Formal Metadata

Title
III. The integrity of published information: Publication of small-unit-cell structures in Acta Crystallographica
Title of Series
Number of Parts
15
Author
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Structural journals published by the IUCr have an efficient workflow built around the CIF standard. For small-unit-cell structures, authors submit their articles in this format, building on files created directly by their structure solution/refinement software. The processed experimental data from which the structure has been determined (structure factors, Rietveld profiles) are also uploaded in this format, allowing stringent technical peer review of the quality of the structural modelling. All published structures are accompanied by the underlying data that would permit their redetermination, and can be explored in the online publication through three-dimensional visualisation and analysis tools.
Keywords
MeasurementCell (biology)Mercury switchCrystal structureChandrasekhar limitDayInstitut für Energieverfahrenstechnik und ChemieingenieurwesenYearLecture/Conference
ForcePlain bearingStandard cellFiling (metalworking)
Plain bearingMusical developmentCommon Intermediate FormatFiling (metalworking)YearProzessleittechnikVideoTape recorderMusical developmentStructure factorStandard cellPlain bearingSystem in packageNoise reductionToolCartridge (firearms)MorningCrystal structureVolumetric flow rateScoutingFormation flyingSpeckle imagingHot workingInternational Celestial Reference SystemTypesettingSpare partDiving suitLecture/Conference
AutomationStructure factorCell (biology)System in packageAudio feedbackMorningTypesettingRutschungFormation flyingAutomationBending (metalworking)Cartridge (firearms)Crystal structureSensorThin filmStandard cellCrystal structureHyperbelnavigationCocktail party effectDirect currentLecture/Conference
Common Intermediate FormatMeasurementToolDoorbellFiling (metalworking)Diving suitCrystal structureRoll formingRail transport operationsNoise figureStructure factorHot workingProfil <Bauelement>Cartridge (firearms)YearCombined cycleSpeckle imagingProzessleittechnikRutschungCell (biology)CogenerationToolToolPlain bearingTypesettingMetallic bondOrder and disorder (physics)MorningSensorPaperSystem in packageWeightPowder diffractionBill of materialsKit carPickup truckPrintingLecture/Conference
RutschungMorningNoise figureEngineGround stationAtomismTypesettingCosmic distance ladderToolKit carJensen Car Co.Power (physics)Lecture/ConferenceComputer animation
Pattern (sewing)CogenerationPowder diffractionGround stationAccess networkSystem in packageSpeckle imagingMorningMicrophoneQuality (business)Crystal structureRutschungWoodturningToolLastBallpoint penMeasurementClock
Transcript: English(auto-generated)
I just welcome everybody back from lunch and I do realise I've been here for the last
few days that this is the time that most people are likely to fall asleep, but I'll try and keep you awake as we go. I've been working at the IACR now for 20 years, so I arrived at the time when SIF was just really beginning to take hold, so I can't actually remember a time before
SIF. I work for the International Union of Crystallography, as you will know we're an international scientific union and we currently publish nine research journals. The three that I'm most interested in for these purposes are actors B, C and E where we do a lot of small molecule publishing
and the driving force certainly in C and E is SIF. Also it's our remit to promote the standard crystallographic data file or SIF. Now those of you who saw John Heliwell's talk and Brian McMahons this morning will
have seen this diagram already. It's a schematic of how data flows through the crystallographic process. I'm going to be concentrating here on the ones in the thick lines from data reduction through to publication and dissemination to databases and the like. I've
actually added one important line here after John Heliwell's talk this morning where we're talking about raw data which is being deposited and linked to from the final published article and this is the way things are going. Quite a scary diagram, I'll
try and give you an overview of it. This is how the various files flow in the publication system. Start with your experiment, you've got your raw data which could be an image
SIF format. Go through a process of data reduction, get structure factors. Structure refinement, out comes the SIF. So the SIF structure factors go to the author. The
author, any other information into the SIF and along with the structure factors submit those to the ICR in the first instance for validation. If they pass through the validation which isn't just structural validation, it's also duplication checking to see if the structure
has been published before. If it passes through all that, it comes through to us for peer review. So we've still got the structure factors in the SIF which are used as part of the review process but also we form a PDF review of the article, various HTML files including check SIF reports which is a validation report, duplication reports and various other
things. Pass through the peer review system, accept it for publication, come through for technical editing. At this point, the editing is done on the SIF, we don't edit the structure
factors but the structure factors come through and the SIF is technically edited and ultimately we go to publication where the article of record is an SGML file. So the SIF, all the information in the SIF is converted to SGML. From publication, all that data is
then disseminated to various chemistry databases such as the CCDC or in the case of inorganics the ICSD and various bibliographical databases and onto our website where we publish the
traditional PDF of the article, various HTML files and the SIF and the structure factors are freely available. Quick overview of the early development of SIF, again you've seen some of this and some of this is actually in the conference handout. There's a small
section for our purposes, my purposes. Before SIF, David Brown worked on the standard crystallographic file structure. A lot of the work that was done there was fed in to the SIF. As early as 1990, before we even had SIF, we actually checked the data coming to our journals using
various suites of programmes such as XSTAL, NRCVAX. We keyboarded all the data and did that by hand. 1991, the first SIF dictionary was published and in-house, Brian McMahon
developed techniques for processing this SIF data to a typeset article. Within a year, we had our first unsolicited SIF submission, which we weren't expecting. This speeded up the process and we really had to do something to automatically work with these. Within
two years, the workflow was in place that we could guarantee faster processing for any article that was submitted in SIF format. By 1996, advocacy became a SIF only journal. You could only submit by SIF. Five years from the first SIF to mandatory, it's quite
quick. Benefits of SIF for us as publishers, there's a submission and deposition of structural and experimental data in a standard format, obviously a good thing to start with. It also, and this was quite forward thinking at the time, it contained the framework for
publishing an article directly. There wasn't just the structural and experimental data in there, there's also text or the ability to put text in. Standard format also allowed the possibility for automated validation checking against a set of known standards and
also duplication checking against relevant structural databases. This could all be automated now. Collecting all this data, the SIFs and the structure factors also helps in fraud detection, which I'll come to a little bit later. It has been mentioned this morning as well. What's validation? Well, it's a comparison of your data against a set of test criteria.
Simple things like, are the usually expected data and information present? Have you got your cell parameters? Well, of course you can have your cell parameters, but it'll check that. Are related parameters consistent? Does the cell volume match the cell parameters?
Doesn't always, or didn't, it does now, but there was a time when it wasn't the case. Is the space group correct? Has the refinement converged? Are the assigned atom types correct? And is the structure reasonable? And going back to duplication, has it been determined
before? The automation of validation allows authors to instantly get anonymous and instant feedback. We have the Check SIF service on the web, which anybody can go and use. It's freely available, and you get, when I say instant feedback, some large structures
will take a few seconds to run, but it's fairly instantaneous. It allows authors to detect and hopefully fix problems prior to submission, which can lead to fewer and shorter revision cycles. There's a consistent set of applied criteria. These are published,
they're freely available, there are no hidden hurdles that you have to jump. It allows the editors and referees to focus on the science, and the benefit, which is what everybody wants, faster publication times. Now, Tony Linden's going to talk after me about validation
of SIFs, so I'll just quickly skip through this slide about what validation software does, but I will draw your attention to these points at the bottom. It isn't intended
as a hurdle to make difficult. We don't want to hinder the publication of correct results. It's the set of criteria to possibly sometimes make you think again, oh, maybe that hasn't been done quite correctly, or yes, it has been done correctly, and this is why it is correct. It allows you to focus on the science and show where the good science
is. This is the workflow we see with this. Obviously, do your sample preparation, data collection, solution and refinement, and your structure analysis. You then prepare a SIF and your structure factors, or in case of powder diffraction, a powder profile.
You send those to check SIF. One of two things can happen. One, it gives you a completely clean bit of health, or two, it has alerts that you may want to investigate further. So taking the case where there are alerts, you either can resolve them, or you could
complete what we call validation response form, which is your explanation of why there appear to be alerts there. To resolve the alerts, you send it to check SIF, go back through the cycle, no longer any alerts, go on to publication. Or in the worst case, something is seriously wrong, and you may have to go back to do the structure analysis,
or the solution refinement, or possibly even the data collection again. Otherwise, you submit to the journal and go through the review process. As I said earlier, SIF contains structural data, and also textual data. This is a typical set of structural data,
cell parameters, weights, symmetry operations. In combination with the textual information, you can go straight to a pre-print of the article. What tools do we make available? Well, Simon Westrup has written this program called PUBLE SIF, which is downloadable,
freely available, and it's essentially a WYSIWYG editor. On the left-hand side, you have the SIF. On the right-hand side, you have representation of the article, and you can make changes in either, and they're reflected on either side. It's a very useful, very
well-used utility. On the web, we also have PrintSIF, which allows you to have a PDF version of the paper, as it would appear in the final journal. Again, upload the SIF. But it allows you to do more than just have a pre-print. You can highlight,
you can go to the bond table, highlight a bond, and it'll fire up Jmol, and you can investigate further. You can also do that within the text of the article, highlight a bond within the text. It'll find it, fire up Jmol, and you look at it further.
We also make available the Enhanced Figure Toolkit, which again, behind it has Jmol. This is obviously a protein structure rather than a small molecule. It'll work happily for either. This allows you to prepare an animated, well, sorry, it allows you to
form a static image of however you want to represent your structure, but also dynamic images as well. And the journals will publish dynamic images quite happily, and this is once you've prepared your image in this way, you can upload it directly to our submission
system. Everything's carried through. You don't need to worry about it anymore, and it'll be published in that form. Check SIF. This, over the last two years, it's been upgraded so that you can upload your SIF and structure factors, and it'll
just run the Check SIF and Platon suite to give you a validation report on your SIF in this form with a summary at the top and any alerts, any possible validation
issues it may find. And this is fully linked to files which give you explanations of all the tests and everything that might be going on. There's also a tool we make available whereby you can upload a SIF and create a table in RTF form which you can use
for any purpose. You can upload it into a Word document. It's a nice way of formatting all the data you've got in your SIF. People have used the RTF in their thesis. It's a very useful tool. Now, going back to what was mentioned earlier about fraud. Between 2007 and 2009, we published in Atticrist E of order about a hundred fraudulent
structure determinations. How did we find out? Well, the nature of the fraud only became apparent when we were able to correlate the different structure factor files. Ton-Spec,
who's here, did this work and I'll show you an example of two uncorrelated structure factor files. This is the kind of plot you'll get. No correlation at all. But if you take
a fraudulent structure, the genuine structure that it's been derived from, the two structure factor files you get there to the human eye, they're completely different. But when you correlate them using Platon, you get something like that. Clearly, that kind of graph should set off alarm bells that something is going on. And
in this case, this is what happened. We now, because at the submission time, we take a SIF and a structure factor file, and with the database we have over the last 15 years, we have all the SIFs and structure factor files we've had, we can now automatically
detect this. If this happens again, from the last two years onwards, if this happens again, we should be able to detect it automatically. This can't be said of all publishers because not everybody takes structure factors. This slide has been shown this morning and I won't
reiterate it. It's why do we publish data. John Hellewell gave a very good summation of this and I don't want to confuse matters. Publishing the data. Here we have a figure that was prepared using that enhanced figure
toolkit I showed you earlier. Now, this is the author's view, the view the author wanted the reader to see. And he's highlighting in here a distance between two atoms, two
atoms, and they also provided this view. They blew it up and went in and wanted to demonstrate this. The beauty of this particular system is that you don't have to look at what the author wants you to look at. You're free to explore
completely using the JML engines behind this and you can look at the view you want to look at. Quite powerful. Also, something that I think Brian Toby talked about this morning, PD-SIF, what we've got here is a tool that the reader can
generate a predicted powder diffraction pattern for any SIF. This isn't a powder SIF per se, so there isn't the experimental data on top of it, but this will generate
the predicted powder pattern for any of the molecules we have. And again, you can zoom in and save production quality image. Now, this is my last slide. Other things, you can visualize crystallography and chemistry here. So we've got direct access
to the data. So again, you're not looking at static data, you're not looking at a view that an author wants to see. You can go in and play with this yourself. So for any published structure, if you click on 3D view, you'll get something like this. So you've got a 2D representation, 3D representation of the molecule. You can
choose ball and stick, ellipsoids, molecule, unit cell, whatever. You can collapse that down to predict. So you can try and automatically turn the 3D into 2D, and then a unit cell view of this. And I think I've just finished on time. That's not bad because
my clock that I've got here has actually stopped.