The NIH pilot project 'Integrated resource for reproducibility in macromolecular crystallography' will create a web-based archive of diffraction images collected from macromolecular samples around the world. The resource will enhance and sustain the macromolecular diffraction data comprising the primary data sources for macromolecular atomic coordinates in the Protein Data Bank (PDB). The project will develop tools that will extract metadata from images alone, or from a combination of information obtained from a PDB deposit and diffraction images. All of the metadata needed for automatic determination and re-determination of macromolecular structures will be collected. Currently, the project has more than 1500 data sets and a preliminary system for extracting certain types of metadata. The data mining tools developed will allow for analysis of single experiments, as well as sets of experiments performed using various synchrotron and home based sources. Diffraction sets and metadata will be available from the project's website at http://www.proteindiffraction.org, or through a link on a PDB deposits page on the RCSB PDB website. This talk will present initial results of data mining performed on the archive. |