Never before it was so easy and inexpensive to gather, as well as generate,massive amounts of data. Often, data get discretized in space and time,naturally leading to multi-dimensional arrays. In fact, arrays play a corerole in most domains of science, engineering, and business - generallyspeaking, spatio-temporal sensor, image, timeseries, simulation, andstatistics data. This raises the need for flexible, scalable, and openservices in replacement of the bespoke silo solutions that have prevailed inthe past.
Traditional databases have been successful due to their flexibility (throughquery languages) and scalability (through manifold optimizations andparallelization in the server) - however, they unfortunately do not supportmassive arrays. This is being remedied within ISO currently where SQL/MDA("Multi-Dimensional Arrays") is in an advanced stage, likely becoming adoptedin summer 2017. SQL/MDA adds declarative array definition and operations toSQL. Not only paves this the way for powerful services, maybe even moreimportant it allows, for the first time, integrating data and metadata intothe same archive, even in one and the same query. As such, SQL/MDA will be agame changer in data services not only for science and engineering at large.
We present the concepts and rationales, as well as the open-source technologyrasdaman ("raster data manager") which is serving as the blueprint for MDA.
We have learnt to live with the pain of separating data and metadata into non-interoperable silos. For metadata, we enjoy the flexibility of databases, bethey relational, graph, or some other NoSQL. Contrasting this, users still"drown in files" as an unstructured, low-level archiving paradigm. It is timeto bridge this chasm which once was technologically induced, but today can beovercome.
One building block towards a common re-integrated information space is tosupport massive multi-dimensional spatio-temporal arrays. These "datacubes"appear as sensor, image, simulation, and statistics data in all science andengineering domains, and beyond. For example, 2-D satellilte imagery, 2-Dx/y/t image timeseries and x/y/z geophysical voxel data, and 4-D x/y/z/tclimate data contribute to today's data deluge in the Earth sciences. Virtualobservatories in the Space sciences routinely generate Petabytes of such data.Life sciences deal with microarray data, confocal microscopy, human braindata, which all fall into the same category.
The ISO SQL/MDA (Multi-Dimensional Arrays) candidate standard is extending SQLwith modelling and query support for n-D arrays ("datacubes") in a flexible,domain-neutral way. This heralds a new generation of services with new qualityparameters, such as flexibility, ease of access, embedding into well-knownuser tools, and scalability mechanisms that remain completely transparent tousers. Technology like the EU rasdaman ("raster data manager") Array Databasesystem can support all of the above examples simultaneously, with onetechnology. This is practically proven: As of today, rasdaman is inoperational use on hundreds of Terabytes of satellite image timeseriesdatacubes, with transparent query distribution across more than 1,000 nodes.
Therefore, Array Databases offering SQL/MDA constitute a natural commonbuilding block for next-generation data infrastructures. Being initiator andeditor of the standard we present principles, implementation facets, andapplication examples as a basis for further discussion. Time allowing we willpresent live demos from services exceeding 20 TB of "datacubes". |