Building cloud-based data services to enable earth-science workflows across HPC centres
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 490 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/47250 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 2020355 / 490
4
7
9
10
14
15
16
25
26
29
31
33
34
35
37
40
41
42
43
45
46
47
50
51
52
53
54
58
60
64
65
66
67
70
71
72
74
75
76
77
78
82
83
84
86
89
90
93
94
95
96
98
100
101
105
106
109
110
116
118
123
124
130
135
137
141
142
144
146
151
154
157
159
164
166
167
169
172
174
178
182
184
185
186
187
189
190
191
192
193
194
195
200
202
203
204
205
206
207
208
211
212
214
218
222
225
228
230
232
233
235
236
240
242
244
249
250
251
253
254
258
261
262
266
267
268
271
273
274
275
278
280
281
282
283
284
285
286
288
289
290
291
293
295
296
297
298
301
302
303
305
306
307
310
311
315
317
318
319
328
333
350
353
354
356
359
360
361
370
372
373
374
375
379
380
381
383
385
386
387
388
391
393
394
395
397
398
399
401
409
410
411
414
420
421
422
423
424
425
427
429
430
434
438
439
444
449
450
454
457
458
459
460
461
464
465
466
468
469
470
471
472
480
484
486
487
489
490
00:00
Web serviceCloud computingBuildingSupercomputerService (economics)Cloud computingProjective planeHorizonBinary fileArithmetic meanMereologyComputer animation
00:59
FingerprintScalable Coherent InterfaceAnnulus (mathematics)MathematicsBitService (economics)Function (mathematics)Web serviceData centerOperator (mathematics)Computer animation
01:49
Image resolutionEndliche ModelltheorieProduct (business)outputAreaMultiplication signService (economics)Function (mathematics)Scheduling (computing)Musical ensembleScaling (geometry)Core dump
02:51
Level (video gaming)Data modelPiFile archiverImage resolutionType theoryEndliche ModelltheorieMoment (mathematics)Musical ensembleComputer animation
03:46
SupercomputerElectronic data interchangeComputer iconPiData storage deviceCloud computingOpen setTape driveCollaborationismTotal S.A.Tape driveDatabase normalizationMoment (mathematics)Operator (mathematics)Point cloudFile archiverProduct (business)Identity managementComputer animation
04:44
Physical systemProduct (business)Data modelFunction (mathematics)Axonometric projectionType theoryTime domainTerm (mathematics)Asynchronous Transfer ModePoint cloudEndliche ModelltheorieSlide ruleType theoryImage resolutionNatural numberMusical ensembleConnectivity (graph theory)Process (computing)Resolvent formalismFrequencyCartesian coordinate systemMultiplication signInitial value problemPhysicalismPhysical systemStatisticsNumberComplex (psychology)Presentation of a groupComputer animation
07:30
Physical systemVolumeFile archiverMusical ensembleLimit setFile formatMoment (mathematics)Web 2.0Phase transitionProof theoryInformation retrievalWeb serviceProjective planeOpen setReal-time operating systemOrder (biology)Computer animation
10:08
Process (computing)Time domainProcess (computing)Client (computing)Link (knot theory)Server (computing)Volume (thermodynamics)Error messageMoment (mathematics)Mathematical analysisComputer animation
11:17
Term (mathematics)Combinational logicData storage deviceVolume (thermodynamics)Programming paradigmTape driveMultiplication signWindowObject (grammar)MetadataLimit (category theory)File archiverMiniDiscQuicksortDataflowComputer animation
12:47
SupercomputerSoftwareProcess (computing)Data modelPhysical lawUniform resource locatorComputer networkProcess (computing)SupercomputerHorizonHuman migrationCASE <Informatik>Group actionPhysical systemSoftware developerCollaborationismEndliche ModelltheorieoutputProjective planeComputer animation
13:56
SupercomputerSystem programmingData modelSimilarity (geometry)SupercomputerPhysical systemPoint cloudCloud computingTerm (mathematics)Function (mathematics)Arithmetic meanSimulationProcess (computing)Closed setEndliche ModelltheoriePoint (geometry)Computer animation
15:18
SupercomputerPoint cloudFaculty (division)VacuumPoint cloudProjective planeFluid staticsEndliche ModelltheoriePoint (geometry)Representational state transferQuicksortOrder (biology)BuildingComputer animation
16:43
SupercomputerCloud computingPolytopFLOPSSoftware testingInterface (computing)Type theoryClient (computing)Point cloudWeb 2.0Computer animation
17:56
PolytopService (economics)RootUsabilityTape driveSource codeTerm (mathematics)MultiplicationRepresentational state transferSource codeComputer architectureFile archiverPhysical systemMereologyOpen sourceFront and back endsSoftwareQueue (abstract data type)DebuggerPoint (geometry)Moment (mathematics)PolytopData storage deviceLevel (video gaming)Computer animation
19:17
PolytopService (economics)UsabilitySource codePhysical systemOrder (biology)Limit (category theory)BuildingTerm (mathematics)Process (computing)Touch typingFile Transfer ProtocolMoment (mathematics)Vector potentialCommunications protocolComputer configurationWeb 2.0Point cloudEndliche ModelltheorieOpen setComputer animation
21:34
QuicksortMoment (mathematics)Projective planeComputer programmingPlanning4 (number)Open sourceState of matterWeb 2.0Parameter (computer programming)Client (computing)Axiom of choiceLink (knot theory)Term (mathematics)Front and back endsInterface (computing)Computer fileMultiplication signMereologyLimit (category theory)CodePoint (geometry)Process (computing)PolytopCartesian coordinate systemComputer animation
25:01
Cloud computingFacebookOpen source
Transcript: English(auto-generated)
00:07
So we're starting the last session. John is gonna talk about cloud-based data services. And after that, we will do a little bit of cleanup. So if you see anything lying around, just take it with you,
00:21
or throw it here in the garbage bin. Thank you, John, for the talk now. Cheers, so thanks for the invitation. So today I'm gonna talk about building cloud-based data services to enable earth science workflows across HPC centers. I'm based at ECMWF, and this is part of a project which is funded by Horizon 2020 called HIDAGO.
00:42
So an overview of what I'll talk about. I'll give a brief introduction to ECMWF, who we are and what we do. I'll talk about the data challenge that we're currently facing and ways that we want to mitigate that. And then I'll talk about HIDAGO and ECMWF's role in HIDAGO and what that means. So ECMWF stands for
01:01
the European Center for Medium-Range Weather Forecasts. It was established in 1975. It's an intergovernmental organization, and it's also an NWP, or New America Weather Prediction Operational Center. So I'll talk a little bit about what that means in a moment. It also supports, then, national weather services through its output.
01:21
It's also a research institute, and we also have Copernicus services. We run two Copernicus services, C3S and CAMS, and we also support, then, an additional service called CEMS. We're based in Reading in the UK, and we're currently finalizing our new data center in Bologna in Italy. Our HPC is currently located in our center in Reading,
01:42
but it will move. So our new HPC will be installed in Bologna this year, and we hope to have that operational by next year. So who are we, and what do we do? As the name suggests, medium-range weather forecasting is our core focus, and we're an operational center for that. So that means that we run a global high-resolution model,
02:03
so between nine kilometers and 18 kilometers, depending if it's ensemble or high-res. On a production schedule, it's six to 12 hours. Medium-range, in this case, for weather forecast, means between up to day 10 and up to day 15. And then we also do slightly longer time scale weather forecasts of up to,
02:20
of on monthly and seasonal time scales. What we don't do are short-range weather forecasts, so this is what you'll be familiar with from your own national weather services. They focus on those, and that's their area of expertise. They're usually running on downscales, very high-resolution regional models, and on a much higher, more frequent production schedule. But the output of our global model is used as the input
02:42
for a lot of these regional models, at least within Europe. And what we also don't do then is climate prediction, CO2 doubling, and so on. This is rather for research. So as I mentioned, we have a time-critical component, so twice per day, we have two main runs, at zero and 12 UTC. We have a H-Res, which is a high-resolution run,
03:02
at nine-kilometer resolution, and we have an Ensemble resolution, which is one at 18 kilometers. We're producing about 100 terabytes per day at the moment from those model runs, and we have about 85 million products, and we also have real-time dissemination then to many destinations worldwide.
03:20
We have non-time-critical components, and so we have also research within the center, which is looking at continuously then updating and improving the model. And finally, we have the largest meteorological archive of its type in the world. So it's currently in excess of 300 petabytes. We have 5,000-plus daily active users of that archive,
03:41
and we're adding, at the moment, about 250 terabytes per day, and that's accelerating. Our facility, so at the moment, we have two Cray HBCs. They're identical for redundancy purposes, so that if anything happens during the production of a run with one HBC, we can relatively easily switch
04:00
to the second as a backup. At the same time, we're also utilizing the two as much as we can for research and for other purposes. They're currently in the top 50 globally. We also have cloud services, which we've just started building up in the last few years. So we have something called the CDS, which is Copernicus Data Storage. It's operational about two years or so.
04:22
We have the European Weather Cloud, which is a pilot which we're currently setting up with EU-METSAT, and we also have Wikio, which is a collaboration with MacArthur, and again, the archive that I just mentioned, it's primarily on tape drives, 140 tape drives, and we're adding, as I said, 100 terabytes from the operational side, 150 other,
04:42
and that's 250 total at the moment. So that's leading us then to the data challenge. So you might have seen this slide possibly before, particularly throughout my colleagues' presentation earlier on in the day, but if we look at the disseminated data that we're disseminating globally to these 200 destinations per day,
05:01
as recently as 2015, 2016, this was the order of about five terabytes, and we're now up to 30-ish, and this is only going to grow further. It's getting kind of near to what currently we can push out. And also, our model output, our ensemble is currently running at 18-kilometer resolution, which is the first bar here on the second chart,
05:22
and that's responsible then for about 70-plus terabytes of data per day. This is projected to grow to 900-plus terabytes, so the petabyte range, by 2025 for five-kilometer resolution, and that's only the beginning of what we're looking to do, so we're looking to go to 1.25, 2.5 by 2030 in 10 years.
05:42
So that number will be higher again. So we are facing quite significant challenges in terms of data growth. What's driving that? So there's three kind of main access that we can think about when we consider what's really driving that exponential data growth. So first of all, we have reliability. So historically speaking,
06:01
weather forecasts were run purely in the terministic mode. So you would have a weather forecast model, NWB model, you'd let it run for some period of time, and then that was your forecast. And obviously, that wasn't the most, didn't give you an idea of the probabilistic sense of that forecast. So East NWDF were pioneers in setting up the idea of ensemble forecasting system
06:21
about 30 or so years ago. And this is where you take your initial conditions that that high-rise model runs in, you slightly perturb them. In our case, then we have 50 ensemble members. So each one of these ensemble members then runs a perturbed initial condition, and we let that evolve forward then, and then we do statistics on those forecasts then,
06:41
and come up with a more probabilistic type of a weather forecast. We're also then another axis that we can consider is model resolution, naturally. So depending on HPC resources that we have access to, we're continuously trying to push model resolution to higher and higher resolutions so that we can resolve cloud convective processes better,
07:02
cloud processes physics better, and so on. And then finally, there's the complexity of the model itself. So traditionally speaking, we were just dealing with atmospheric models, so just modeling the atmosphere with what we call the IFS model. Nowadays, you have a whole Earth system model. So you've coupled that with an ocean model, you have to model ice, land use,
07:22
and it gets more and more complex as time goes on. And then the interaction, or all these three components combined together are essentially fueling this rapid exponential growth in data. So what is the problem? The problem is that no user can really handle all that data in real time as we produce it. Much, unfortunately, ensemble forecast data goes unused.
07:43
So we're continuously looking for ways of trying to encourage users to use more and more of that data, but it's becoming more of a challenge, and it's particularly not made easy by domain-specific formats and conventions. As I mentioned, we're disseminating, pushing 30 terabytes at the moment globally
08:00
to destinations across the world, which is going to be a challenge as we start looking towards those projections that we were looking at a moment ago. And then web services, so we're developing and exploring web services then to allow users to access data. So the key challenge here is how can we prove user access to such large volumes of data?
08:22
So how can you actually access it today? Well, today we have something called ECMWF Web API. That allows users to common access Mars or the public datasets depending on what permissions they have. So authorized users and commercial users can access the Mars archive. So Mars here is the meteorological archive that I mentioned a moment ago.
08:41
For public access, we currently have public datasets available, so you just simply have to sign just a general license agreement, and you can then self-register and access that data. In order to access it, you use the ECMWF Web API, which you can access here, and we encourage you to go to those links and check them out and use the data
09:02
and start exploring the data if you're interested in doing so. We have our specific, we have a retrieval example here where you would specify in meteorologically meaningful parameters, you would specify precisely what data you're interested in retrieving, and then you retrieve it in formats like NetCDF, GRIB, JSON.
09:20
At the moment, we do differentiate here between public access and private access, or commercial access, let's say. The public access at the moment is for a limited set of datasets, but we have recently, our council has recently decided to make all our data open by 2025, so in the next five years,
09:41
we'll gradually phase more and more data from the commercial side into being publicly available. At the moment, what's primarily in the public dataset is climate reanalysis, or historical climate data, you could say, for example, like Era 5, Era Interm, Era 40, if you're familiar with those datasets, and over time, for example, this year, we'll release EC charts.
10:01
They will be publicly available by the end of 2020, and in the coming five years, we will essentially make all of our data open access. Another way you can access data today is through the CDS, or our climate data store. This is a portal that we introduced and built about two years ago to allow users to come and search
10:21
for what data we have, and then to download it, and to start playing with it. So, one issue that we face with such large volumes of data and growing volumes of data is that our datasets are becoming quite large for users to work with locally, so it's quite challenging for users to download all that data on a global level, let's say. Then, they have to also have the required processing power,
10:41
then, to process that locally, and to transform it, or whatever they're interested in doing on the client side. So, what we've done here is we've tried to migrate some of that processing, rather, to the server side, and you can do this with Python interface, and it allows non-domain users, then, to build apps, and to build on top of that. So, what's primarily in the CDS at the moment
11:01
is we analysis data, for example, the error five, new data reanalysis dataset that we produced a couple of years ago is available there. So, if you're interested in that dataset, you can go and check that out, and there's also a link where you can, yeah, register today, and register right now, and start playing with it. So, we are thinking about, then,
11:21
novel data flows. So, the way our data flows have worked historically, and the way they're still working today is that we have our HPC. It's running, then, twice per day, as we mentioned, in terms of our main runs, and it's then producing weather forecast data, and then we have our FDB, which is our dedicated object store for meteorological data.
11:41
So, that's basically where we store the data immediately after production, and then we have the PFS, which is Lustre, and then we archive data, then. Everything we produce, and everything that has historically been produced at the center is archived, and that's our archive combination between disk and tape.
12:01
And so, the new kind of model, or paradigm that we're introducing, then, is the cloud-based paradigm. So, rather than taking all these large volumes of data, and moving them out of the center, and then making users, first of all, store that data, and then try and process it, we want to, rather, move the data, move the users to the data.
12:22
And we also want to do that while the data is still hot, so to speak, and with scientifically meaningful metadata. What we mean by hot, here, is that our forecasts are really only meaningful within a certain window of time. I mean, they can be meaningful for research, and for other purposes later, but obviously, weather forecast has a sort of a limited shelf life,
12:40
and we want to get users to the data as quickly as possible, so that they can actually utilize that while it's, so, within a short timeframe that it's been, after it's been produced. So, HIDAGO and ECMWF. So, HIDAGO is a project that I'm funded on. It's a Horizon 2020 project, which is HPC and big data technology for global systems. It's a collaboration across 13 institutes,
13:03
across seven countries. The mission is to advance technology to master global challenges, and the mission, then, is to develop novel methods for HPC, HPD, and so on, to simulate the complex processes involved with those global challenges. And also, then, to,
13:21
our role in this will be to couple data, and couple, particularly, weather data for certain pilots. So, we have three pilot test cases. We have a human migration pilot from a group at Brunel University, an air pollution pilot from a group at SZE in Hungary, and a social networks pilot from Plos in Austria. And of these three pilot test cases,
13:42
two pilots, in particular, are of interest for us, as they will couple with our weather and climate data. So, weather climate data can be, can be considered potentially useful input for migration models, and also for air pollution models. So, ECMDO's role will hopefully be to enable coupling
14:01
as a means to build a workflow. So, what we mean here is that you can consider the HPC, in our case, which is actually a closed HPC. So, for the purposes of IDAGO, because we are an operational center, we can't simply give users access to the HPC, and let them play, and break, and so on. So, rather, we can, through the cloud, give users access to the output of the HPC.
14:22
So, what we are going to do here is, we'll have co-located private clouds at ECMDOF with the HPC. This will then give users access to what would then be, prior would have been a closed HPC. And from that, then, they can build a workflow, starting from our forecast output via the cloud.
14:41
They can then do some processing on that data to prepare it for their own simulation. That simulation can then run in a HPC, high-performance compute environment, and then they can then produce, then post-process the data afterwards, or visualize, and so on. So, the key point here is that we kind of want to open up a closed HPC system, and bring valuable experience in integrating
15:04
such a closed HPC system with such workflows. We think this could be a useful model for other HPC systems around Europe, who have similar restrictions to us in terms of operations, and in terms of access. We have two HPC centers that we will be collaborating with,
15:21
so that's HLRS, Hazelhen in Stuttgart, and Potsnan, the Eagle supercomputer, and then our own cloud resources will then feed into that as well. So, we look to try and couple data. So, there's two steps to coupling that we're envisioning. So, first of all is static coupling. So, because these models that we want
15:42
to couple the weather and climate data with, they haven't previously taken account of such data. So, the first stage of the project is for them to take the static re-analysis data, historical data, to analyze that, and to see how maybe we've used to integrate and improve their own simulations, like the human migration, for example. We've already completed that,
16:01
and we've done that using the CDS. And then there's dynamic coupling. So, from year two onwards, or from 2020, we want to couple forecast data, so the hot data we were talking about, we want to couple that via rest API. And again, this is the model of bringing the users to the data while the data is hot and meaningful, and to enable them to build custom workflows,
16:23
utilizing our data. This is sort of just reiterating kind of points. The models we have already are the push, where we disseminate data over the internet, the poll, where users can come and get our data through the archive, and the third model that we're producing now is the cloud model, and the idea here
16:40
is to move the computer, not the data. So, in order to do that, we need to build a cloud data as a service, which we've named here Polytope, and this is basically underdeveloped eSIMWF. It will be deployed eSIMWF, but accessible externally, and we're currently in beta testing for our first release, which we hope to release this year.
17:00
We're beta testing with European Weather Cloud. It exposes a rest API, and we will also provide then CLI and Python clients to aid users to use it. It will give you access to Mars if you have permission to do so, and we're also interested in implementing Hypercube data access later in the project, so probably in 2021. So, it's, yeah, typical REST type of interface.
17:22
You will have a retrieve similar to the retrieve that we saw earlier, so if you've used eSIMWF, the web Mars in the past, it's essentially gonna be a very similar type of a request. It'd be asynchronous, so you can submit your request from the client to the Polytope server, wait some time, and then poll for status.
17:43
You can list requests and then poll, and we'll tell you it's not yet available depending on either it's processing, you're in a queue, or the data forecasted is not yet available. When the data's available, you get it back. You can delete requests and so on. In terms of the system design that we've implemented,
18:01
so a key point for us was scalability, and particularly that the architecture would be elastic enough for us to scale easily. So we want to have multiple front ends, multiple workers on the back end, and so we can scale depending on demand as we add more users. He's a deployment, so we have Kubernetes support, and then a shallow software stack as well.
18:22
So we have our front end here, say polytope eSIMWF.int of a REST API. We have then the various parts of the architecture, such as a request or a broker queue. At the moment, we're piggybacking on open source software as well, which we're grateful for. So the request store at the moment is MongoDB.
18:40
The queue is RabbitMQ. The front end is Flask, but the infrastructure is generic enough that we can plug and play as we wish. We have a worker pool, and then a worker pool on the back end will then go visit the various data sources and check if the user has permission to access those and then fetch the data either from the FDB immediately after or soon after it's been produced
19:02
from the forecast or go to the Mars archive or go to another data source. So we then stage the data in the data staging area, and then the user can come poll and get the data when it's available. So thanks for your attention,
19:20
and I'll take any questions if you have them. This one, or? Yeah, I think that one.
19:43
Okay, so I think you mentioned that you were moving 30, was it terabytes, or? Terabytes at the moment, yeah. What tools are you using to push that around as far as protocols go? So I'm not, I don't know what we're using
20:00
at the moment in terms of, I think just FTP, or? Are you doing that like over multicast, or?
20:23
Yeah, I think it's mostly FTP, but there's also S3, but I mean, what we're talking about here is not to replace, so the model we're building at the moment, we will still have our dissemination system pushing that 30 terabytes. We'll still have users coming to us via the Mars archive, via the Mars web API for the moment,
20:40
but this is then the third option which will allow users to come to the cloud and access data via that. So it's not necessarily, it'll be a parallel system to the dissemination system. Yeah, yeah, yeah. It's just, it's usually difficult to push out that amount of data, so I was wondering if you guys were using grid FTP or something like that. Yeah, we have a colleague here.
21:00
We could put you in touch with our colleague who's dedicated that. Well, we are pushing 30 terabytes, and at the moment, I think the limit we have on that is we could go up to 40 or 45. So yeah, it's a significant amount of data disseminating globally. The only issue then is within the next five or 10 years that will increase by an order of magnitude, potentially. So we have to think about new ways of bringing users to the data
21:21
rather than pushing it and so on. And do you want to make, for example, port job available as open source, or is that already available? Yeah, I mean, making this, I think the plan is to make it open source. I mean, it's sort of limited in terms of, I mean, you need access, it's sort of built for the Mars archive,
21:41
and I mean, what we would make open source is the Python clients that we mentioned. So similar at the moment, the link that I showed you for the web API that's on PyPI, you can access that, and you can use that then for web Mars access. And so we will similarly make it available on PyPI.
22:04
And so the, yeah, the command line interface and the Python client will be there, and I think probably the code will probably also be there in general. I don't think there's any reason for it not to be, but yeah, it's kind of limited. But it's there if there's interest, yeah.
22:26
You mentioned moving compute to data. How would that work in practice if I write a 4th RAM program that requires Python 2.5 and stuff like this, would I have to push it in a container and give the container to you which uses the data?
22:42
Yeah, I mean, I think at the moment we're just supporting this Python at the moment, so that Python interface that we showed for the CDS. And then as time goes on, we may, I think with polytope that we're developing, we may start considering developing or supporting more complex containerization such as that.
23:01
But we're not quite at that point yet of development, so I'd say that's probably maybe about a year or so away. And is it part of polytope, or? Potentially, yes, yeah.
23:24
In the beginning you mentioned that you use Lustre. What do you use as backends? How do you mean, sorry? The backend for the Lustre file system, is it something? So it's just POSIX at the moment, but we're investigating Ceph and also MVRAM
23:41
in parallel at the moment. But it's POSIX currently, and we have some projects or some investigation for looking to Ceph, yeah. And what was the reason for choosing this file system? So I think it was just for historical reasons, or I think that was mostly available at the time,
24:03
so pre-Ceph, pre-MVRAM. Why you choose Bologna? I like the choice because I'm Italian, but it's just curious. I don't quite, I mean, possibly for political reasons,
24:23
and I mean, one of the issues was to make East India a multi-site organization, so not to simply just have the HVC located in Reading in the UK. And so then there was a process then to choose where it would go, and so member states then were free to submit proposals and submit applications
24:43
a few years ago, and then ultimately then Bologna was chosen for site, for various parameters. Yeah, looks like this was the last talk. Thank you, John. Thanks for this great talk.