Pangeo Forge: Crowdsourcing Open Data in the Cloud
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 351 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/68956 (DOI) | |
Publisher | ||
Release Date | ||
Language | ||
Production Year | 2022 |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
| |
Keywords |
FOSS4G Firenze 2022286 / 351
1
7
13
22
25
31
33
36
39
41
43
44
46
52
53
55
58
59
60
76
80
93
98
104
108
127
128
133
135
141
142
143
150
151
168
173
176
178
190
196
200
201
202
204
211
219
225
226
236
242
251
258
263
270
284
285
292
00:00
Open setMagnetic-core memorySoftware developerIntegral domainGoogolComputer programCubeInformationExtension (kinesiology)Formal verificationMachine visionIntegrated development environmentCodeComputer-generated imageryComputer fileCluster samplingRandom numberPhysical systemSupercomputerLimit (category theory)CollaborationismFunction (mathematics)Mathematical analysisElasticity (physics)Process (computing)Software frameworkGoodness of fitLevel (video gaming)Computer configurationOpen setMathematical analysisCodeVirtual machineLaptopPoint (geometry)Integrated development environmentTerm (mathematics)Scaling (geometry)Digital object identifierProjective planeSelf-organizationMachine visionPattern languageComputer fileGastropod shellCollaborationismForm (programming)QuicksortInternet service providerStaff (military)NeuroinformatikFunction (mathematics)Universe (mathematics)Set (mathematics)Multiplication signNP-hardStudent's t-testSupercomputerView (database)Vapor barrierService (economics)Product (business)SpeciesTouchscreenArithmetic progressionMereologyFamilyData managementBookmark (World Wide Web)Cloud computingElement (mathematics)Software repositoryInformation privacyData exchangeAnalytic setInternetworkingOrder (biology)Design of experimentsWage labourSoftware bugType theoryPhysical systemVector potentialProcess (computing)DistanceClosed setBuildingShared memoryStandard deviationSystem callLocal ringComputer animation
06:05
Mathematical analysisExplosionSet (mathematics)BuildingData miningAlgorithmComputer fileData storage deviceSoftware frameworkParallel portNumberVector potentialAsynchronous Transfer ModeHybrid computerArchaeological field surveySineTime domainAudio file formatNP-hardGoogle EarthGoogolLibrary catalogLattice (order)Complex analysisSystem programmingProcess (computing)Data analysisInternet service providerRAIDCone penetration testSoftware repositoryCondition numberHuman migrationPlane (geometry)Stress (mechanics)Mathematical analysisQuicksortProcess (computing)Product (business)Projective planeParallel portDomain nameNumberComputer fileSet (mathematics)Goodness of fitScaling (geometry)Order (biology)Level (video gaming)Bit rateCloud computingData analysisSoftware frameworkInternet service providerSurjective functionMathematical optimizationFile archiverTerm (mathematics)SubsetBuildingExecution unitAudio file formatMetadataAsynchronous Transfer ModeDifferent (Kate Ryan album)Pattern languageMereologyMultiplication signFile formatReading (process)Modulare ProgrammierungShared memoryMedical imagingResultantSoftwareLibrary (computing)Table (information)Line (geometry)Data storage deviceCondition numberIntegrated development environmentObject (grammar)Data storage deviceComputing platformWage labourNeuroinformatikRight angleType theoryInformationEndliche ModelltheorieAlgorithmGeometryStack (abstract data type)Attribute grammarOpen setComputer animation
12:03
ForceWebsiteMagnetic-core memoryTrailSoftware repositoryGroup actionRepository (publishing)BuildingComputing platformOpen sourceoutputRepresentation (politics)Computer fileSource codePattern languageDisintegrationFatou-MengeMachine visionProcess (computing)Demo (music)Software frameworkCodeData storage deviceProcess (computing)Structural loadRepository (publishing)MereologyLibrary (computing)QuicksortOpen setFunction (mathematics)Level (video gaming)Demo (music)Object (grammar)Source codeNeuroinformatikInteractive televisionBookmark (World Wide Web)Cartesian coordinate systemAnalytic setMetadataBitMachine visionoutputSoftwareRight angleInternet service providerExterior algebraDifferent (Kate Ryan album)Flow separationComputing platformPattern languageProjective planeComputer fileData storage deviceDataflowOpen sourceCloud computingMathematical analysisGene clusterTransformation (genetics)Configuration spaceCubeCollaborationismGeometrySlide ruleElement (mathematics)Multiplication signComputer animation
16:21
Repository (publishing)Degree (graph theory)GeometryChemical polaritySurfaceMathematical analysisLevel (video gaming)Computer fileAsynchronous Transfer ModeInterpolationMathematical optimizationSatelliteRevision controlRippingFluid staticsPolygon meshEndliche ModelltheorieCodeInformation securityWikiGroup actionEvent horizonLaptopConstraint (mathematics)Personal area networkSheaf (mathematics)View (database)Kernel (computing)Task (computing)Row (database)Gauge theoryAttribute grammarFiberFunction (mathematics)DeterminismServer (computing)Range (statistics)Pattern languageLambda calculusString (computer science)Digital filterClosed setQuicksortMeta elementElectronic meeting systemLibrary (computing)Demo (music)Data storage deviceSet (mathematics)Data storage deviceElectronic mailing listMultiplication signDifferent (Kate Ryan album)Open setComputer fileProcess (computing)Repository (publishing)Physical systemCubeInternet service providerAudio file formatLevel (video gaming)PlotterServer (computing)InformationMeta elementWebsiteProjective planeElement (mathematics)Link (knot theory)MetadataDirectory servicePattern languageCodeLaptopLine (geometry)CASE <Informatik>WeightIntegrated development environmentMathematical analysisBlock (periodic table)DeciphermentStorage area networkComputer animation
20:17
Computer fileCodeGroup actionFluxCalculationVariable (mathematics)Physical systemData modelError messageComputer data loggingLocal ringSoftware testingSoftware maintenanceDemo (music)Stack (abstract data type)Computing platformVector potentialTask (computing)Limit (category theory)Library catalogLibrary (computing)Repository (publishing)Multiplication signValidity (statistics)Front and back endsBookmark (World Wide Web)Information engineeringService (economics)VideoconferencingTerm (mathematics)Virtual machineCloud computingQuicksortLibrary (computing)Demo (music)Self-organizationLimit (category theory)Projective planeComputing platformInstallation artOnline helpLattice (order)Software developerNatural numberLibrary catalogSoftwareStatisticsFunktionalanalysisWebsiteCode refactoringVector potentialMathematical analysisComputer animation
Transcript: English(auto-generated)
00:01
Hello and welcome to my talk. I'm going to be telling you about PangeoForge, a new framework for crowdsourcing open data in the cloud. I'm deeply grateful for this opportunity to share our work, and I regret that family circumstances did not permit me to travel in person to the conference. On the screen, you can see some background on who I am, how to contact me if you're interested in following up on anything I present.
00:22
I also want to acknowledge the many contributors to this project, which is truly a team effort. I also want to thank the U.S. National Science Foundation for funding us. One and a half million is a lot of money, but when you look at the ambitious scale of what we're trying to accomplish, it's probably not nearly enough. Part of my goal here is to train more people and organizations into collaborating on and supporting this project.
00:43
Okay, let's get started. The high-level motivation for our work here is the open science vision. We would like everyone in the world to be able to verify, extend, and build upon all existing knowledge. This is not just some abstract academic goal. This would really accelerate research progress and also empower builders to create products and services
01:02
that use that scientific knowledge for social good. That's what's needed, for example, if our species is going to effectively adapt to climate change. There are many social and political barriers to this vision. Here I'm going to focus on the technical barriers. So what's really needed to reproduce a scientific discovery? We tend to think about reproducibility in terms of three main elements.
01:23
The code that was used to create the discovery, the environment where that code was run, and the data that went into that discovery. I think we're actually in a pretty good place in terms of tools for reproducibility, say, compared to where we were 10 years ago. For code, we have a lot of great tools for sharing code,
01:42
GitHub being the standard solution, such that it's really not hard to share code today. For managing and sharing environments, we have a lot of great tools, PIP, Conda, Docker, whatever your favorite package manager is, and we can debate about what the best solution here is, but the fact is there's a lot of options available, and they all work.
02:02
And then for sharing data, we have made so much progress as a community in terms of making data fair, findable, accessible, interoperable, and reusable. We are now very focused on DOIs, digital object identifiers. There are many data repositories that you can use to upload data and make it public and discoverable. However, from where I sit in the world of oceanography and climate science,
02:24
it's not quite so simple. Our data are big. Climate science data can be up many petabytes in scale, so we can't just upload them to Zenodo or download them onto our laptop. And the analysis we want to do is highly complex and varied, with machine learning playing an ever-greater role.
02:42
This is a general pattern in what I call data-intensive science, and it's not unique to geospatial at all. So from this point of view, I really don't think we have a good option, a good solution for doing open science with big data. Let me describe what I see as the status quo,
03:01
at least in the academic research community. So I work at a rich, private American university, Columbia University. What we do at our university is, you know, to support data-intensive science, we set up a big supercomputer in our basement, and then we have our students and postdocs spending a good chunk of their time
03:22
downloading data from various data providers, organizing that data, getting it into a form where it's ready to be worked with. And then we have that computing resource and staff that can support that resource that permit us to do data-intensive science within our silo. This works well for us. We can be productive and do research,
03:41
but it's pretty impossible to give access to outside collaborators. My colleague Shell Genteman calls this way of working a data fortress. So science done inside such a data fortress is pretty hard to share with the broader world. Have you ever had someone send you code that looks like this? The first step is opening some files that you don't have access to.
04:01
This is pretty common, and I think it's a big barrier to realizing the open science vision. So here are some problems I've observed with this status quo. The emphasis on files as a medium of data exchange creates a lot of work for individual scientists in terms of downloading, organizing, and cleaning these files. And what I've observed is most file-based data sets are kind of a mess.
04:20
Even simulation output, they require a lot of manual data science labor to get them ready for analysis. Yet that hard work of data wrangling and data munging is rarely collaborative. The outputs of it aren't really reusable and aren't shared. They're just kept within the private data fortress. Doing this sort of data-intensive science is expensive.
04:41
It requires a lot of expensive local infrastructure or maybe you have access to a big agency supercomputer like a NASA or a DOE supercomputer. This is limiting participation and really excluding a lot of people from this type of work. Finally, the fact that the work is locked up inside a data fortress really limits access to outsiders.
05:03
That restricts collaboration and reproducibility. And it's important to recognize that this is a feature, not a bug, of the way our infrastructure works. These systems are designed to be secure and to keep others out. But that's not what we want if we want to move towards a more collaborative and open future. I think I, like many people in the audience, am excited about the potential of cloud computing
05:24
to change the status quo and usher in a new era of more collaborative and open scientific research. Here by cloud, I just mean some place out on the internet where you can do your work rather than downloading everything onto your own computer. I think that such a cloud environment should offer three basic capabilities
05:42
in order to support arbitrary and analytic workflows on big data. We need analysis-ready, cloud-optimized data, and I'll go into a lot more depth on what I mean by that. We need data-proximate computing, the ability to bring computing close to the data rather than downloading the data. And then we need elastic distributed processing capability if we want to really scale out and work with data at petabyte scale.
06:05
I'm going to spend a lot of time talking about this idea of analysis-ready, cloud-optimized data, or ARCO data. So first, what is analysis-ready data? This term emerged from the geospatial community, so I think that folks here have a good sense of what analysis-ready is. Here I'm not talking specifically about analysis-ready imagery, but just any data set that is ready for analysis.
06:25
For one, it means we can think in terms of data sets and not data files. Files are just one sort of container that can hold data, but the data is the broader object. Data set is the broader object that we want to analyze. Analysis-ready data has no further need for tedious homogenizing and cleaning steps.
06:43
That processing has already been done, and analysis-ready data has been curated and cataloged with good metadata so we know what is in it and what sort of analysis we can do with it. And the fact is, across data science, it's well known that data scientists spend a ton of their time getting their data ready for analysis rather than actually doing the more fun part of building models or exploring algorithms or visualizing or whatever it is
07:06
that is associated with that job title. Here's an example of what analysis-ready cloud-optimized data looks like in my world. We often use the Python package X array, and so we can see our whole data set with many different coordinates, data variables kept together, rich metadata that tells us the units and other information about the variables that are in there,
07:27
and then we can see the data is sort of chunked or set up appropriately for different modes of analysis. That's analysis-ready. Now how about cloud-optimized? This is another term that's probably very familiar to this community. I think of cloud-optimized data as data that is compatible with object storage,
07:45
so cloud object storage like Amazon S3, where it's accessible directly over HTTP rather than having to download an entire file. That means cloud-optimized data should support lazy access and intelligent subsetting without, again, downloading a whole file. This is usually accomplished by integrating file formats with high-level analysis libraries.
08:06
Here's three examples of a cloud-optimized software stack. In the business world, pandas would often be used with the Parquet format to provide efficient tabular data analytics. In the geospatial imagery world, cloud-optimized Geotip is a very successful cloud-optimized format.
08:21
It's often accessed through the GDAL library. In my world of climate science, we use the Czar format and tend to interact with data through X array. All of these sort of share this similar pattern and these similar attributes. Now if you're using Arco data, your analysis can go really fast. So what I'm showing here are some results from a paper we published last year that describes how fast we can process data stored in Xar format in the cloud.
08:45
And there's a lot of sort of details in here, but we're basically looking at how processing throughput scales with the number of parallel reading processes. And we can easily, that sort of top line up there is 10 gigabytes per second. So we can, using modest levels of parallelism, 50 to 100 parallel processes, we can be chewing through data at a rate of 10 gigabytes per second.
09:04
And that'll get you up to that petabyte scale and really permit big data analytic workflows. So this is one reason why we love Arco data. But there's a problem. A problem that we've noticed in the Pangeo community over the past few years of working with data in the cloud. It's really hard to make a good Arco dataset.
09:22
In order to make a useful Arco dataset, you really need to bring together a lot of different types of expertise. You need first domain expertise about knowing where to find the data, how to clean it, what it means for the data to be analysis ready. Then you need to have some sophisticated technical knowledge about these cloud optimized formats.
09:40
You need some serious computing resources to stage and, you know, store and upload this Arco data. And then you need some analysis skills to validate and make use of the Arco data. And asking one person to do all of that is quite a lot. Many of you are probably sort of familiar with the one stop shop analysis platforms in the cloud. This is what a lot of people think of when they think of using cloud computing.
10:01
Google Earth Engine, of course, is a very well known such example. And these platforms have great analytic capabilities. But another thing that they provide that is super valuable are huge libraries of curated Arco data. It's very labor intensive to create and maintain such libraries. And as a result, that expense has to be subsidized by other business activities or by venture capital investment.
10:25
And so, you know, I sometimes wonder whether this is a sustainable path towards providing open science access to everyone in the world. Furthermore, you know, I think it's interesting to question whose job really is it to be making Arco data. If you talk to, you know, data providers like NASA, they're very interested in this.
10:43
But I have the perhaps controversial opinion that data providers' main job is to provide sort of preservation and archival quality of their data. It's really the scientist users who know what is needed to make data analysis ready. So motivated by the above, this is really what led us to the creation of the Pangeo Forge project.
11:00
So the goal of Pangeo Forge is to democratize the production of Arco data to make it easier, more scalable, and more sustainable to build archives of Arco data in the cloud. So to replace some of those difficult steps with a tool, the platform that can automate those things. We're inspired directly by the Conda Forge project.
11:22
So many of you are probably familiar with Conda Forge. It is essentially a crowdsourcing solution to provide software packages using the Conda framework. Before Conda Forge existed, it was possible to upload your own Conda packages onto the internet and distribute them. But the process of doing that was very difficult and inaccessible to most people.
11:42
Conda Forge democratized this creation of these packages by creating a GitHub-based environment where you could just contribute a simple recipe for how to prepare one of these packages. And then all kinds of automation would kick in and actually do the heavy lifting of creating it. So that's our inspiration for what we would like to do with Pangeo Forge for data.
12:03
So our project Pangeo Forge Recipes consists of two main parts. There's a Python library called Pangeo Forge Recipes. And you can think of this as an open source Python package for describing and running data pipelines, which we call Recipes. It's essentially an ETL, Extract, Transform, Load framework focused on scientific data, specifically climate, weather, and geospatial data.
12:26
And then we have Pangeo Forge Cloud, which is a cloud platform for automatically executing recipes stored in GitHub repositories. At a high level, this is kind of how the software works. We create something called a file pattern that describes where to find the source files, which are going to be the input to our recipe.
12:45
And we also bring an object called a storage config, which describes where we want to store the outputs of the recipe. We bring these together along with any sort of processing steps that need to be done along the way into something we call a recipe. We then take that recipe to something called an executor, which knows how to run the recipe and execute it.
13:04
And the executor framework is designed to be flexible and allow you to run these recipes on your favorite sort of distributed data flow style application. So we currently have support for executing these recipes using Prefect, Dask, and Apache Beam. And I'm going to give a demo later where I go into a little bit more depth about how the code actually works.
13:25
Here I'm trying to stay at a high level. So that's one element, PangeoForge Recipes. The other is PangeoForge Cloud. Now in PangeoForge Cloud, the way it works is that we have a feedstock which contains the code and metadata for one or more recipes. These are stored in GitHub as GitHub repositories.
13:44
Then we have something called a bakery that executes these recipes in the cloud using elastically scaling distributed compute clusters. And the bakery's job is to extract the data from wherever it lives and deposit it into cloud storage. So the vision we have here is that by using GitHub to store and manage these recipes, we can enable collaborative community-based data curation.
14:08
Where different people from around the world can work together to curate analysis-ready data. The same way we collaborate on code using GitHub. So if we can manage to make this vision a reality and create a big community-maintained data lake of Arco data,
14:27
we'll be able to have this wonderful world of decentralized yet performant cloud-native analytics. So this is a pretty busy slide, but it sort of shows what this sort of infrastructure is intended to enable. Where we have our data living in object storage and then many different ways to plug in that data.
14:43
Whether it's using interactive computing in Jupyter, distributed processing using your favorite framework like Dask or Spark. Or whether it's another sort of higher level platform such as OpenEO or Open DataCube that's connected to that data. We can all be sharing the same basic analysis-ready cloud-optimized data.
15:01
And the challenge I think for the open science community is how are we going to populate that data library. Now this is really different from the status quo. It's different because it has a strong separation between data storage and compute. Over on the left where we have data storage, we have a relatively steady storage cost. And we can work with data providers and also cloud providers to subsidize those costs or aggregate them at the institutional or national level.
15:29
The data also don't have to be in the commercial cloud. We can use alternative storage providers. Then over on the right we have the data consumers. The idea is that a data consumer can pay for their own costs for computing on that data.
15:42
Which are often very bursty. They just happen in short intense bursts. And we can take advantage of spot pricing to find the most cost-effective way to get our computation done. And we can also have a multi-tenancy architecture where many different institutions can be computing on the same data at the same time.
16:02
That is very different from the way things work today when we have both our data and our compute all locked down together in our data fortress. So to make this a little bit more concrete, I'm going to switch now over to a demo. Where I'm going to show off Pengio Forge, how it works and what it enables.
16:22
Okay, welcome to the demo. This is what it looks like if you head over to pengioforge.org, which I encourage you to do. I'm going to quickly walk through what it looks like to contribute to the data library and have your data get stored in the cloud. First I'm going to click over on feedstocks. This brings us to a list of feedstocks, which are recipes that are managed and executed by Pengio Forge cloud automation.
16:44
I'll click on this one, the global precipitation climatology project. So when I click on this feedstock, I see a bunch of useful information about what this dataset is. You see the title, the global precipitation climatology project. And we can see the providers who originally generated this data. In this case, the NOAA National Center for Environmental Information and the University of Maryland.
17:05
Now this feedstock is actually just a link over to a GitHub repository. And if I click on this button, it will bring me there. So this is the original GitHub repository, which is holding the recipe. I'm going to quickly walk through what we can find in here, so I'll go to the feedstock directory.
17:21
Now one element here in this feedstock is a metadata file called meta.yaml. Which is giving all of that information that you saw on the website. Global precipitation climatology project, etc. Here we have information about the providers and the license. And also who is maintaining this recipe. In this case, it's me. That's the metadata. Now let's look at the recipe itself.
17:41
So what this recipe does is it takes almost 10,000 individual NetCDF files that are stored on a NOAA server. And it assembles them into a single XAR-based data cube. So this is how the recipe looks. Basically, we start by crawling the NOAA server to figure out what are all of the different file names.
18:00
And in this case, it's kind of a hard to decipher or guess set of paths, which is why we need to actually crawl the server. Then we assemble these file names, all 9,000 of them, into this thing that we call a file pattern. And finally, we use that file pattern to construct an X-Array to XAR recipe. And you can see this is just a few lines of code that specify that pattern itself.
18:23
And then a few other customizations about how we want to combine that and process it. And that's it. That's all that a recipe is. So that recipe is then automatically executed by Pangeo Forge. And we can see, for example, that this recipe was last run 36 days ago and it completed successfully.
18:43
Which resulted in the creation of a new data set. So if I click over on the data set tab, I can see a list of data sets that have been produced by that recipe run. And if I unfold this block, I can see code I can use to load the data up right away. So let me demonstrate that by going over to an empty notebook and just copying and pasting that code.
19:04
So what happens here? I'm connecting to the data. In this case, they're stored on Open Storage Network, which is a cloud storage provider outside of the commercial cloud. And I've opened up this data set. And you can see it opened up instantly. You can see over 9,000 different time steps organized into an analysis-ready cloud-optimized data cube backed by the XAR format.
19:26
And this is ready to work with right away. So I could, for example, plot the latest values and visualize what precipitation is looking like. As of, well, in this case, the end of the year last year.
19:41
So that's just a demonstration of what it looks like to interact with data in the library. I want to show you one more piece here, which is how you make new contributions. So to create a new recipe contribution, you go over to the Staged Recipes repository. And here you can see a bunch of discussion about different proposed recipes.
20:01
And here, if we look over in the pull requests, we can see recipes that have been already added and run and ingested into the system. So the process of adding a new recipe is done by making a pull request to this repository, which then generates a new feedstock repository.
20:21
And so you can see a whole bunch of checks and validation and back and forth that are happening here. There's a lot more to go through, but I don't have much more time. So I encourage you to just head over to pengioforge.org and explore for yourself. Okay, so I hope you enjoyed the demo. I'm now going to wrap up my talk. First, I want to share this video, which is one of my favorites, showing the sort of
20:43
failed Rube Goldberg machine that so many of us are familiar with when trying to automate our work. So the fact is, the platform today is a long way from what we think it could be in terms of functionality and professionalism. We are scientists, we are oceanographers, we are relatively inexperienced at building full stack cloud software as a service tools.
21:08
So a big goal for me is to entrain more contributors into this project so we can go faster and achieve more together. By the way, the person in this video is basically Charles Stern, the full
21:20
time data engineer who's been absolutely instrumental in actually getting Pengio Forge to run. Where are we at today? Well, Pengio Forge cloud is live and open for business. You can go there now and check it out. There's not a lot of data up there, but there is some. We are currently working on a major refactor of the Pengio Forge software. We're hitting, because of the very large nature of some of these recipes, we're really hitting some of the limits of Prefect as a workflow engine.
21:45
And we're currently refactoring the backend to use Apache Beam under the hood. We're working very hard on providing a stack catalog for our data library. That's going to make it much more accessible and interoperable with many of the other tools that you've heard about at this meeting. In general, I'm just trying to advertise the fact that this is pretty hard and ambitious, what we're trying to do, and therefore we need your help.
22:05
If this sounds like something that would be useful to you, please consider joining our project. We are very excited about the potential of Pengio Forge to transform how scientists interact with data, and we're going to keep trying to push this forward. So how can you get involved? Here's four different ways.
22:20
You can use the Pengio Forge recipes Python package as a standalone ETL tool for you to manage analysis-ready cloud optimized data. You don't have to use the cloud platform. You can just pip install Pengio Forge recipes and run it. You can access the data that we have already produced through crowdsourcing at the pengioforge.org website. If you would like to get some new data into that catalog, into that data library, you can propose a new recipe.
22:45
And if you're interested in developing Pengio Forge or adding capabilities or backend infrastructure, then I invite you to join our weekly development meetings. Thank you so much to the conference organizers for this opportunity, and I really hope to connect with all of you who are interested in this project.
23:03
It's been a real great opportunity for me to be here. Thank you.