We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

How well do we understand our Universe? Let’s Python it out!

00:00

Formal Metadata

Title
How well do we understand our Universe? Let’s Python it out!
Title of Series
Number of Parts
141
Author
Contributors
License
CC Attribution - NonCommercial - ShareAlike 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
As our understanding of the Universe is expanding, the desire to model the physics that govern cosmic evolution is more evident than ever, driving the emergence of cosmological simulations that model the Universe from the beginning of time till present day. In combination with Machine Learning, they allow for an unprecedented capability; one can train AI models on simulations, where the evolution history of galaxies is available, that can in turn be applied on real galaxies. In this work, we propose the use of Python as a ML tool, through the popular library Tensorflow, to quantify the impact of different cosmological models on the derivation of the history of galaxies. Python accompanies us at every step of the way, from creating the datasets and training the probabilistic neural networks to the visualization of the results, as we attempt to shed light on the cosmic past of galaxies, surpassing the unshakeable reality that we can only observe them at a specific moment in time.
Universe (mathematics)AstrophysicsStudent's t-testPerfect groupUniverse (mathematics)EvoluteField (computer science)File formatComputer animationLecture/ConferenceMeeting/Interview
Universe (mathematics)GenderVariable (mathematics)Right angleSinc functionGoodness of fitArithmetic meanMereologyType theoryDistanceState observerMeasurementData structureMeta elementUniverse (mathematics)Right angleComputer animation
VelocityDistanceLogical constantPhysical lawUniverse (mathematics)VelocityResultantDistance
Electric currentCode division multiple accessVacuumFood energyComputer simulation
GravitationNegative numberPressureVacuumSpacetimeThermal expansionElectric currentCode division multiple accessUniverse (mathematics)Latent heatSimulationRight angleInteractive televisionPhysicalismMultiplication signScaling (geometry)Context awarenessDark energyBitData structureComputer simulationSupercomputerGravitationProcess (computing)Sound effectTouchscreenVolume (thermodynamics)Image resolutionVacuumResultantPoint (geometry)Web 2.0Extension (kinesiology)State observerThermal expansionFood energyOrder (biology)Nichtlineares GleichungssystemGoodness of fitCausalityComputer animation
SimulationSingle-precision floating-point formatReal numberTime evolutionHierarchyData recoverySoftware testingMassPhysicsScale (map)Fraction (mathematics)FrictionConditional probabilityCalculus of variationsBuildingSpacetimeComputer simulationoutputPredictionArtificial neural networkRevision controlMedical imagingDimensional analysisoutputComplex (psychology)EvoluteConvolutionMassDifferent (Kate Ryan album)Fraction (mathematics)Theory of relativityNumberRight angleSpacetimeCategory of beingSimulationBounded variationBitScaling (geometry)Condition numberAnnihilator (ring theory)Multiplication signPairwise comparisonResultantMachine learningVirtual machineSoftware testingLibrary (computing)File formatPhysicalismMultiplicationComputer simulationCodierung <Programmierung>InfinityInformationProcess (computing)DivisorData structureOrder (biology)Physical systemDistribution (mathematics)Focus (optics)Sampling (statistics)MereologyWave packetBookmark (World Wide Web)Ferry CorstenEvent horizonCASE <Informatik>Rule of inferenceLatent heatFunction (mathematics)Latin squareInferenceReal numberGreatest elementSound effectState observerStructural loadInheritance (object-oriented programming)MathematicsPoint (geometry)GravitationEngineering drawingComputer animationDiagram
SimulationTime evolutionoutputPredictionReal numberKinematicsSpectroscopyInformationSpectrum (functional analysis)Matter wavePixelComputer-generated imageryCubeVolumeType theoryVariety (linguistics)Pairwise comparisonCNNInvariant (mathematics)Transformation (genetics)DarstellungsraumSimilarity (geometry)TheoryAstrophysicsComputer simulationProcess (computing)Machine codeDigital signalPrincipal component analysisType theoryLibrary (computing)Category of beingCartesian coordinate systemPoint (geometry)Machine learningProjective planeQR codeRepresentation (politics)Group actionDifferent (Kate Ryan album)Pairwise comparisonOrientation (vector space)Arithmetic meanMachine codeVariety (linguistics)Computer simulationAstrophysicsMappingProcess (computing)SimulationDistribution (mathematics)InformationSpectroscopyReal numberState observeroutputSpacetimeMereologySmoothingMedical imagingResultantEvolutePredictabilityRepresentation theoryAtomic nucleusConvolutionPhysicalismSound effectGraph coloringPlotterArtificial neural networkPropagatorConnectivity (graph theory)Archaeological field surveyDescriptive statisticsGreen's functionTransformation (genetics)Electronic data processingProof theoryAverageDialectRange (statistics)Proper mapKinematicsCubeHelmholtz decompositionGravitationLevel (video gaming)Field (computer science)TwitterView (database)Set (mathematics)WebsiteTracing (software)File formatMatter waveUniverse (mathematics)CASE <Informatik>Image resolutionSampling (statistics)Computer animation
AstrophysicsComputer simulationProcess (computing)Machine codeDigital signalUsabilityLemma (mathematics)Absolute valueComputer simulationKinematicsResultantMappingFunctional (mathematics)Pairwise comparisonoutputMachine codeSimulationFeature spaceEvoluteInsertion lossMedical imagingMachine learningTime evolutionState observerDimensional analysisWave packetVirtual machineVisualization (computer graphics)Inheritance (object-oriented programming)Multiplication signPhysicalismLevel (video gaming)Computer configurationNatural numberComputer animationLecture/ConferenceMeeting/Interview
2 (number)Software repositoryOpen sourceOnline helpSoftware developerMultiplication signInformation privacyLecture/ConferenceComputer animation
Transcript: English(auto-generated)
Hi, everyone. Thank you. I'm Irini. I'm Regina. Perfect. And actually, we're super happy to be here. This is our first year of Python, and we love it so far. So we are both PhD students in the Institute of Astrophysics of the Canary Islands, where we work a lot with Python, like using it like 80% of our day, every day.
And we use it to see if we can, like, our field is to study the formation and evolution of galaxies. And we use Python to see how well we can actually seem to understand so far. And in a sense, see how well we also understand the universe.
So first, before we answer this question, which is kind of a big question, we have to go back and take a step back and ask ourselves how well we used to think that we were able to understand the universe, like, some years ago. And like, only 100 years ago, apparently the answer is that we weren't able to understand
it not well at all. Like many, many astronomers around the 1900s or something like that, they thought that Milky Way stars, like our galaxy and all the stars that comprised it, consisted the entire universe. So as we know now, this is certainly not accurate, and there was a big debate these days to find out what's going on, because they could actually observe other galaxies,
but not very well. They seemed like these clump structures, but they didn't have the means then to observe them very well. So this debate was resolved when we actually got the first observational evidence that other galaxies existed. So this guy here, maybe you know him, Hubble, he was the first to actually manage to measure
the distance of the other metagalaxy from our galaxy. And he did so by observing the brightness of a certain type of variable stars that give us a very good measurement of distance. So he measured that these stars in the Andromeda spiral nebula, as they thought it was back
then, is actually much bigger than the size of Milky Way itself. So apparently it cannot be part of the Milky Way, right? It has to be something outside our galaxy. And just as a fun fact, this is how Hubble used to observe the Andromeda galaxy on the right. This is how it looked back then with the telescopes they had back then.
And on the left, sorry, and on the right, this is how we can observe it now. So we can see that we have actually evolved a lot since then. So by continuing to study the galaxies and studying more about their distance and velocities, we actually found more and more interesting results. We actually found that galaxies that are further away from Earth seem to be moving
away from us in a much higher velocity than galaxies that are closer to us. And what this means is that the universe is not only much larger than we initially thought, but it's actually expanding. So to be able to understand this better, of course, all the scientists are concentrating
in starting to create a new cosmological model. And this is just a way of trying to explain what you observe and the behavior of matter as we see it. And one that is most widely accepted right now is called the Lambda CDM model. So what the Lambda, oh, I can use this one, okay, cool.
So what the Lambda CDL model tells us is that the whole universe just has three basic ingredients. And like 70% of it is dark energy, this mysterious energy that we are not really sure what it is, but we know that this is what drives the expansion of the universe.
And it's mostly associated with vacuum. And then we have like 25% that is, again, some dark material that we call dark matter, and that we again cannot observe. This is why we call it dark. It doesn't interact with light, but interacts with gravity. And it's what causes all the clamping that we see and all the structures creating
the universe. And then this tiny little bit, like only 5% of stuff that we can actually observe. So it's like everything that we are made of, or like the Earth, the stars, the galaxy, et cetera. No, okay, cool. So now that we have this cosmological model, scientists were like, okay, maybe we can
try and simulate the whole universe, right? So how do you go around that? Like 10 years ago, I think the first cosmological simulations were around then, they were like, okay, we have to define some ingredients in order to simulate the universe, but of course
in a reduced volume. So for that, what we need is some questions, some physical equations of how the matter behaves. And then we of course need the effect of gravity. And then we need some smaller scale physics. Because we cannot simulate the universe in its whole extent, like simulate every individual
star, we have to stop at some point, right? So we have to stop at the resolution, the specific resolution, and everything under that. We just have to define some specific recipes that we think that govern how the small scale things work. And of course, to run that, we need some huge supercomputers, and we need a lot of
time. So these cosmological simulations to run, they take like months. So we see here on the right, like a cartoon of how this cosmological simulation will look like in time. So you will see in a little bit again. In the beginning, it looks like all the matter is like randomly distributed and homogeneous.
But as time progresses, we start seeing these structures, and everything starts to be more clamped. And this is actually a very good result from cosmological simulations. They're super, like amazingly well, they do a very good job in simulating the large scale structures. So what we get from cosmological simulations looks very much from what we can get from
observations and what we think the cosmic web looks like. So this is an amazing result already. But what can we do now when we'll go to smaller scales? And by smaller scales, because we're talking about the universe, I just mean galaxies. Not really small scale, but yeah, in that context, yes.
So here, I'm plotting a simulated galaxy and a real galaxy. And I guess you can guess which one is which or not. I don't know. The screen is not very good, but yeah, probably you can guess. But still, so the simulated galaxy is the one on the left, and the real galaxy is this beautiful M81 galaxy. And you see that while we can still guess, it's mainly because we haven't added any
observational effects in the simulated galaxy. But still, the cosmological simulations do an amazing job in simulated galaxies and recreating galaxies in both the physical context and shapes that we know that exist in the universe. So for me, that's already amazing. I'm amazed by that.
And why do we like so much cosmological simulations? So cosmological simulations give us a great capability that we didn't have before. So the reality is that we can only observe galaxies at a very specific point in time, right? So no longer how much we wait, like our whole lifetime, the way the galaxies look in
the sky is not going to change, right? We have to wait infinite amount of time, not infinite, but in our scale. So cosmological simulations actually track the formation of galaxies at multiple snapshots. So we actually have, along with how the galaxies are now, we have how they used to look in the past. We track them across the whole evolution. And that's super nice because we can use, for example, machine learning and terrain
models on data that we have from simulations where the history of the galaxies is available. And then if we find out that they work quite well, we can just go on observations and apply them and infer things that we didn't know before because we cannot observe them, right? But to make sure we can do that, we first have to quantify and make sure that these
cosmological simulations are actually trustworthy. So to do that, we have to do two comparisons. First we have to compare different cosmological simulations with each other because there's not only one cosmological simulation right now out there. Multiple teams created their own cosmological models separately and obviously they defined
different small-scale physics. So this might create differences. And then of course we need to compare how simulations look in comparison to real galaxies. So first we're going to first focus on this part, simulations versus simulations.
And in order to do that, we have to go a step back again and get an idea of how galaxies actually evolve. So the evolution of galaxies, as we know now, is bottom-up. So they start from smaller-scale systems and then they continually merge without the galaxies and they keep creating larger structures. So it's like a hierarchical evolution.
And this, like on the left, you can see how a merger looks between two galaxies. And of course, I don't need to say that I think this is a simulation, this is not something that we can observe. And so from that process, we can classify stars into different categories.
We have the in situ stars of the galaxy, which is our stars that have been present in the galaxy before the merger event. And then they were created there by gas already present in the galaxy. And we also have stars that were created or let's say stolen from other galaxies. So if we manage to measure how many ex situ stars are present in a galaxy,
we get a pretty good idea of how much it has merged across cosmic time. So in order to test between the different cosmological simulations, we will just set a very simple setup. So we'll use machine learning, and we will use some of our favorite Python
libraries because it's EuroPython, I have to say that. So the setup is quite simple. We will use as inputs for the model, like images of galaxies, of how we see them now. So this is like simulation data, but this is actually data we can actually get from telescopes or close to what we can get from telescopes.
And as an output, we want to see what we can see from the galaxy now. We can predict something about its cosmic past. So as an indicative property of the evolution history of galaxies, we will just take the fraction of ex situ mass in this galaxy. And then we will train in one cosmological simulation and see if we can predict on the other cosmological simulation,
or if the differences that the small-scale recipes that these cosmological simulations enforce actually do not allow us to predict. So to evaluate that, first we need to plot how the ground truth looks. So this is the ground truth between the two simulations, the one on the left and the other on the right. And as you can see, the ex situ fraction in a galaxy relates a lot
with the stellar mass in the sense that more massive galaxies have created more mass from other galaxies. And this makes sense, like in the hierarchical concept that we said before, the more a galaxy merges, the more massive it gets. But still this relation is kind of different between the two simulations. So this is how well our model does, the neural network model does
in predicting this ex situ fraction just from images of how galaxies look today. And this is in a fixed simulation. So you train in one cosmological simulation and you test in the same simulation. And we see that we have quite nice results, right? So this means that we can actually infer the property of the history of a galaxy just from how it looks now.
So this is already nice. But now, thank you, so now when we go to a cross-cosmological simulation, so we train on one cosmological simulation and we test on the other, now we see that we find this bias. And this means probably that the small scale differences that we described before actually makes this up for us.
We cannot predict from how galaxies look now as an aspect of their merging history. So this result doesn't make us feel very confident. But still, this doesn't discourage us. We do not lose faith in the cosmological simulations as a tool because we can play around a little bit with the inputs and then just decide to discard some inputs
and then use some others. And we find that if we only train with inputs that are more closely related to galaxies, to gravity, sorry, we are actually able to cross-predict across simulations quite well. So we managed to find features that are independent of the small scale differences between simulations and they're actually robust.
So this means that we might be able to use that so that we can actually go and predict on actual observations, which is nice. So can we do more? So can we just see how our galaxies look now and instead of predicting like a single number for just one galaxy,
can we, let's say, predict the origin of every star that we see? That would be nice. So let's see how we would go about doing that. So for that, we will use like a more complex convolution neural network now. We'll use a conditional variational autoencoder. And what the variational autoencoder is, simply put,
is just a version of an autoencoder that just compresses images in a low-dimensional space and then it's able to reconstruct them just from this dimensional space. And then you can also make this conditional. So you just factor in the encoding process and the decoding process some conditions that you want and then you're still able to reconstruct the results.
And the nice thing about the variational autoencoder is that the latent space is well-behaved, so it has good properties. And that means that you can actually use this model as a generative model as well. So you can completely remove the encoder part and during inference, you can just sample from the latent space and then just create new images
that follow the distribution of the inputs that the model was trained on. So in our case, what we're going to do is we're going to use as conditions the images of how galaxies look right now. And the ground truth and the reconstruction that we're trying to achieve is the information of how the evolution looks in a 2D image.
So same part of their evolution history. And of course, we need to even the conditions are 2D images so we need a convolution neural network for that to compress that. And during inference, we'll just ask the model, okay, I know you have been trained now on simulation data. I want you to produce me from what you know
from this latent space that you know now. I want you to produce me a galaxy, the evolution history of a galaxy or how this would look in a 2D image given that I give you as inputs how this galaxy looks now. And this actually seems to work quite well. So here on the top, I have the observable inputs
that this galaxy is three different galaxies from the simulations and they depict properties of how this galaxy looks right now. And on the bottom, we have how the ground truth looks and how the model predicts it. So for the all three galaxies,
the result looks quite nice, I think. So if you want more info, you can just scan this QR code. It will get you to this publication. And so from this work, what we've learned is that we can actually use cosmological simulations and we can use them with machine learning and they can actually help us unveil some part of the evolution history of galaxies.
But before getting too excited about that, we still want to make sure of this other part that I said before. We still want to make sure that simulations and real galaxies actually are closely related together, right? So for that, I will give you to Fajina. Thank you. Okay, so from an observable point of view,
I'm working with interior spectroscopy. So this type of data is just spatial in two axis and one spectral axis with all of that information of galaxies, so we can just think of at every wavelength, we have an image of a galaxy. So the cool thing about this data is
that we can derive physical properties, so the 2D maps that she was showing, the kinematics, the stellar ages and the stellar chemical composition. They can all be derived from this data. The only thing is that we have to make a lot of assumptions on how the stars are formed, how the light propagates through space and the instruments and so on.
So for this, we really need to make our simulations look like observed galaxies, and this is what I worked with. So for that, we take a galaxy from the simulation and then run it through a Python pipeline where we put all these ingredients, so we have from the astronomical part, star formation,
stellar evolution, star emission, dust absorption. In the instrumental side, we have all the description of the instrument that we're using to observe the galaxies, so we have the resolution, the sampling, and so on. So then we can get an observations like data cube, and this pipeline usually takes like several hours to run
from one galaxy, just as a comment for now. And to compare with observations, we need to cover a variety of different types of galaxies. So for this, we need to have really multiple galaxies, and this is because even though simulations do reproduce galaxies or they generate galaxies that look like the galaxies
in our universe, they are not reproducing exactly the galaxies that we are observing. We are not generating an Andromeda galaxy. So for this, we need a lot of simulated galaxies, and we did this for 10,000 MOOC galaxies,
and you can see the results of this comparison with observations in a paper that I show here in this QR code. But well, maybe the interesting thing for you is that this data is all public, and you can go and check it out on your own and go and trace back the history of the galaxies that are simulated
because all that data is public. So to further compare the physically resolved properties of galaxies, because what I showed before was only the integrated properties, we use contrastive learning. It is a self-supervised learning algorithm. In particular, I use convolutional neural networks because I'm using as input the same type of data
that she's using, so 2D maps. And I'm not going to explain anything about this maybe, but if you're interested, we can talk about it later. So the only thing that you need to know is that it's useful to extract meaningful representations by applying transformations that we want our data to be invariant to.
So for example, we don't care about how the galaxy is orientated in space because it's just a projection effect. And so for that, we just rotate the image or things like that. So here I'm showing how a projection of this representation space that we obtain marked in green around, and I'm comparing with the components
that are obtained by a linear decomposition, just principal component analysis. So we can look at observational effects, so as a, that I used to color code this plot. So each galaxy is a dot, and it's color coded about this,
taking into account this effect. So we can see that PCA shows a smooth transition for apparent size, for example. That is something that we don't really care about because it's not something intrinsic from the galaxy. And this means that this space is correlating a lot with this property. We don't want that.
And in the other case, we see that the distribution is quite arbitrary, and we see the same for orientation. Now, if we look at physical properties, the things that we do care about, for example, how the galaxy rotates, that is the galaxy spin, or the age of the galaxy, we see the smooth transition in our representations, but not so much in the PCA.
So this is a proof of concept done on observed galaxies, but we want to take this further. But first, we're going to see some examples of what we do, what can we do with this representation space. So on one hand, we can do clustering to see what are the average properties in different regions of the representation space.
We can also try to, like we can find one galaxy that we're interested in and try to look for similar examples within this representation space. We can use it also to find galaxies that are further away from all the rest in the representation space and might be weird galaxies or rare galaxies that we're interested in studying a little bit further. And what is my next step is to use this representation space
as a common ground to compare the simulations with the observations. Because in this representation space, it won't be affected by the observational effects. We can actually compare if the two sets of galaxies live in the same space or not.
So yes, as a conclusion, we can say that we mostly understand the broad properties of galaxies. We can replicate those trends with simulations. So we know that all these ingredients, gravity, supernovas, star formation, and active nuclei are part of how galaxies evolve.
And if we try to form a prediction of a galaxy at a given stage, we mostly recover something that makes sense. It looks fine. We have the kinematics, the mass distribution that is fairly well recovered, but it still needs some tweaking in the flavor of this cake galaxy.
And this is mostly related to, well, the chemical composition that is related to star formation. So for these more detailed comparisons as what Irini was showing or what I was showing need to be done to actually constrain the physics and the models behind the processes that regulate galaxy evolution to get the proper recipe
to form this cake galaxy. So why we are here today is basically, let's see, as astronomers, we can't just go and measure a galaxy with our hands. We can't, like, generate our own experiment. So we have just data that lands in our hands, and we need
to think of ways to derive physical meaning from this data. So this is where astrophysics and Python meets because basically what we're doing is we're just doing data processing and modeling. So we use all these libraries and even more. And something quite nice also is that most of the code is done in Python astronomy, at least today.
And we have dedicated libraries for that, like Astropy, which is super nice. It covers a lot of different ranges in astronomy. And, well, also something that is super nice is about this community is that we have lots of data that is publicly available. Like, whoever can just download the data. And if you're interested, like, just let us know,
and we give you the tips. But it's mostly just go get a survey, look for survey data access. And you got it. And the same with the simulations. So, yeah, if you're interested, just let us know. Thank you.
Great talk. And we have some time for questions, if you can come to the, yes. Yeah, hi. Thank you for a really nice and interesting talk with lovely visuals. You said that in the simulations,
you have the time evolution of the galaxies. And then when you observe galaxies, you obviously are gonna have, it's fixed, but some of them are gonna be old, some are gonna be young. Can your machine learning models predict the ages of galaxies that are observed? We can derive the ages, like through cosmology,
through models that can, like, you can use the images that you get from telescopes. So then there is some way to derive the ages from there, just by taking the chemical composition of stars and something like that. But still, these models are not very accurate. And we could, I think people have already done that, or I'm not sure.
You can, of course, train a machine learning model to just see how a galaxy looks today and just predict the age. And then compare with the other techniques we already have. I'm not sure if someone has already done that with machine learning. I think that it's possible to do it, simulations and machine learning, but I don't think that the result will be directly applicable to observations
because of all the problems that we have to determine those things from observations. So that's like one of the big unknowns so far in galaxy evolution, I think. I mean, a rough idea, yes. Hello, thanks for the nice talk. It was really good, very interesting.
I'm wondering, you said that you are using PCA. That means that you have a high dimensional feature space or what are your features typically? Yeah, so for PCA, I didn't really use it. I was just in comparison with, I mean, with what we get on the other hand. But yeah, the input is the input maps in 2D.
So we just flattened them. And this has kinematics, age, metallicity, and just an image of the galaxy itself. So about five features, I guess. It's more for visualization than? Five maps.
So what's a map? So 2D, so it's like one dimension, two dimensions, and then five stacked maps. Okay, thanks. Let's give Irina and Regina a chance.
Can I have maybe one more question? Yes, please. I'm more of a general one. I'm wondering how important do you think is coding knowledge in natural sciences in general?
That's a very nice question. So like, what we see, it would be very nice. We've got some coding lessons before going into natural sciences, because we see that we have so many data
that we are actually enforced to learn coding ourselves. And sometimes maybe we don't learn the best practices or something like that. So in a sense, we are able, we manage to make things work. But of course, if we got some input maybe from you, would make it much more efficient and much cleaner code or whatever.
But yeah. Okay, thank you. It's of course super important. Hi, thanks for the talk. I was wondering if you only trained in a data, purely data driven way, or if you also experimented with basically encoding knowledge from physics in the network
or in the model via the loss function, for example. We haven't done it. No, but it's certainly an option, yeah. Yeah, some people do these other things, but I mean, yeah, it happens. We didn't do it, but some people do. Data driven, well, okay, yeah.
Thanks. So I have a follow-up question. Last, second last question, which was about taking help from developers or software developers via AI. So is it the work you're working on is open source where people can see what you're working on and maybe recommend what is the best practice
or something like that? So we can only answer about ourselves, I guess. But yeah, we're trying to, after the publication, we try to put everything in a GitHub repo and then have it publicly available, yes. Cool, thank you.
Nothing, in my opinion, that's the best practice. I mean, to make your data public so then people can just actually use it for the next work and even check that everything is working fine. But it's not always what happens. Actually, we have time for one more question.
Okay, let's give Irina and Regina a big applause and thank you very much.