Logo TIB AV-Portal Logo TIB AV-Portal

Gaia Variability Studies: case for Postgres-XC.

Video in TIB AV-Portal: Gaia Variability Studies: case for Postgres-XC.

Formal Metadata

Gaia Variability Studies: case for Postgres-XC.
Alternative Title
Title of Series
Number of Parts
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date
Production Place
Ottawa, Canada

Content Metadata

Subject Area
Mapping a billion stars throughout our Galaxy and beyond Gaia is a cornerstone European Space Agency mission that will create an extraordinarily precise three-dimensional map of more than a billion stars throughout our Galaxy and beyond, mapping their motions, luminosity, temperature and composition. This huge stellar census will provide the data needed to tackle an enormous range of important problems related to the origin, structure and evolutionary history of our Galaxy. I will describe how Variability Studies of light sources observed by Gaia pose Big Data problem that we are trying to solve with the help of Postgres-XC.
man modes presentation The list analysis maximal sets perspective structured data words uniformity Computer animation Hardware Hardware
man modes Actions Context study views time machine Databases proteins Migrations structured data Types means words uniformity Migrations Computer animation different hybrid Hardware system Router oracle
satellite point torus Observation time maximal part perspective petaflops orders of magnitude consultant van number memory position form man Zoom's moment gute degree plane means processes Computer animation real vector case objects spaces
Computer animation different time apparent depth sun testing
satellite point states point sources sets sun vision number residual Types Computer animation case phase operations cubes phase ensemble
Computer animation angles DAS des scans spectral applications
point rotation structured data modes Regular Expressions Computer animation Hardware time views topology Stripe spaces
distribution mapping division Average Coloured dimension period angles calculation objects model extent simulation
frame mapping information formating construction physical bin rules frame dual Computer animation Right sort systems spaces fundamental
frame GPUS Actions response information directions graded Stream effects physical bin part specific Computer animation string objects systems fundamental spaces fundamental
satellite man views time fields second plane Computer animation visualisation different logic chain Sum ensemble pixel position spaces
man addition response time construction maximal effects time-series analysis proper second category period means uniformity Computer animation phase chain statements objects
Turbo time decision sets similar part Coloured attributes number particle words processes Computer animation pattern fitness
Slides Observation Serialization factor maximal time-series analysis part van number sign Average DAS ensemble model Gamma conditions man lemma bits scans tensor category processes Computer animation localization
satellite flow Actions server factor time unit sources sets time-series analysis analysis perspective Twitter number specific extent position man default formating Development category Computer animation topology orders life reading
man standards real student rules number Forest Types processes Types Computer animation evolution topology boom speech classes Arc taxonomy classes
point modes building services binaries fit maximal variance Forest structured data goodness estimates processes Computer animation case Hardware cross-validation classes iterations classes
Turing validation Gender time construction Tangentenb├╝ndels argument perspective Coloured subset structured data processes Computer animation repository topology matrix input objects Office input Windows classes
flow files load varieties volume sequence Computer animation environment CPUs orders ensemble input table
web pages Observation services varieties time sources time-series analysis dem Arm attributes domain input position classes domain man mapping survey construction attributes argument Transformers Types structured data processes Computer animation Query functions orders table
Meta functionality services time sets perspective metadata van structured data splitting utilizes series model Gamma partition man generate mapping validation forces projects Java storage sampling analytics gute indicators tensor Types Arrays means structured data Computer animation orders input normal des cycle ensemble sort objects progress diamond form Windows
point validation sampling survey attributes Databases total perspective number attributes period structured data Cut Correlation Computer animation rates internet Query objects errors classes exception
distribution projects sampling analog time-series analysis dem elements number Hause Computer animation ensemble website box Right
point histograms Sequel time Databases traction bin attributes period processes events Computer animation ensemble Games metrics Results
testing man histograms Sequel states real maximal traction Stream part van second information processes events Computer animation boundaries objects Gamma fitness
domain distribution Development machine Databases subset versions Computer animation causal software DAS versions repository
point man Actions eigenvalue load machine coordination high availability Databases clients particle data management Computer animation Hardware load-balanced
man formating time storage machine maximal sets Databases student part production 4th Computer animation
words Computer animation real vector recovery decision machine maximal expansions backup
batch man Integrationen Java volume open subsets van number Types Computer animation Hardware Right Gamma metrics partition
statistics constraints forces time traction machine median argument Correlation memory Hardware optimal spaces
man Computer animation information systems recovery life storage backup van number
purchasing man modes Development sources machine plan storage terms fan Higher-order logic production Computer animation real vector Hardware Hardware testing Gamma disk form partition addresses Chat systems
point Actions services NET time ones mathematical problem schemes part events dimension number Indexable means estimates Computer animation component analysis Right objects optimal systems
statistics pixel Actions Regular Expressions states sigma sources unit formating schemes DRAM Average distances number Densities objects Gamma partition formal oracle man Elektronischer Datenaustausch files analysis Databases Transactional nominations degree means Computer animation Right Rolling simulation
long Chris my willing to stop button and and what are you bothering you from proper proper policies of winning and the big architect and then deputy of data processing center engineers in the European Space mission called that I'm
going to tell the story from the perspective of the positive 6 user-initiated presentation was was done for the the 1st day of this conference for the was the 60 something and a list of some things we've got chief was 60 and analysis amazing from other sets of an elected to the value the few words about the mission itself what kind of approach we have to protect the problem we have with positive succeed background so I'm going to say a few words about the mission itself the science we do in Geneva which is slightly different than the rest of the the of the concerns in and then I had a few words about that while the hardware how I see it's 0 no external co-clustered positive in no scope and coverage so few
words about myself that would the I never do but in this context that for that could be in fruitful because I was for a long time was a lot of of of different databases so I've heard about was drinking studies when I was really thinking about the world through biological databases and it's a in a fact I came to CERN after 2 years of my professor here to pursue the biggest database of different database than ever and the 1st thing I've done was migration duration of a database oracle and present it to the biggest situational stuff for the Scientific view on a certain and at the time was over the years ago it was on 12 protein machines so it was 1st written down food through something similar to a friend iteroparous and the things that is 0 and then I joined the guy in group mean and you know the doesn't really started not going to sell the things what variability studies are in this context what type what is damaged so
that right now quite important mission for long for the whole Astronomical Society on because no such a mission is going to be pursued from phonics 24 years maybe even more this is the consulting and the main goal is to build sense of the galaxy means to see as far as we can do is to study the the stars all the object that we have in our galaxy and all the other mean the magnitude of 6 and 20 20 means spreading money to me this is this is the light emitted by the observed by other which is 400 Morris 400 thousand live less for have a hundred thousands and thousands and fainted what we see and normally ends it we want to of course was also for for the entire down the streets quasars you want tools to see what kind of supernova could happen during the mission and the mission itself is 5 years old 5 years long and and this is in part just say that 24 magnitude means we can think of this 1 billion stars recently there some the debates that are needed the satellite was launched in December that the bonded to 21 magnitude that mean without 2 billion stars which is is the bad news for fall from their perspective because the prevention of space and the processing time so it's not going to happen there are some problems but actually this is still a very very very and so that has a lot of features that are very very interesting just to make you understand what kind of precision a about if you put a guy on the on the if place and honest 1 Diamond and the moment that I will be able to see you or T could see their their the hair from thousand kilometres the position vector so there's a lot of problems related to this entity called engineering part and and discuss some by-products like makes but then doesn't exoplanets being and discovered and what is very common for all of us and he buys that estimated number of variables from the all stars we going to see observers that represent means hundreds to 200 million stars are going to be very and then we if we find out that some stars variable
we going to what that significant time to look at it and that instrument so another special thing about that is we have told nearest memory when you think about satellites or any kind of telescope you have 1 year I have once city but in this case we have 2 nearest the throughout an intermittent nearest point to that 1 big city this is 1 big epochs O'Connor this is the biggest ever made CCD camera and the notes in the space and because of this fact to nearest have to keep that make the angle at constant in degree so we know what we're looking at because otherwise we would be you couldn't actually met properly and understand what you're looking looking at and because of that that the torus is built such a such that its super stable in in a form of this that in the space is so there some
animation I hope starts here demonstrated that long before so easy to nearest to the we normally go out spinning all the time I'm going I'm going to show you more about this later spinning and spinning over 5 years and has been several times the whole sky 6 months the whole instead the light goes through full 1 your 2nd year and ends up in the on this is important the defendants beings is very important because there's a lot of sustainable the 1st been around the the Sun
and then in the last sentence moving is being around the Sun and it's being around the year of refraction special lobbied I'm going to show an obvious is source of different than the usual test like you heard about and this is then
are normally when we all in any serious scientific satellite is built that 2 satellites the 1st to do don't this stuff you learn in the states and put the the inside that from structure but in this case the creation of fish of the nearest so complicated so time-consuming that only 1 supplied with new solely something would go wrong and during lunch with a problem of England well for us for months supply is there it's in the commissioning phase but best is some problems for example in St. is that in vision that some type of ice could be some residue of ice cubes you gather in in that kind of the satellite which affects the light that we observed so there is the number of features the ensemble but apparently it is not that big problems such as operator the other
thing will then settle anything got satellite is a geostationary satellite which is pretty close to the to the 2 years in this case we are in addressing broad the points to this is the point where if the set of points I 1 the points where there already sources of air from the sun equalize so so this is the point where you don't win it will once you there you can walk around and so on
and it when the spinning you can observe that the whole spectrum of being an eclipsed by the idea for ever so you there's no danger of the battery is running out because you know right Jews because the keeps on you and on your so upon it's this is a very important thing that's that's that's that's been spent but also has positioned so this X is also extending the angle and discourses on a very
peculiar way we look at the sky so we imagine that we
spinning we tending to the precision does exist standing but every my this being 1 of the like see it more 6 hours and you see we start with 1 point of view than the 2nd point of view and it rotates rotational rotates slowly spanning the stripes of the sky every 6 times the whole space well what kind of
society and there's a lot of science we out most be we Indianapolis traded on the on the variability base from from a tree and then with this but 1st of
all it's all about creating and because a lot of stars Tennessee dimensions so we know as our galaxy looks like this is that the division of the of the artist from from from the side and then from the public is this is this is assignment is us and this is the color shows you this is a heat map shows how showing you how much stars how much of how many of the frequency for the galaxy that is interesting artifacts here so this is actually simulation from the from the mother must from the supercomputer from bosonic it turns out that we have the extension here that means you can see further and deeper into the center of the galaxy moon another
thing what how we can do is how we can see so much deeper how could we construct the 3 D model of the galaxy is all about parallax so just imagine saying him can't you moving but that is moving and it's a different angle this it is this is the thing that's happening here some people are moving faster than the others you can be much faster than the guys at the end of the of the the of the room and this is how by calculation by strangulation we be able to to understand what's going on if it comes to this distribution of the objects in the galaxy so this is 1 thing the other
thing is something everybody recognises this sort of that right if you remember from his here means for some ideas the idea is that why you need to to to why we want to understand where the map of the galaxy because the school gives you a lot of
information about what we are what what the galaxy formation and rules are there and what the future of the space and so on so you can do a lot of science this what are some
of the things that the physics not construction was shown for reference frame which this of
show this 1 that 1 of the things that you
could see at the end of the animation that some stuff could be is moving some different directions so we don't know actually exactly how the how the matter in our galaxy is moving with their actions are affected by tape taking both 3 met and the speed of the gradient of the of the of the movement of of specific objects or parts of the galaxy you can understand how the how the whole the things working it would have us also we from this thing of fundamental physics and so there is dark matter question you also should help us to understand how how the dark matter effects and what it is and in the galaxy and in the space all their works so 1st of all we see the light that we see is that the flight we have to remember that what we see that this is that would happen because of all this motion that we know often involve the so 1st the observed anything like that need to it and then send a string which is compressed year for the use of the information so the city's responsible for the fall the digitizing this this information and and that is a little freebie powerful of video processing unit on board the dust and there's a lot of stuff
is happening on the uh and Austin value at the end of the
food chain this is example of this is actually the the city and the city's and created these is composed of multiple other small cities have star mapper school each for different bureaux no 1 year to stop there for that had the the real thing that observes so and then we have the spectrometer so we you you you have different so that if have for the future so therefore cities with different visual bands because this is visible light this is below this is right and they have regular velocities so that also give you this is this this thing that gives you that and that that the idea about how things are moving away opposed to us and imagine that we expect some to here so so those coming entering the field of view some of exits and basic and then there's the whole logic individual overseeing the target the star once it was detected by the sum of take smallest 40 seconds 4 seconds per 1 column and the whole thing takes 40 seconds and so is doing boring thing all the time the same thing that but the the exciting thing about it is that he has to be so precise so for example you are also there was a problem recently that satellite is fainted than expected and times fainter and you tool to calibrate the data that goes from this comes from the satellite we need many precision have to know of in this space even if it's 1 year 1 . 5 trillion candidates from us was that position so there was a breach for the for the telescope the quantitative affect so for 4
seconds and others scientists we this is what we see is also after some chain of processing the mean of the operation and and actually this is not what we get this is what you would be start processing but you to get it that the addition of that which is called construction of the times because we are the only ones in in the course of concerns and that are interested in the things that I find students from primary and so this is a what non-strongly phrase in the city they probably know what kind of objects that the human eye so good that it strengthened the human race so good that by looking this this thing so prosaic is that but you would have to spend a lot of times what's what is happening is that unfolded and
folded over the period didn't find so far from our perspective that major where scientific work is defined in that the object is constant with variable and it is very to its periodic not and then it is periodic enough then you can do that you know your assigned classification but they're finding period so you can secure 4 days here is that the size of the effects of a period of 1 . 5 and so on and so so this is that the that that face of means that's 1 means for days right here in the seventies he would be 5 days to be a big 1 but we see this as repetitive thing is a special type of of this response the fates never and I think the US inside like an inside like proper property move and here's finding out the spirit the statements in takes smallest 3 seconds for object so you can properties you have for free for time series per per object we have to go through certain use quite some time series data
processing the ultimate
goal is compared with the previous missions that is based on it was called the part and title was that the idea of an awful lot of particles and every customer that's words we've photometry data has the full power of oneself number of books is firmly on 22 and consider the whole of this set you have this like a essentially before and some attributes that are typically derived from the of light curves parallax monotonicity from colors and so on what we have
forgotten what the benefit that used to have similar patterns around spirits deflate announced 2022 so and the end of the mission and to achieve this decision and 93 and the public was post 97 so that finishes around 2020 22 I hope we can get it from time to is to to once their final they face and process and categorized you want to publish the battle of budget to take must be yourself right so heart and
remember I showed you how did they have spins and moved to this is that the data 1st update local Big Data and on what's what's happening here that the number of observations so how wrong intensities depending on what pointing on this serial because that certain parts from the beginning from the 0 or more more dense in with the data defenses longer than others so we have you non-uniform look on the roundabout on this on the sky in that it means that the sizable defenses not comparable with what we've seen before it comes to the processing don't have equal equally sampled sensors we have Trudy model we have to do a lot of stuff from all the property tensors and there's a lot of factors that affect how you how you see a whole lot of time series so there's a lot of research done them for the NEC after 5 years so this is 1 of conditions in there and we have enough fuel to actually is used to correct that you're bit next slide corrections some another interesting signs that have my professors 1 fosters faster and power in 0 4 is given is comparable to the what the 1 of of the just make so purpose precise so
finding that we see that on average you have not because you give up some of which we have 80 observations purposes but it varies from 40 to 30 to 250 and global
data flow so you get to tree so this is the category that's got data from satellite and there is a lot of science default and we're until we got there that data is distributed in custom formats which is pretty down this is the and diversity of life of the of fact and found that compressed and this is really hard to reach from the from the data perspective is horrible but most of the time and there is no really any ordering so this that we are special because we want tentative but for most of the guys they don't care about the time order and they see the trends it's and source and able to directly beside from that it can be great scholars but we have to and the cost of a time series that you know to operate on from that for this perspective this is interesting because we started from modern 0 4 years ago but is a and straight out of humans and positive from other extensions of things are going well this but also other units could be interested especially in positive 60 because they have a slightly positive factors in problems especially seen 1 of the interesting of this because this is this is the kind of this is the south should and is going to stay there for years and years so if this thing around and positive settings before positives as well because going to be there for many years to come as in
that group is distributed all over Europe I work here but we have a lot of at at the work that had that many active people that the number of units and read that people who dealing with data of all fitting someplace although 5 the scientists when we have to distribute the data to many of the people who are outside of the US only this is not necessary that data we use a lot of data that is existing than there's number of servings that we base our research and development on all people who did observation of the proposition before but there's then writers for to few of them are very very valuable because because the specification they do is really good and this is what's what is
classification so this is likely saying that taxonomy tree students this is our our own invention that we basically looking for speech is the depending on the idea to use that way to behave you want them to be classified so we starting from from this and we have a number of rules and doing all this because in a minute to discover another nite of steps to find out what the things are In the and some of them most of them will be covered by gasoline the uh we call that even with this same something I showed you of the sky we can still find most of these types for that for the process who for the from the classification of work and so we have this number of of all the things that we get from this when
we when we have this when you have this classes that are defined but in the
end would be doing is taking taking the 2 classes so I mentioned that existing services tells us that we should mention that for the for the good astronomer classification to knowledge you recently was saying this fall that light curves and say all this money that they will look and this thinking good estimate the Buddhists could probably classify free hundreds of deferred we have let's 200 million if we do the 1st future variances and going out so what we're doing for example building classifiers based on the actual class which is the thing we got from astronomers and then many of our classifiers we predicted and do this is that the the moon cross-validation effort so you see this is what the classifieds classify the classes of fitting binaries and on the fate and this is true classes of course not always classes ritual so you have to be careful about the idea that is that is that that you should have all the stuff that this would be ideal situation from the center point but it's not the case and uh it makes a lot of iterations understand why OK so all the Processing 1
we have actually do have the central repository that we get the data from and this repository was the place is given laughing gender data that have to know answers a construction which is unique for us and this is quite from the perspective of an end the validation we could use another consolidation before like the confusion matrix of like that on this on the subset when you run for office in a subset of the data and then a proper processing this is great because it could be repeated multiple times so during the 1st 6 months into this many many times because we find out that all parameters are all you want to include the other evidence of this kind of typical things that you don't window when you're dealing unknown all the tangent bundle and also I the mention that we also want to create to discover new classes of course of objects in that this taxonomy tree if we find and
I know this is not free will just about the color so this is 1 that's the 1 side 1 month assessing the green thing is this is the input data are but the right thing means this is excited by the things that we been some human intervention validation needed since even for 1 during the 1 month we have to do a lot of manual work money means that we taking statistically only taking the matrix if the matrix of electric you want and again this is the work was this is important for us because we do and using others so why we
are be been data Thailand that that's only in the environment that microphone breaker which defines the data as volume because city and the variety but you don't have to cover all of these people and it's not when you do when you have a 1 when enough that if you have a tool that means you
don't need for why because then we have this input files which could didn't have was 3 thousand of them that just photometry have also oxalate exterior data about photometry have more 3 thousand finds that the order of 10 megabytes called floating about the future given the a each I I said this is if you want a distributed and have a lot of processing Indian solution below the 2nd Intermediate petitions from a distributed if we use positive 60 table and the red runner in recent sequence this this is
just an example of what simplified but not that much of how we reconstruct the time series but this is important this is really an yes this is the very long table which has been to 11 times for the transformed couples died from this 1st of 2 for the prostate and some some of its modern culture like so of course we have to partition that because otherwise we partition this and so we want to distribute its full paralyze it involving houses that positive IXI lots of work and after such a query we have more understanding and we had big table which has put any less food than 11 observations we ended up with 9 sources in order answers to this construction of to free days due to the variety why
we we are not that should be on what's ASAP type of place but we have 70 domain classes and that process that not all of them at any of them are on that stool Taylor's re-using open data for the mapping and so it's possible to being some tricks before the 2 Morris 2030 but few of them are of some of them are some some of them are reviewed this to could be replicated so 1 of them the but many of them have to be purchased from you also have to have the variety we have data from other services so we using an entity attribute kind of data model as well for the page 4 and so on and so on and so as I
mentioned we have this input but we also have this repeating processing during that this is a one month window or six-month window meaning when we generate the data because we generate out of all the time so from this bands of survives on the input we can generate be hundreds of terabytes during 1 cycle and the eastern side of the world so knowing that we do we keep the data from the cycle minus 1 from the previous cycle during the current cycle and we delete them the rest that also probably indication that we use you should is kind of partition into 2 from and this
is just a set of data model from from the dead of perspective the whole project by definition it was decided that a splitting uh but these models the bush true to for using for for the lead to having other so right now using also are are for that ad-hoc analytics to present preference for that because you can embed are in progress can also this is of course for all foreign forces in are we create some some special utility functions but by this year that we have that that catalog and have multiple catalogs in itself a lot of this is your that to work on as a validation for example you don't take go and 1 billion fair thing but you start with specially sample objects and then it's services and the kitchen which they should see series and then from tensors offences use our sensors without this stresses out in you have a bunch of other things so we using EPA has been mentions and we have some custom mapping sort of normal or and hibernate type of by a kind of mappings that have phonetic mappings and its use a store metadata we have strange things through in order to get rid of 4 . 0 that out and probably needed in various various various of PG types snapped seen to an
attribute so in in the end we ended and up here we have a bunch of attributes and so it's not in the end intermittent internet and so forth is the number of attributes that we want to use to classify so be effort is being spent on how to find the attributes and basically any number of attributes so they give you error rate of course validation that means you take with existing catalogs from existing surveys you take with this class is that its customers already provided for us for the same object and you want this this number to be as low as possible so you see that doesn't really matter starting from some point of 10 attributes if you another tribute your your error rate is not 3 decreasing is his social here positive less than doesn't like that but the point is here that if we know what attributes are meaningful and from all this processing of around 500 laughing about it from the new data perspective because this is the stuff that we got on this that the 15 hadrons terminated because this is that the way make samples of the data and
this is how you over there is 1 of the ways of seeing what attribute is meaningful for what classes see the classes here from the taxonomy tree and see the attributes name and the it is the more important it is to find specific classes of except for the 1st release from q variables amplitude is very important but you can see my father is not so this 1 here frequency error for for that the school it's quite complicated tool to make data model then data model tool to map this kind of uh dependencies correlations when you when you know you have to do some cuts you have some queries that gather specific database from this and what we do we
do all the things to classical another advantage and what not but most of the things signal must be on the on the samples but ideally would like to run everything so this is 1 of the projection of the site of projection on some simulated data you see already in not stories box right trickled and the things that I would make was improperly simulated that somebody he holds no data but otherwise succeeded distribution is more
or less what we expect from their spending on detection from the for this is that the number of elements number of of of new values from the time series that are distributed over the fact
being bunch of other
metrics so all this is done for all of and displayed in our and this is another 1 so could occur mostly for the color and and and that the histograms we showing always looking for some artifact that our scientific and many of the
game not only sequel
also do complex Even Processing so we have now hundreds of processing costs during this period search and driving the attributes that data from database we started results about the same time essentially we publish that the results of the engine and does that the aggregation on the fly so the point of this is that you don't have to do is you know the metrics that that the aggregated that you want to do it in predefined them and basically you have better results as you go so you have the special engine using a spare
that uses sequel like language and here we can see that we create a histogram object with some boundaries we putting some value in this and and the face processing and thing about it only 10 seconds it creates a snapshot of of the states of the the end of this value as
as being fetched from the streams and then starting in positive results of 1 D to the whatever race you can also have snapshots of everything goes that but depositors Wikinews are tool to display on all
some of the other 2 and this is the best part and thus the real-time monitoring so this up this is nice that you would have to run sequentially 2 hours or even days so if you don't have proper hardworking understood for our work on OK in other
things that we like causes in is because we are able to distribute the data uh within positive create appliances and where the data exactly the same schema us on the on the distributed domain database and we is see them as a as a as a from machines
so this is really good because we have a lot of development is impossible to keep the users up to date with the the well with the new data that comes simulated data I we have some somebody that does it automatically so when you create new version of software you can trade the snapshots and subset of the of the data creates with this and publish it in can now x 0
solar who knows about exceed OK so this is a special kind of example how we would like to the key exceeds the cluster database distributed database when you have data notes and coordinators in principle you need what only 1 coordinator particle plus the point is this this to a database but I see everybody and ends up using credit above they cannot missing piece of hardware that you have a lot of action manager is responsible for getting snapshots and for you so that this works and and you have clients that somebody very nice excitatory projects for for the load-balancing for the for the high availability and this
Jim Jim from from storms from the from just let us what it is is that this so you can you can automatically be before watch 20 connect and going to connect to the release of machine this is
so this is the set of we aiming to have before the 60 so in 1 of these 1 physical note we have both primary and secondary that is kind of skewed like greater starts when he when you have the primary that accept part of this 1 does that but this 1 and this student patient this 1 this 1 we have to encloses the acknowledges free but we envision useful for production for notes for for the production world and there's some and waiting that the done for integrating pacemaker
and this is our nodes is just you know those rated does so that we have to have for 4th format for databases running positive the other databases running 1 physical machine the primary data primary data about storage primary coordinated and because the final bigger than that of the the note and the couple at any time and so on and this is 1 scenario
and 1 of the most size and this 1 basically is responsible for finding out who is responsible for placing the guy a when when the machine is not so Big so when it is responsible for fish for telling does that it stops substituting there's no problem because this guy is that this guy is becoming a primary this promotion for for this 1 so it's working as a replacement of this 1 was the is this decisions and back again you have to have some somebody from the from kind of coordinated from the pacemaker to to bring back and uh history stuff so you have the recovered the primary and use shifted all the other data that's picked up here to that the vector of the India ending up with the same thing as before
but also there is word problem but also possibility of adding new notes sold when you have uh when you bought a new know we should be able to edit pretty is not so crazy right now but we hoping to go what was the 6 York so that they do this properly so they do some kind of idea the background for the balancing of the data of the spreading of the data when the notes and and
there's many reasons why 1 which is positive and was succeeded setting understand now why
and and number of priority that are important for us to be the 1st place performance partitioning partitioning is the big deal for us because of the another 1 because the volume but also the way we work so we want to delete stuff is not and went on to it is much better to the the partition then it is from the same right so this is 1 of the things that we use is is our work and this is more tricky when you have distributed set on a single In many many things were covered origin the also so we usually keen on his income stock in the very like to use the nearest nearest neighbor searches on metrics and to be going on of and hardware of this is that there was
a talk about the and the hardware optimization all looking around the on the and stress so we did I find it pretty pretty hard to believe but I couldn't find here that is from 4 years ago I think the picture there's no we set all as parameters that time even if somebody wants to optimize for example for this is the and surrender my own so what we ended up with ended up doing a strip the does the brute force search for the whole parameter space and this is the correlation traverses the that show you will force parameters of that because of for example the Sevilla give you will give you more or less highest median with their spread right so for example in this guy here is good there's many way of doing this but this is 1 of them and in the end is proving to be healthy food that is for the sequential random from extraneous FAO and benchmarking tool for the acid this is our so this is our
foreign is a lot of cool article features if you have this paper written statistics it's really implemented in R and this is a is also a ideas and all this machine learning other implemented the promise argues is that it's memory constraints so for many for many things is OK for many things not both quite complicated because
you've seen this for different interfere with this wonders for
different and I would say that
I I don't remember
right now but I think this
1 I can take to reduce his sedation OntoStudio life here right because it means something potentially dangerous and so we want us as short as possible and medium as high as possible on the so we have a number a
number of course you have the number of
users so this is a procurement plan for buying hardware we have here we actually have 400 to
40 terabytes started but from that of the enclosure responsible the 5 machines that 64 with the others that will be used for production of these for test and development and end so this is the whole question how partition the hard have both primary and backup on same and and yet this is I
think very important we up you want to be a power user of course addresses especially positive 6 you excel Supreme Court rather yet so we merged into 1 thing we believe disaster I think if the tool and to not the vectors into something more coherent and yeah and purchasers the source so we'll be deploying them that our system next week so the hardware is that we still have to fill the 15th for the for that's always as configuration and if the negotiating societies because we do something else and everybody else in the center but this is the topic questions I just want to
give the all used to guide also in the group we by doing so this is what the world it's the all things Brian it's it's in there no they don't know how you will not hold you all of you we also use it so yeah sorry i precedents estimation as we really have grateful therefore if we work and for all for all for the work of everyday so all all so admitted especially actually put exclamation points above is super interesting for astronomers because of the special indexing support services clustering this this special but also the new nearest-neighbor searches and basically by the probability so you can try creating metrics and create your own and indexing scheme this is super important for us to go and so I we are now waiting and waiting questions this is the only thing that the only thing that's for some much using you would do more use parallax for for the other that you have that you sure OK so if you see that even tho we also looking for the gravitational singing events you treated differently this is already some variable that has special meaning so we don't use it for products from because it would be too complicated to derive the proper parts for you and you will hear the phrase that yes I think there is also the center was research done about indexing how many in the system make sense of how complex index makes sense before it degrades to sequential scan so this is a general mathematical problem about so how dimensionality affects your the optimization right you Europe of reviews so there is no really against but what I know 11 is then the dependence of 42 so if you go over 11 years ago already 11 then and then you probably have a hard time finding someone interesting stuff you have to do other tricks like principal component analysis so then all the multiple dimensions and 1 dimension there's some some stuff from that so we do this as well but said yeah there's there's a number of ways how to how to find this number this particular example is based on random forests and other and deciding is critical because it gives you the most significant attributes and give the weights so you can see in ones it is just very slow but so to train well I don't remember but in ontology depends on the luminosity of the of the objects right to yeah but side
of the remember I don't want right but to for example if we see some can see some number of
supernovas you probably will see so this is the analysis so see
going almost to the to the end of the galaxy right and this is this is some this is this is parsecs so
not only goes to seek Waziristan other galaxies but this is so much less in the in the nominal we can measure for the distance to the galaxies but is not as precise Roll is the the further you go the precision is lower and I think therefore the faintest once we have smallest 20 per cent precision and this is being north the you know somebody there was this source ID obvious that explain so society for us this is derived from the partitioning of this kind is is that there are several partitioning schemes for the sky using peaks which which is basically gives you kind of food in the rhomboidal rhomboidal like states and and it's here article so in this number we actually have the number of that pixel or subpixel we have 3 levels of sub-pixel so it goes to than 1 degree of precision and the but of course the depending is not enough because depending on the on the meaning of the of the density of the source in the sky you might change this summer like what we going to do we going to get the 1st there's another obscure data we have a its quality in the Galactic is that some galactic let's say statistics going to take this the kind of 1st data we have and partitioned bulletproof partition from that from this we derive our petition like and button but it's still because they were at the fair to prove oracle so this is like pseudo object-oriented database that had doesn't really have transactions and so on and this is I think that there are many reasons but 1 of them is that they filled Oracle and they didn't they were afraid to invest in something there's a number of units that use hard because well so appendages had upon not not that transient in group that you mention the use of OK thank you