Merken

New Trends In Storing Large Data Silos With Python

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
so the meaning of old smears the like which might be useful if you don't call tool the story and analyze large data silos having an account holder more than computer architectures are evolving so far the water will mean I'm a process's training in computer science is my question and I do believe in open source and the proof is that I spent the on 1 1 part of my life doing open source development only the project invested the most is supply tables where I spent almost 10 years with within although my current that projects are balls and because I am going to talk about quite extensive extensively about the last 1 limit so I open source well my opinion that is being glottal between dreams and reality the and many times so we the parameters thing that something some improved right the thing that to to try to find time in order to implement that kind of the opinion that the why should opinion with mineral for example that the artist and execution of an idea and not engage itself because there is
not much left 1 right here so the open source called me will implement my own ideas and it's a nice way to cool for yourself what you so I'm going to
talk about 1st of all to interview the need for speed because they're always relies as much as the best possible using the existing resources that you have then I will talk about new trends in computer hardware because I think that seemed evolution of cooperative from what it is it's it's very important in order to the sign your data structures and you think combined and I will finish showing you the calls which is an example just an example of the data formalized larger datasets that follows the principles of these newer computed the forthcoming where a computer architecture OK so why do we need a speed of course well but let me remind you of the main for of Python of course I think what was 1 of the most important thing so about Python is that there is some 384 system of data oriented libraries and most of you will know about by and their secular a lot of different libraries and also Python-specific reputation of being a small but probably most of you also know that it is very easy all it is this is lovely is to identify the below acts of your programs and then you use made of C extensions in order to treat super forms using examples like sigh Suite 4 2 5 but for me it is particularly what the most important thing about life and is OK the ability to be able to interact with your data and your field parents so what so what the result would you feel there's the result of your queries almost in real time this is the key the key thing
about about the only but of course if
you want to handle the commands of data and you want to do that and that they believe that you need to speak right because if not that this is not a novel the but the signing call for the storage performance depends very much on the computer architecture and that that would be the main point of my talk that image in my opinion expressed in Python libraries need to invest more effort getting the most out of the system temperature picture computer also note that maybe
yeah about the meaning of my talk I mean I am not going to talk about whole pool restored analyze data that because you know because plasters performance of the clusters in my opinion this is not exactly the nature of life and I mean that that that that reality work or loss of life on it's been able to only maybe about especially mostly in laptops OK a lot of people is you see in Python and there's their own laptops and my goal is to try to help them in order to rule out more we furthermore they use we using
laptops for weeks but trying to
optimize for laptops service doesn't mean that this is going to be a difficult task because the slot of smaller block other service very very complex beast and we have to leverage we have to understand all the architecture is used to design AltLex's memory of all the different precious water other things so this of
absolutely quadrants of architecture at and see if the
2nd lecture short the the level it's not to the sign you the structure of the new trends in
computer architecture are mainly driving by the view of nanotechnology and I think it's very interesting to see here hold Richard Feynman predicted the the nanotechnology disclosure as soon as are like almost 50 years ago so I think it's it's nice for you to detect this this talk so I think that the most important thing
with memory architecture knowledge is there difference in the speed the pollution a speed between memory access time this was if you like about we know that the CPU is is what are you getting faster and faster and in fact the the speed of the would also have almost in this is born 1 inch away but expressive but close so the more than but in
contrast to the memories fade is increasing with slowly very little and this is creating a big gap becomes much between CPU speed and memory speed right and this and this is an
and very important a key on the evolution of the architecture so we we see you we see that there were some of the architecture we can see that they're in their eighties for example the architecture of the men would that of all the machines of the order of of the computers was very simple they just for a couple of memory layers than the main
memory and the mechanical this then in the nineties so thousands of vendors utilize this problem with wind and that the mismatch between the memory and and the city being and they start that include use the 2 additional levels in the in the interviews of that and no other cases is the case is very useful to catch up to to 6 layers of memory OK this is the change in the paradigm and is not the same thing to program for up to a machine in the in the the 2 thousand and then the number of machines in days so know that
understand how we can adapt their company Wikitecture so it's important to no the difference between reference time on transmission insulin is going so when this if you ask for a for for local data in memory the time that it takes a From the this if you request and the LiveMemories something to transmit the data it is called the reference that what others only latency as well and then the time that it takes once the request has been received and the position of school starts and on and on until end this call it mission by the thing is if you have a big mismatch between the reference time and as we sometimes you kind of got not doing an optimized access to data so they different and idea is that the reference time of it as we sometimes should be more or less in the same order but
there are not all the storage layers of created equal that means for example that the memory in which has reference time typically off 100 nanoseconds we can
transfer up to 1 kilo bytes in this in this amount of time but also the state is with where the reference time is 10 microseconds we can transfer up to 4 the robots OK using the same the same by and for mechanical deists the server log the because of of time is around 10 milliseconds and the transferred and that the transfer of the different book of all the emission a loss to transfer you when up to 1 megabyte so the thing is that there's the media the larger the block that should be transmitted in order to optimize the and again this has profound implications products as storage as we will see for or that we finish
this part with friends and store it the key of think is that as we have seen the gap between between memory and the amount of storage is is is is
larger and it is it's right now and that means that in order to fulfill all cool field is vendors about creating just as as the devices that are known to have the same the same interfaces that and to become obvious when the sort of starting to create multiple solid state memory in the in the boss like this this year BCI and also a it's newer protocols are new and to specifications had been instructed to create tools will be introduced in order to put all the solid state memory in not so you're all not so will be able to access with this the memory at the CIA speeds in which is very different to access as they used to be at the center of of the original the last and also the
princess of users that that we are going to see more cost of course and why vector for going to most starting multiple data and we're going about what we have seen already in degradation of the GP using this diffusion this
that and these are the trend that most of the research that in mind in order to be fine tool to useful where new data entire so what I'm going to do
school show you an example all for implementation and that the different that leverages the user this the this new
computer architecture so because it said it provides the coupling dynamics at a library that provides that the companies that can be used in a similar way than the pipeline as or all and in because they decided to stand up and he was content can be compressed the and there 2 flavors 20 service which is meant to host use types and in dimensional data what is a measure of the tendency table for it a genius types in England and Wales the but as I am
going to escape from the slightest lights because in I am going little to bit travel time Monday warranted important thing that they want to transmit will be there the consequences of using these so this convention so I'm not going to explain in the details of the difference between continuous and check that the only thing
that's important is that exciting it's important that it's it's a nice because it allows for efficient large amount of ranking compression is special and in analysis tent sites can be adapted to the sports later we remember that depending on the spot it's clear that you are going to use the time size of the different right so signed in the story it follows you to fine-tune the jump size for you only so if that's on the other
entities like pending is much much much faster you need a copy when you're doing and up and
cooperation on because container this
memory travels to superior and
also the table on dynamic implemented in the course of the circle forgotten that means that the data in columns of makes the memory especially when you want to fetch recall what the on the only information that you need to
transfer so this is that it at all the case of table that this role reliance prehistoric in are always function and if you're interested in this column in 32 for example you're going to profit much more data listen you just because of what architectural reasons this is called
computers work right now when memory called the on the column-wise stable if you are interested in just 1 goal is going to be that only that column and transfer it to the and is also less memory trials for this also I
cooperation 12 the 1st thing is that it's also known to store more data is that in memory or on the but of
another another goal is that you're that this compressed maybe maybe it would be better to have this they can compress the memory on this gesture the compressed data into the cache and the complexity and maybe it is the sound of this as we sometimes and the the canvas and then would be faster in some situations and understand dictates the transmitter the audiologist into
introduction and that the goal
most which is the compressor that is used the because because this is the last of the goal is to be faster than a man to be what it uses set as as a series of techniques that they know what not going to describe what basically Limenitis new architecture In this case for example we can lost the compressing 5 up to 5 times faster than them into the I'm not
going to this describe the rules they're all the talks about this and the main the
main place to produce lost is basically to accelerate the input output not only mechanical these but especially on the solid state is from main memory
most states that it's our library making seeing and and the it is quite widely used and especially this it has been used for example and open the debate and new which is sort of a library of properly using animation 3 D animation movies amend the mighty works thank you serious of
projects using because already so for example racial politics the query which is meant to do to produce a lot of according bias but but on this not in memory because because supports both and then this income this kind of memory also continues place you simply calls on doping which travesti they have been excited about using the culture we
want to skip that I mean this
plot where people only showing how because the among will always applied for they all use cases of course the and I'm going to
close the talk were saying that the world does this and that there is so that the companion the fits your needs old and and this is contents of you over the all of the from a life is always for you to check existing libraries to choose the 1 that fits your name and sometimes you can get you could be surprised and the world depending on the such that this structure that you use in your using a bang it much more performance not because of the ongoing but because of the data structure but available data contained also use of the occasion to have word and so what trends and make informed decisions about your current developments which by the way be deployed in the future so it's important that the dual in our conscious about the new computer
architectures because you're going to use them 20 applications is going to use them and finally in my opinion complete compression of I think I think many people at the scene that already compression is a useful feature not only to store more data but also to process the faster under the right the right condition OK so and we conclude my talk we found my own
version of a cold like which I was too far time when I was a teenager so but it is changed continuing changed and you inevitable change that is the dominant factor in computer science and they may been in a sensible decision can be made any longer without thinking in and book into account not only the computed as it is now but the computed civil look so thank you very much that's only questions but
yeah the size of the unknown management I will be talking about the continuation we get which are going and yet just maybe um because the rest graphs about this but just in general for example it was in comparison with normal and there are some like for example velocity there is the uh as I don't know to which reference that is just right because I haven't heard so much about it before um yes if you want you know what you this year's difference for a similar patterns that were tried by other persons just comparisons if you just might and ages of technologies you present feeling that moment was using a mapping right always applied for 1 storage and euphemism reducing study alternative but it's not just faster for compressing for example of delta since there's also works to be for of use many things but what question and because as I said before use is the boss behind the scenes lost is I said that this is the the 2nd about it was an oversimplification what is actually a matter conversion so Boston use different compression and a particular use not be the kind of things you can use it later 4 which is the kind of new trends in compression because very fast and progress is really well as well many of these also has support for about lost LZ so you have high rates of compressive that you can use in order to tailor all tools they're fine tuned for your applications in of them may be so the question of just into a talk on number of and they claim to want to speed up NumPy and stuff like that so does the close or creates a number yes I mean know because it's solid provided that the data later on with the data structure right and the public structure it provides very few emotionally just provide some we sound principle some function but that illegal so they DAC to use because for example and on top of that you can put a number of for example for in for doing computations but you can also put that for example which is a way to move do operations in parallel as well and you can what that because is providing a generator of interface so that all the all the layer from above leverage that you are not bound to use because infrastructure because the machinery for doing competition but it only provides the storage like so I didn't related questions and can and as with because of the Storage Engines further and Europe Denise vendors with because as the story changes we 1st the comparison with the that the of centers like 48 PA vendors and still have an ecosystem so engine what is presented yes yes exactly that's another application for example and for example I've seen ensemble preferences by just a remember his name the government they know of oneness that he's he's trying to see for example understand support different but like a sequel databases or is 5 and because can be another market for 4 planets itself yet so it can be but it isn't enough no and I mean there is no and this in my In my knowledge there is no market for Canada but he could he could be done and of course you know OK so as this is everything that we might see and since you know that this was the last some of these new now there will be a lane intensive quantum past the site and does everything in pieces the lines in the thankful any yeoman farmers and breeders thoughts you're attending to something that's all before leaving just linear we remind that I will be driving a pictorial ones that I will be be talking more about all this but the unbiased and doing comparisons between the calls by NASA costliest known I you know that if you are interested music along with us about and you
Videospiel
Parametersystem
Wellenpaket
Prozess <Physik>
Wasserdampftafel
Open Source
Quellcode
Arithmetisches Mittel
Rechter Winkel
Beweistheorie
Mereologie
Inverser Limes
Vorlesung/Konferenz
Projektive Ebene
Computerarchitektur
Ordnung <Mathematik>
Softwareentwickler
Informatik
Tabelle <Informatik>
Resultante
Subtraktion
Computer
Computeranimation
Bildschirmmaske
Vorzeichen <Mathematik>
Vererbungshierarchie
Programmbibliothek
Datenstruktur
Maßerweiterung
Optimierung
Ideal <Mathematik>
Hardware
Suite <Programmpaket>
Videospiel
Hardware
SIDIS
Open Source
Systemaufruf
Abfrage
Physikalisches System
Datenfeld
Echtzeitsystem
Twitter <Softwareplattform>
Rechter Winkel
Evolute
Reelle Zahl
Computerarchitektur
Ordnung <Mathematik>
Schlüsselverwaltung
Punkt
Datenverarbeitungssystem
Vorzeichen <Mathematik>
Code
Programmbibliothek
Speicher <Informatik>
Systemaufruf
Computer
Computer
Computerarchitektur
Physikalisches System
Speicher <Informatik>
Bildgebendes Verfahren
Computeranimation
Arithmetisches Mittel
Videospiel
Einfügungsdämpfung
Notebook-Computer
Natürliche Zahl
Computer
Ordnung <Mathematik>
Cluster <Rechnernetz>
Computeranimation
Subtraktion
Server
Wasserdampftafel
Minimierung
Computer
Kartesische Koordinaten
p-Block
Knoten <Statik>
ROM <Informatik>
Menge
TLS
Computeranimation
Task
Dienst <Informatik>
Code
Notebook-Computer
Festspeicher
Notebook-Computer
Computerarchitektur
Zentraleinheit
Hardware
Sichtenkonzept
Vorzeichen <Mathematik>
Vorlesung/Konferenz
Computer
Computerarchitektur
Datenstruktur
Computeranimation
Hardware
Übergang
Router
Subtraktion
Rechter Winkel
Festspeicher
Kontrast <Statistik>
Computerarchitektur
Zentraleinheit
ROM <Informatik>
Zentraleinheit
Computeranimation
Zeitabhängigkeit
Mathematisierung
Zahlenbereich
Computer
Übergang
Computerunterstütztes Verfahren
Computeranimation
Übergang
Virtuelle Maschine
Festspeicher
Evolute
Elektronischer Fingerabdruck
Programmierparadigma
Computerarchitektur
Optimierung
Schlüsselverwaltung
Bitrate
Subtraktion
Dualitätstheorie
Ortsoperator
Sender
Stellenring
Datentransfer
Systemaufruf
Speicher <Informatik>
Aggregatzustand
ROM <Informatik>
Computeranimation
Hypermedia
Reduktionsverfahren
Festspeicher
Stereometrie
p-Block
Speicher <Informatik>
Mini-Disc
Einfügungsdämpfung
Sender
Speicher <Informatik>
Wärmeübergang
Aggregatzustand
p-Block
Biprodukt
ROM <Informatik>
Speicherbereichsnetzwerk
Computeranimation
Roboter
Metropolitan area network
Hypermedia
Festspeicher
Hypermedia
Mereologie
Server
Stereometrie
Mini-Disc
p-Block
Ordnung <Mathematik>
Speicher <Informatik>
Schlüsselverwaltung
Aggregatzustand
Umwandlungsenthalpie
Stereometrie
Protokoll <Datenverarbeitungssystem>
Desintegration <Mathematik>
Speicher <Informatik>
Quick-Sort
Computeranimation
Metropolitan area network
Multiplikation
Datenfeld
Benutzerschnittstellenverwaltungssystem
Festspeicher
Mini-Disc
Ordnung <Mathematik>
Zentraleinheit
Schnittstelle
Aggregatzustand
Metropolitan area network
Twitter <Softwareplattform>
Desintegration <Mathematik>
Vorlesung/Konferenz
Vektorraum
Ordnung <Mathematik>
Ganze Funktion
Computeranimation
Diskretes System
Speicher <Informatik>
Computeranimation
Portscanner
Dienst <Informatik>
Typentheorie
Datentyp
Programmbibliothek
Computerarchitektur
Inhalt <Mathematik>
Einflussgröße
Tabelle <Informatik>
Hardware
Managementinformationssystem
Bit
Web Site
Speicher <Informatik>
Ranking
ROM <Informatik>
Ausgleichsrechnung
Computeranimation
Portscanner
Wechselsprung
Rechter Winkel
Quellencodierung
Analysis
Caching
Kreisfläche
Festspeicher
Computer
Information
p-Block
ROM <Informatik>
Zentraleinheit
Computeranimation
Caching
Zeichenkette
Tabelle <Informatik>
Festspeicher
Singularität <Mathematik>
Wärmeübergang
Computerunterstütztes Verfahren
ROM <Informatik>
Zentraleinheit
Computeranimation
Tabelle <Informatik>
Caching
Sender
Singularität <Mathematik>
Speicher <Informatik>
Transmissionskoeffizient
ROM <Informatik>
Komplex <Algebra>
Computeranimation
Portscanner
Metropolitan area network
Reduktionsverfahren
Festspeicher
Caching
Bus <Informatik>
Wärmeübergang
Zentraleinheit
Caching
Sender
Singularität <Mathematik>
Reihe
ROM <Informatik>
Computeranimation
Metropolitan area network
Reduktionsverfahren
Wärmeübergang
Bus <Informatik>
Computerarchitektur
Mini-Disc
Zentraleinheit
Metropolitan area network
Caching
Unterring
Hauptspeicher
Übergang
Schlussregel
Aggregatzustand
Ein-Ausgabe
Computeranimation
Festspeicher
Stereometrie
p-Block
Mini-Disc
Zentraleinheit
Aggregatzustand
Funktion <Mathematik>
Sinusfunktion
Festspeicher
Programmbibliothek
Abfrage
Projektive Ebene
Quick-Sort
Computeranimation
Aggregatzustand
Videospiel
Benchmark
Plot <Graphische Darstellung>
Computer
Ausgleichsrechnung
ROM <Informatik>
Computeranimation
Entscheidungstheorie
Metropolitan area network
Software
Programmbibliothek
Wort <Informatik>
Inhalt <Mathematik>
Datenstruktur
Softwareentwickler
Mini-Disc
Hardware
Fitnessfunktion
Vervollständigung <Mathematik>
Mathematisierung
Versionsverwaltung
Mathematisierung
Computer
Kartesische Koordinaten
Ausgleichsrechnung
Teilbarkeit
Computeranimation
Entscheidungstheorie
Portscanner
Demoszene <Programmierung>
Metropolitan area network
Software
Rechter Winkel
Konditionszahl
Quellencodierung
Informatik
Hardware
Geschwindigkeit
Web Site
Subtraktion
Umsetzung <Informatik>
Momentenproblem
Mathematisierung
Zahlenbereich
Fortsetzung <Mathematik>
Kartesische Koordinaten
Ungerichteter Graph
Computerunterstütztes Verfahren
Eins
Demoszene <Programmierung>
Datenmanagement
Arithmetische Folge
Gruppe <Mathematik>
Mustersprache
Quantisierung <Physik>
Äußere Algebra eines Moduls
Vorlesung/Konferenz
Datenstruktur
Speicher <Informatik>
Analytische Fortsetzung
Quellencodierung
Gerade
Schnittstelle
Beobachtungsstudie
Nichtlinearer Operator
sinc-Funktion
Systemaufruf
Paarvergleich
Mapping <Computergraphik>
Twitter <Softwareplattform>
Rechter Winkel
Benutzerschnittstellenverwaltungssystem
Normalvektor
Ordnung <Mathematik>

Metadaten

Formale Metadaten

Titel New Trends In Storing Large Data Silos With Python
Serientitel EuroPython 2015
Teil 38
Anzahl der Teile 173
Autor Alted, Francesc
Lizenz CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben
DOI 10.5446/20114
Herausgeber EuroPython
Erscheinungsjahr 2015
Sprache Englisch
Produktionsort Bilbao, Euskadi, Spain

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Francesc Alted - New Trends In Storing Large Data Silos With Python My talk is meant to provide an overview of our current set of tools for storing data and how we arrived to these. Then, in the light of the current bottlenecks, and how hardware and software are evolving, provide a brief overview of the emerging technologies that will be important for handling Big Data within Python. Although I expect my talk to be a bit prospective, I won't certainly be trying to predict the future, but rather showing a glimpse on what I expect we would be doing in the next couple of years for properly leveraging modern architectures (bar unexpected revolutions ;). As an example of library adapting to recent trends in hardware, I will be showing bcolz, which implements a couple of data containers (and specially a chunked, columnar 'ctable') meant for storing large datasets efficiently.
Schlagwörter EuroPython Conference
EP 2015
EuroPython 2015

Zugehöriges Material

Ähnliche Filme

Loading...