Merken
Apache MADlib
Automatisierte Medienanalyse
Diese automatischen Videoanalysen setzt das TIBAVPortal ein:
Szenenerkennung — Shot Boundary Detection segmentiert das Video anhand von Bildmerkmalen. Ein daraus erzeugtes visuelles Inhaltsverzeichnis gibt einen schnellen Überblick über den Inhalt des Videos und bietet einen zielgenauen Zugriff.
Texterkennung – Intelligent Character Recognition erfasst, indexiert und macht geschriebene Sprache (zum Beispiel Text auf Folien) durchsuchbar.
Spracherkennung – Speech to Text notiert die gesprochene Sprache im Video in Form eines Transkripts, das durchsuchbar ist.
Bilderkennung – Visual Concept Detection indexiert das Bewegtbild mit fachspezifischen und fächerübergreifenden visuellen Konzepten (zum Beispiel Landschaft, Fassadendetail, technische Zeichnung, Computeranimation oder Vorlesung).
Verschlagwortung – Named Entity Recognition beschreibt die einzelnen Videosegmente mit semantisch verknüpften Sachbegriffen. Synonyme oder Unterbegriffe von eingegebenen Suchbegriffen können dadurch automatisch mitgesucht werden, was die Treffermenge erweitert.
Erkannte Entitäten
Sprachtranskript
00:05
changing all the time and you can read and then at the end of the world and I think of the of the constants the in the end of the year and the use of a the title of the book and some of you did so and so on and on and and the and the Senate and the length of of the they were much if you don't like everyone was was a colony and that the only thing that we want to know how the tool that you know In the 1st half of the root of the word in a key tool for my good to start the times of that is that the 2 images you this is what I do for a living so they have quite a while the people from various backgrounds were distributed systems and a lot of the computer science and and I suppose that there were looking for interesting projects over the years and found that we heard you liked look in this area of logic of but you still have living so you that number 2 is that the very large commercial enterprise here using relational database using data which is arranged in light of form of which is what and so you could but for those of you who don't know that is also the goal so you put those 2 together is the equation that you know which is that of 1 plus money it was not so that's the material I like you so in particular the part that's part of to the talk about the during to part you were in format that is so a new source project and then look at the database machinery in which you know the the architecture of the management and then talk about some of the skills so the power in the of provokes a in the to get so let's start
03:16
with the and the touring to use of the that and I like the start of the history of the so this is
03:26
sort of the history of all post grasses it's hard in the and I felt a little celebrated multiple processes or so this is a time of the presence of God couple days that I think are interesting and this time I 1 wonder what is that was added the workers in all the years later the other thing that was the of the and so in the 2000 the by the there a budget of culturing from and who thought about all of that data inside of grass and a possible to make a distributed solution may be massively parallel processing and so
04:22
they both work of those that's the point to the version that doesn't have the feel this massivelyparallel losses in the engine on top of now about the very of interesting and the problem that you were found about 2 here's years where the a lab and they realize that you have now hold this terrible computing capability
04:52
with the database and it is going to add machine learning component to it right so the idea is that you don't move the data out of the database operated on property owners in some
05:07
experts from that in fact if you want to do everything in it and so that is the advent of the environment which was launched in 2011 so shortly after that we have to do if it follows that I. Greenwald and later and I said why don't we take this matter with processing parallel processing engines on so local storage and
05:36
distributed I also the capabilities of this unit you're going to the ecosystem can be added to the whole heart which is a Apache hot later the this in the because of that you the knowledge about the heart and the continued I told you about in the useful version of these are all research projects that you can work hard and that there are no such thing you if so so there's we have the result
06:23
of the interesting part of collaboration with industry and then some the history as well as academia so the project was actually where he published University in Berkeley University 1st guy act indeed there is a lowering of the architecture and you realize that only because I couldn't stand with Stanford University School of Medicine so you were so why is the life that she might like to think that just of these the of wealth is because it is that this really a great place to be used by the developers come together to work in a collaborative way on software and clearance of the transparency of our dataset so if you have a research project that you can a share of this happens and you that here you know this is groups who what if you don't have it here the linking the of that the other using that as the knowledge of the so called back around that it's important community for a of this slide postlistening using while in the snow and projects and if you want to know about that should be the some sum of the village really the source of the building itself to appear rule has taken out of heart the with all the motion products would well through the the database and the Apache the the the when you have these are all the open source in the last of last year's go all of which the sum of the that was a little bit of
08:53
history is that there were about like a tale of that and the talk of because of the world we show that the use of scalable in From the runs in today's as in the rest of you agree in principle the base as well as touching the heart and to you in a matter of power and scalability for so if you have the physical memory of a the nodes of the utility of the lots of other solutions to the right and on the the of the variety the what do you what do you think that you know and every word in the sense that it's aware that is the other thing is that but if you want so the performance of the sense that if you have a large dataset work faster rather the of where the weights and the of any such that you know you're working on large datasets and you get your results but so these are the functions that exist in Annapolis they said severely injured libraries in the world of 35 40 possible functions of the problems of not less so now we lost 5 over 5 years you see the the expected of nearest provides thing while the unsupervised learning of involved in over all of the work that and in fact this is the focus of a recent development in the state of the art of the time that they stayed in the area of the feature extraction and what have you the words you using the likelihood of focus and the application and talking about the other solves owls however I think in general if you want to know a lot of work and getting ready to go to such we started doing more irritable models in particularly in the last 6 months or so that all of the matrix operations as well as the inside of that function the function of interest in the matter of creating it M so the features of the future power better parallelism and the key thing she said that it is a sequel based that is designed to take advantage of this method would parallel processing architecture in the as well as of distributed and so on the scalability and by designing the algorithm Scalability datasets that don't change your software as you it's a bigger so if you have a universal dataset you know your just want to test for example and then you can run that is well on its way in the direction of the to change the facts in some really wonder what it's running of the another key thing but the idea is that if you do look at the dates I just kind of like this company right in the area of data sets that that you mentioned that high cardinality by writing things that you want to find a way to write the embedding Portuguese example in 1st so that the idea of all the data so these are the supported platforms that matches the people talk about it was so all then we have a field of the culture of jobs lost in the elite elitist this is the scaling by what the size you will say so the exactitude variables and then on the Y axis is the use right so this is a linear regression and going over the top right hand side you have a guest segments the vote so the to 2nd the double respects so you come down to of the red dots dot on the right thank you that you might have and if you don't have a lot of reports and then to move or 1 of them again in your life is to have this just to shows a year of our scalability losses With respect to regression with respect to scaling by the size this shows linear regression scalability or 10 million of what we have in this this means that it will not be so the part of this is just the direction around the world so this is where the the the right of the sequel that this is the area of the book but this is how you came all for example the linear regression here and predicting the price of houses of given some historical data houses in out of the impact of that through size the train that then if I want you prediction the results from that was and again i call and again unless the statement by the the idea of prediction of based on the results of the tree it's very
16:15
easy to I like to talk a little bit architecture be so many machine learning problems are in in nature and the picture truth is that a bad so that was accommodation type the process that we have this and relations all layer of which I the the actual core of the the of the of the of the of the what's 1 of the simplest possible but if you don't later the don't look at how scalability and we were over this in this and regression so each of them we have crafted for distributed part the guys from will although that you can't just take it out of which for a single node time to the so we developed the need to think about how the cell to the loss of this and the so we have a few so that example of they don't have sort of a straight line at the heart of the universe of there we wanna find essentially seek to modify the some of the way and so ordinary squares of reason was like this so we set up the matrix of that we monitor and distributed to think well I like going to be like that nobody in the mind the Indian work to not just by the the transpose savings circle of the the so you from the research in future but I see that actually operating costs here because they can see the difference right so that's an increase of the only you that was used to work out what the would the kind of everything that's the problem with the if you have look at the algebra you can see that actually decomposable right you can see a square the square in the had you could be separated those out you could do everything every operation was 1 not all the operations on the other hand and then just like them it turns out you can do that using something called like of product the inner product the idea here is that you can see all the operations on the the inside of the 2nd line the node of the right side of the but this is kind of like to think about posing and learning algorithms for this the that kind of idea for a 2nd and they did that offers the 1 thing to the University of great together because the lecture this time not
20:03
every data science sample and
20:07
so many women in science is more so than the heart of the matter will allow you that your and that's what you're quite keen on how to actually solve this distributed and keep did back in the scale where we work is that the the right regular or light from the inverse of the sequel to execute in database has been there and then returns the results in fact all of the data you will the variance from of the results that we have to on so just to finish up so what's coming
21:01
out of the edges of them next most of the areas that we focused on here is the very fact that support vector machines and approval nonlinear kernels and we've added more material of utilities that you need the operations cost functions that means the text of a little of functionality in the future as a learning theory is a theory of the whole lot for nearly all these of actual predictive of models the year and terms of usability and thinking about the size of this set of so they were prepared and you're more than welcome to participate in the project in the links on the web site where the sample size is a bit of a list of and the check it out in the questions that you you the fact that if I'm going to think of it is way the it is and they have the same of the 1st and the last day of the land is in the form of a theory of the by I have to use have 1 of the of the of the thing you think of the the of the the and in some of the things that have that God I think that we do with him in the the an what users go in the
00:00
Zahlenbereich
Gleichungssystem
Maschinelles Lernen
Mathematische Logik
Computeranimation
Datenhaltung
Bildschirmmaske
Datenmanagement
Auflösbare Gruppe
Wurzel <Mathematik>
Informatik
Bildgebendes Verfahren
Leistung <Physik>
Relationale Datenbank
Transinformation
Datenhaltung
Quellcode
Physikalisches System
Konstante
Flächeninhalt
Mereologie
Dateiformat
Projektive Ebene
Wort <Informatik>
Computerarchitektur
Unternehmensarchitektur
03:14
Software
Prozess <Physik>
Grundsätze ordnungsmäßiger Datenverarbeitung
Maschinelles Lernen
GRASS <Programm>
QuickSort
Datenhaltung
04:17
Einfügungsdämpfung
Punkt
Kategorie <Mathematik>
Datenhaltung
Versionsverwaltung
Virtuelle Maschine
IRIST
Maschinelles Lernen
Computerunterstütztes Verfahren
Systemzusammenbruch
Menge
Digitale Photographie
Datenhaltung
Hypermedia
Virtuelle Maschine
ATM
Makrobefehl
Demoszene <Programmierung>
Zusammenhängender Graph
05:05
Resultante
Expertensystem
Datentyp
Elektronischer Programmführer
Versionsverwaltung
Bildauflösung
Maschinelles Lernen
Übergang
Dateiformat
Datensichtgerät
Datenhaltung
Rahmenproblem
Einheit <Mathematik>
Funktion <Mathematik>
Standardabweichung
Projektive Ebene
Programmierumgebung
Parallele Schnittstelle
Gammafunktion
06:18
Resultante
Abstimmung <Frequenz>
Einfügungsdämpfung
Prozess <Physik>
Gewichtete Summe
Gemeinsamer Speicher
Skalierbarkeit
Gruppenkeim
Kartesische Koordinaten
Fortsetzung <Mathematik>
Richtung
Netzwerktopologie
Metropolitan area network
Skalierbarkeit
Prognoseverfahren
Algorithmus
Prozess <Informatik>
Lineare Regression
Parallele Schnittstelle
Zentrische Streckung
Lineares Funktional
Topologische Einbettung
Befehl <Informatik>
Datenhaltung
Gebäude <Mathematik>
Systemaufruf
Quellcode
Biprodukt
Rechenschieber
Kollaboration <Informatik>
Datenfeld
Menge
Rechter Winkel
Festspeicher
Unüberwachtes Lernen
Projektive Ebene
Aggregatzustand
Varietät <Mathematik>
Lineare Abbildung
Wellenpaket
Gewicht <Mathematik>
Mathematisierung
Maschinelles Lernen
Systemplattform
Datenhaltung
Open Source
Informationsmodellierung
Knotenmenge
Variable
Software
Programmbibliothek
Softwareentwickler
Gammafunktion
Leistung <Physik>
Videospiel
Lineare Regression
Matrizenring
LikelihoodFunktion
Open Source
Softwarewerkzeug
Schlussregel
Fokalpunkt
Skalarprodukt
Flächeninhalt
Mereologie
Wort <Informatik>
Computerarchitektur
Verkehrsinformation
16:14
Algebraisches Modell
Lineare Abbildung
Subtraktion
Einfügungsdämpfung
Bit
Prozess <Physik>
Natürliche Zahl
Skalierbarkeit
Zellularer Automat
Maschinelles Lernen
Systemzusammenbruch
Datenhaltung
Virtuelle Maschine
Knotenmenge
Skalierbarkeit
Algorithmus
Lineare Regression
Stichprobenumfang
Datentyp
Grundraum
Parallele Schnittstelle
Gerade
Implementierung
Nichtlinearer Operator
Lineare Regression
Kreisfläche
Relativitätstheorie
Einfache Genauigkeit
Biprodukt
Skalarproduktraum
QuickSort
Quadratzahl
Rechter Winkel
Mereologie
Speicherabzug
Quadratzahl
Computerarchitektur
20:07
Resultante
Schnittstelle
Bit
Web Site
Fortsetzung <Mathematik>
Maschinelles Lernen
Term
Physikalische Theorie
Datenhaltung
Kernel <Informatik>
Bildschirmmaske
Informationsmodellierung
Stichprobenumfang
Nichtlineares System
Nichtlinearer Operator
Lineares Funktional
Zentrische Streckung
Benutzerfreundlichkeit
Datenhaltung
Inverse
Softwarewerkzeug
Binder <Informatik>
SupportVektorMaschine
Digitale Photographie
Flächeninhalt
Grundsätze ordnungsmäßiger Datenverarbeitung
Projektive Ebene
23:40
Speicherabzug
Computeranimation
Metadaten
Formale Metadaten
Titel  Apache MADlib 
Untertitel  Distributed in Database Machine Learning for Fun and Profit 
Serientitel  FOSDEM 2016 
Teil  27 
Anzahl der Teile  110 
Autor 
McQuillan, Frank

Lizenz 
CCNamensnennung 2.0 Belgien: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. 
DOI  10.5446/30933 
Herausgeber  FOSDEM VZW 
Erscheinungsjahr  2016 
Sprache  Englisch 
Inhaltliche Metadaten
Fachgebiet  Informatik 