Merken

# MADlib

#### Automatisierte Medienanalyse

## Diese automatischen Videoanalysen setzt das TIB|AV-Portal ein:

**Szenenerkennung**—

**Shot Boundary Detection**segmentiert das Video anhand von Bildmerkmalen. Ein daraus erzeugtes visuelles Inhaltsverzeichnis gibt einen schnellen Überblick über den Inhalt des Videos und bietet einen zielgenauen Zugriff.

**Texterkennung**–

**Intelligent Character Recognition**erfasst, indexiert und macht geschriebene Sprache (zum Beispiel Text auf Folien) durchsuchbar.

**Spracherkennung**–

**Speech to Text**notiert die gesprochene Sprache im Video in Form eines Transkripts, das durchsuchbar ist.

**Bilderkennung**–

**Visual Concept Detection**indexiert das Bewegtbild mit fachspezifischen und fächerübergreifenden visuellen Konzepten (zum Beispiel Landschaft, Fassadendetail, technische Zeichnung, Computeranimation oder Vorlesung).

**Verschlagwortung**–

**Named Entity Recognition**beschreibt die einzelnen Videosegmente mit semantisch verknüpften Sachbegriffen. Synonyme oder Unterbegriffe von eingegebenen Suchbegriffen können dadurch automatisch mitgesucht werden, was die Treffermenge erweitert.

Erkannte Entitäten

Sprachtranskript

00:05

I think it's time for all you want to start my position about not being funny to and thanks for coming to this 1st of all

00:19

I I want to talk about myself into his mother I initiate so that only so to our own boss list just a happened and I will the window function but in a point all and extended it in 9 . 0 and also in 9 . 1 development cycle I have a likable the features with them the here and then the choice of words you know that the a great feature and and so uh I'm now on what can only deviate sex before that I I really appreciate about that because of the Jones did have guys users who feature actually I'm not talking about a period of fish and I'm also working on other modules like is that the W. kind in which is what might the extension and I adjoining green brown and last year and I really enjoyed the development like in green but so how and just wanted to ask you how many how many people here have ever held about being part of all also at the flow of people may know about green but but that I just spoke about the activation of the green but the rebound there's a company that develops a green but that this which is the node for whole from both because it points to and it's distributed databases so in this this is the type of across the system will bring about a group of users so here's a muscle cell and the the the grass hold the whole bunch of segments so and the gradient from master to segment and data is distributed interested so that the query processing is polarized in the 2nd set well so local full of customers have experienced by of the like a terrible dope tera bytes of data paid by the data and the process of the huge amount of data in rainbow so that the best thing about

02:49

what was going on in the hungry but I also think that you guys started about the entirely by the people but had that and I think I have a lot of people talking about because of the not only us but also like the media company that CNN then role of people on here so we think

03:14

that the that to degree that arise in the kids in the simple

03:27

example above all had the cost of a single book is just examples but it expense what we're doing so in the of the system in enterprise system and the customer do good during the report because of the data is too huge and politicization too slow so they tried to run that no reporting query but it took a like on the Bayes so there's or sometimes weeks and angry about is a hot and muscle power processing system so that actually pretty they all agree brother the best and the system can run the corre and with all the around the great in in a few seconds although the the digital data so it is subject to understand this part data what's going on behind the data and after that the because someone started to predict the their future also they're trying to optimize the process based on data so after understanding the simple fact behind the data they're trying to leverage data to optimize the prof and example that we discussed some of it is that went into use of base recommendations based on data so they need to aggregate more leverage that and that the problems we're facing is that so this is a simple reason provided traditional to appear and its local on the left hand side and this is a database which is which made the apostle so rampant and the ice is used text to psychology me all about the process of some kind of a be 2 and they need to extract the data out of that to put it to the tools and run analyzed did you get the results back to that of from the center to the other side to the other logics now make a database gets bigger and all this will be the the companies collecting old all kinds of data not only the enterprise system but also general collecting the gathering the data from the Facebook Twitter all of his obligations so the problem here is those of like that and it's tools are not designed although this this kind of big data so this page there there those kinds of salt in memory system and this is parallel to make it power so performing the underlying it's unbiased is a big challenge in it can be so they a new sphere they needed to actually extract that out of that that that this but it is possible to do on my own the entire dataset so they need to do is harmful in the except the so holes on a small subset of the data to put it into the olympics to and this doesn't solve the problem soul trying to push this copy of the anarchists population into the this is this is exactly what we want to do the main concept is here is of great magnetic adjust the the means of so the database is now like a member of the correct in all kinds of does not only structure that about also structures that using the like that that sells 1 as in this and I think the nature of all undertakes is kind of iteration and do you do you do some client error and get the some insights and get feedback to the line of business and the Detroit the the hypothetical analysis based on that is the the the also the data this but I think that stand-alone sequel defiance of some kind of semantics features like a simple of a function of window functions also there a grouping 5th career down and nice features but it's not enough we do need more accurate method 1 gives statistic methods that the integrations of more complete it never so it becomes not so

09:16

we have to depend on Monday FIL so

09:20

that possible we introduce this this new you say it's called the the best and the brightest at the warehouse in 2009 in here of cotton from green brown and also the Johannes saying formula the covering of of the book together this idea into 1 paper and into the semantic scale new unassisted practices for this which are described and things like that I expect now and they really started to demand the prejudices of the development and we reported in this aspect in uh in this year just a few minutes the Monday holiday Monday project is now to some

10:11

of the good that is why we call this model is not these are masters will not i j and which I expect no like the sample with of course on the library will not be desire and added to the Bosporus of green brown so this is just the idea and you can install to you the and you can run the analytics methods in some of it is has the unbiased methods like a mathematical statistical machine learning and modules and that is accurate designed for power and scalable because that this is part of that determinism and you can you can scale out there what what we get and the old interfaces for the uh and it's middle of defined as it into the function so it's just the sickle function

11:18

the missionaries is to to foster widespread development of scalable experience we what we want to harness the reports from a commercial practice from like us and also as well as the academic research from universities and we want to do this as open source for the because we we want to more we want to have more and more contributions from happening so I just this this is the obviously licensed users

11:53

lessons and propose you can hack and you can send the ball request it so this is a kind of corroborated project between green brown and universities but we know this is a very useful solution so it isn't 1 is already in green of so everybody can contribute to the source code and I can't read we have good the result from a universal covering about pre Wisconsin Madison broad and that's because rebound called base shares of Apple logo prosperous kind of bases so mother isn't countries supports both all past present really and about an inch from Fig . 4 2 9 . 1 and the green banners for 2 0 2 important was to it and this is designed for the scientists to provide more scalable robust analytics capabilities again look up the information on did not deduct net and also spelled is hosted in the top so you can just call intuitive stop and if you have any questions on mapping all usage ancestry feel free to both groups for what some of is the same answer I we think that the money is the same as 2 big because the of is designed to available homocysteine scary thing and this is a moment is running inside you that there's so you don't need to extract in their home that's that that this is just run inside the best and what it is only will rebound what leverage the power is and is very it's very easy to to use because it's just as the function of you don't need no additional tools as go is your friend and those this is out of this also in just talk the this also means that the nature of this this kind of analytics is of great complicated and sometimes you want to customize the models so predefined package may not be enough for for but you can just read it the source code and you can just change some parameters of the more you you can just repressed in some parts of the model but of course

14:45

it's great I mean I yes there's a process by me yes so all that this is in fact I counted how process so you can just send the public sphere actually got a lot of us in this this this thing about it is that I'm not sure about the new users in the press but I think there was some ideas of this shared among the entities was so right right measure about that the current status of the actions of the is about you know we appreciate that the contribution if you formula that if she shared some of the most vendors like of providing a similar thing to to the predictive on that's modules but their proprietary software and you can look up the inside and it's type theory it's also expensive but money is free the status of God

16:05

and monitored Roma insight from and by and development we're talking to that really is therefore every quarter for your knowledge we will just over history there's really the the proof itself is a little still just start up phase but still you can use the the whole bunch of like modules like this a linear regression logistic regression k-means clustering decision trees soil and we're willing to various of new item 0 but 0 . 4 in the end of this world really the distribution functions sponsored random forests again so it's called

16:56

inside the mother of issue it

17:02

mother you have several parts during the on the talk on behalf of Python years so it depends on the Python so you need to install the Python but the bison of controls the loop of words and works on that is right there something I wasn't very good clustering need to do a homogenous population so it's not as easy it to this type of words and in on the sequel so we use a Python fossil then and for the simplest way over the like of linear regression we use just a simple function you hear what you need a union is defined and we have to be a behind it that we have 2 graphs obstruction they so the the main concept of is not only prosperous and removal so we have the obstruction millions of preference and if someone who is familiar with other that was like and then you can just about the to the correct this columns of and the database and even that can have a goal would use we have some kind of a simple compressed vector representation of the user defined types and also we have just call it the American Congress function indicates that the whole

18:42

convinced that looks like this the model consists of some kind of bit morning and descriptive statistics support modules for most of the margin learning so we have a supervised learning which is our goal in a regression regression I've discussed doesn't really SVM and for the unsupervised learning we have association rules k-means clustering on speedy also we have a descriptive descriptive statistics and the support more use micro or extension so we have this discussed the on the right hand side shows what the stuff that there is a user defined types which compresses the title on sparsity so you have been reduced since prospect in and we also extended pulse was already tight so you can just make a summation only a we just define additional prior function in that the good parts in the lab is we have a good amount of documentation online so you can just quoted above what might be the next and the the the the the condition of not being which is good because it's not only about how to use how to call the API but also we have a whole bunch of them might not think background music the theory so I can just go around here and understand the kind of ideas and the user then

20:37

there some of the is so with that I'm going to have money and use cases so hard what's what can you do actually I'm going to talk

20:54

about I'm going to talk about 2 types of masterminding the supervised learning supervised learning and unsupervised learning so I think people around here meaning that we know about the differences between 2 about and and and expand the 2 types of running the unsupervised is right in front of role better which means that you don't need in level on data is just that the cut eyes from the data that exist in your database and in contrast in contrast to unsupervised supervised learning is that if you have any historical data which are and then you just put it and you just you can just use predictive model on the basis of the and With that build you build the model you just build and you predict the new observation you just you can model new data with that model Safeco example would be the core of all and unsupervised learning is a consumer market segmentation study which and want to talk about the middle this can cross and then support was running impressed by the know this formal spam people like 0 a provision is sponsored and there are a lot of implementation for the sperm but if we use this realistic regression 1 I this if there is no there is not a yes or no and then we can use a decision tree for the multi multi level problem because

23:02

of market segmentation this

23:06

is a very is a constant thing and so customer segmentation studies if you have some kind of customer and then you can just run the k-means across the nation and the data is called mockery country like this so all you can say that when grouped as the height Broglio group at that time so customize age and funding for and about they tend to show shop at nodes form an important so that and the green groups is the name for the to customers who prices this group and the by consuls they tend to show up possible also OK so hard to do so we have a k-means clustering modeling mandate and after the so that the mean and the variance and prepare data so here we have an input points which causes the cost model and that the customer ID and some kind of a lot of you here we use just so that provide examples but this OK mean k-means function used to represent their attributes as a vector not not consul in new possible you need to classroom your attribute column 2 the rate here just so there are 8 18 it's like as appropriate in this area and then the data we looked like this and k-means so can mean is a traditional way to cross the the data so there are a bunch of these and so in and generalization of brochures which may find but you and explore quarter so you need to choose 1 of the good initialization approach so means is isn't just like income was copulation and you that they k-means do some groups are wasn't just calculated that descent toward the distance from centroid sent the growth of the center of the cluster from the it's still the point then just just crossed pollution want to do the result is a simple so after the initialization point is very important and stuff for their good quality results this is a kind of and try and their stuff and you just customized the method and in this class of some of it is very simple to choose 1 of the the the point of this and this is a initialization point which is a random is random and k-means practices of improve of that but the the the cost to initialize it's about a higher than 1 and you can just give the center on the set of the data set innovatory and also all they depending on your program into you can choose 1 of the distance metrics so for the right the choose application uses the 2 norm of again that you gradient descent so therefore for the special there that I think that the 2 norm is enough but for the thousands of this dimensional vectors all the documentation Prescott clustering however we need to choose cosine of times more potent Jaccard index and for the upcoming release you we make it probable so you just write the fuel cost and assessment function and put this into a chemical process can invest and then then now you can run the k-means function the next step will not be the the game process actually it is unable to contact read but this is just cut function and the the input is like a table name the column name all the idea was the and reach and assessment procedures and which now so maximum number of iterations the combatants payment about also not been doing and then we got the results so for this example . bonds is so this this 2 columns of the table column candidates all columns is that result of which characterizes the point is belongs to a as a result and each centroids represented in under the table so that you can just look up there that's the unsupervised

29:13

learning of k-means crust now is how to the you supervised learning and the use cases heart attack risk and analytics

29:27

the cross petition policies use of the expression variance across the region is identify which and the observation that answers with known observation so but very the classification process is pretty to 2 parts 1 which is training on which is gratification pointing is to build your model based on your level that and then vision processes is with this model you build crossfire the new observation and an example of a music for the biological not much crossed over a much crust decision process of this is that we have this decision is also you can use in naive Bayes so here and it's

30:17

been about a realistic position in fall out of the prediction corrected the potential risk of heart of the vessel initially patient data with the number number of other goods so for example and the efficient that may have the age causal during the height weight and think nest name again

30:52

you need the input data retrieval correlated with age but pressure causal height weight and the last column hotter there is that level so that know it's yes along so the historical data which the this this recall had a heart attack is recorded in the heart of the then transformed again transformed into an rate as a vectorized just so that a come gone from it and training the data to build a model so let's start from modern those function this is much the came in and then you need to specify which terrible isn't but which come to specify them as a feature of the attributes then there built model is selected so that the resulting adjusted record types so you that this isn't just expand as a as the role of features like age and is called has coffee coefficient near to 1 9 2 1 5 so standard is here here is this is just a example so I think the value of the sum of the Israeli Boris here but fact that this example shows that the brain pretty blood pressure is has a big impact on the heart attack there's on history of and then the classification so if you have a new data without level and then with you have known to build models of random logistic function with the dot product is your model in which the result is the sticker 7 . 8 8 you might say 0 there's 6 again this result is I think it's not body but you you so this further restrict functionalist accumulation this becomes minus 1 to 1 and if there is the a result of water is positive and the risk is positive and the of the viruses narrative and 0 then the risk so the the good of all linguistic regression is not just the sum of all the data it returned to the possibility that is on the history of so hard that put to the

33:59

employer had to install a magic now just uploaded

34:04

the model to the extent that could it right I mean that this is so there are no digits and cry and I at accurate as of today physics and grand can install valid because of physics and current use Apache License to as a political but also because the money for the news I some called Europe and make a book is a graph this is it's a grant of today I should say that there's only the Makefile but I hope that in a few days ago knowledge and the poor than you can just say it examines the badly then the phosphorus is not ready to through this final for the the other being the formation of

35:02

and we have to have standard you take the tools for green so I have to agree about how is providing a community rebound confusion which you can use without any money so you just downloaded rebound binary fission bomb refund site and then you just installed it in your Linux and there's no injection so you can run agree about the structure the new face of other the necessary in a distributed way and wrong you analytics with this form of

35:46

thank you already but yes yes so Monday there an open source project and we want you to contribute if you have any in size so new portal or to form a new top for the marginal and always good answer then that where the I took the oath to

36:15

at this yes all yet we're carefully designed the compatibility between group under the most of and processed so there's I think there is no gap feature just such that there that would use of polarized rebounded but the future is not different from each other so from union death you yeah questions appear so be so yeah the PL our is good for them and process that this effect but it's and actually it is a green bound we use here often appeal is still not colonized way so it's just a running so local and it runs in memory right and money runs like 0 so right the data to the business and get back to the population so if the data gets bigger and bigger evolve may run out of memory aid parallel critical question our coverage without having to do more yet impostors that had that had to yesterday yesterday and in the reprimanding we're talking about a how power of the prospects where things are not but I think there the the problem here is that it is different from the marginal power quality of where some of showing what kind of harmonization equals or addition some of the types of problems so for example if is it's only and you can just tolerance that there's been solved or if you have if you want to be part of the argument function then is a different problem so I'm not sure what kind of authorization is going to get into the prospects of yeah it if possible as the have the sun or processing then manager to consider how to part how to scale and that this and in the you have and enjoy the planet gets together for the names of the 0 for all rear how our forest there's on the decision tree it this 1 is not what I'm assuming call it you have the right to life there yeah solving some functions and creative or it could make with table convertible and there are some wrong but the output is likely to tell us all what results received right at the names of the approaches to the people in here so yes so we designed this this kind of problems we carefully to not have done is to add an expected result but you need to care so for for for example this and modules creates some future was but that that that is based on the input variables so just of some fixed the theory that he's not so visualization of gives a literally thing is that elements there so this form of it doesn't have any visualization tool but isn't really has some takes the base this isn't we visited and make expand on that so that that that that that had to do that but I can just out of the language is the language and what can work with like that on this also partition here is based on the group participation so that other languages we don't have any you to harmonization management of 1 of so what is it that the news is the icing as I say I think that's a good idea to present the use of pure proxy inside the mother function particular those of 1 . prosperous and yet we made however as the query the the that's school they do

00:00

Hoax

Prozess <Physik>

Punkt

Ortsoperator

Gruppenkeim

Zellularer Automat

Twitter <Softwareplattform>

Bildschirmfenster

Computeranimation

Datenhaltung

Gradient

Open Source

Knotenmenge

Modul <Datentyp>

Parallelrechner

Datennetz

Mehragentensystem

Maßerweiterung

Softwareentwickler

Auswahlaxiom

Synchronisierung

Prozess <Informatik>

Schreib-Lese-Kopf

Datenhaltung

Netzwerkbetriebssystem

Mailing-Liste

Physikalisches System

Frequenz

Datenfluss

Modul

Videokonferenz

Fensterfunktion

Funktion <Mathematik>

Menge

Benutzerschnittstellenverwaltungssystem

Mereologie

Dreiecksfreier Graph

Wort <Informatik>

GRASS <Programm>

Programmbibliothek

Innerer Punkt

02:46

VIC 20

Endlicher Graph

Total <Mathematik>

Hypermedia

Reelle Zahl

Computeranimation

Spezifisches Volumen

03:22

Resultante

Mittelwert

Prozess <Physik>

Natürliche Zahl

Familie <Mathematik>

Iteration

Fortsetzung <Mathematik>

Benutzerfreundlichkeit

Extrempunkt

Computeranimation

Formale Semantik

Homepage

Metropolitan area network

Client

Flächeninhalt

Gerade

Lineares Funktional

Datenhaltung

Abfrage

Prognostik

Teilmenge

Arithmetisches Mittel

Strukturierte Programmierung

Digitalsignal

Twitter <Softwareplattform>

Emulation

Datenstruktur

Verschlingung

Festspeicher

Fehlermeldung

Rückkopplung

Facebook

Identitätsverwaltung

Mathematische Logik

Datenhaltung

Physikalisches System

Iteration

Kugel

Parallelrechner

Verkehrsinformation

Datenstruktur

Analysis

Zwei

Statistische Analyse

Physikalisches System

Integral

Portscanner

Fensterfunktion

Mereologie

Mehrrechnersystem

Unternehmensarchitektur

Verkehrsinformation

09:17

Maschinelles Lernen

Analysis

Computeranimation

Formale Semantik

Ausdruck <Logik>

Datenhaltung

Informationsmodellierung

Skalierbarkeit

Determiniertheit <Informatik>

Stichprobenumfang

Statistische Analyse

Programmbibliothek

Algorithmische Lerntheorie

Softwareentwickler

Schnittstelle

Lineares Funktional

Prinzip der gleichmäßigen Beschränktheit

Seidel

Mathematisierung

Inferenzstatistik

Modul

Funktion <Mathematik>

Analytische Zahlentheorie

Mereologie

Kommensurabilität

Projektive Ebene

Programmbibliothek

11:15

Resultante

Offene Menge

Webforum

Momentenproblem

Natürliche Zahl

Skalierbarkeit

Gruppenkeim

Maschinelles Lernen

Computeranimation

Homepage

Datenhaltung

Open Source

Informationsmodellierung

Skalierbarkeit

Parallelrechner

Programmbibliothek

Statistische Analyse

Stützpunkt <Mathematik>

Softwareentwickler

Grundraum

Parametersystem

Addition

Lineares Funktional

URN

Open Source

Mathematisierung

Quellcode

Funktion <Mathematik>

COM

Analytische Zahlentheorie

Mereologie

Projektive Ebene

Bildschirmsymbol

Information

Programmbibliothek

Verkehrsinformation

14:45

Distributionstheorie

Offene Menge

Typentheorie

Prozess <Physik>

Euler-Winkel

Gruppenoperation

Skalierbarkeit

Fächer <Mathematik>

Gewichtete Summe

Computeranimation

Netzwerktopologie

Dedekind-Schnitt

Kugel

Software

Lineare Regression

Mapping <Computergraphik>

Logistische Verteilung

Softwareentwickler

Gravitationsgesetz

Große Vereinheitlichung

Phasenumwandlung

Lineares Funktional

Wald <Graphentheorie>

Dean-Zahl

Rohdaten

Modul

Entscheidungstheorie

Einheit <Mathematik>

Beweistheorie

Grundsätze ordnungsmäßiger Datenverarbeitung

16:51

Objekt <Kategorie>

Selbstrepräsentation

Fortsetzung <Mathematik>

Ungerichteter Graph

Abstraktionsebene

Computeranimation

Vektorrechner

Loop

Iteration

Standardabweichung

Lineare Regression

Datentyp

Speicherabzug

Operations Research

Lineares Funktional

Algorithmus

Architektur <Informatik>

Datenhaltung

Datenstruktur

Funktion <Mathematik>

Loop

Mereologie

Gamecontroller

Wort <Informatik>

Vektorrechner

Innerer Punkt

Normalspannung

18:41

Schätzwert

Randverteilung

Lineare Abbildung

Bit

Gewichtete Summe

VIC 20

Unüberwachtes Lernen

Content <Internet>

Gradient

Extrempunkt

Physikalische Theorie

Computeranimation

Überwachtes Lernen

Metropolitan area network

Schwach besetzte Matrix

Deskriptive Statistik

Informationsmodellierung

Puls <Technik>

Parallelrechner

Lineare Regression

Gruppe <Mathematik>

Datentyp

Zählen

Statistische Analyse

Maßerweiterung

Operations Research

Betriebsmittelverwaltung

Modul

Assoziativgesetz

Lineare Regression

Logarithmus

Güte der Anpassung

Schlussregel

Empirisches Quantil

Schwach besetzte Matrix

Sollkonzept

Modul

Schlussregel

Benutzerprofil

Assoziativgesetz

BAYES

Rechter Winkel

Konditionszahl

Mereologie

Unüberwachtes Lernen

Entscheidungsbaum

Vektorrechner

Matrizenzerlegung

20:51

Subtraktion

Logistische Verteilung

Unüberwachtes Lernen

Formale Grammatik

Implementierung

Maschinelles Lernen

E-Mail

Computeranimation

Überwachtes Lernen

Übergang

Netzwerktopologie

Multiplikation

Informationsmodellierung

Lineare Regression

Datentyp

Luenberger-Beobachter

Kontrast <Statistik>

Schnitt <Graphentheorie>

Beobachtungsstudie

Lineare Regression

Rohdaten

Datenhaltung

Objektklasse

Entscheidungstheorie

Basisvektor

Unüberwachtes Lernen

Kategorie <Mathematik>

Speicherabzug

Beobachtungsstudie

23:06

Resultante

Punkt

Prozess <Physik>

Extrempunkt

Natürliche Zahl

Gruppenkeim

Iteration

Kartesische Koordinaten

Oval

Extrempunkt

Ähnlichkeitsgeometrie

Computeranimation

Metropolitan area network

Vektorrechner

Trigonometrische Funktion

t-Test

Gradientenverfahren

Punkt

Lineares Funktional

Güte der Anpassung

Stichprobe

Softwareentwicklung

Bitrate

Ein-Ausgabe

Algorithmische Programmiersprache

Arithmetisches Mittel

Funktion <Mathematik>

Menge

COM

Einheit <Mathematik>

Rechter Winkel

Automatische Indexierung

Zahlenbereich

Ein-Ausgabe

Messprozess

Vektorrechner

Trigonometrische Funktion

Normalspannung

Tabelle <Informatik>

Lesen <Datenverarbeitung>

Kontrollstruktur

Klasse <Mathematik>

Zahlenbereich

Schwach besetzte Matrix

Open Source

Informationsmodellierung

Knotenmenge

Zufallszahlen

Spieltheorie

Schwellwertverfahren

Abstand

Bruchrechnung

Varianz

Normalvektor

Attributierte Grammatik

Physikalischer Effekt

Beobachtungsstudie

Tabelle <Informatik>

Linienelement

Linienelement

Winkel

Menge

Inverser Limes

Datensatz

Abstand

Flächeninhalt

Analytische Zahlentheorie

Räumliche Anordnung

Binäre Relation

Normalvektor

Bitrate

Beobachtungsstudie

29:26

Vektorpotenzial

Wellenpaket

Prozess <Physik>

Gewicht <Mathematik>

Ortsoperator

Logistische Verteilung

Regulärer Ausdruck

Zahlenbereich

Analysis

Computeranimation

Übergang

Entscheidungstheorie

Informationsmodellierung

Prognoseverfahren

Gewicht <Mathematik>

Gruppe <Mathematik>

Luenberger-Beobachter

Maschinelles Sehen

Varianz

Lineare Regression

Physikalischer Effekt

Güte der Anpassung

Datenmodell

Abelsche Kategorie

Prognostik

Vektorpotenzial

Entscheidungstheorie

Zahlenbereich

Mereologie

ICC-Gruppe

Attributierte Grammatik

30:48

Information Retrieval

Resultante

Gewichtete Summe

Gewicht <Mathematik>

Logistische Verteilung

Wasserdampftafel

Computeranimation

Übergang

Statechart

Informationsmodellierung

Datensatz

Ganze Zahl

Koeffizient

Gewicht <Mathematik>

Lineare Regression

Total <Mathematik>

Datentyp

Randomisierung

Logistische Verteilung

Biprodukt

Gasdruck

Attributierte Grammatik

Tabelle <Informatik>

Drucksondierung

Lineares Funktional

Lineare Regression

Strahlensätze

Physikalischer Effekt

Division

Güte der Anpassung

Gebäude <Mathematik>

Datenmodell

Ein-Ausgabe

Bitrate

Biprodukt

Druckverlauf

Funktion <Mathematik>

Array <Informatik>

Koeffizient

Schwimmkörper

ATM

Ein-Ausgabe

Vektorrechner

Programmbibliothek

34:01

Web Site

Analytische Zahlentheorie

Graph

Installation <Informatik>

Physikalismus

Graphiktablett

Strömungsrichtung

Maßerweiterung

Binärcode

Computeranimation

Informationsmodellierung

Digitalisierer

Analytische Zahlentheorie

Injektivität

Programmbibliothek

Maßerweiterung

Datenstruktur

35:44

Randverteilung

Resultante

Offene Menge

Harmonische Analyse

Proxy Server

Umsetzung <Informatik>

Subtraktion

Prozess <Physik>

Formale Sprache

Gruppenkeim

Element <Mathematik>

Physikalische Theorie

Computeranimation

Netzwerktopologie

Open Source

Bildschirmmaske

Variable

Datenmanagement

Datentyp

Visualisierung

Funktion <Mathematik>

Autorisierung

Soundverarbeitung

Lineares Funktional

Videospiel

Parametersystem

Addition

Wald <Graphentheorie>

Open Source

Systemaufruf

Abfrage

Ein-Ausgabe

Modul

Partitionsfunktion

Entscheidungstheorie

Rechter Winkel

Festspeicher

Parallelrechner

Mereologie

Projektive Ebene

### Metadaten

#### Formale Metadaten

Titel | MADlib |

Untertitel | An open source library for in-database analytics |

Serientitel | PGCon 2012 |

Anzahl der Teile | 21 |

Autor | Harada, Hitoshi |

Mitwirkende | Heroku (Provider) |

Lizenz |
CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben |

DOI | 10.5446/19025 |

Herausgeber | PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross |

Erscheinungsjahr | 2012 |

Sprache | Englisch |

Produzent | FOSSLC |

#### Technische Metadaten

Dauer | 42:11 |

#### Inhaltliche Metadaten

Fachgebiet | Informatik |

Abstract | An open source machine learning library on RDBMS for Big Data age MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data. The MADlib mission is to foster widespread development of scalable analytic skills, by harnessing efforts from commercial practice, academic research, and open-source development. The library consists of various analytics methods including linear regression, logistic regression, k-means clustering, decision tree, support vector machine and more. That's not all; there is also super-efficient user-defined data type for sparse vector with a number of arithmetic methods. It can be loaded and run in PostgreSQL 8.4 to 9.1 as well as Greenplum 4.0 to 4.2. This talk covers its concept overall with some introductions to the problems we are tackling and the solutions for them. It will also contain some topics around parallel data processing which is very hot in both of research and commercial area these days. |