Merken

MADlib

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
I think it's time for all you want to start my position about not being funny to and thanks for coming to this 1st of all
I I want to talk about myself into his mother I initiate so that only so to our own boss list just a happened and I will the window function but in a point all and extended it in 9 . 0 and also in 9 . 1 development cycle I have a likable the features with them the here and then the choice of words you know that the a great feature and and so uh I'm now on what can only deviate sex before that I I really appreciate about that because of the Jones did have guys users who feature actually I'm not talking about a period of fish and I'm also working on other modules like is that the W. kind in which is what might the extension and I adjoining green brown and last year and I really enjoyed the development like in green but so how and just wanted to ask you how many how many people here have ever held about being part of all also at the flow of people may know about green but but that I just spoke about the activation of the green but the rebound there's a company that develops a green but that this which is the node for whole from both because it points to and it's distributed databases so in this this is the type of across the system will bring about a group of users so here's a muscle cell and the the the grass hold the whole bunch of segments so and the gradient from master to segment and data is distributed interested so that the query processing is polarized in the 2nd set well so local full of customers have experienced by of the like a terrible dope tera bytes of data paid by the data and the process of the huge amount of data in rainbow so that the best thing about
what was going on in the hungry but I also think that you guys started about the entirely by the people but had that and I think I have a lot of people talking about because of the not only us but also like the media company that CNN then role of people on here so we think
that the that to degree that arise in the kids in the simple
example above all had the cost of a single book is just examples but it expense what we're doing so in the of the system in enterprise system and the customer do good during the report because of the data is too huge and politicization too slow so they tried to run that no reporting query but it took a like on the Bayes so there's or sometimes weeks and angry about is a hot and muscle power processing system so that actually pretty they all agree brother the best and the system can run the corre and with all the around the great in in a few seconds although the the digital data so it is subject to understand this part data what's going on behind the data and after that the because someone started to predict the their future also they're trying to optimize the process based on data so after understanding the simple fact behind the data they're trying to leverage data to optimize the prof and example that we discussed some of it is that went into use of base recommendations based on data so they need to aggregate more leverage that and that the problems we're facing is that so this is a simple reason provided traditional to appear and its local on the left hand side and this is a database which is which made the apostle so rampant and the ice is used text to psychology me all about the process of some kind of a be 2 and they need to extract the data out of that to put it to the tools and run analyzed did you get the results back to that of from the center to the other side to the other logics now make a database gets bigger and all this will be the the companies collecting old all kinds of data not only the enterprise system but also general collecting the gathering the data from the Facebook Twitter all of his obligations so the problem here is those of like that and it's tools are not designed although this this kind of big data so this page there there those kinds of salt in memory system and this is parallel to make it power so performing the underlying it's unbiased is a big challenge in it can be so they a new sphere they needed to actually extract that out of that that that this but it is possible to do on my own the entire dataset so they need to do is harmful in the except the so holes on a small subset of the data to put it into the olympics to and this doesn't solve the problem soul trying to push this copy of the anarchists population into the this is this is exactly what we want to do the main concept is here is of great magnetic adjust the the means of so the database is now like a member of the correct in all kinds of does not only structure that about also structures that using the like that that sells 1 as in this and I think the nature of all undertakes is kind of iteration and do you do you do some client error and get the some insights and get feedback to the line of business and the Detroit the the hypothetical analysis based on that is the the the also the data this but I think that stand-alone sequel defiance of some kind of semantics features like a simple of a function of window functions also there a grouping 5th career down and nice features but it's not enough we do need more accurate method 1 gives statistic methods that the integrations of more complete it never so it becomes not so
we have to depend on Monday FIL so
that possible we introduce this this new you say it's called the the best and the brightest at the warehouse in 2009 in here of cotton from green brown and also the Johannes saying formula the covering of of the book together this idea into 1 paper and into the semantic scale new unassisted practices for this which are described and things like that I expect now and they really started to demand the prejudices of the development and we reported in this aspect in uh in this year just a few minutes the Monday holiday Monday project is now to some
of the good that is why we call this model is not these are masters will not i j and which I expect no like the sample with of course on the library will not be desire and added to the Bosporus of green brown so this is just the idea and you can install to you the and you can run the analytics methods in some of it is has the unbiased methods like a mathematical statistical machine learning and modules and that is accurate designed for power and scalable because that this is part of that determinism and you can you can scale out there what what we get and the old interfaces for the uh and it's middle of defined as it into the function so it's just the sickle function
the missionaries is to to foster widespread development of scalable experience we what we want to harness the reports from a commercial practice from like us and also as well as the academic research from universities and we want to do this as open source for the because we we want to more we want to have more and more contributions from happening so I just this this is the obviously licensed users
lessons and propose you can hack and you can send the ball request it so this is a kind of corroborated project between green brown and universities but we know this is a very useful solution so it isn't 1 is already in green of so everybody can contribute to the source code and I can't read we have good the result from a universal covering about pre Wisconsin Madison broad and that's because rebound called base shares of Apple logo prosperous kind of bases so mother isn't countries supports both all past present really and about an inch from Fig . 4 2 9 . 1 and the green banners for 2 0 2 important was to it and this is designed for the scientists to provide more scalable robust analytics capabilities again look up the information on did not deduct net and also spelled is hosted in the top so you can just call intuitive stop and if you have any questions on mapping all usage ancestry feel free to both groups for what some of is the same answer I we think that the money is the same as 2 big because the of is designed to available homocysteine scary thing and this is a moment is running inside you that there's so you don't need to extract in their home that's that that this is just run inside the best and what it is only will rebound what leverage the power is and is very it's very easy to to use because it's just as the function of you don't need no additional tools as go is your friend and those this is out of this also in just talk the this also means that the nature of this this kind of analytics is of great complicated and sometimes you want to customize the models so predefined package may not be enough for for but you can just read it the source code and you can just change some parameters of the more you you can just repressed in some parts of the model but of course
it's great I mean I yes there's a process by me yes so all that this is in fact I counted how process so you can just send the public sphere actually got a lot of us in this this this thing about it is that I'm not sure about the new users in the press but I think there was some ideas of this shared among the entities was so right right measure about that the current status of the actions of the is about you know we appreciate that the contribution if you formula that if she shared some of the most vendors like of providing a similar thing to to the predictive on that's modules but their proprietary software and you can look up the inside and it's type theory it's also expensive but money is free the status of God
and monitored Roma insight from and by and development we're talking to that really is therefore every quarter for your knowledge we will just over history there's really the the proof itself is a little still just start up phase but still you can use the the whole bunch of like modules like this a linear regression logistic regression k-means clustering decision trees soil and we're willing to various of new item 0 but 0 . 4 in the end of this world really the distribution functions sponsored random forests again so it's called
inside the mother of issue it
mother you have several parts during the on the talk on behalf of Python years so it depends on the Python so you need to install the Python but the bison of controls the loop of words and works on that is right there something I wasn't very good clustering need to do a homogenous population so it's not as easy it to this type of words and in on the sequel so we use a Python fossil then and for the simplest way over the like of linear regression we use just a simple function you hear what you need a union is defined and we have to be a behind it that we have 2 graphs obstruction they so the the main concept of is not only prosperous and removal so we have the obstruction millions of preference and if someone who is familiar with other that was like and then you can just about the to the correct this columns of and the database and even that can have a goal would use we have some kind of a simple compressed vector representation of the user defined types and also we have just call it the American Congress function indicates that the whole
convinced that looks like this the model consists of some kind of bit morning and descriptive statistics support modules for most of the margin learning so we have a supervised learning which is our goal in a regression regression I've discussed doesn't really SVM and for the unsupervised learning we have association rules k-means clustering on speedy also we have a descriptive descriptive statistics and the support more use micro or extension so we have this discussed the on the right hand side shows what the stuff that there is a user defined types which compresses the title on sparsity so you have been reduced since prospect in and we also extended pulse was already tight so you can just make a summation only a we just define additional prior function in that the good parts in the lab is we have a good amount of documentation online so you can just quoted above what might be the next and the the the the the condition of not being which is good because it's not only about how to use how to call the API but also we have a whole bunch of them might not think background music the theory so I can just go around here and understand the kind of ideas and the user then
there some of the is so with that I'm going to have money and use cases so hard what's what can you do actually I'm going to talk
about I'm going to talk about 2 types of masterminding the supervised learning supervised learning and unsupervised learning so I think people around here meaning that we know about the differences between 2 about and and and expand the 2 types of running the unsupervised is right in front of role better which means that you don't need in level on data is just that the cut eyes from the data that exist in your database and in contrast in contrast to unsupervised supervised learning is that if you have any historical data which are and then you just put it and you just you can just use predictive model on the basis of the and With that build you build the model you just build and you predict the new observation you just you can model new data with that model Safeco example would be the core of all and unsupervised learning is a consumer market segmentation study which and want to talk about the middle this can cross and then support was running impressed by the know this formal spam people like 0 a provision is sponsored and there are a lot of implementation for the sperm but if we use this realistic regression 1 I this if there is no there is not a yes or no and then we can use a decision tree for the multi multi level problem because
of market segmentation this
is a very is a constant thing and so customer segmentation studies if you have some kind of customer and then you can just run the k-means across the nation and the data is called mockery country like this so all you can say that when grouped as the height Broglio group at that time so customize age and funding for and about they tend to show shop at nodes form an important so that and the green groups is the name for the to customers who prices this group and the by consuls they tend to show up possible also OK so hard to do so we have a k-means clustering modeling mandate and after the so that the mean and the variance and prepare data so here we have an input points which causes the cost model and that the customer ID and some kind of a lot of you here we use just so that provide examples but this OK mean k-means function used to represent their attributes as a vector not not consul in new possible you need to classroom your attribute column 2 the rate here just so there are 8 18 it's like as appropriate in this area and then the data we looked like this and k-means so can mean is a traditional way to cross the the data so there are a bunch of these and so in and generalization of brochures which may find but you and explore quarter so you need to choose 1 of the good initialization approach so means is isn't just like income was copulation and you that they k-means do some groups are wasn't just calculated that descent toward the distance from centroid sent the growth of the center of the cluster from the it's still the point then just just crossed pollution want to do the result is a simple so after the initialization point is very important and stuff for their good quality results this is a kind of and try and their stuff and you just customized the method and in this class of some of it is very simple to choose 1 of the the the point of this and this is a initialization point which is a random is random and k-means practices of improve of that but the the the cost to initialize it's about a higher than 1 and you can just give the center on the set of the data set innovatory and also all they depending on your program into you can choose 1 of the distance metrics so for the right the choose application uses the 2 norm of again that you gradient descent so therefore for the special there that I think that the 2 norm is enough but for the thousands of this dimensional vectors all the documentation Prescott clustering however we need to choose cosine of times more potent Jaccard index and for the upcoming release you we make it probable so you just write the fuel cost and assessment function and put this into a chemical process can invest and then then now you can run the k-means function the next step will not be the the game process actually it is unable to contact read but this is just cut function and the the input is like a table name the column name all the idea was the and reach and assessment procedures and which now so maximum number of iterations the combatants payment about also not been doing and then we got the results so for this example . bonds is so this this 2 columns of the table column candidates all columns is that result of which characterizes the point is belongs to a as a result and each centroids represented in under the table so that you can just look up there that's the unsupervised
learning of k-means crust now is how to the you supervised learning and the use cases heart attack risk and analytics
the cross petition policies use of the expression variance across the region is identify which and the observation that answers with known observation so but very the classification process is pretty to 2 parts 1 which is training on which is gratification pointing is to build your model based on your level that and then vision processes is with this model you build crossfire the new observation and an example of a music for the biological not much crossed over a much crust decision process of this is that we have this decision is also you can use in naive Bayes so here and it's
been about a realistic position in fall out of the prediction corrected the potential risk of heart of the vessel initially patient data with the number number of other goods so for example and the efficient that may have the age causal during the height weight and think nest name again
you need the input data retrieval correlated with age but pressure causal height weight and the last column hotter there is that level so that know it's yes along so the historical data which the this this recall had a heart attack is recorded in the heart of the then transformed again transformed into an rate as a vectorized just so that a come gone from it and training the data to build a model so let's start from modern those function this is much the came in and then you need to specify which terrible isn't but which come to specify them as a feature of the attributes then there built model is selected so that the resulting adjusted record types so you that this isn't just expand as a as the role of features like age and is called has coffee coefficient near to 1 9 2 1 5 so standard is here here is this is just a example so I think the value of the sum of the Israeli Boris here but fact that this example shows that the brain pretty blood pressure is has a big impact on the heart attack there's on history of and then the classification so if you have a new data without level and then with you have known to build models of random logistic function with the dot product is your model in which the result is the sticker 7 . 8 8 you might say 0 there's 6 again this result is I think it's not body but you you so this further restrict functionalist accumulation this becomes minus 1 to 1 and if there is the a result of water is positive and the risk is positive and the of the viruses narrative and 0 then the risk so the the good of all linguistic regression is not just the sum of all the data it returned to the possibility that is on the history of so hard that put to the
employer had to install a magic now just uploaded
the model to the extent that could it right I mean that this is so there are no digits and cry and I at accurate as of today physics and grand can install valid because of physics and current use Apache License to as a political but also because the money for the news I some called Europe and make a book is a graph this is it's a grant of today I should say that there's only the Makefile but I hope that in a few days ago knowledge and the poor than you can just say it examines the badly then the phosphorus is not ready to through this final for the the other being the formation of
and we have to have standard you take the tools for green so I have to agree about how is providing a community rebound confusion which you can use without any money so you just downloaded rebound binary fission bomb refund site and then you just installed it in your Linux and there's no injection so you can run agree about the structure the new face of other the necessary in a distributed way and wrong you analytics with this form of
thank you already but yes yes so Monday there an open source project and we want you to contribute if you have any in size so new portal or to form a new top for the marginal and always good answer then that where the I took the oath to
at this yes all yet we're carefully designed the compatibility between group under the most of and processed so there's I think there is no gap feature just such that there that would use of polarized rebounded but the future is not different from each other so from union death you yeah questions appear so be so yeah the PL our is good for them and process that this effect but it's and actually it is a green bound we use here often appeal is still not colonized way so it's just a running so local and it runs in memory right and money runs like 0 so right the data to the business and get back to the population so if the data gets bigger and bigger evolve may run out of memory aid parallel critical question our coverage without having to do more yet impostors that had that had to yesterday yesterday and in the reprimanding we're talking about a how power of the prospects where things are not but I think there the the problem here is that it is different from the marginal power quality of where some of showing what kind of harmonization equals or addition some of the types of problems so for example if is it's only and you can just tolerance that there's been solved or if you have if you want to be part of the argument function then is a different problem so I'm not sure what kind of authorization is going to get into the prospects of yeah it if possible as the have the sun or processing then manager to consider how to part how to scale and that this and in the you have and enjoy the planet gets together for the names of the 0 for all rear how our forest there's on the decision tree it this 1 is not what I'm assuming call it you have the right to life there yeah solving some functions and creative or it could make with table convertible and there are some wrong but the output is likely to tell us all what results received right at the names of the approaches to the people in here so yes so we designed this this kind of problems we carefully to not have done is to add an expected result but you need to care so for for for example this and modules creates some future was but that that that is based on the input variables so just of some fixed the theory that he's not so visualization of gives a literally thing is that elements there so this form of it doesn't have any visualization tool but isn't really has some takes the base this isn't we visited and make expand on that so that that that that that had to do that but I can just out of the language is the language and what can work with like that on this also partition here is based on the group participation so that other languages we don't have any you to harmonization management of 1 of so what is it that the news is the icing as I say I think that's a good idea to present the use of pure proxy inside the mother function particular those of 1 . prosperous and yet we made however as the query the the that's school they do
Hoax
Prozess <Physik>
Punkt
Ortsoperator
Gruppenkeim
Zellularer Automat
Twitter <Softwareplattform>
Bildschirmfenster
Computeranimation
Datenhaltung
Gradient
Open Source
Knotenmenge
Modul <Datentyp>
Parallelrechner
Datennetz
Mehragentensystem
Maßerweiterung
Softwareentwickler
Auswahlaxiom
Synchronisierung
Prozess <Informatik>
Schreib-Lese-Kopf
Datenhaltung
Netzwerkbetriebssystem
Mailing-Liste
Physikalisches System
Frequenz
Datenfluss
Modul
Videokonferenz
Fensterfunktion
Funktion <Mathematik>
Menge
Benutzerschnittstellenverwaltungssystem
Mereologie
Dreiecksfreier Graph
Wort <Informatik>
GRASS <Programm>
Programmbibliothek
Innerer Punkt
VIC 20
Endlicher Graph
Total <Mathematik>
Hypermedia
Reelle Zahl
Computeranimation
Spezifisches Volumen
Resultante
Mittelwert
Prozess <Physik>
Natürliche Zahl
Familie <Mathematik>
Iteration
Fortsetzung <Mathematik>
Benutzerfreundlichkeit
Extrempunkt
Computeranimation
Formale Semantik
Homepage
Metropolitan area network
Client
Flächeninhalt
Gerade
Lineares Funktional
Datenhaltung
Abfrage
Prognostik
Teilmenge
Arithmetisches Mittel
Strukturierte Programmierung
Digitalsignal
Twitter <Softwareplattform>
Emulation
Datenstruktur
Verschlingung
Festspeicher
Fehlermeldung
Rückkopplung
Facebook
Identitätsverwaltung
Mathematische Logik
Datenhaltung
Physikalisches System
Iteration
Kugel
Parallelrechner
Verkehrsinformation
Datenstruktur
Analysis
Zwei
Statistische Analyse
Physikalisches System
Integral
Portscanner
Fensterfunktion
Mereologie
Mehrrechnersystem
Unternehmensarchitektur
Verkehrsinformation
Maschinelles Lernen
Analysis
Computeranimation
Formale Semantik
Ausdruck <Logik>
Datenhaltung
Informationsmodellierung
Skalierbarkeit
Determiniertheit <Informatik>
Stichprobenumfang
Statistische Analyse
Programmbibliothek
Algorithmische Lerntheorie
Softwareentwickler
Schnittstelle
Lineares Funktional
Prinzip der gleichmäßigen Beschränktheit
Seidel
Mathematisierung
Inferenzstatistik
Modul
Funktion <Mathematik>
Analytische Zahlentheorie
Mereologie
Kommensurabilität
Projektive Ebene
Programmbibliothek
Resultante
Offene Menge
Webforum
Momentenproblem
Natürliche Zahl
Skalierbarkeit
Gruppenkeim
Maschinelles Lernen
Computeranimation
Homepage
Datenhaltung
Open Source
Informationsmodellierung
Skalierbarkeit
Parallelrechner
Programmbibliothek
Statistische Analyse
Stützpunkt <Mathematik>
Softwareentwickler
Grundraum
Parametersystem
Addition
Lineares Funktional
URN
Open Source
Mathematisierung
Quellcode
Funktion <Mathematik>
COM
Analytische Zahlentheorie
Mereologie
Projektive Ebene
Bildschirmsymbol
Information
Programmbibliothek
Verkehrsinformation
Distributionstheorie
Offene Menge
Typentheorie
Prozess <Physik>
Euler-Winkel
Gruppenoperation
Skalierbarkeit
Fächer <Mathematik>
Gewichtete Summe
Computeranimation
Netzwerktopologie
Dedekind-Schnitt
Kugel
Software
Lineare Regression
Mapping <Computergraphik>
Logistische Verteilung
Softwareentwickler
Gravitationsgesetz
Große Vereinheitlichung
Phasenumwandlung
Lineares Funktional
Wald <Graphentheorie>
Dean-Zahl
Rohdaten
Modul
Entscheidungstheorie
Einheit <Mathematik>
Beweistheorie
Grundsätze ordnungsmäßiger Datenverarbeitung
Objekt <Kategorie>
Selbstrepräsentation
Fortsetzung <Mathematik>
Ungerichteter Graph
Abstraktionsebene
Computeranimation
Vektorrechner
Loop
Iteration
Standardabweichung
Lineare Regression
Datentyp
Speicherabzug
Operations Research
Lineares Funktional
Algorithmus
Architektur <Informatik>
Datenhaltung
Datenstruktur
Funktion <Mathematik>
Loop
Mereologie
Gamecontroller
Wort <Informatik>
Vektorrechner
Innerer Punkt
Normalspannung
Schätzwert
Randverteilung
Lineare Abbildung
Bit
Gewichtete Summe
VIC 20
Unüberwachtes Lernen
Content <Internet>
Gradient
Extrempunkt
Physikalische Theorie
Computeranimation
Überwachtes Lernen
Metropolitan area network
Schwach besetzte Matrix
Deskriptive Statistik
Informationsmodellierung
Puls <Technik>
Parallelrechner
Lineare Regression
Gruppe <Mathematik>
Datentyp
Zählen
Statistische Analyse
Maßerweiterung
Operations Research
Betriebsmittelverwaltung
Modul
Assoziativgesetz
Lineare Regression
Logarithmus
Güte der Anpassung
Schlussregel
Empirisches Quantil
Schwach besetzte Matrix
Sollkonzept
Modul
Schlussregel
Benutzerprofil
Assoziativgesetz
BAYES
Rechter Winkel
Konditionszahl
Mereologie
Unüberwachtes Lernen
Entscheidungsbaum
Vektorrechner
Matrizenzerlegung
Subtraktion
Logistische Verteilung
Unüberwachtes Lernen
Formale Grammatik
Implementierung
Maschinelles Lernen
E-Mail
Computeranimation
Überwachtes Lernen
Übergang
Netzwerktopologie
Multiplikation
Informationsmodellierung
Lineare Regression
Datentyp
Luenberger-Beobachter
Kontrast <Statistik>
Schnitt <Graphentheorie>
Beobachtungsstudie
Lineare Regression
Rohdaten
Datenhaltung
Objektklasse
Entscheidungstheorie
Basisvektor
Unüberwachtes Lernen
Kategorie <Mathematik>
Speicherabzug
Beobachtungsstudie
Resultante
Punkt
Prozess <Physik>
Extrempunkt
Natürliche Zahl
Gruppenkeim
Iteration
Kartesische Koordinaten
Oval
Extrempunkt
Ähnlichkeitsgeometrie
Computeranimation
Metropolitan area network
Vektorrechner
Trigonometrische Funktion
t-Test
Gradientenverfahren
Punkt
Lineares Funktional
Güte der Anpassung
Stichprobe
Softwareentwicklung
Bitrate
Ein-Ausgabe
Algorithmische Programmiersprache
Arithmetisches Mittel
Funktion <Mathematik>
Menge
COM
Einheit <Mathematik>
Rechter Winkel
Automatische Indexierung
Zahlenbereich
Ein-Ausgabe
Messprozess
Vektorrechner
Trigonometrische Funktion
Normalspannung
Tabelle <Informatik>
Lesen <Datenverarbeitung>
Kontrollstruktur
Klasse <Mathematik>
Zahlenbereich
Schwach besetzte Matrix
Open Source
Informationsmodellierung
Knotenmenge
Zufallszahlen
Spieltheorie
Schwellwertverfahren
Abstand
Bruchrechnung
Varianz
Normalvektor
Attributierte Grammatik
Physikalischer Effekt
Beobachtungsstudie
Tabelle <Informatik>
Linienelement
Linienelement
Winkel
Menge
Inverser Limes
Datensatz
Abstand
Flächeninhalt
Analytische Zahlentheorie
Räumliche Anordnung
Binäre Relation
Normalvektor
Bitrate
Beobachtungsstudie
Vektorpotenzial
Wellenpaket
Prozess <Physik>
Gewicht <Mathematik>
Ortsoperator
Logistische Verteilung
Regulärer Ausdruck
Zahlenbereich
Analysis
Computeranimation
Übergang
Entscheidungstheorie
Informationsmodellierung
Prognoseverfahren
Gewicht <Mathematik>
Gruppe <Mathematik>
Luenberger-Beobachter
Maschinelles Sehen
Varianz
Lineare Regression
Physikalischer Effekt
Güte der Anpassung
Datenmodell
Abelsche Kategorie
Prognostik
Vektorpotenzial
Entscheidungstheorie
Zahlenbereich
Mereologie
ICC-Gruppe
Attributierte Grammatik
Information Retrieval
Resultante
Gewichtete Summe
Gewicht <Mathematik>
Logistische Verteilung
Wasserdampftafel
Computeranimation
Übergang
Statechart
Informationsmodellierung
Datensatz
Ganze Zahl
Koeffizient
Gewicht <Mathematik>
Lineare Regression
Total <Mathematik>
Datentyp
Randomisierung
Logistische Verteilung
Biprodukt
Gasdruck
Attributierte Grammatik
Tabelle <Informatik>
Drucksondierung
Lineares Funktional
Lineare Regression
Strahlensätze
Physikalischer Effekt
Division
Güte der Anpassung
Gebäude <Mathematik>
Datenmodell
Ein-Ausgabe
Bitrate
Biprodukt
Druckverlauf
Funktion <Mathematik>
Array <Informatik>
Koeffizient
Schwimmkörper
ATM
Ein-Ausgabe
Vektorrechner
Programmbibliothek
Web Site
Analytische Zahlentheorie
Graph
Installation <Informatik>
Physikalismus
Graphiktablett
Strömungsrichtung
Maßerweiterung
Binärcode
Computeranimation
Informationsmodellierung
Digitalisierer
Analytische Zahlentheorie
Injektivität
Programmbibliothek
Maßerweiterung
Datenstruktur
Randverteilung
Resultante
Offene Menge
Harmonische Analyse
Proxy Server
Umsetzung <Informatik>
Subtraktion
Prozess <Physik>
Formale Sprache
Gruppenkeim
Element <Mathematik>
Physikalische Theorie
Computeranimation
Netzwerktopologie
Open Source
Bildschirmmaske
Variable
Datenmanagement
Datentyp
Visualisierung
Funktion <Mathematik>
Autorisierung
Soundverarbeitung
Lineares Funktional
Videospiel
Parametersystem
Addition
Wald <Graphentheorie>
Open Source
Systemaufruf
Abfrage
Ein-Ausgabe
Modul
Partitionsfunktion
Entscheidungstheorie
Rechter Winkel
Festspeicher
Parallelrechner
Mereologie
Projektive Ebene

Metadaten

Formale Metadaten

Titel MADlib
Untertitel An open source library for in-database analytics
Serientitel PGCon 2012
Anzahl der Teile 21
Autor Harada, Hitoshi
Mitwirkende Heroku (Provider)
Lizenz CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben
DOI 10.5446/19025
Herausgeber PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
Erscheinungsjahr 2012
Sprache Englisch
Produzent FOSSLC

Technische Metadaten

Dauer 42:11

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract An open source machine learning library on RDBMS for Big Data age MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data. The MADlib mission is to foster widespread development of scalable analytic skills, by harnessing efforts from commercial practice, academic research, and open-source development. The library consists of various analytics methods including linear regression, logistic regression, k-means clustering, decision tree, support vector machine and more. That's not all; there is also super-efficient user-defined data type for sparse vector with a number of arithmetic methods. It can be loaded and run in PostgreSQL 8.4 to 9.1 as well as Greenplum 4.0 to 4.2. This talk covers its concept overall with some introductions to the problems we are tackling and the solutions for them. It will also contain some topics around parallel data processing which is very hot in both of research and commercial area these days.

Ähnliche Filme

Loading...