MADlib

Zitieren

PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross

Harada, Hitoshi

Formale Metadaten

Titel

MADlib

Untertitel

An open source library for in-database analytics

Serientitel

PGCon 2012

Anzahl der Teile

Autor

Harada, Hitoshi

Mitwirkende

Heroku (Provider)

Lizenz

CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben

Identifikatoren

10.5446/19025 (DOI)

Herausgeber

PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross

Erscheinungsjahr

2012

Sprache

Englisch

Produzent

FOSSLC

Inhaltliche Metadaten

Fachgebiet

Informatik

Genre

Konferenz/Talk

Abstract

An open source machine learning library on RDBMS for Big Data age MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data. The MADlib mission is to foster widespread development of scalable analytic skills, by harnessing efforts from commercial practice, academic research, and open-source development. The library consists of various analytics methods including linear regression, logistic regression, k-means clustering, decision tree, support vector machine and more. That's not all; there is also super-efficient user-defined data type for sparse vector with a number of arithmetic methods. It can be loaded and run in PostgreSQL 8.4 to 9.1 as well as Greenplum 4.0 to 4.2. This talk covers its concept overall with some introductions to the problems we are tackling and the solutions for them. It will also contain some topics around parallel data processing which is very hot in both of research and commercial area these days.