MADlib

Cite

PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross

Harada, Hitoshi

Formal Metadata

Title

MADlib

Subtitle

An open source library for in-database analytics

Title of Series

PGCon 2012

Number of Parts

Author

Harada, Hitoshi

Contributors

Heroku (Provider)

License

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/19025 (DOI)

Publisher

PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross

Release Date

2012

Language

English

Producer

FOSSLC

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

An open source machine learning library on RDBMS for Big Data age MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data. The MADlib mission is to foster widespread development of scalable analytic skills, by harnessing efforts from commercial practice, academic research, and open-source development. The library consists of various analytics methods including linear regression, logistic regression, k-means clustering, decision tree, support vector machine and more. That's not all; there is also super-efficient user-defined data type for sparse vector with a number of arithmetic methods. It can be loaded and run in PostgreSQL 8.4 to 9.1 as well as Greenplum 4.0 to 4.2. This talk covers its concept overall with some introductions to the problems we are tackling and the solutions for them. It will also contain some topics around parallel data processing which is very hot in both of research and commercial area these days.