Show filters Hide filters

Refine your search

Publication Year
1-31 out of 31 results
Change view
  • Sort by:
47:13 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

Gaia Variability Studies: case for Postgres-XC.

Mapping a billion stars throughout our Galaxy and beyond Gaia is a cornerstone European Space Agency mission that will create an extraordinarily precise three-dimensional map of more than a billion stars throughout our Galaxy and beyond, mapping their motions, luminosity, temperature and composition. This huge stellar census will provide the data needed to tackle an enormous range of important problems related to the origin, structure and evolutionary history of our Galaxy. I will describe how Variability Studies of light sources observed by Gaia pose Big Data problem that we are trying to solve with the help of Postgres-XC.
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
50:30 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

9.4 On The Floor

How 9.4 will shake your world Join us for this first look at new and upcoming features in the 9.4 release. We'll discuss some of the what and why of the new features, and help you start planning your upgrades. The Postgres development should be mostly through feature freeze by the time PGCon rolls around, so this talk should be able to give you a good idea of what will be in the next release of Postgres. Improvements in JSON, replication, updateable views, materialized views, and more. Hope to see you there!
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
49:17 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

In-Memory Columnar Store for PostgreSQL

IMCS is In-Memory Columnar Store for PostgreSQL. Vertical data model is more efficient for analytic queries performing operations on entire column. IMCS provides 10-100 times improvement in performance comparing with standard SQL queries because of: data skipping: fetching only data needed for query execution parallel execution: using multiple threads to execute query vector operations: minimizing interpretation overhead and allowing SIMD instructions reduced locking overhead: simple array level locking no disk IO: all data is in memory IMCS is implemented as standard PostgreSQL extension. It provides set of function and operators for manipulations with timeseries. Some of them are analog of standard SQL operators (arithmetic, comparisons, sorting, aggregation...). But there are also complex analytic operators like calculation of ranks, percentiles, cross points and extended set of aggregates for financial application like split-adjusted price, volume-weighted average price, moving average... Columnar store manager stores data tables as sections of columns of data rather than as rows of data.In comparison, most relational DBMSs store data in rows. Such approach allows to load the whole record using one read operation which leads to better performance for OLTP queries. But OLAP queries are mostly performing operations on entire columns, for example calculating sum or average of some column. In this case vertical data representation is more efficient. Columnar store or vertical representation of data allows to achieve better performance in comparison with horizontal representation due to three factors: * Data skipping. Only columns involved in query are accessed. * Vector operations. Applying an operator to set of values minimize interpretation cost. Also SIMD instructions of modern processors accelerate execution of vector operations. Compression of data. For example such simple compression algorithm like RLE allows not only to reduce used space, but also minimize number of performed operations. IMCS is first of all oriented on work with timeseries. Timeseries is sequence of usually small fixed size elements ordered by some timestamp. Operations with timeseries rarely access some particular timeseries element, instead of it them operate either with whole timeseries either with some time interval. Such specific of timeseries operation requires special index for timeseries, which is different from traditional database indexes. Such index should not provide efficient way of locating arbitrary timeseries element. Instead of it this index should be able to efficiently extract range of timeseries elements. Advantages of IMCS approach: Fast execution based on vector operations Parallel execution of query No changes in PostgreSQL core (just standard extension) No MVCC overhead (MURSIW isolation level) No disk IO (in-memory store) Optimized for timeseries (massive operations with time slices)
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
55:23 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

HowTo DR planning for the worst

planning for the worst There's a lot more to disaster recovery than making backups. Most of DR, in fact, is planning instead of code: knowing what you need to do when disaster strikes, how to do it, and who does it. Further complicating things, management and admins are fond of preparing for unlikely events while failing to prepare for probable outages at all. There's a lot more to disaster recovery than making backups. Most of DR, in fact, is planning instead of code: knowing what you need to do when disaster strikes, how to do it, and who does it. Further complicating things, management and admins are fond of preparing for unlikely events while failing to prepare for probable outages at all. This talk will outline how to make a disaster recovery plan, and some basic dos and don'ts of DR. Included: The three most common downtime causes Determining acceptable losses (and getting management to agree) Backup vs. Replication Planning for the unexpected Against Improvising (always have a Plan B) Public Cloud DR Other Dos and Don'ts When disaster strikes, it's too late to start planning. Do it now.
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
38:59 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

SQLite: Protégé of PostgreSQL

The Keynote for PGCon 2014 by D. Richard Hipp SQLite has become the most widely deployed SQL database engine in the world, with over two billion installations providing SQL database services for cellphones, web-browsers, cameras, and countless other gadgets and applications. But SQLite is not in competition with PostgreSQL. Rather, PostgreSQL and SQLite complement each other, with each engine targeting a different class of problems. SQLite can be thought of as a derivative of PostgreSQL. SQLite was originally written from PostgreSQL 6.5 documentation, and the SQLite developers still use PostgreSQL as a reference platform to verify that SQLite is working correctly.
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
49:02 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

Balancing Database Storage Latency And Throughput

by Greg Smith The easiest type of benchmark to run on a database is checking its throughput: the total transactions processed during some period of time. But throughput isn't what application users care about. They want low latency. Latency and throughput have a complex dependency on each other, and you'll need a whole new type of test to balance them well. Recent advances in benchmarking tools, like the rate limit in pgbench 9.4, make it possible to analyze latency in a way that highlights this difficult to see area. The graphics and reports of the pgbench-tools package make it possible to see how tuning changes impact both latency and throughput. That that lets you test the "lore" for how best to tune your PostgreSQL server to find out what really works. Using these new tools, this talk will look at the tricky correlation between throughput and latency and how to measure each usefully in common situations. We'll look at three different storage stacks with very different latency profiles: regular disk, disk with battery-backed write cache, and SSD. We'll then look at exactly how checkpoint spikes play out on each and what you can do about them. You'll never trust a simple transactions per second count again!
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
41:55 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

I know greater-than-or-equal-to when I see it!

A tour of the operator class facility A PostgreSQL operator class tells the database server how to sort, match, and index values of a particular data type, and an operator family ties related data types together for the purpose of these operations. They constitute agreements between data type authors and data type-independent code, with most data types defining at least one operator class. This talk will explore the theory behind operator classes and families, including the assumptions data type-independent code is and is not entitled to make based on available operator classes. We will walk through the creation of new operator classes, some practical and others deliberately perverse, and examine some exceptional operator classes already present in PostgreSQL. Writing a query expression such as "val greater than = 1" without knowing the data type of "val" is almost always a mistake. It will work for the vast majority of data types, but it will malfunction in rare cases. We will examine the use of operator classes to adapt such an expression to each data type. This talk caters to writers of code intended to deal with unforeseen data types. Application code such as replication systems and client interfaces are affected, as are backend features such as integrity constraints and join algorithms. As an author of such software, you will leave the talk equipped to audit for mistakes and substitute code that harnesses operator classes and families to process values of arbitrary type. The talk will also help prospective authors of new data types.
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
45:55 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

When PostgreSQL Can't, You Can

After using PostgreSQL for a while, you realize that there are missing features that would make it significantly easier to use in large production environments. Thankfully, it's extremely easy to make add-ons to enable some of those features right now without even knowing a single line of C! After using PostgreSQL for a while, you realize that there are missing features that would make it significantly easier to use in large production environments. Thankfully, it's extremely easy to make add-ons to enable some of those features right now. And you don't even have to know a single line of C code! Over the past view years I've been developing tools that have made PostgreSQL administration for our clients much easier. Table partitioning is one of the best methods for providing query performance improvements on large tables in PostgreSQL. While the documentation gives instructions on how this can be set up, it's still a very manual process. For time/serial based partitioning, the maintenance of creating new partitions is huge part of that process. PG Partition Manager is an a extension that aims to provide easy setup & maintenance for the most common partitioning types. It also provides advanced features for taking advantage of constraint exclusion and retention to drop unneeded tables. Another area where we ran into limitations was when one of our clients needed an easy method to keep the database schema checked into version control. The pg extractor tool grew from this and has become a popular method of finely tuning the extraction of database objects from PostgreSQL into individual files. Autonomous transactions is another one of the more popular features in other RDMS systems that is missing from PostgreSQL. Begin able to reliably log a function's run status requires that function failure not roll back the entire transaction and erase entries made to the log table. PG Job Monitor takes advantage the dblink contrib module to give the behavior of an autonomous transaction, providing logging & monitoring for mission critical functions. PostgreSQL's built in replication is all or nothing, but sometimes you just need to replicate specific pieces of your database. There are several 3rd-party systems capable of accomplishing this, but their setup can been daunting to most new users and overkill if you just need to grab a few tables. Mimeo is a logical replication tool that requires minimal permissions and setup and provides several specialized means of replicating from one or many databases to a single destination. This talk will discuss these project and hopefully bring insight on how easy it is to contribute to the PostgreSQL community.
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
50:53 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

Static analysis, test coverage, and other 12+ letter words

It is hard to maintain complex C code free of bugs. Numerous tools have been developed to help, and we have thrown many of them at the PostgreSQL source code. We will take a look at tools such as Coverity Clang scan-build AddressSanitizer Valgrind Gcov How do they work? How should their results be interpreted? What have they and have they not delivered for us in practice? How can we integrate them into our workflow going forward? How can we structure our code better to work with tools like these?
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
52:42 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

Why UPSERT is weird

Counterintuitive lessons learned from the implementation effort Talk that examines implementation process on the INSERT...ON DUPLICATE KEY LOCK FOR UPDATE feature proposed for PostgreSQL. "UPSERT" can loosely be described as a DBMS feature that allows a DML statement's author to atomically either insert a row, or on the basis of the row already existing, update that existing row instead, while safely giving little to no further thought to concurrency. One of those two outcomes must be guaranteed, regardless of concurrent activity, which is the essential property of UPSERT. Examples include MySQL's INSERT...ON DUPLICATE KEY UPDATE, or the SQL standard's MERGE statement (as implemented in a number of proprietary systems). The absence of this feature from Postgres has been a long-standing complaint from Postgres users, and an effort to implement an extension to the Postgres INSERT statement, that implements something broadly in line with user expectations in this area was undertaken in 2013. This talk considers the experience of working on that implementation, and examines how the proposed INSERT...ON DUPLICATE KEY LOCK FOR UPDATE feature deals with concurrency issues, visibility issues, the interplay of value locking and row locking, and the general avoidance of unprincipled deadlocking. In order to come up with an implementation that satisfied user expectations, while still preserving and respecting long standing abstractions and conceptual precepts, some interesting and counterintuitive choices were made to resolve the fundamental tension in tying value locking (as always required on a limited scale for unique index enforcement) to row locking (to LOCK FOR UPDATE). Finally, the talk examines the strategic significance of UPSERT, and further use-cases enabled by the feature in the area of multi-master replication conflict resolution.
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
53:00 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

XML, HSTORE, JSON, JSONB - OH MY!

Use the unstructured data type that's right for you. PostgreSQL 9.4 adds a new unstructured data type, JSONB, a variant of JSON optimized for storing objects. JSONB complements the existing XML and JSON document objects, as well as HSTORE. Which one is right for you? We'll take a tour of the features of each, and their advantages and disadvantages for various use cases, all illustrated with real-world examples. There has been a lot of work on the representation of unstructured data in PostgreSQL, culminating in the addition of the JSONB type in the forthcoming 9.4 release. JSONB complements the existing HSTORE, XML, and JSON types, not to mention arrays. With so many options, which do you use? As usual it depends on your use case. In this presentation, we'll review the unstructured data types in PostgreSQL, and look at their advantages and disadvantages for: Document storage Configuration management A "schemaless database" Object serialization Entity/Attribute/Value models Path queries
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
48:57 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

The Next Five Years for PostgreSQL Advocacy

or "How We Can Overtake Every Other Database" According to Jeff Barr, the Chief Evangelist for Amazon Web Services, "Over the past few years, PostgreSQL has become the preferred open source relational database for many enterprise developers and start-ups, powering leading geospatial and mobile applications." Through the hard work from the major contributors of our community and certain market events in the data world, it is indeed true that many new projects, both open and closed source, are defaulting to PostgreSQL as their data storage choice. So what is preventing PostgreSQL from becoming the default database for everyone? We will explore the PostgreSQL Advocacy efforts to date which has brought us to this crossroads, and how through a combination of our ongoing product development and a bit of community marketing prowess, we as a community can make "PostgreSQL" as household technology name over the next five years. According to Jeff Barr, the Chief Evangelist for Amazon Web Services, "Over the past few years, PostgreSQL has become the preferred open source relational database for many enterprise developers and start-ups, powering leading geospatial and mobile applications." Through the hard work from the major contributors of our community and certain market events in the data world, it is indeed true that many new projects, both open and closed source, are defaulting to PostgreSQL as their data storage choice. We know that PostgreSQL is an amazing product, and those who know the realm of technology agree. So what is preventing PostgreSQL from becoming the default database for everyone? In order to solve this problem, we will first explore the PostgreSQL advocacy efforts to date. We will take a look at what features have helped propel PostgreSQL into its "newfound" limelight and what the community has done to create a "buzz" around PostgreSQL to capture more users. We will then arrive at the present day crossroads, and look at different strategies to more effectively market ourselves as a product and a community, including how to: Actively make more people aware of PostgreSQL and what it can offer Make it easier for new people to adopt PostgreSQL Determine the pain points of our current users and figure out how to fix them Measure the success of our advocacy efforts Through these combined efforts on the development and advocacy fronts, we as a community can make PostgreSQL as household technology name over the next five years.
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
45:43 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

SELECT * FROM changes; - Part 2

9.4 saw a great deal of development around a feature (now) called changeset extraction. This talk will explain what the feature does, which areas of use we see, what the state of the feature in 9.4 is and which additional features around it we want to see in future releases of postgres. Usecases for the changeset extraction feature are: Replication Solutions Auditing Cache Invalidation Federation ... Changeset extraction is the ability to extract a consistent stream of changes in the order they happened - which is very useful for replication, auditing among other things. But since that's a fairly abstract explanation, how about a short example? -- create a new changestream postgres=# SELECT * FROM create decoding replication slot('slot', 'test decoding'); slotname | xlog position ----------+--------------- slot | 0/477D2398 (1 row) -- perform some DML postgres=# INSERT INTO replication example(data) VALUES('somedata'); INSERT 0 1 -- and now, display all the changes postgres=# SELECT * FROM decoding slot get changes('slot', 'now', 'include-timestamp', 'yes'); location | xid | data ------------+---------+--------------------------------------------------------------------- 0/477D2510 | 1484040 | BEGIN 1484040 0/477D2628 | 1484040 | table "replication example": INSERT: id[int4]:1 data[text]:somedata 0/477D2628 | 1484040 | COMMIT 1484040 (at 2014-01-20 01:18:49.901553+01) (3 rows) All this works with a low overhead and a configurable output format.
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
44:01 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

Postgres in Amazon RDS

A compelling option for startups In recent years, Amazon's cloud service is key for deploying and scaling IT infrastructure tiers, and this is not different with Postgres as well. Amazon's relational database service for Postgres RDS - is a new but compelling option for the startups, when the resources are scarce and administrators are hard to find! RDS is a great option for people to achieve in terms of rapid deployment of replicas, ease of failover, and the ability to easily redeploy hosts when failures occur, rather than spending extensive time trying to repair. In the talk, I'm planning to topics related to managing Postgres RDS : Introduction of RDS Benefits of using RDS Challenges with RDS Choosing EC2 vs RDS Security Tuning Scaling Monitoring Backups Cost Generic Recommendations
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
48:47 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

PostgreSQL 9.4 and Json

We'll cover the new Json generation functions, as well as jsonb, and when to use json, jsonb or hstore. We'll end by looking into the future to see what other json functionality might be coming, such as json mutation functions. The talk will present a survey of the current and coming state of json features, as well as a speculative look at what which be in the future beyond 9.4 Overview: * the new json generation functions, such as jsonbuildobject and jsonbuildarray and how they can be used to build up arbitrarily complex and non-rectangular json. * json typeof() * jsonb, and its relation to json and nested hstore. * how jsonb is different from json, (e.g. object field ordering, treatment of duplicate field names and whitespace) * performance differences between json and jsonb * what's missing from json/jsonb? What should we work on for 9.5? Mutation functions have been suggested, Are there others?
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
34:34 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

PGCon 2014: Lightning talks

Various speakers & subjects. Database Appliance based on PostgreSQL OHAI, My name is Chelnik! BDR - Bi Directional Replication What the XL?
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
40:19 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

Scaling a Cloud Based Analytics Engine

Scaling a 100% Cloud Native Analytics Engine Your mission, should you chose to accept it, create a data storage system that can handle 200 Gigs of data per day on cloud servers with heavy analytics. GO! The architecture plan of a real time logging system built to handle 200g/day of data and hand it off from mid-term OLTP storage into and OLAP postgres data warehouse. It was built with heavy reliance on inheritance, dblink, and streaming replication.
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
38:03 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

A Tale of Two Oracle Migrations

In this presentation I'll talk about two migrations from Oracle to PostgreSQL - the first back in 2001, which is how I got into PostgreSQL in the first place, and the second in 2013 involving the second largest stock exchange in Japan. One of the hotter PostgreSQL topics recently is the increasing trend towards enterprise-scale migrations from Oracle and other commercial databases to PostgreSQL - not only for cost savings but also for its rich and expanding feature set combined with a solid reputation for reliability. But it's hardly a new trend - that's how I first came across PostgreSQL back in 2001. I'll give a brief and hopefully humourous description of what I did back then (with some pictures of cute animals) and how it led to the Japanese stock market. Which is where I ended up in charge of the database underpinning the back office of Japan's second largest stock exchange, SBI Japannext. Since late last summer this has been powered by PostgreSQL and the changeover from Oracle has gone very smoothly. The main part of this presentation will be about how this migration was planned and implemented, and the foundations it has created for further adoption of PostgreSQL.
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
25:11 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

PostgreSQL experience in Russian Parliament

In 2010 Russian State Duma (lower chamber of parliament) renewed its web site. That was the first step in modernization towards open data and e-goverment in further years. PostgreSQL took a central place in data integration between intranet systems and open web services. During recent years many new open data systems were launched powered by PostgreSQL. This talk covers the issues of PostgreSQL integration into State Duma intranet systems, which were mostly running Oracle Database as well as implementation of many open data systems. Data between intranet systems and PostgreSQL databases are mostly synchronized using ora2pg. Full synchronization was used for small databases as well as incremental synchronization for large databases. Open data systems implemented on PostgreSQL including: Law and draft law search systems Open vote system Verbatim system Using PostgreSQL database allows to save license const as well as achieve great performance and features.
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
27:16 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

PgREST

PgREST turns PostgreSQL into a RESTful JSON document store, allows running JavaScript and npm modules as stored procedures for queries and triggers from within the database. It also provides Firebase-compatible API for developing real-time applications. PgREST is... a JSON document store running inside PostgreSQL working with existing relational data capable of loading Node.js modules compatible with MongoLab's REST API and Firebase's real-time API This talk will cover: the building blocks of PgREST: PostgreSQL, plv8js, plv8x examples for turning existing relational data into REST endpoints building real-time applications with PgREST
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
29:48 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

An Adventure in Data Modeling

A case study on the trials of Emma's performance when implementing the Entity-Attribute-Value data model on their PostgreSQL database systems. Emma, Inc. is an email marketing company that provides a Web based application for managing email lists and marketing campaigns. This is a tale about Emma's experience with a database schema design refactor that is now using the Entity-Attribute-Value (EAV) data model on the members in email lists. We will briefly describe the previous data model and why we had to move to a new one. This is followed by all the places where Emma stumbled and recovered in implementing the EAV data model and rebuilding the application around it. Finally we will touch on where Emma might go next with respect to the data.
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
52:24 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

Postgis in the Open Cloud

Instant Mapping Applications with PostGIS and Nodejs Learn how to write PostgreSQL-backed applications that can be automatically cloned, built, monitored, and scaled in a matter of minutes. We'll walk you through all of the steps involved in developing and scaling your own Node.js-powered PostGIS mapping application on the OpenShift cloud hosting platform. By the end of this talk, you should be able to instantly deploy your own PostgreSQL-backed network applications to your own public or private cloud in a single command. This talk includes a brief architectural overview of OpenShift, a freely-available Open Source hosting platform. Folks who want to follow along should bring their own laptop with git, rubygems, and nodejs installed & available.
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
56:23 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

Implementing Parallelism in PostgreSQL

Where We Are Today, and What's On The Horizon PostgreSQL's architecture is based heavily on the idea that each connection is served by a single backend process, but CPU core counts are rising much faster than CPU speeds, and large data sets can't be efficiently processed serially. Adding parallelism to PostgreSQL requires significant architectural changes to many areas of the system, including background workers, shared memory, memory allocation, locking, GUC, transactions, snapshots, and more. In this talk, I'll give an overview of the changes made to background workers in PostgreSQL 9.4 and the new dynamic shared memory facility, which I believe will form the foundations of parallelism in PostgreSQL, and discuss some lessons I learned while implementing these features. I'll also discuss what I believe is needed next: easy allocation of dynamic shared memory, state sharing between multiple backends, lock manager improvements, and parallel algorithms; and highlight what I believe to be the key challenges in each area.
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
46:15 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

Explaining the Postgres Query Optimizer

The optimizer is the "brain" of the database, interpreting SQL queries and determining the fastest method of execution. This talk uses the EXPLAIN command to show how the optimizer interprets queries and determines optimal execution. Examples include scan methods, index selection, join types, and how ANALYZE statistics influence their selection. The talk will assist developers and administrators in understanding how Postgres optimally executes their queries and what steps they can take to understand and perhaps improve its behavior.
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
48:16 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

Schema-less data in PostgreSQL

An efficient indexing of nested structures We present a prototype of new access method, heavily based on GIN and optimized for efficient indexing of nested structures like hstore and json(b). Introducing of the nested hstore and jsonb in PostgreSQL brought new challenge to the developers, namely, an efficient indexing of hierarchical keys. Those keys are consist of duplicated strings, which made index to be uselessly huge if store key-value pairs independently. We propose to replace btree data structure, which used in GIN to index keys, by digital tree. To do this in 'right way', we would like to experiment with hybrid access method based on of SP-GiST and GIN. This is a first step in making GIN more flexible to support richer set of queries. In principle, one could be able to use other than btree data structure to index not just keys, but also the posting lists.
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
50:23 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

GIN in 9.4 and further

This talk presents set of GIN advances in PostgreSQL 9.4 and further which brings GIN to new level of performance and extendability. Most important advances are: posting lists compression, fast-scan algorithm, storing additional information and index-based ranking. This talk presents set of GIN advances: Compression posting lists. Indexes become 2 times smaller without any work in opclass. pg upgrade is supported, old indexes will be recompressed on the fly. Fast scan algorithm. Fast scan allows GIN to skip parts of large posting trees during index scan. It dramatically improve performance of hstore and json search operators as well as FTS "frequentterm & rareterm" case. In order to use this improvement three-state logic support required in "consistent" opclass method. Storing additional (opclass defined) information in posting lists. Usage of additional information for filtering enables new features for GIN opclasses: better phrase search, better array similarity search, inverse FTS search (search for tsqueries matching tsvector), inverse regex search (search for regexes matching string), better string similarity using positioned n-grams. Index based ranking. This improvement allows GIN to return results in opclass defined manner. Most important application is returning results in relevance order for FTS which dramatically reduces IO load. But there are other applications like returns arrays in similarity order. We present the results of benchmarks for FTS using several datasets (6 M and 15 M documents) and real-life load for PostgreSQL and Sphinx full-text search engines and demonstrate that improved PostgreSQL FTS (with all ACID overhead) outperforms the standalone Sphinx search engine.
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
53:03 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

Postgres on the wire

While it usually Just Works, sometimes it's useful to know exactly what happens between hitting return in psql and seeing results appear on the screen. This talk will explain how the PostgreSQL frontend/backend protocol works. We will look at the low-level blocks from which the protocol is built and try to give the audience a solid understanding of how bytes are transferred back and forth between their applications and the Postgres server. Even if you never directly deal with the PostgreSQL protocol, every application you're writing or maintaining uses it and it's still useful to know the basics of how it works. Sometimes, knowing the fundamentals can also help you understand some behaviour that otherwise would seems mysterious or quirky. The talk is aimed at users, DBAs and system administrators interested in learning a bit about how the sausage gets sent. We'll go through: protocol versions and how they differ differences between simple and extended protocol text and binary query formats authentication and encryption asynchronous features: async queries and NOTICE the COPY protocol query cancellation and how does it work future development of the protocol
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
41:00 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

The Worst Day of Your Life

Recovering from Crises and Disasters What do you do when the worst happens? It could be a catastrophic hardware (or even data center) failure, a badly placed rm -rf *, or a PostgreSQL bug. We'll discuss how to recover from disasters that are far outside the usual operating procedure... and how to avoid getting into them in the first place. Every DBA with real-life experience knows that sinking feeling when you realize that something terrible has happened: PostgreSQL crashes with a PANIC message, you realize you were on the production system when you dropped that table, or you get a status update that "us-east is currently experiencing problems." What do you do? There's no single solution to catastrophic problems, but we can talk about strategies that might help you keep a cool head while everything around you is losing theirs. We'll talk about things like: Dealing with PostgreSQL bugs. Catastrophic hardware failures. Application and operator error. And, of course, we'll discuss what you need to do in advance to make the Worst Day of Your Life a little bit less traumatic: Backup and recovery strategies and tradeoffs. Upgrade procedures. Planning for business continuity in major disasters.
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
46:13 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

Mobile Canvassing with PostgreSQL

At Moonshadow Mobile, we visualize geospatial data in a custom geospatial engine and analysis framework inside a browser. This is coupled with Ground Game, a mobile canvassing solution that lets managers and canvassers easily mount a campaign. At the heart of this solution we need a featureful database with a high level of data integrity to quickly store and retrieve customer data, all of which PostgreSQL provides. Our Ground Game mobile canvassing solution is available for iDevices, Android devices, and any device with a good mobile web browser.Campaign managers assign walking lists and questionnaires to canvassers in the field from our web application, who receive the data on their mobile devices and fill out questionnaires while going door-to-door. Data can be collected even if no connection is available and the data is synced as soon as a wifi or mobile data is available. On a typical day hundreds of canvassers around the country upload thousands of questionnaires. Our base data consists of the voter files that are collected from states and counties, augmented with psychographic information from commercial sources. At any time voter files,phone numbers,addresses,and emails are updated. Clients augment our data with their own information about donations and membership information, along with mobile canvassing. All these data streams need to be processed without causing any downtime for the campaign managers and the canvassers in the field.PostgreSQL plays a very important role in our technology stack. All canvassing data,walking lists,canvassing users,assignments,and questionnaires are stored in PostgreSQL. We also use PostGIS for the creation of the data and the assignment of voters to district zones.In this talk, I will give an overview of our architecture, and focus on how PostgreSQL has helped us deliver our product. These reasons include custom features of PostgreSQL such as bytea,json,and custom types, addon modules such as crypto and ltree,and custom functions in languages such as plperlu,along with WAL replication,hot backups,all of which give us security,peace of mind,and the ability to more rapidly develop features. Finally, we look to the future and eagerly await integrating new and additional PostgreSQL features into our workflow.
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
54:15 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

PostgreSQL, Rust, and Range Types

Combining database features with language features Range Types, introduced in PostgreSQL 9.2, are an important feature for scheduling and other applications. Range Types allow much more efficient queries on ranges (e.g. ranges of time) such as "overlaps" and "contains", are less error-prone, and offer the critical "non-overlapping" constraint (useful for preventing schedule conflicts). But to make use of any advanced database features, good support in the client language is crucial. Many people use libraries to translate between database types and types in the client language -- not only is that convenient, but it contributes to safety and reduces the chance for simple errors. Rust is a powerful new language with a lot to offer, including a postgres driver that supports Range Types as first-class types. I'll be discussing how rust libraries help bridge the gap between advanced database features and practical application development. The speed of development in PostgreSQL can often outpace the ecosystem surrounding it. Performance-boosting features are used as quickly as administrators upgrade, but extensions to the SQL language take more time. Without support in the client language, SQL language features are only used by early-adopters who are willing to put up with the rough edges. As the primary developer for Range Types, I feel that their utility would be greatly increased with greater accessibility for simple applications. The problem they solve - largely the problem of scheduling - is prevalent in such applications. Rust makes a great example of how to better integrate data types into the client language. It's got Option types, which are great for handling edge cases and special values (like an empty or unbounded range); it's fast enough that the driver can be written in rust, which avoids the need for a dependency on libpq-dev and enables a little more creativity; and it's got a great community. In particular, I'd like to credit Steven Fackler who wrote the native rust-postgres driver, as well as the Range Types support! This presentation does not assume any prior knowledge of the rust language
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
39:42 PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross English 2014

Multicorn: writing FDWs in Python

A tutorial showing off Multicorn's features Multicorn is a generic Foreign Data Wrapper which goal is to simplify development of FDWs by writing them in Python. We will see: what is an FDW what Multicorn is trying to solve how to use it, with a brief tour of the FDWs shipping with Multicorn. how to write your own FDW in python, including the new 9.3 write API the internals: what Multicorn is doing for you behind the scenes, and what it doesn't After a presentation of FDWs in general, and what the Multicorn extension really is, we will take a look at some of the FDWs bundled with Multicorn. Then, a complete tour of the Multicorn API will teach you how to write a FDW in python, including the following features: using the table definition WHERE clauses push-down output columns restrictions influencing the planner writing to a foreign table transaction management This will be a hands-on explanation, with code snippets allowing you to build your own FDW in python from scratch.
  • Published: 2014
  • Publisher: PGCon - PostgreSQL Conference for Users and Developers, Andrea Ross
  • Language: English
out of 1 pages
Loading...
Feedback

Timings

  208 ms - page object
  142 ms - search
    7 ms - highlighting
    0 ms - highlighting/19073
    1 ms - highlighting/19088
    4 ms - highlighting/19098
    1 ms - highlighting/19072
    2 ms - highlighting/19084
    3 ms - highlighting/19080
    1 ms - highlighting/19094
    1 ms - highlighting/19091
    1 ms - highlighting/19087
    1 ms - highlighting/19102
    3 ms - highlighting/19085
    2 ms - highlighting/19090
    2 ms - highlighting/19086
    3 ms - highlighting/19089
    3 ms - highlighting/19101
    3 ms - highlighting/19099
    1 ms - highlighting/19097
    2 ms - highlighting/19096
    1 ms - highlighting/19100
    1 ms - highlighting/19093
    3 ms - highlighting/19083
    2 ms - highlighting/19075
    2 ms - highlighting/19074
    2 ms - highlighting/19078
    3 ms - highlighting/19076
    1 ms - highlighting/19092
    3 ms - highlighting/19079
    2 ms - highlighting/19095
    2 ms - highlighting/19071
    2 ms - highlighting/19077
    2 ms - highlighting/19081

Version

AV-Portal 3.7.0 (943df4b4639bec127ddc6b93adb0c7d8d995f77c)