We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

How I used pgvector and PostgreSQL® to find pictures of me at a party

Formal Metadata

Title
How I used pgvector and PostgreSQL® to find pictures of me at a party
Title of Series
Number of Parts
131
Author
Contributors
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Nowadays, if you attend an event you're bound to end up with a catalogue of photographs to look at. Formal events are likely to have a professional photographer, and modern smartphones mean that it's easy to make a photographic record of just about any gathering. It can be fun to look through the pictures, to find yourself or your friends and family, but it can also be tedious. At our company get-together earlier in the year, the photographers did indeed take a lot of pictures. Afterwards the best of them were put up on our internal network - and like many people, I combed through them looking for those in which I appeared (yes, for vanity, but also with some amusement). In this talk, I'll explain how to automate finding the photographs I'm in (or at least, mostly so). I'll walk through Python code that extracts faces using OpenCV, calculates vector embeddings using imgbeddings and OpenAI, and stores them in PostgreSQL® using pgvector. Given all of that, I can then make an SQL query to find which pictures I'm in. Python is a good fit for data pipelines like this, as it has good bindings to machine learning packages, and excellent support for talking to PostgreSQL. You may be wondering why that sequence ends with PostgreSQL (and SQL) rather than something more machine learning specific. I'll talk about that as well, and in particular about how PostgreSQL allows us to cope when the amount of data gets too large to be handled locally, and how useful it is to be able to relate the similarity calculations to other columns in the database - in our case, perhaps including the image metadata.