How I used pgvector and PostgreSQL® to find pictures of me at a party

Cite

Related Material

EuroPython

Ibbs, Tony (Tibs)

Formal Metadata

Title

How I used pgvector and PostgreSQL® to find pictures of me at a party

Title of Series

EuroPython 2024

Number of Parts

131

Author

Ibbs, Tony (Tibs)

Contributors

N. N. (Moderation)

License

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/69453 (DOI)

Publisher

EuroPython

Release Date

2024

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Nowadays, if you attend an event you're bound to end up with a catalogue of photographs to look at. Formal events are likely to have a professional photographer, and modern smartphones mean that it's easy to make a photographic record of just about any gathering. It can be fun to look through the pictures, to find yourself or your friends and family, but it can also be tedious. At our company get-together earlier in the year, the photographers did indeed take a lot of pictures. Afterwards the best of them were put up on our internal network - and like many people, I combed through them looking for those in which I appeared (yes, for vanity, but also with some amusement). In this talk, I'll explain how to automate finding the photographs I'm in (or at least, mostly so). I'll walk through Python code that extracts faces using OpenCV, calculates vector embeddings using imgbeddings and OpenAI, and stores them in PostgreSQL® using pgvector. Given all of that, I can then make an SQL query to find which pictures I'm in. Python is a good fit for data pipelines like this, as it has good bindings to machine learning packages, and excellent support for talking to PostgreSQL. You may be wondering why that sequence ends with PostgreSQL (and SQL) rather than something more machine learning specific. I'll talk about that as well, and in particular about how PostgreSQL allows us to cope when the amount of data gets too large to be handled locally, and how useful it is to be able to relate the similarity calculations to other columns in the database - in our case, perhaps including the image metadata.