Ingesting 35 million hotel images with python in the cloud.

EuroPython

Vinyals, Alex

Formal Metadata

Title

Title of Series

EuroPython 2016

Part Number

138

Number of Parts

169

Author

Vinyals, Alex

License

CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/21086 (DOI)

Publisher

EuroPython

Release Date

2016

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Alex Vinyals - Ingesting 35 million hotel images with python in the cloud. This talk covers the distributed architecture that Skyscanner built to solve the data challenges involved in the generation of images of all hotels in the world. Putting together a distributed system in Python, based on queues, surfing on the AWS Cloud. ----- Our goal? To build an incremental image processing pipeline that discards poor quality and duplicated images, scaling the final images to several sizes to optimise for mobile devices. Among the challenges: 1. Ingest all the input images that partners provide us. 2. Detect and remove bad quality + duplicated images from reaching production. 3. Resize all the generated images to optimise for mobile devices. 4. Ensure the process scales and behaves in an incremental way. 5. Ensure the whole process fits in a time constrained window. Among the tools we used? Pillow, ImageHash, Kombu and Boto.