We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Building Scalable Multimodal Search Applications with Python

Formale Metadaten

Titel
Building Scalable Multimodal Search Applications with Python
Serientitel
Anzahl der Teile
131
Autor
Mitwirkende
Lizenz
CC-Namensnennung - keine kommerzielle Nutzung - Weitergabe unter gleichen Bedingungen 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen und das Werk bzw. diesen Inhalt auch in veränderter Form nur unter den Bedingungen dieser Lizenz weitergeben
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Many real-world problems are inherently multimodal, from the communicative modalities humans use such as spoken language and gestures to the force, sensory, and visual sensors used in robotics. For machine learning models to address these problems and interact more naturally and wholistically with the world around them and ultimately be more general and powerful reasoning engines, we need them to understand data across all of its corresponding images, video, text, audio, and tactile representations. In this talk, Zain Hasan will discuss how we can use open-source multimodal embedding models in conjunction with large generative multimodal models that can that can see, hear, read, and feel data(!), to perform cross-modal search(searching audio with images, videos with text etc.) and multimodal retrieval augmented generation (MM-RAG) at the billion-object scale with the help of open source vector databases. I will also demonstrate, with live code demos, how being able to perform this cross-modal retrieval in real-time can enables users to use LLMs that can reason over their enterprise multimodal data. This talk will revolve around how we can scale the usage of multimodal embedding and generative models in production.