We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Similarity Detection in Online Integrity

Formal Metadata

Title
Similarity Detection in Online Integrity
Subtitle
Fighting abusive content with algorithms
Title of Series
Number of Parts
542
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
How Meta manages to take offline millions of pictures, videos and text that violate its community standards, all of them adversarially engineered, in a catalog that counts in the trillions. We'll talk about open source technologies that embrace vector search, state of the art in neural and non-neural embeddings, as well as turnkey solutions. Content moderation is a problem that affects every service that hosts user uploaded media. From the avatars to a personal collection of pictures, the platform holds the responsibility of removing the violating content. The problem can be tackled with clssifiers, human moderators and by comparing media signatures; this presentation will be about the latter. Similarity Detection is an approach that tries to detect media based on an archive of "definitions" (yes, like the antiviruses) of things that have already been classified as violating. But how do we measure similarity between images from the perspective of a machine (not to mention video/audio clips of different lenghts)? The answer is not MD5... We'll talk how we do it, what technologies you can use too and how we can leverage a public, crowdsourced archive of signatures to defeat various threats, from terrorism to misinformation to Child Exploitation.