Don’t forget to sketch! Running with large datasets

Cite

Confreaks, LLC

Marcus, Adam

Formal Metadata

Title

Don’t forget to sketch! Running with large datasets

Title of Series

Bangbangcon (!!CON 2016)

Number of Parts

Author

Marcus, Adam

License

CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this

Identifiers

10.5446/38548 (DOI)

Publisher

Confreaks, LLC

Release Date

2016

Language

English

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

Large datasets got you down? Have no fear! Make them small! Sketches are probabilistic data structures: they store a rough outline of a dataset in way less space than the dataset itself takes up. We'll sketch out three sketches to determine if an item is missing from your dataset (Bloom Filters!), count how many of an item are in your dataset (Count-min Sketches!), and count how many distinct items are in your dataset (HyperLogLogs!). In the spirit of the sketch, this talk will be hand-drawn (!!!) and leave some details to the imagination!