CRYPTO AND PRIVACY VILLAGE - Catarineu and Modi - Anonymous rate limiting with Direct Anonymous Attestation

Video thumbnail (Frame 0) Video thumbnail (Frame 1883) Video thumbnail (Frame 4012) Video thumbnail (Frame 5382) Video thumbnail (Frame 7012) Video thumbnail (Frame 9821) Video thumbnail (Frame 12150) Video thumbnail (Frame 13654) Video thumbnail (Frame 16446) Video thumbnail (Frame 17960) Video thumbnail (Frame 19645) Video thumbnail (Frame 20953) Video thumbnail (Frame 22559) Video thumbnail (Frame 24207) Video thumbnail (Frame 26007) Video thumbnail (Frame 27377) Video thumbnail (Frame 29013) Video thumbnail (Frame 31618) Video thumbnail (Frame 32191) Video thumbnail (Frame 33807) Video thumbnail (Frame 35693) Video thumbnail (Frame 38714) Video thumbnail (Frame 40030) Video thumbnail (Frame 40918)
Video in TIB AV-Portal: CRYPTO AND PRIVACY VILLAGE - Catarineu and Modi - Anonymous rate limiting with Direct Anonymous Attestation

Formal Metadata

CRYPTO AND PRIVACY VILLAGE - Catarineu and Modi - Anonymous rate limiting with Direct Anonymous Attestation
Alternative Title
Anonymous Rate Limiting in Services
Title of Series
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
Data stream Inheritance (object-oriented programming) Search engine (computing) Direction (geometry) Web browser Clique problem Mortality rate Information privacy Event horizon Physical system
Building Service (economics) Identifiability Parameter (computer programming) Information privacy Event horizon Power (physics) Front and back ends Twitter Timestamp Facebook Query language Computer worm Message passing Exception handling Data type Multiplication Uniqueness quantification Web page Analytic set Bit Data stream Uniform resource locator Search engine (computing) Personal digital assistant Right angle HTTP cookie Table (information)
Point (geometry) Trail Identifiability Service (economics) Virtual machine Set (mathematics) Streaming media Login Event horizon Front and back ends Power (physics) Timestamp Goodness of fit Query language Computer worm Selectivity (electronic) Message passing Proxy server Fingerprint Physical system Data type Content (media) Landing page Data stream Message passing Software Personal digital assistant Query language Search engine (computing) Order (biology) Right angle Routing Resultant
Point (geometry) Implementation Context awareness Service (economics) Multiplication sign Source code Primitive (album) Coma Berenices Rule of inference Mortality rate Product (business) Revision control Medical imaging Frequency Personal digital assistant Proxy server Physical system Task (computing) Standard deviation Multiplication Content (media) Sampling (statistics) Computer network Mortality rate Limit (category theory) Cryptography Cartesian coordinate system Frame problem Landing page Individualsoftware Electronic signature Type theory Message passing Voting Query language Search engine (computing) Internet service provider Convex hull Right angle Clique problem Communications protocol Task (computing) Asynchronous Transfer Mode
Standard deviation Game controller Group action Link (knot theory) Authentication Workstation <Musikinstrument> Client (computing) Information privacy Coprocessor Mortality rate Element (mathematics) Sign (mathematics) Latent heat Different (Kate Ryan album) String (computer science) Computer hardware Message passing Proxy server Computing platform Physical system Module (mathematics) Authentication Standard deviation Interactive television Computer network Bit Mortality rate Electronic signature Category of being Message passing Word Software String (computer science) Communications protocol
Slide rule Server (computing) Group action Functional (mathematics) Algorithm Link (knot theory) Link (knot theory) Validity (statistics) Algorithm Pseudonymization Equals sign Client (computing) Public-key cryptography Electronic signature Local Group Sign (mathematics) Category of being Message passing Sign (mathematics) String (computer science) Operator (mathematics) String (computer science) Message passing
Rule of inference Server (computing) Functional (mathematics) Pseudonymization Multiplication sign Connectivity (graph theory) 1 (number) Content (media) Counting Drop (liquid) Client (computing) Client (computing) Mortality rate Rule of inference Formal language Type theory Frequency Message passing Arithmetic mean Different (Kate Ryan album) Data structure Table (information) Message passing Physical system
Server (computing) Content (media) Client (computing) Content (media) Timestamp Rule of inference Independence (probability theory) Type theory Message passing Uniform resource locator Strategy game Query language Personal digital assistant Computer configuration Different (Kate Ryan album) Query language Uniqueness quantification Normal (geometry) Data structure Message passing
Laptop Service (economics) Mehrplatzsystem Limit (category theory) Web browser Client (computing) Number Software Integrated development environment Implementation Extension (kinesiology) Identity management Physical system Client (computing) Zeitdilatation Mortality rate Limit (category theory) Web browser Message passing Integrated development environment Software Personal digital assistant Limit set Musical ensemble Communications protocol Physical system Identity management Extension (kinesiology)
Standard deviation Email Server (computing) Pay television Link (knot theory) Multiplication sign Limit (category theory) Client (computing) Mereology Mortality rate Lattice (order) Rotation Military operation Operator (mathematics) Information Message passing Physical system Standard deviation Email Theory of relativity Information Key (cryptography) Cellular automaton Interactive television Minimal surface Mortality rate Limit (category theory) Public-key cryptography Measurement Category of being Message passing Personal digital assistant Key (cryptography) Communications protocol Physical system
Building Group action Server (computing) Service (economics) Trail Mehrplatzsystem Bit Client (computing) Mereology Complete metric space Public-key cryptography Electronic signature Power (physics) Local Group Message passing Personal digital assistant Formal verification Order (biology) Different (Kate Ryan album) Key (cryptography) Physical system
Axiom of choice Complex (psychology) Keyboard shortcut Mountain pass Multiplication sign Archaeological field survey Client (computing) Mereology Information privacy Subset Cryptography Website Library (computing) Physical system Point cloud Proof theory Keyboard shortcut Token ring Bit Website Information security Reading (process) Point (geometry) Server (computing) Implementation Service (economics) Token ring Similarity (geometry) Web browser Rule of inference Product (business) Frequency Voting Latent heat CAPTCHA Matching (graph theory) Server (computing) Archaeological field survey Content (media) Client (computing) Limit (category theory) Cryptography Web browser Frame problem Information privacy Elliptic curve Word Personal digital assistant Query language Revision control CAPTCHA Communications protocol Library (computing)
Server (computing) Implementation Open source Workstation <Musikinstrument> Client (computing) Mereology Electronic signature 2 (number) Product (business) Befehlsprozessor Operator (mathematics) Formal verification Core dump Process (computing) Message passing Physical system Overhead (computing) Server (computing) Client (computing) Core dump Message passing Befehlsprozessor Process (computing) Repository (publishing) Infinite conjugacy class property Formal verification
up now we have Alex and Konark with anonymous rate-limiting with direct anonymous attestation thank you and welcome everybody to our talk my name is Connor and this is Alex and we both work for a company called cliques based out of Munich Germany for those of you who have not heard about cliques we are basically a privacy friendly search engine which can be accessed through our browser and through our Firefox add-on and apart from being a search engine we also package the browser and an add-on among various with various privacy features like anti tracking ad blocking and we are also the parent company for Co Street today we want to talk about the system that we built and developed that we developed and use at clicks which basically helps us prevent attacks on our data collection system although the examples that we will take here will be based on based on the events that we receive in our data streams but we believe the system is generating enough that wherever rate-limiting applies the system can be used before Alex will talk about the system in detail I would just like to explain why at first place we needed to build an anonymous rate limiting system and how it has evolved since 2016 now it's important to
understand that data is important to build services and clicks is no exception to power our search engine and D tracking anti-phishing data is of the most importance and yes it might sound a bit controversial talking about data collection at a previously village but
it's important to understand that when you talk about data and privacy how that data is being collected is of the utmost importance if you look at this table here this is what any most of the data streams in the industry would look like most of the data collection happens with an identifier that is sent back home with the events it the identifier could be a long-term identifier an explicit identifier like cookie which you see on the extreme right or it could be based on it could be an implicit identifier based on multiple parameters like IP and the user agent the whole the whole notion of sending back identifiers is so that use cases like analytics could be solved on the back end so based on this data stream we can easily solve the question how many unique visitors visited in a given hour right but also because of the presence of the identifier you can link multiple events on the backend which means not only can you solve the use case of finding out unique visits on Facebook you can also see that the same user also visited Twitter dashboard and maybe because of some capability URLs you can potentially DN analyze the user itself right so at so
when we started to design our data collection streams we wanted to ensure that any data point that can link multiple messages on the back end needs to be dropped on the client-side itself what that means is in our data streams there is no way you can link multiple messages together that could be implicitly or explicitly and in order for us to even prevent ourselves from linking messages based on network fingerprinting we also want to route the data through a set of proxies so that means the data collection streams are clean and cannot be linked with multiple events that said we still can solve the use cases that the that is needed by that is needed to collect identifiers for but for this purpose for this talk we will not dig deeper into how we do it if you want more details we can talk offline or you can read this paper and our approach how we do data collection anonymously and yet solve the use cases
coming back to the motivation of doing this talk now the problem is all the data collection systems suffer from the same problem that attacker can spam you with multiple messages which are fake which are not actually happening on the user's machine right and we are no exception but because you know so if you take let's take an example here so for example one of the most important data points for us is query logs what that means is what is the query that user did and what result that the user select from our search results right and this query log powers how we rank our search engine and how we promote the content because that detects what's the popularity of a query and landing page right now an attacker knowing that we do not have identifiers in our data streams can with multiple messages right and once the spammer starts to send us multiple messages it could be used for altering or promoting their own content at a different track or maybe disrupt our services by promoting a bad result for a good query right to give you a more
concrete example just based on these two messages these are two sample query logs query booking the landing page query booking the landing page booking holidays com just based on the message content we cannot detect whether it's a fabricated data or a genuine data right and because unlike other search engines we do not track users there is no way we can know what is the source of the message and what is the history of the source itself to use it for server-side rate-limiting or for server-side rate limiting solutions right and this is a big big concern for us because now at one point we are doing anonymous data collection but because it's anonymous it can negatively impact our services itself right so one solution could be let's drop an on image data collection and let's start tracking users but that is not what we want to do so we want to build a service wherein we can detect without be anonymized in the user whether it's the same user sending us multiple messages or these are different users you can think about this as a voting system where a trusted user can vote but can only vote once for a given period right now think about this problem as the user needs to vote for popularity of a query but can only vote once in a given timeframe in a mode called in the context of this query log we only want to receive one query per user one normalized query per user in one hour right and if we talk about a general implementation of the system based on the message type and the content of the message we want to rate limit the users rate limit per user or time frame right and that's kind of important for us and that's why the problem that we try to solve is without knowing the user how do we enforce these rate limiting users this these rate limiting rules
so in 2016 we post a version to production which we call anonymously delimiting version one and the idea was because we are already using proxies to transport data we can actually empower the proxies without learning the content of message but still perform deduplication for us but what that meant was these proxies needed to run custom software which perform these deduplication tasks now this protocol was based on blind signatures and other standard crypto primitives but yet for this solution to work we needed multiple trusted third party providers to run proxies for us the reason we needed multiple third party proxy providers first was because they needed to run the custom application and second we did not want that the users could collide with the proxies the other proxies could collide among themselves or cliques in flix itself could pollute among the proxies but what happened was we tried finding these multiple third-party vendors to run proxies for us and sadly in about two years we could not find any right what that meant was cliques had to run their own proxies which kind of defeated the whole purpose and because these proxies had to run a custom software to perform deduplication we cannot rely on the off shelf solutions like tor or other trusted vendors so based on these challenges that we faced in the first version earlier this year we started testing a new version which we call HP n which we call anonymous rate limiting version two and which Alex will explain in detail so
Scott said I'll talk about our new anonymous rate limiting system which is based on the Wraith anonymous attestation he already mentioned some of the benefits but just as a quick review this system has several benefits over a previous one the first one it is that now we don't need we remove the need for trusted third parties second there are less interactions between the client and the server because in the previous system we required one blind signature or a message so the client had to first get message blind silent then actually send a message so there was two interactions the third one is now that we can we can use known solutions for network and immunity like tor which in the previous system was a bit more complicated because of these custom third-party proxies so what is
actually direct anonymous attestation the eternal unaware that the station is a cryptographic primitive which is implemented in processors that follow the trusted platform module standard as well as other chips which follow the Intel enhanced privacy ad specification but rather than focusing on the hardware that runs this cryptographic protocol we would rather like to focus on the cryptographic primitive itself so for our purposes the economics of the attestation has two key features first it allows for anonymous authentication this means that you can prove that trusted platform is signing a message but not which concrete platform is doing so so in other words the signatures can prove membership of the device in a group but not which concrete device is signing the message the second feature that is important for us is the control link ability so given sorry for this control the link ability a device sends a message with respect to a base named string in a way that two signatures from the same member are linkable if and only if they are done with respect the same base name string so for example if a device always signs a message with the same base name then all the signatures will be linkable so we will be able to link all his messages together on the other extreme if a device will randomize this base name and sign every message with respect to a different base name then all the all the messages would be completely and linkable so we see later that we can use this link ability properties to achieve our rate limiting NEADS but before that we have to talk
about the main four operations about that are involved in the return on modernization because we need to refer to them later the first one is called join and this is where a device gets credentials from an issuer typically a device is the client and the issuer is a server and we can see this as the device becoming a member of the issuer group then once this device obtains credentials it can sign messages with respect to this base name string in a way that the link ability properties are fulfilled the link ability properties that we talked in the previous slide are fulfilled so then the verify operation essentially means that you can check whether a signatories valid for a given message and base name and finally the link operation is which is especially important for us is it allows to check whether two signatures have been owned by the same member and the same base name now since this signature link
ability is essential for our purposes we actually need this link algorithms to be efficient so let's take a look at how this link ability looks in practice to achieve this link ability signatures contained pseudonym which can be seen as a link ability tag and the pseudonym is computed as a one-way function of the base name and the user private key of course the user private key is unknown to the server so it isn't known both to the issuer and the verifier the way this still name works is like if whenever if we see two sealed names to equal pseudonyms then it means for sure that the corresponding signatures were done by the same member and with respect the same base name so this means that the link algorithm is efficient because the server just needs to store all the previously seen pseudonyms and then whenever a new message comes just extract its pseudonym and see if it has already been seen before
now that we have explained how the yet anonymous attestation works let's see how to use it actually in practice to achieve there a limiting our rate limiting needs so the idea is quite simple is basically enforce these base names to be computed in a very specific way so that the concrete limiting rules for the system can be enforced so this means that the client will compute the base name according to these rules and will never send two messages with the same base named pseudonym because of course they could be detected very easily at the server side then the verifier which is a server will drop messages either if this base name has computed incorrectly according to the rules or if the pseudonym is repeated so
this would be the structure to achieve our rate limiting purposes so it would it looks like a couple of four components the first one is matches in message type which can be used to have different rules for different kind of messages the second one is a time period which is actually timestamp truncated to an hour or a day depending on on the need the second the third one is a message digest which is an arbitrary function on the message and is used to introduce some message content into these rules and the last is a count which is an interval between unity 1 and n and is meant is can be seen as a multiplier so given a fixed message type time period and message sizes then you can send up to n messages so we found this base table structure to be expressive enough for our needs but it doesn't mean that it is the only possible structure to do this kind of rate limiting now revisiting the previous example that
cannot show the query logs we could have a message that would look like this first the search the search query in this case booking the London URL and the timestamp and we would like we could like to enforce the following rule we only want one message per user per hour per normalized search query to be accepted by the server to achieve this rule and following the previous base name structure the base name would look like this first we will have the message type which is constant in this case query log type second one would be a truncated timestamp to the hour so instead of 656 just six the third one normalized query you know if last one just one in because in this case that's the only option so essentially the strategy is to not let the user choose more than one base name for a normalized search query and for a given hour another example that we might
want to enforce is let's say we have a different kind of message we which we only want to accept three messages per user per day so for this kind of message which would could call three daily type the base names would look like this first the message type then the date and then the message digest which in this case would be empty because the rule is independent of the message content and then account which in this case could be chosen between 1 and 3 so a client and the start of the day would like to send a message then it would see that it can choose out of 3 based names and it would choose one at random it would mark this base name as used and then in the next message that that would want to send it would pick from the two remaining base names etc and essentially when it runs out of make based names there is no way like the message quota is exceeded and there is no way that you the user can send an additional message without being detected at the server side
now if TPM was available like in all the devices a hundred personal devices we could basically end the talk here and say okay we leverage on the TPM because it runs this data anonymous attestation protocol and we showed how to use it to rate limit but in practice this is not the case first TPM is not available in all the devices like it might be available in this laptop but in general it's not and then even if it's available for example in our executors execution environment we want to run this really limiting service in the client which is a browser extension so even if the TPM might be available physically in the device we might not be able to access it so at the end it's not realistic to assume that the TPM is available so what with it is
we implemented the dilator non-homogenous session protocol but without the TPM so software only now this opens the possibility for a new problem which is basically ok in the system we showed we protect against a single user from sending too many messages right so we kind of okay for a given user this music has only a limited set of boats let's say to contribute on the system but if we remove then the TPM then it put it could be quite easy for an attacker to create many identities and then obtain credentials for all of them and then essentially bypass the whole system so note that this would not happen with the TPM because if we would enforce the TPM then an attacker would need to actually get hold of many DPMS devices which is not so easy so at the end if we cannot limit the number of credentials given two attackers then the whole system will be useless now this
has a name it's called civil attacks and we actually need to prevent them so the good thing is that we can pretty much focused on one operation of the system which is the join of relations this is the operation where the clients get revenge cells and the good thing is that it does not need to be anonymous so this is because according thanks to the protocol properties there is no way to link this join to any other of the user interactions with the server so essentially the join is decoupled from all the other interactions so since the joints do not need to be anonymized and we can leverage standard techniques but just on this joint operation for example the simpler one would be just to rate limit on the user IP we believe that buddha already put at some protection then depending on the use case of this limiting system we might have email you know user accounts paid user account subscriptions so we might be able to use these for the join operations to give the credentials now as an extra measure we also rotate the issuer public keys this means that every time the issuer public keys are rotated all the credentials that the issuer has given are invalidated so the clients need to join again this is to avoid an attacker to potentially accumulating many credentials over time and using them with any extra effort so at the end as a summary of this part we believe there are ways to mitigate these civil attacks and that the rate limiting is useful even without TPMS okay so as a summary
for the key parts of our system first we limit joins using all available information that we have user IP email if it's a veil all etc then rotate assure key is periodically and last use the data anonymous attestation link ability for doing the actual rate limiting on the messages now we have to
mention some possible the anonymization attacks that we might be able to do given that we control the servers so given that we control both the issuer and the verifiers the first attack would be trying to make each user join a group where they are alone so I each user would join a group or which is unique for them and then essentially this would its signature if this was this case would contain like a user ID so this would pretty much make it very easy for us to build user sessions this is quite easy to detect because the user fetch the user public is periodically and then the user the issuer put sorry they the user would detect that the issuer is changing this issuer public keys when it should not and then basically could decide to ban the issuer and not send any message anymore or to report it or depending on the system the
second attack is a little bit more tricky and we call it denial of join is that the issuer has the power of deciding who gives who it it's credentialed so in the in extreme case the issue could only give credentials to a single user and then since this since only this user would be able to send messages it would be trivial to track this user right so this is a a bit more tricky to detect from the client side because the clients would meet would need to cooperate with each other in order to know that a single part of the population is not receiving credentials and then react upon it but in practice it is not so easy to implement but in our complete use case since we have the incentive of keep receiving data to run our services we have more incentive on receiving data than on trying to track users and since to track users we have to drop messages proportionally to the granularity that we want to track it's not so it's not reasonable for us to try to do this attack okay so we thought it
would be a good idea to compare our system with similar systems that are used in production one is called privacy bias and it's used in cloud fair and the way this word is basically a user solves a token as far as a user solves a CAPTCHA and then receives an tokens that can redeem later to avoid solving more CAPTCHAs right so these tokens skip the CAPTCHA the CAPTCHA puzzle and these tokens can be redeemed anonymously so we couldn't use this even though we could use this in some specific use cases of our system we cannot use it as a as a general solution for our needs because the well the main reason that these tokens can be redeemed at any point of time in the future and we cannot frame it to read limit at specific time periods as we require then the second one is called ananias and is implemented in brave brave browser essentially consists in the server being able to set up an animal service for a subset of users in a way that only one user can vote once per sorry in a way that users can only vote once per survey as far as I know this is used for payments to decide the popularity of websites we couldn't find a way to use it directly to achieve our our needs because again because our trade limiting rules needed a little bit more of complexity like you need it it needed a time period they needed for some matches too like the query logs to include part of the content to do the limit so we couldn't find a way of using this but still these systems are similar to to our in some way so in for the
implementation part we implemented a painting based tiptoe graphical protocol which is used which is defined in a Phaedo elliptic curve diet an annual attestation and since its user spelling descriptor of it's not so easy to find well at least the choices for libraries are a bit reduced eventually we decided to use a library called Apache Apache Milagro Milagro crypto library which is in C so implemented the protocol in C using this library and since our client runs in a web browser we compile it to what assembly and build wrote bindings to JavaScript for that and then for the server-side we use the same C implementation but broad night wrote native bindings for no GS as our server
now the main smirks especially the most important operation is the server verify because these actually influences how many messages per second can the server process so right now it's 4 milliseconds per CPU core so this means we can process 250 messages per second per CPU core even though this is not blazingly fast we believe that is fast enough to be used in practice to be used in production so this system like not the
whole system is open-source yet like the server-side is not open source but the crypt implementation of the data normally the station it is as well as the client part in this repository so thanks for this [Applause]