We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

5th HLF – Lecture: Mathematical Theories of Communication: Old and New

00:00

Formale Metadaten

Titel
5th HLF – Lecture: Mathematical Theories of Communication: Old and New
Serientitel
Anzahl der Teile
49
Autor
Lizenz
Keine Open-Access-Lizenz:
Es gilt deutsches Urheberrecht. Der Film darf zum eigenen Gebrauch kostenfrei genutzt, aber nicht im Internet bereitgestellt oder an Außenstehende weitergegeben werden.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Reliable and efficient digital communication today is possible largely in part due to some wonderful successes in mathematical modelling and analysis. A legendary figure in this space is Claude Shannon (1916-2001) who laid out the mathematical foundations of communication in his seminal 1948 treatise, where among other contributions he gave a mathematical definition of “entropy” and coined the now ubiquitous term “bit” (for binary digit). But Shannon is not the last word in communication. Communication extends to settings well beyond the carefully designed full information exchange model explored in Shannon's work. In this talk I will try to describe some of the many extensions that have been explored in the Interim period including communication complexity (Yao 1980) that explores how it might be possible to achieve effective communication without a full exchange; interactive communication (Schulman 1992) which explores how to cope with errors in an interactive setting, and some of our own work on uncertain communication, which explores how shared context can make communication more effective, even if the context is shared only loosely. The opinions expressed in this video do not necessarily reflect the views of the Heidelberg Laureate Forum Foundation or any other person or associated institution involved in the making and distribution of the video.
Elektronisches ForumMagnettrommelspeicherProfil <Strömung>Konvexe HülleLaurent-ReiheRechenwerkSchwebungSchmelze <Betrieb>VerknüpfungsgliedCliquenweiteBitrateOptimierungsproblemCodierungFehlermeldungBeweistheorieApproximationBimodulNichtlineares ZuordnungsproblemTelekommunikationFinitismusMultiplikationsoperatorPunktMathematikMessage-PassingBitElement <Gruppentheorie>InformationTaskPhysikalische TheorieDigitalisierungPhysikalisches SystemRauschenTupelSoundverarbeitungQuick-SortEinflussgrößeInformatikInverser LimesDiskrete UntergruppeHypermediaVektorpotenzialKommunikationssystemEntscheidungsmodellÜberlagerung <Mathematik>EndlichkeitRechter WinkelMinkowski-MetrikGeschwindigkeitKatastrophentheorieDatentransferInteraktives FernsehenWellenlehreBildschirmmaskeMAPGebäude <Mathematik>HalbleiterspeicherCASE <Informatik>Kontextbezogenes SystemRechenschieberKlasse <Mathematik>Endliche ModelltheorieZeichenvorratStrömungsrichtungSichtenkonzeptData MiningVorlesung/Konferenz
Elektronisches ForumBitrateAuswahlaxiomFunktionalTelekommunikationBitFolge <Mathematik>AnnulatorMessage-PassingNeuroinformatikDeskriptive StatistikMAPFehlermeldungCodierung <Programmierung>p-BlockTaskInverser LimesRechenbuchEntropie <Informationstheorie>BandmatrixSymboltabelleMultiplikationsoperatorAbstraktionsebeneDickeDigitalisierungKarnaugh-Veitch-DiagrammCASE <Informatik>Extreme programmingBrennen <Datenverarbeitung>ZahlenbereichOrtsoperatorNummernsystemEinsInformationFigurierte ZahlBitrateBildschirmmaskeZeichenketteGamecontrollerStatistische HypotheseKomplex <Algebra>DatentransferTransformation <Mathematik>Mapping <Computergraphik>Logischer SchlussVorzeichen <Mathematik>Funktion <Mathematik>SichtenkonzeptPhysikalische TheorieDatenkompressionWeb SiteArithmetisches MittelKategorie <Mathematik>GraphfärbungProzess <Informatik>Vorlesung/Konferenz
Elektronisches ForumRechenbuchNeuroinformatikRandomisierungDickeOrdnung <Mathematik>Folge <Mathematik>Befehl <Informatik>Formale Sprachet-TestRechter WinkelRichtungMessage-PassingRechenschieberKomplexitätstheorieLuenberger-BeobachterBitInformationstheorieMultiplikationsoperatorNatürliche SpracheDatenstrukturMotion CapturingKontrollstrukturFormale GrammatikEndliche ModelltheorieMathematikKartesische KoordinatenPhysikalisches SystemInteraktives FernsehenPhysikalische TheorieFrequenzCodeGrundraumFundamentalsatz der AlgebraApproximationTVD-VerfahrenFunktionalSymboltabelleZahlenbereichDatenfeldInformationDistributionenraumCodierung <Programmierung>AlgorithmusEreignishorizontTelekommunikationExistenzsatzArithmetisches MittelSoundverarbeitungMehrrechnersystemPunktDifferentialgleichungWort <Informatik>Konstruktor <Informatik>BitrateAnalysisNeuronales NetzDivergente ReiheProgrammierungKontextbezogenes SystemEntropie <Informationstheorie>TheoremZufallsgeneratorTabelleMereologieComputerarchitekturVersionsverwaltungVariableGewicht <Ausgleichsrechnung>Digitale PhotographieMAPVorlesung/Konferenz
Elektronisches ForumTelekommunikationFormale SprachePhysikalische TheorieProtokoll <Datenverarbeitungssystem>Message-PassingInterpretiererFunktionalSyntaktische AnalyseSchreib-Lese-KopfFigurierte ZahlBitProgrammierspracheDatenstrukturKontextfreie GrammatikProgrammierungVierzigVorlesung/KonferenzBesprechung/Interview
GammafunktionElektronisches ForumBitrateMailing-ListeLokales MinimumInformationZahlenbereichFolge <Mathematik>BitEin-AusgabeTeilmengeKontextbezogenes SystemFunktionalEndliche ModelltheorieMessage-PassingNeuroinformatikSchnittmengeKomplex <Algebra>Protokoll <Datenverarbeitungssystem>TelekommunikationRandomisierungKonfiguration <Informatik>TypentheorieInteraktives FernsehenEuler-WinkelOrdnung <Mathematik>DifferentePhysikalismusObjekt <Kategorie>KonditionszahlMultiplikationsoperatorQuellcodeArithmetische FolgeFehlermeldungUnrundheitGlobale OptimierungOnline-KatalogBesprechung/InterviewComputeranimationVorlesung/Konferenz
Elektronisches ForumAssoziativgesetzBitAusdruck <Logik>ZeichenketteVertauschungsrelationSummierbarkeitReelle ZahlLeistung <Physik>AusnahmebehandlungRandomisierungQuadratzahlVektorraumRechter WinkelDimensionsanalyseAdditionMultiplikationsoperatorPunktAbstandStochastische AbhängigkeitTelekommunikationTVD-VerfahrenGemeinsamer SpeicherIdentitätsverwaltungSoftwaretestRechenwerkRelativitätstheorieSchaltnetzKomplex <Algebra>Kontextbezogenes SystemProtokoll <Datenverarbeitungssystem>Sega Enterprises Ltd.MatrizenringKartesische KoordinatenVorlesung/KonferenzBesprechung/Interview
Elektronisches ForumBitrateMathematikSymmetrische MatrixElektronische PublikationServerTheoremTotal <Mathematik>FehlermeldungFolge <Mathematik>MereologieExponentGeradeDifferenteKartesische KoordinatenInteraktives FernsehenFunktionalErwartungswertMultiplikationsoperatorCodeRandomisierungAutorisierungArithmetischer AusdruckGrundsätze ordnungsmäßiger DatenverarbeitungUnrundheitTaskResultanteGanze FunktionNummernsystemQuick-SortBitProtokoll <Datenverarbeitungssystem>Ordnung <Mathematik>Arithmetisches MittelBitrateVektorraumZahlenbereichDatenfeldSchreib-Lese-KopfReelle ZahlVirtuelle MaschineInformationTelekommunikationBildgebendes VerfahrenZentrische StreckungLinearisierungAbstandKonfiguration <Informatik>DickeMinkowski-MetrikDivergente ReiheFigurierte ZahlEindeutigkeitDeskriptive StatistikKlasse <Mathematik>ÄhnlichkeitsgeometrieDatenparallelitätPunktwolkeProzess <Informatik>Autonomes SystemAlgorithmische LerntheorieZufallsvariableKonditionszahlKatastrophentheorieZweiRechenschieberSummierbarkeitBlackboxDatentransferProdukt <Mathematik>Wurzel <Mathematik>WärmeübergangQuadratzahlComputeranimationVorlesung/Konferenz
Elektronisches ForumKontextbezogenes SystemEndliche ModelltheorieBeweistheorieInformationGeradeZahlenbereichMathematikTheoremBitTelekommunikationEuler-WinkelZweiPerfekte GruppeRichtungCASE <Informatik>Framework <Informatik>MultiplikationsoperatorWort <Informatik>Deskriptive StatistikData DictionaryVorlesung/Konferenz
Transkript: Englisch(automatisch erzeugt)
So the next speaker is Madhu Sudan, I'm happy to introduce him. So who is Madhu Sudan? At least he is a very nice fellow, as I noticed on Sunday during the reception. But he didn't receive the Nevelina Prize for being a nice fellow, but for his work on probabilistically checkable proofs,
non-approximability of optimization problems and error correcting codes. Please. Thank you all for coming. I realize after lunch, especially for those of you who are flying in from the western sides or from anywhere far away, this is a good time to be taking a nap
and I appreciate your foregoing that thing to come over here. I'm going to tell you a little bit about theories of communication and it's probably a talk which is not quite a mathematics talk, definitely not quite a computer science talk, but somewhere in the middle.
I've spent a lot of time with young researchers over here who've been trying to wonder what I am and I'll try to make sure that you remain puzzled even after the end of this talk. So a few quick sort of explanations about the title before I actually launch into the talk. What is communication to me for this talk today?
I do want to talk about digital communication, but here I want to include two extremal aspects of it. Human communications. I could be communicating to you by singing a song, but I'm not doing that. Instead, I'm actually speaking over a finite alphabet, English,
and using a finite sequence of sentences over this. This is digital communication and human communication is going to play and at least it plays an important role in my research. I'll try to explain to you where it also influences the theories of communication. It could also talk about communications of bits and bytes of data
and again as opposed to radio waves for sound and here challenges are of the form that communication can be very expensive, it can be very noisy, and we want to do things to ameliorate these costs. Modern communication also has other aspects that are of interest. They're interactive.
We don't just talk about sending a message over from here to Mars or vice versa, but instead constantly interacting with some other entity and then achieving some objective and that leads to its own challenges. Communication also tends to be contextual and I'll try to explain what I mean by that later.
Now, an aspect that we should all wonder about, why build a theory? Why not just go ahead, build communication systems that seem to be working? And I do want to emphasize a theory. I don't mean to dismiss the fact that it's very hard to build a system that actually works and sometimes theory even points in the wrong direction,
but that doesn't mean the theory should not be done. So neither one of them is exclusive and I want to explain why the theory. Ad hoc solutions can often work today, but they may not work tomorrow and many of the issues that we are dealing with in communication where I might mean communicating some message across space from here, send a message over to the U.S. or vice versa,
but I might also be thinking about communicating over time. I want to store some information on a disk and be able to recover it 10 years from now. What is the point in having a technology that works today but will not be working 10 years from now if that is the goal? So we do want theories that will work, cover solutions that will work much later,
but even more importantly, I think one of the big advantages of being a theoretician working in the space is you often realize people have proposed solutions, but they haven't yet formulated the problem. And this is something that may seem a little funny to you, but that happens and we even see textbooks covering solutions where the problems have not been specified.
So I would really like to emphasize putting the cart before the horse. So let's try to talk about the problems first so that we have maybe potential collection of solutions, try to define measures by which we would like to evaluate them.
And sometimes when you start looking in this view of things, very creative solutions emerge and these have been the reasons that some of these theories have been very, very successful. And at the end, we do want an understanding of the limits. If there is a reason that we cannot improve on the system that exists today, we would like to know why.
And maybe it will offer us a way of saying, well, this is how we formulated the system today, we will change it so that we can avoid inherent limits which hold under the current model. So understanding of limits is very, very important even if your goal is only to go beyond them. Finally, old and new. Why am I talking of old theories and new?
Well, did the old theories not work? So, and the answer is exactly the opposite. The old theories were just fantastic and I'll talk to you about them today. But they were so successful that they took us from ground level out into space. We were able to achieve orbital velocity. We are now in a very new space, we are able to explore an entirely new class of problems
and now we are faced with those problems and we want to solve them. So we do need evolving theories because the space of problems is evolving. That is a little bit of a preamble. Now let me go ahead and start talking to you about reliable communication.
Hopefully I'll try to get to speed quickly. And by the way, I know that there are a few laureates who were here a few years back. I'm going to actually replay some of the same slides and I hope you'll forgive me. A, I hope your memory is as good as mine, in which case you don't remember those. And B, I think it's always nice to put the old work into the context
before we start placing the new layers on top. So let me start with the task of reliable communication, which emerged as a problem in the late 1940s along with the advent of the digital age. Recall, digital information is when we are talking about bits or bytes
or elements of a discrete finite set being communicated. Alice wants to send some message to Bob, the communication media have always been noisy. There was communication before the 1940s and there's communication after. What changed in the 1940s where we started focusing on discrete information?
And with discrete information, noise tends to have a very potentially catastrophic effect. So for example, you might be sending the following message on the left and you get the message on the right. But there's only one error, but it is catastrophic. It is the exact opposite of what you really intended. We do not want this to happen.
Digital information needs to be preserved in its entirety or else it's futile. How can we go around protecting information when we are worried about errors? And by the way, you can make your physical system as reliable as you want. It will never be 100% reliable. In fact, even with all the layers we put on top, all we do is improve the level of reliability.
It's never 100%. One very simple idea, and this is what I mean by let's start by offering a solution before explaining the problem, is this is a solution we could work with. Let's just go ahead and repeat everything we want to say three times, okay? I want to say W, okay, W, W, W, and so on and so forth,
and I send this much longer sequence. And maybe with all the errors that happened, what I received is a sequence which has a few letters, and they are color-coded for you. Of course, the receiver would not see them as color-coded. They would just see a sequence of letters and now have to figure out what was the message.
And this works moderately well. If you look at this block of letters, you see a W appearing in majority. You might say, okay, maybe that was a W. You see this block that's entirely W's, that was probably a W. You look at this block in the middle, you see a couple of S's and an R.
You think it's an S, but now you're actually wrong, okay? So it's improving the reliability. It has not brought you up to 100%. And in the process, you have paid a price. You have repeated every symbol that you're sending three times. You're using roughly one-third of the bandwidth that may be available to you for the communication.
Is this the best? Is this what we should be doing? But let's understand first a very simple, I won't do too many calculations in this talk, but let's do this one with me, okay? I want you to go through the following calculations. Now let's fix the number of repetitions. Let's say I repeat everything 11 times, okay?
There is still some probability that all 11 symbols might be corrupted, right? It's some positive number, it's not going to be zero. Now let's take the length of the message to infinity, make it longer, longer, longer, and longer. Watch the odds that at least one of the symbols will be entirely corrupted, all 11 occurrences.
It's a positive number, and in fact, as the length of the sequence goes to infinity, it's 100%. There's a 100% probability that something is completely ruined. What we learn from this is that if you fix the number of repetitions, the length of the message is bounded.
If the length of the message is going to infinity, the number of repetitions has to go to infinity, and if the number of repetitions is going to infinity, the rate of usage is zero, is going to zero, okay? And that really was the belief at the time of the first paper that I want to talk about,
when this kind of question started getting explored. How can I make digital communication reliable when I want to send longer and longer and longer messages? Will I really be paying, you know, can I only achieve communication at rate zero, or will it be positive? And I would think that, I mean,
maybe some people would contradict me on this, but I like to stick to my beliefs and not listen to actual stories about history. So my belief pre-1940 was that the rate of every scheme should have been zero as the length of the communication went to infinity. Length of the message that you want to transmit goes to infinity. In view of this, the next seminal work,
the first work that I want to talk about today, Shannon's theory, which came about in a monumental thesis that he wrote in 1948, came up with a very surprising, different answer. One of the things that Shannon did, which was really beautiful, was to first articulate what is the problem
and what are the features that are available, what is under our control. There is Alice that wants to talk to Bob and there's a noisy channel in the middle, but Alice does have a hope of encoding the string. What is encoding? The example that we saw earlier, we were repeating every letter thrice.
That is a form of encoding. Some way of transforming the message into something longer so that after you transmitted it over the channel, you receive something, you do some decompression, decoding, reduce it back to a sequence of the length that you were expecting with some clever inferences.
Like, for example, take the most frequently occurring symbol in every block of three. That's some sophisticated thinking. Maybe we can do others and then we should ask, does Bob get the original message out? The exact message as was intended to be sent by Alice.
Now, a magical transformation happened here. Shannon decided, okay, so I want to design an encoder and a decoder. Let me just denote them by arbitrary functions. If I'm transmitting bits, let the encoder E be any function mapping k bits to n bits
where k over n is the rate at which I'm communicating. In the previous example, for any choice of k, n would have been three times k. Now we are saying, okay, pick an arbitrary n, an arbitrary k, look at a function from E mapping n to k and see what can we do.
And we will apply as decoding any arbitrary function D mapping n bits to k bits. Of course, we want some nice properties, some relationships between E and D. If you transmitted the encoding of a message and then just got the exact thing without any corruption, you would want the decoding to give back the original message. But even if there was some error,
the encoding of the message got corrupted by some error. I'm just denoting it by plus error. It doesn't mean anything in this talk. Some error happened, you decode, you should be able to recover the original message with very, very, very high probability. If you can do that with a particular rate,
that's the rate that I'd like to achieve. This is a very simple, clean description and I think in already formulating this problem, Shannon did some major leaps of faith. One, he's thinking about this in 1948 when there are no computers about. And he's saying, let's take an arbitrary function
and see what happens. What are you going to do with this arbitrary function, my dear friend? Well, I don't know, maybe the next 50 years will tell me. Okay, but that was the thinking then and that is indeed something that took more than 50 years to actually realize the ultimate dreams that he was able to promise. But he took the leap of faith and said, look, it doesn't matter what is the complexity of this encoding task,
what is the complexity of the decoding task. I'll worry about it later, but let me first understand the limits. So he looked at these two and then decided to, and just writing it down as a precise mathematical function and saying, well, we can take any functions we want is a wonderful, clean abstraction of the problem.
What are the best encoding and decoding functions and Shannon gave the ultimate answer. His ultimate answer was the rate that you can achieve is one minus the entropy of P. Entropy of P is this quantity over here. I don't want to read it to you. It's not very important to the top.
And this can be achieved. This is the best possible. There exist encoding and decoding functions which will achieve this in the limit and this is not something that you can improve upon no matter what you do. And let me though explain a couple of extremal cases.
P equal to zero. If I'm sending you bits and they're getting flipped with probability P, every bit is independently flipped with probability P. P is zero. What does that mean? It means if I send you a zero, you get a zero. If I send you a one, you get a one. There is no error in this channel. It doesn't take rocket science to figure out
that the rate is 100%. Of course Shannon proved that. Let me give you another extreme example. P is a one half. If I send you a zero, you'll receive a zero with probability one half and one with probability one half. If I send you a one, you get a zero with probability one half, you get a one with probability one half.
You don't need me to transmit these zeros and ones. You can just simulate it sitting at your own side, just toss a coin and you get exactly the same sequence. So clearly the rate is zero. You're seeing random coins uncorrelated with what I'm sending. And the Shannon theory proved that.
Not two pieces of rocket science, but nice. What is nice is what's in between. This function rate is monotone decreasing for P going from zero to one half. So even if there is, the probability of the bit getting flipped is 0.4999, that gives you some positive rate
at which you can transmit. And this is positive independent of the length of the message. Make the length of the messages longer and longer and longer and you can still transmit at this rate. And Shannon was able to prove this. I mean completely counterintuitive to what we described in the repetition code.
So this is a lovely piece of theory. And let me just tell you a little bit about what I view as Shannon's major contributions over here. It's a far reaching architecture. Putting in an encoder and a decoder when you don't have computers. What will we do with the existence of an encoding and decoding function? We'll worry about it later.
We'll let algorithmic complexity emerge to the fore and eventually find efficient algorithms to do it. But for the time being this is good enough. It's giving us some milestones to try and achieve. The analysis of this problem was profound mathematically.
It's one of the first uses of the probabilistic method at least in the field of engineering though possibly even, I mean, it's pretty contemporaneous with the work of Erdos. Which within plus or minus three years so we don't really know exactly what happened here. There were deep mathematical concepts underlying all this. The concept of entropy, the concept of information,
each one of which got definitions in this very piece of work. And along the way he even coined the word bit for the binary digit. And this theory was, of course, non-constructive. This is what I've been alluding to all this time. We didn't know what the encoding function might look like.
What is the decoding function might look like? How long will they take to compute? None of this was made explicit and constructive. But most of it has been made constructive by now. You can always say I want this extra little feature and maybe it's there, maybe we don't have it. But all the things that we've actually explicitly articulated
we've been able to achieve constructively. In the ensuing period it took a long time to get to this stage. And the remarkable thing is it has brought to us most of the technology that we think about in the information technology era. The reason we think of a piece of a photograph,
of a song, of a video or of a piece of text, we measure them all in bits, goes back to this one piece of work. It's a seminal piece of work and the communication theorists and the information technologists have always kept it in mind as they have built their technology.
Now I want to tell you a little bit about a side effect in Shannon's work. And this is where he decides to apply his theory of uncertainty, entropy, information to English text. And he came up with the following model which is probably known to some of you as the N-gram model.
He says we can actually approximate to a natural language by means of a series of simple artificial languages. And the i-th order approximation in this universe for any number i that you pick will say given the last i-1 symbols that I've written down what could be the next, what's the frequency of the next letter?
If I give you that the first two letters are t and an h, it's a pretty high chance that the next letter is going to be e, a slightly lower probability that it will be a and then a slightly lower probability that it might be i. It's a very unlikely event that it will be an n.
Though in my slides you might see a th followed by an m. But these are the general distributions. So use this kind of knowledge of English and try to produce sentences. What does it do? He described a few approximations. This is a third-order letter approximation. So after looking at the first two letters,
he decides to guess the third one based on the distribution of things. It's not great, but it's not terrible either. It does look like some, you know, caricature of English. Maybe it was like that in the fifth century. We don't know. But, you know, we can't be sure it was not. So it's a pretty reasonable approximation
and by a very, very crude probabilistic method. Here is an even better thing. It's a second-order word approximation. Now we're doing the same thing, but our individual entities are words. And says the herd end in frontal attack on an English writer that the character of this point. And this he did in 1948. So not too bad for that part of time.
By the way, how did he do it? He did not go to the computer and program it and say, here is the code. Because there were no computers then, okay? So what did he do? He actually used a table of random numbers, calculated the statistics of English by himself by hand and then did all these calculations.
Actually quite a remarkable piece of work. I hope he didn't use any graduate students. But, all right. And in fact he makes a very interesting statement which does say something about language along the way. He says, if I look at the i-th order approximation, it produces plausible sequences of length 2i.
Where did the 2 come from? Language is not as unpredictable as you think it is, okay? It's not, you know, should have been i plus 1 maybe, but where did the 2i come from? Somehow it's working out. It's not a theorem, it's just an observation that this is how it looks to be and it was a beautiful piece of observation in this paper.
All right, so this is what I wanted to say about Shannon. In the rest of the talk, I actually want to talk about other theories. But some of the messages that I want to repeat, A, we do want to talk about design systems, but we might end up talking about human languages. Actually, my first next slide will go in the other direction.
We want to talk about human languages, we'll end up influencing design systems. We also want to talk about the interaction between mathematics and engineering and applications and modeling. You know, mathematics came in as a fundamental tool to solve an important engineering problem of the times, in fact, a fundamental engineering problem
of the times since then till now. And along the way, it also helps us model some, you know, find some esoteric aspects of some, you know, common, of modern, of language, which we didn't probably expect. So this is the kind of a phenomenon that I want to trace in the rest of the talk.
And the kinds of theories that I will talk about, Chomsky, you know, really given my knowledge, I shouldn't be talking about him. On the other hand, not talking about him would be a travesty, so I desire to include a little piece about him, but really that deserves an entire lecture on its own. I'll talk a little bit about works of Yao and Shulman,
which are, I think, very important and some things that you should all be aware of. The remarkable thing, the kinds of questions that we should be asking, the kinds of solutions that may exist. And, of course, I'm not here to just do some charity work, I'm here to sort of advertise some of our own work, so I'll tell you a little bit about our own work,
about context and uncertainty and how we should model it and how important they are to communication. All right, on to Chomsky. Chomsky started this work around the same time as Shannon, maybe a little bit later, in the 50s. And his goal was, of course, to understand human communication.
I say communication, though in his text you might see more frequent occurrences of the words like language and grammar, and indeed that's what he focused on. He actually came up possibly, I may be wrong, and if I am wrong, please let me say this and then correct me later in the break. He formalized grammar mathematically, okay?
It didn't capture all aspects of English grammar, it did not, there was always like, this is a structure that you could put to understand grammar and then you have to introduce all your variations. And it gave a lot of the structure behind languages and it started to give, you know, raise questions about,
you know, how much of language has got to be embedded in your cognitive system, in your neural net, in the brain already and how much of it could be acquired by learning to talk. And these kinds of questions started to get articulated in this work. There's a profound piece of work has influenced linguistics heavily since then, but linguistics is not the only thing
it actually influenced. The whole goal was to understand language and human communication and as a byproduct, we got the theory of context-free grammars and in general structure to languages and how to express them. And we learned that, you know, maybe making languages a little bit more structured gives us very good tools to use
in programming languages and out came theories coming out of it like automated parsing or compiling and interpretation and it's so successful today that in fact we don't even teach it anymore, we just program it out and we are done, okay? So this was pure theory influencing practice in another very, very impressive, incredible way.
Like I said, I won't be able to do enough justice to Chomsky's work. This is all I'm going to say about it and I'll be very happy to hear from you about what he really did and what he didn't or tell you what I think he did and what he didn't. But now I want to move on to what have we been thinking about in the last 20 to 30 years, 40 years.
And this is theories which are taking mostly again in the engineering, engineered setting which means I'm designing communicating devices or designing protocols for communication and trying to figure out how well they work. And now an obsession with us is,
well, not all communication is I have this huge message in my head and I want to send it to you. Instead, I have some huge message in my head but you also have some huge message in your head and we want to compute something which is a joint function of the two. Let me give you an example. Suppose you want to buy an airline ticket online, okay?
You have your list of preferences. These are the kinds of days on which I'm willing to travel, these are the times of the days that work, the airlines that may or may not work, the condition on a particular airline, what kinds of seats may be acceptable to you or not acceptable to you? The travel agent, on the other hand, has a vast portfolio
and by a travel agent, I do mean Expedia, I don't mean a physical person who's ever heard of that. So a vast portfolio of airlines and flights and prices and seats and so on, what should we do? What's the protocol? Should the travel agent go first and tell you about their entire catalog, send it to you?
Or should you go in and type out your entire preferences? And the answer is clearly neither, right? So neither of these is the best solution. The solution should be something interactive. I send you some information, the travel agent comes back with some options and I come back and say here is something that is not working and then we interact like this
and that's how we go around buying an airline ticket. This is an interactive protocol, but what is the best protocol over here? How should we really do it? How should we even model problems of this kind where inputs on both ends are large but we want small amounts of interaction in order to be able to achieve a given objective function?
What happens if during this interaction we start having errors? I go through this entire sequence and then I'm wondering why you kept telling me about this airline which I may not want to fly ever on and always give me that as the option and it turns out you misheard something in the first round.
Shouldn't I discover this error early on and so on? And there is a lot of context. What are all the pieces of information that I rely on in deciding what is an acceptable flight? What are all the pieces of information that you take into mind
in deciding to offer me certain options? There's a lot of context, how should it be shared and how can we set aside some of it in order to make progress and soon be able to buy a ticket? So the model that we want to work with over here and this is one that I want to advertise in this talk is due to Yao.
It comes from 1980 roughly, 1979 and it considers once again two players, Alice and Bob. Alice has some large input X. You could think of X as the message that she might be wanting to send to Bob but really she doesn't need to send all of X to Bob because Bob is not interested in X. He's just interested in some function of X.
This function of X also depends on his own input Y. That's only known to him. What Bob wants to know is what is F of XY. So Bob maybe the traveler wants to buy an airline ticket. Y is all his preferences. X is all the flights that are available in the world
and their prices. And F of XY says this is an acceptable ticket for Bob. He's willing to pay the price and after that Alice doesn't care whether he actually flies or not. So what can they do? They are allowed to exchange a sequence of bits and at the end Bob should be able to say
yes F of XY is the following. Here it is. I know what's the price, what's the itinerary, here I pay it and I'm done. Communication complexity of F is going to be the minimum number of bits that are exchanged by the best protocol over here. And one thing that's kind of very unique in this setting
or maybe not so unique but in every setting of computation this becomes evident that this is a very useful tool to count on is we might want to say what happens if we allow Alice and Bob to share some random bits? If they have access to some common source of randomness what could they do with these things? It turns out you can communicate a lot more effectively
if you have randomness and more importantly even if you don't have the randomness there are many problems for which the best communication protocol is not Alice sending X over to Bob. You can do a lot better than that.
So in a sense we are moving beyond the Shannon model where we all would have abstracted this problem as saying Alice needs to send X to Bob and that will suffice. So yes that will suffice but that's not necessary so let's see what else could we get away with and then we would like to say okay and now maybe even give them additional tools and see how it works.
So one thing that's different about this talk and in general many of my talks is that I actually think of the Yaw model as a tool for design of protocols as a model for design of protocols works about the Yaw model of communication complexity
will say here is this beautiful problem and let me tell you that there's nothing that Alice can do other than to send X. Here is a collection of functions for which you can't do any better. I'm more interested in a collection of functions where she can do better and I'll try to illustrate a couple give you a couple of examples of that kind.
Here is a very simple problem I'm sure you could have all thought of it. Alice has N real numbers X1, X2 and so on to Xn. Bob has N real numbers Y1, Y2, Y3 and so on to Yn and what they want to compute is you know X1 minus Y1 plus X2 minus Y2 and so on.
Very simple. Clearly Alice could send X1 to Xn over to Bob and Bob can now compute the quantity over here but you can also see that there's something much simpler we can do. Alice can just sum up all her Xi's send one real number over I'll view it as much smaller
you can think about this question in various other related contexts if you know about modular arithmetic I might send things just simpler. And what Bob now does is to just takes the sum of Xi's subtracts the sum of the Yi's we used associativity and commutativity
of some of these things to come up with this simple formula and one real number suffices the communication complexity is not growing with N it's much much much smaller. It's a lovely solution except it's not such an interesting problem let me tell you about a slightly different problem
now I've changed the problem slightly this is problem 2 and problem 1 had X1 to the Y minus Y1 to the power of 1 problem 2 has X1 minus Y1 to the power of 2 plus X2 minus Y2 to the power of 2 and so on so I'm really now starting to think of Xi's and Yi's X and Y as two vectors in n dimensional space
what they're trying to compute is the square of the distance between these points in particular if they're at the same point we want the answer to be 0 if they're at two different points we want the answer to be greater than 0 ok does this question make sense?
so very simple question what's the solution? should I send X1 square to plus X2 square and so on should I send Y1 square and so on turns out none of those will work actually I didn't realize I already said this there is no one of variation of any of these solutions that I would offer
which would require less than n bits of communication for those of you in the know this is at least as hard as identity testing and testing identity of n bit strings takes at least n bits of X and Y are real numbers in this talk ok so it needs at least n bits
doesn't mean that it's sufficient I'd like to in this talk I'm just thinking of a real number as something that I can send in one unit of time I won't worry about precision issues but if we cared about that we'd do some slightly more refined things so it doesn't there is no simple solution but wait we had an additional tool right randomness
what if Alice and Bob have some common randomness and here's a beautiful solution it came up in a work of Alon, Matias and Segedi but there were probably other variations that were already known so this might not be the best reference for it but it's certainly a reference for it
what they say is the following let Alice and Bob share random bits plus one or minus one independent random bits, n of them R1, R2 and so on up to R sub n and these are random independent independent of X, Y, everything else ok and both Alice and Bob know R1 to Rn
if they know it there is a very simple protocol doesn't involve sending one real number it involves sending two let's say two each way if both of them want to know the answer Alice can send to Bob the sum of the squares of Xi's X1 squared plus X2 squared and so on up to Xn squared
and also sends the following funny linear combination of the Xi's Xi's weighted by Ri's summation R1, X1, R2, X2 and so on up to Rn, Xn Bob can do the same thing oh these should be Y1 squared to Yn squared sorry about that I caught some other errors on the slide
but the 2's are missing in the exponent Y1 squared plus Y2 squared and so on to Yn squared and so it's completely symmetric Alice and Bob could just exchange this piece of information simultaneously we don't care they are not waiting for the other person to send the information a total of four real numbers are being sent
and we are not worried about precision issues they are not important but I just don't want to talk about them and here is a theorem which drives their work and drives a lot of work since then the expected value of S1 plus T1 minus twice S2 T2 expected over what?
over R1, R2 and so on to Rn there is no other randomness here that's the only random variables I mean assuming that R1 to Rn is independent of everything else but I am not assuming X and Y are independent in fact one of the main concerns is
is X exactly equal to Y or not so if they were independent the answer is almost always no no, I am making no assumptions about X and Y but I am assuming the R1 to Rn are independent this expectation over only R1 to Rn is exactly the quantity that we want
so very short communication allows us to actually compute this very complicated looking function it relies on randomness, yes but in many many many applications of this theorem and this theorem by the way is immensely applied in fact the authors of this paper won the girdle prize
for this particular result and the surrounding body of work which included some definitional issues and so on this theorem is immensely applied because there are many many applications where we can concoct the randomness we can put in randomness into devices that are communicating at the beginning and they keep generating more and more
pseudo random things and keep using that to communicate and most of the times it works so this is not a outrageously bad assumption and communication has become so effective so you might ask does this solve my airline ticket buying problem and the answer is not quite but it brings us close
one thing we will have to do in order to do this is to somehow express you know every flight and every user's preference as some real vector now this is not unreasonable people who were at Jeff Dean's talk here and earlier you see that you know it's a very common thing that's done in machine learning
you translate, whoops I did this mistake did I get it back, alright sorry it's a very common thing that's you know practice in machine learning to be able to express preferences and other pieces of data as long vectors in real space where the distance really means what you want it to mean
okay so it's not unreasonable it's not unheard of and distance roughly says how closely aligned is this option of airline ticket to my preference and you can actually use this and as a result with just about two rounds of communication and each communication sorry and the length of the communication
would be roughly twice the description of the final itinerary so if you are buying a ticket you're happy to exchange as many bits as it would take to get your itinerary if you send that much information over the airline agent sends you back that much information you have it all done
that's quite remarkable of course this is not the thing that we see in fact you know this is only two rounds of communication give me a protocol which can buy me an airline ticket with two rounds of communication I'll be thrilled we don't have it because it's kind of hard to express user preferences as numbers I mean as a vector of numbers I have to go look inside my head and figure things
a lot of things out that I don't even know about myself but it's not so far out there in the future your you know your Cortana, your Siri, your Alexa or something will probably be able to using your cell phone and the data that you have out there express your preferences as a vector of real numbers
and be able to do this it's not so far out of scope either so we're not there but so does this mean that this very nice protocol cool protocol I described to you have no applications no it actually has a ton of applications it's used all the time in things like genome matching spam detection, image similarity and so on
it's a lovely protocol, extremely extremely valuable and it's solving a class of problems that by very unique solutions solutions that you might not have thought were possible alright so marching along I want to tell you in the next five minutes or so I guess I'll tell you a little bit about interaction plus errors
and here I do want to talk about interaction but let's think about a slightly different scenario than an airline ticket buying process but say we're trying to maintain a file in the cloud server with concurrent updates from different people so you may have been performing an update I may be performing an update
I want to know what is the status of the file before I tell you what update I want because I might be editing things which are not I might be deleting things which you've already deleted and an asynchronous update could lead to major catastrophes I want to focus on just the part of the interaction
that goes between one of the users say Alice and the server server and let's look at this interaction the server says this is what the file looks like I say okay change these and then she says here's what the file looks like we keep doing this now imagine that the first of these binary sequences I said edit I said you know leave the file as it is
somehow by an error the server says okay let me delete the first line I need to find this error as soon as it happened if I don't find out about this error I might start making catastrophic changes in the file so if there are errors in interaction we can't just do line by line correction
every one of my updates might be one bit saying here's what needs to be changed here's what I need to change I cannot repeat you know for one bit the only error correcting code is repeat I can repeat it as many times as we want and as we know repetitions do not work for long interactions they have rate zero
so we will have to come up with a different scheme and schemes which sort of look at the entire history of interactions and say have errors happened and when errors happen they have to say oh how much should we roll back and this is a very delicate task errors must be detected but we also have to worry about false alarms it may look like there was an error up there
but it was actually recently and we desire to rewind oh my god all this communication that we did is wasted so we have to be very careful about this it's a delicate task it's novel and I don't know how to build a solution based on just black box use of Shannon so it turns out there are lovely interactive coding schemes
so by the way I hope I gave credit Leonard Shulman came up with this problem in 1992 gave a lovely solution both the problem and the solutions were largely ignored but we appreciated it but we didn't work on it it's only now that there's a lot more activity in this field and what we know is the following if p bits you know again we can ask the question
if every bit that I'm sending is flipped with probability p what can I live with what is the rate of information transmission and it's remarkable it's actually there is a linear rate of information at which I can pass information in the same regime between p being 0 and 1 half but the scaling is somewhat different
it's 1 minus square root of 2p and there's a whole series of works which has gotten us here and it's a lovely series of works I would say it's very surprising to me even given Shannon I mean Shannon was very surprising this is equally surprising even if you condition on knowledge of all the parts alright so that's about all the other works
now briefly about my own work in this direction oh am I actually at about 40 seconds left okay so let me very briefly tell you what we've been looking at lately it's in the same model as the yao model of communication I want to think about people exchanging information
but where they have large pieces of context built into it when we talk to each other how am I giving this talk I'm assuming all of you know English can I actually define English so it turns out there's a lot of questions that we can formulate around communication which asks the question where I'm assuming that the user
and you know the two interacting players share some very large common piece of information when we do mathematical proofs I assume that you're all familiar with high school mathematics but what is high school mathematics I don't know I can't write it down and if there's one line of that high school mathematics that you don't know
it's not that my proof fails if I'm sending this information to you right now if I'm speaking to you I have to write down a dictionary of English before I can say this is what you know but can I write down a precise description of English which you know if you are missing a few words of you won't understand the talk
it's not the case I try to build talks which are built on shared context and at the same time are robust to imperfectly shared context so we've been trying to look at these kinds of questions a lot turns out that the YAR model is the right framework for studying this
and there's a lot of ways in which you can inject context into the YAR model I won't try to tell you about all the details of it and do feel free to come by and ask me about it but it turns out in many of these cases you can see a very similar phenomenon if I ignored the context and said how long does it communicate excessively long, it would be as if I ignored the fact that you know English
and launched into this talk today, I could not get anywhere close to the end of this talk by now on the other hand I cannot assume perfect knowledge of English either but if I do assume perfect knowledge of English I could have probably said a little bit more so being somewhere in between I have to scale down my ambitions
I have to say a little bit less so it costs a little bit more in communication but I'm robust to a fairly large wide audience of context all of these things can actually be captured in theorems and there are a fair number of theorems that we proved along these lines I won't tell you exactly anything about them except to flash you a collection of pictures
some very respectable people here, some not so respectable people here but these are all theorems and there are some theorems that we have along the way so I'll stop here and thank you very much