Decision Problems in Information Theory
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 123 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Germany: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/49370 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | |
Genre |
14
41
45
49
54
77
108
111
00:00
NeuroinformatikTheoryInformationDecision theoryDecision theoryComputer virusInformationDigital divideEqualiser (mathematics)Computer animation
00:29
TheoryInformationDecision theoryAlgorithmKolmogorov complexityInequality (mathematics)Complex (psychology)Physical lawRecursive languageValidity (statistics)InformationstheorieDigital divideCartesian coordinate systemInsertion lossComplex (psychology)HierarchyFlow separationResultantLinearizationDecision theoryInformationSelf-organizationSymbol tableFood energyComputer animation
01:15
Computer fontTheoryInformationDecision theoryTime domainMassDecision theoryStress (mechanics)WindowMetropolitan area networkComputer fontInformationMoving averageLevel (video gaming)CASE <Informatik>Multiplication signElectronic program guideView (database)FinitismusEntropie <Informationstheorie>SummierbarkeitRandom variableSign (mathematics)Probability spaceComputer animation
01:52
WritingWechselseitige InformationVariable (mathematics)Entropie <Informationstheorie>TheoryInformationGame controllerService (economics)Term (mathematics)DialectRadical (chemistry)InformationExecution unitDifferent (Kate Ryan album)TwitterWritingEntropie <Informationstheorie>SubsetWechselseitige InformationCondition numberRandomizationSingle-precision floating-point formatPower (physics)Random variableSet (mathematics)Variable (mathematics)SummierbarkeitDistribution (mathematics)Computer animation
03:06
Entropie <Informationstheorie>TheoryInformationVector graphicsMereologyData storage deviceBoss CorporationProcess (computing)Vector spaceGamma functionRandom variableAlgebraic closureSubsetConnectivity (graph theory)Entropie <Informationstheorie>Set (mathematics)Power (physics)CASE <Informatik>Positional notationComputer animation
04:13
Entropie <Informationstheorie>Vector graphicsTheoryInformationForcing (mathematics)Entropie <Informationstheorie>SequenceVector spaceLimit (category theory)Set (mathematics)Computer animation
04:41
Inequality (mathematics)InformationVector spaceIntegerCoefficientEquivalence relationTheoryExpressionSource codeKey (cryptography)Order (biology)Validity (statistics)LinearizationInequality (mathematics)Entropie <Informationstheorie>Digital divideVector spaceGamma functionCoefficientEqualiser (mathematics)Combinational logicTerm (mathematics)Set (mathematics)Real numberComputer animation
05:32
InformationInequality (mathematics)Vector spaceIntegerCoefficientEquivalence relationTheoryDecision theoryArmEndliche ModelltheorieEqualiser (mathematics)InformationMereologyInequality (mathematics)Power (physics)Revision controlDigital divideCoefficientIntegerGoodness of fitComputer animation
06:03
Inequality (mathematics)Recursive languageModul <Datentyp>InformationTheoryProof theoryInequality (mathematics)Term (mathematics)SubsetRandomizationEntropie <Informationstheorie>Modul <Datentyp>Validity (statistics)Digital divideRandom variableSet (mathematics)Right angleSpeech synthesisSheaf (mathematics)Universe (mathematics)Computer configurationWordCausalityGroup actionWritingComputer animation
07:33
Inequality (mathematics)Recursive languageModul <Datentyp>Proof theoryTheoryInformationPhysical lawComplex (psychology)CodecInfinityRoutingProof theoryInequality (mathematics)Decision theorySequenceEqualiser (mathematics)Lattice (order)InformationProcess (computing)Parameter (computer programming)InformationstheorieDigital divideLinearizationCartesian coordinate systemResultantRandomizationGoodness of fitConstraint (mathematics)Complex (psychology)Finite setComputer animation
09:14
TheoryInformationDecision theoryBoolean algebraConstraint (mathematics)Function (mathematics)Validity (statistics)Group actionEqualiser (mathematics)Incidence algebraResultantInformationCASE <Informatik>Constraint (mathematics)Information managementCentralizer and normalizerWell-formed formulaBoolean algebraForm (programming)Validity (statistics)Decision theoryVector spaceCoefficientEntropie <Informationstheorie>Inequality (mathematics)Boolean functionDifferent (Kate Ryan album)Real numberComputer animation
10:13
InformationConstraint (mathematics)Boolean algebraCoefficientIntegerFunction (mathematics)Validity (statistics)TheoryDecision theoryType theoryVector spaceConstraint (mathematics)CoefficientInformationBoolean algebraSystem callOrder (biology)Computer virusComputer animation
10:35
Type theoryConstraint (mathematics)InformationBoolean algebraInequality (mathematics)TheoryDecision theoryInequality (mathematics)Type theoryBoolean algebraConstraint (mathematics)InformationCartesian coordinate systemSlide ruleComplex (psychology)PreconditionerCondition numberForm (programming)Flow separationCASE <Informatik>ExpressionDigital divideLinearizationMaxima and minimaStress (mechanics)EvoluteRevision controlOrder (biology)Junction (traffic)Computer animation
11:54
TheoryInformationEquivalence relationPartial derivativeElectronic mailing listInequality (mathematics)Performance appraisalKolmogorov complexityQuery languageSemantics (computer science)Inequality (mathematics)Range (statistics)Maxima and minimaMereologyTheory of relativityInformationConstraint (mathematics)Group actionQuery languageBoolean algebraSecret sharingDigital divideDatabaseReduction of orderCorrespondence (mathematics)Bounded variationCartesian coordinate systemConnected spaceResultantElectronic mailing listGame theoryObservational studyUsabilityCondition numberUniverse (mathematics)Network topologyTouchscreenComputer animation
13:16
TheoryInformationHierarchySubsetFamilySet (mathematics)ResultantSolomon (pianist)Natural numberSystem callHierarchyFamilySlide ruleInformationSigma-algebraConstraint (mathematics)Type theoryBoolean algebraSet (mathematics)Recursive setSubsetTupleComputer animation
13:55
HierarchySubsetFamilyTheoryInformationSet (mathematics)Arithmetic meanAdditionRow (database)Sigma-algebraGoodness of fitExterior algebra1 (number)HierarchyRecursive setStandard deviationComputer animation
14:37
Inequality (mathematics)InformationTheoremTheoryResultantTerm (mathematics)InformationTheoremLinearizationExpressionInequality (mathematics)InformationstheorieEntropie <Informationstheorie>Digital dividePhysical systemComputer animation
15:29
Inequality (mathematics)InformationTheoremTheoryProof theoryRational numberInfinitySpacetimeRational numberInequality (mathematics)Reading (process)Sign (mathematics)Uniqueness quantificationSeries (mathematics)Direction (geometry)Green's functionCASE <Informatik>Parameter (computer programming)TheoremFinitismusCoefficientIrrational numberAxiom of choiceProbability spaceNeighbourhood (graph theory)Level (video gaming)Digital divideValidity (statistics)HierarchyComputer animation
17:13
Inequality (mathematics)InformationTheoremRational numberInfinityProof theoryTheoryConditional probabilityMagneto-optical driveInformationDigital divideInequality (mathematics)Wechselseitige InformationFinitismusEntire functionCondition numberForm (programming)CASE <Informatik>Parameter (computer programming)Probability spaceEqualiser (mathematics)Source codeNeighbourhood (graph theory)Physical systemSquare numberRow (database)WordComputer animation
18:35
Meta elementInequality (mathematics)InformationConditional probabilityTheoremTheoryCondition numberResultantProof theoryLevel (video gaming)HierarchyComputer animation
18:57
Conditional probabilityInequality (mathematics)InformationTheoremTheoryError messageResultantCondition numberGroup actionSymbol tableSummierbarkeitEqualiser (mathematics)DistanceMachine visionInformationData conversionLimit of a functionTerm (mathematics)Inequality (mathematics)Digital divideLambda calculusMultiplication signPreconditionerAdditionLinearizationExterior algebraCombinational logicConvex setVector spaceSet (mathematics)Entropie <Informationstheorie>CoefficientQuantificationClosed setCASE <Informatik>Nichtlineares GleichungssystemComputer animation
21:51
Inequality (mathematics)InformationConditional probabilityTheoremLemma (mathematics)TheoryBedingte UnabhängigkeitCondition numberValidity (statistics)Constraint (mathematics)Independence (probability theory)Vapor barrierCondition numberValidity (statistics)CASE <Informatik>FreewarePrice indexRandom variableEqualiser (mathematics)Marginal distributionEntropie <Informationstheorie>Independence (probability theory)RandomizationForm (programming)Statement (computer science)LogarithmWechselseitige InformationFormal languageSet (mathematics)AdditionPreconditionerDifferent (Kate Ryan album)Casting (performing arts)Lattice (order)ArmInformationComputer animation
23:45
Bedingte UnabhängigkeitInequality (mathematics)Validity (statistics)Constraint (mathematics)TheoremProof theoryTheoryRecursive languageInformationAdditionResultantLevel (video gaming)MultiplicationTheoryReal numberCondition numberHierarchyWordGroup actionMachine visionComputer animation
24:07
InformationTheoryData typeMaxima and minimaInequality (mathematics)Bedingte UnabhängigkeitRekursiv aufzählbare MengeAlgebraic numberRecursive languageAdditionData modelWordMultiplication signPoint (geometry)ResultantCuboidObservational studySet (mathematics)Closed setDecision theoryComputer animation
24:38
TheoryInformationRekursiv aufzählbare MengeAlgebraic numberInequality (mathematics)Recursive languageAdditionData modelConditional probabilityMaxima and minimaOpen setPotenz <Mathematik>Digital divideModel theoryReal numberTheoryMaxima and minimaOrder (biology)ResultantOperator (mathematics)Inequality (mathematics)Computer animation
Transcript: English(auto-generated)
00:10
Hello. Today, I'm going to talk about decision problems and information theory, and this is joint work with Mahmoud Abu-Khamis, Fokion Paulaitis, and Hank Nohan. My name is Tan
00:20
Suju. This talk is based on a paper that appears in ICARC 2020, and it is a talk presented at the conference. Information inequalities are inequalities between entropic terms, and they have been called by Nick Pincher in 1986 the loss of information theory. Today, they have even more applications than they had back in 1986, and today, we
00:44
require even more complex loss of information than linear information inequalities. Yet, the most basic algorithmic questions remain open. It is open today, even with deciding simple linear information inequalities is decidable. In this talk, my goal is to introduce
01:04
several decision problems in information theory and present some results of their complexity, algorithmic complexity, by placing them in the arithmetic hierarchy. A quick outline of my talk, I will give some basic definitions, then I will define decision problems
01:21
in information theory, briefly mention some applications, then describe the results and conclude. In this talk, we will be concerned with finite probability spaces and a random variable over such a probability space. The entropy of this random variable is defined as the
01:40
standard way. It is a sum over all outcomes of that random variable of the probability of that outcome times log of the probability of that outcome with a minus sign. In this talk, we are interested not in a single random variable, but in n, n joint random variables x1 up to xn. For any subset of these random variables, we consider the entropy of that subset,
02:05
so therefore, we have 2 to the power n entropic values or entropic terms, and we denote them like you can see here, h of x1, x4 of h or h of x, y, z. This means the entropy of the
02:21
joint distribution of the variables x, y, and z. When x and y are sets of random variables, then we denote their union by simply dropping the union term. We write x, y for the union. We will often use these standard definitions. The conditional entropy written h of y bar x
02:44
is simply the difference of two entropic terms, the union of all the variables minus the entropy of the variable in which we condition, and the conditional mutual information i of y, z condition on x is the sum of four entropic terms that are represented here. The details
03:03
are not very important for this talk. Since there are 2 to the power n possible entropic values, we view them as a vector, and conversely, given a vector, a boldface h, vector with 2 to the power n dimensions, we call it entropic if there exists n
03:24
random variables such that for each component of the vector, its value is precisely the entropy of those subsets of random variables. So for that reason, because we view entropies as a vector, we will often denote them with lower case, we would write lower case h of a subset
03:48
of random variables. The set of all entropic vectors is denoted with gamma and star. It is a notation introduced by Zhang and Young in the 90s. It is known that this set is not
04:06
topologically closed, and it is not convex. However, its topological closure denoted with bars and gamma and star is known to be convex. It is much better behaved, and the vectors inside this set are called almost entropic vectors. They are simple limits of sequences
04:27
of entropic vectors. It is known that for n greater than or equal to 3, the set of entropic vectors is strictly included in the set of almost entropic vectors. Okay, so with these basic definitions, we can now describe the topic of our talk today,
04:46
which are information inequalities. These are simple assertions that a linear combination of entropic terms is greater than or equal to zero. C denotes a vector of coefficients.
05:01
They are real numbers, and c dot h is the standard dot product. We say that the equality is valid for all entropic vectors, or gamma and star and valid, if it holds for all entropic vectors. Similarly, we can refer to validity over almost entropic, over a set of almost entropic vectors.
05:24
And because these inequalities are very simple linear expressions, the notions of validity over entropic vectors coincides with the notion of validity over almost entropic vectors. Good. So this justifies our first problem statement, which is the information inequality
05:46
problem, which is as follows. Given an information inequality with integer coefficients, decide if it is valid. Check if it is valid. We insist on the coefficients being integers, because we want to use this as a decision problem.
06:03
Okay, so to give you some examples of valid inequalities, here are the very well known Shannon basic inequalities. They are called monotonicity, which says that the entropy of more random variables is greater than or equal to the entropy of a subset of random variables,
06:23
and submodularity. These information inequalities are known to be valid. Their consequences are called Shannon inequalities, and it is decidable if a given linear inequality is a Shannon inequality. In other words, it is decidable if a given
06:44
linear inequality is a consequence of Shannon's basic inequalities. Here is an example. This is an inequality, which is a Shannon inequality, which means it is a consequence of these three, and therefore it is valid. So let's prove that this inequality is a consequence of
07:05
Shannon's basic inequalities. So we start with a term on the left, and we apply some modularity to the first two terms. This allows us to replace them with the two terms consisting of the union of the variables, which means x, y, and z, and the term of the intersection of
07:25
the variables, which is just y. And then we apply some modularity again to these two terms, and we get x, y, z, and the empty set, and h of the empty set is zero. So this is a short proof that the inequality that we have seen here is a Shannon inequality.
07:45
Good. So in essence, what Bipinja asked in 1986, whether the Shannon inequalities are complete for all information inequalities is, at its essence, a question about decidability. If Shannon's inequalities were complete, then it means that we can check every information
08:05
inequality by checking if it is a consequence of the Shannon inequalities. However, a Zhang and a surprising result in 1998, they proved the first, they gave the first example of an inequality that is valid, but it is not a Shannon inequality. We call it a non-Shannon
08:25
inequality. Moreover, in 2007, Matus proved an even more surprising result that there are infinitely many non-Shannon inequalities that are inequivalent, and this only requires four random variables, n equals 4. And this ruined any hope for obtaining decidability
08:46
by listing a finite set of inequalities that are complete for checking all linear inequalities in information theory. So even more interestingly, today we are facing applications
09:01
that require more complex constraints over information inequalities than the simple linear inequalities, and this is my topic next. I'm going to define the more general decision problems in information theory. So the most general form of decision problems that we're going to consider are Boolean
09:26
information constraints. A Boolean information constraint is given by a Boolean formula F. Here is a brief example to give you some intuition. This is a Boolean formula with four Boolean variables, and as many vectors of real coefficients, and the Boolean information
09:47
constraint is just the assertion that this Boolean function applied to the inequalities defined by these vectors of coefficients is true. It is this assertion written here.
10:00
In this case, we distinguish between validity over entropic vectors and validity over almost entropic vectors, and there are cases where it is known that these two notions of validity differ. So the most general problem that we consider is that of Boolean information constraints, which says given such a Boolean information constraint where all the coefficients are
10:24
integers, check if it is valid for all entropic vectors or check if it is valid for almost entropic vectors. Here are a few types of Boolean information constraints that each of them has important applications. The simplest type is that of linear information inequalities
10:45
or simplest information inequalities, like the one we have seen before. Next, we consider max information inequalities. These are of the form the maximum of some expressions greater than or equal to zero, or the maximum of some expressions greater than or
11:02
equal to some other expression. These are equivalent to the Boolean disjunction of several inequalities, and they are a special case of a Boolean information constraint. More generally, we can consider conditional inequalities, the assert that if some inequalities hold, these preconditions hold, then another inequality holds as well.
11:26
And finally, we consider general Boolean information constraints, which can have multiple preconditions and then max inequalities as false conditions. All these types of Boolean information constraints have important applications,
11:44
and my goal next is to briefly describe these applications before we discuss the complexity of these types of Boolean information constraints. Okay, so one slide about applications. This only lists four applications, and I invite you to look at the paper for more.
12:06
The first I want to mention is a beautiful result by Chan from 2007, and he proved that information inequalities are computationally equivalent via one-to-one reductions to certain group theoretic inequalities.
12:21
And this connection is very direct. It's very one-to-one, and it is a beautiful correspondence between information inequalities and inequalities in group theory. What we proved very recently is that max information inequalities are computationally equivalent via many one-to-one reductions to a problem that has
12:43
been studied intensively in database theory, namely the containment problem for conjunctive queries under back semantics. And finally, I want to briefly mention that these Boolean information constraints, they have applications to secret sharing and to query variations over relational databases, and I invite you to check the paper.
13:07
So clearly Boolean information constraints have a wide range of applications, so what do we know about their complexity? This is going to be the topic of the next part of this talk, so let me describe the results that we have. And all we know is that we can place some of
13:28
some types of Boolean information constraints in the arithmetic hierarchy. This one slide is a brief review of the arithmetic hierarchy, also known as the Kleene-Monstovski hierarchy, and this consists of families of subsets of
13:46
tuples of natural numbers defined as follows. At the base of the hierarchy are the families sigma 0 and pi 0, which consist of all the decidable sets. So every set in sigma 0
14:01
or equivalent to pi 0 means a decidable set. Sigma n and pi n are defined using n quantifier alternations. Sigma starts with the existential quantifier, and pi starts with the universal quantifier, followed by a decidable set B. So this is a standard arithmetic hierarchy.
14:23
It is known to be strict, and each set except the base ones contains many undecidable sets, undecidable problems. Good. So with this definition in mind, let me describe the first of three results that I want to mention. And the first result is
14:45
actually a floor theorem that I view as a warm-up. And it concerns the simplest information inequalities, which are linear information inequalities. To give you a sense of why they are hard, here is an example of a nonchalant inequality taken by
15:06
spoke by Jung on information theory. We call that each conditional mutual information, like this term here, is simply a linear expression of four entropic terms. So this
15:22
is just a linear inequality over entropic terms that the third is greater than or equal to 0. It is valid, and it is not a consequence of Shannon inequalities. We do not know how to prove such inequalities in general, except to use ingenuity which is
15:40
specialized to each such inequality. However, it is relatively easy to refute them by simply enumerating over all finite probability spaces with rational coefficients. And this brings us to the first theorem, checking the validity of such an information inequality is in pi 1,
16:00
and the first level of the arithmetic hierarchy. So with proof, we enumerate over all finite probability spaces with rational coefficients. If any of them invalidates the inequality, then we return false, otherwise we continue up to infinity with the next probability space.
16:23
Now, the key piece of the argument that enables this simple theorem is the following observation. If there exists a finite probability space on which this inequality fails, in which we have this strict inequality in the opposite direction,
16:40
even if that finite probability space has irrational coefficients, there exists an entire neighborhood of those probability values, of those irrational probability values, such that in that entire neighborhood, any choice of the probability values we still have will still be a reputation of this inequality. And this means that we can also find the
17:05
probability space with rational probabilities. And this is something that we enumerate in this in this process over here. However, the simple argument fails in the second case, when we consider conditional information inequalities. Here is an example of a conditional
17:25
information inequality taken from a paper by Calcetto and Dromashchenko. It asserts that if these four conditional mutual informations are equal to zero, then this fifth conditional mutual information is equal to zero as well. Now, in our formulae, we discussed inequalities
17:46
as opposed to equalities, but it is easy to convert each of these assertions into an inequality because it is known that the mutual information is greater than or equal to zero, and therefore equality to zero is equivalent to minus the mutual information greater than
18:03
or equal to zero. For such inequalities, the previous argument no longer holds. If we have a finite probability space whose probabilities might be irrational, that forms a violation of this implication, that probability space will have to make all
18:22
these mutual informations exactly equal to zero. And in general, it is no longer the case that this continues to hold in an entire neighborhood of source probability values. Nevertheless, we can prove that with an additional condition on the preconditions of
18:42
such a conditional inequality, the problem is in pi 3, the third level of the arithmetic hierarchy. And more than the result itself, the proof technique is actually even more interesting, and this is what I'm going to show you next. The proof technique is a method that converts
19:03
conditional information inequalities into simple linear inequalities. And let me describe you how it goes. Consider the first, the conditional information inequality in equation 1. It asserts that if C linear inequalities hold, then so does the consequent. And consider condition 2,
19:25
which is obtained by asserting that the consequent is greater than some positive linear combination of the antecedents. Here, the lambdas are any positive coefficients. Now, it is easy to check that if condition 2 holds, if this linear inequality holds,
19:47
then the conditional inequality holds as well, simply because if these preconditions are greater than or equal to 0, then by a condition 2, C times H is also greater than or equal to 0.
20:02
In a surprising result, however, Kachen and Romashchenko proved that the converse fails. It actually fails even on the example that I showed you on the previous slide, which is from their paper. What we proved, however, is that the converse does hold if we allow an error term on the left-hand side of this inequality.
20:24
More precisely, we considered this condition, which is an alternation of three quantifiers. It says for any epsilon greater than 0, there exists positive coefficients such that for any almost entropic term, the inequality 2 plus a small error term which is multiplied just with epsilon
20:46
is greater than this linear combination. It is easy to see that 3 continues to imply 1, and what we proved is that the convex also holds. If the conditional inequality holds, then this inequality with an error term holds as 1.
21:05
So for this result, it is critical that here we take the set of almost entropic terms. This needs to be a closed convex cone, which is the case for the set of almost entropic vectors,
21:22
and it does not apply. We don't know if it holds for the set of entropic vectors. Now, if in addition all the preconditions are tight, which means that the inequality is opposite direction, it's a valid one, then we can further increase the lambda here because these terms are always less than or equal to 0. So we can increase them to obtain natural numbers,
21:45
and this places this problem in bias-free because it's an alternation of for all, exists, and for all. Okay, the third problem I want to discuss is that of conditional independence statements between random variables. So this doesn't need the language of
22:05
entropy. These are simply statements of the form two random variables, y and z, are conditionally independent given the values of a random variable x. Such assertions can be written down
22:25
as equalities over marginal probabilities, as I wrote here. We need to write down such an equality for every possible outcome x, y, z of this free random variable capital X, capital Y, and capital Z. The conditional implication problem has been
22:44
studied intensively in the literature. It also asks us to check if given some conditional independence between some sets of random variables and other conditional independence holds as well. Now, a conditional independence holds if and only if the mutual
23:02
information is equal to zero, so therefore the conditional indication problem is really a special case of a conditional equality. However, there are two differences from our previous discussion. First of all, here we are really interested in validity over entropic terms, not over all
23:22
entropic terms, almost entropic terms. And moreover, here the condition, both the precondition and the consequent, can be written using only multiplication and addition. Addition is needed to write marginal probabilities like p of x. Hence, there is no need for the logarithm.
23:45
Using this and the fact that the theory of reals with addition and multiplication is decidable, this is a new result by Tarski, we can prove that the conditional implication problem is in pi 1, in the first level of the arithmetic hierarchy.
24:07
Okay, so this is all the time we have, so let me briefly mention a few words in conclusion. Here is a brief summary of our results. As you can see, there are many boxes that are still
24:20
open. And I want to mention three points in conclusion. First of all, the question of decidability is related to another question about the set of almost entropic terms, whether this set is semi-algebraic closed. So please see the paper for more discussion.
24:41
Another interesting question is whether the max information inequalities are computationally equivalent to linear inequalities. Again, we have some evidence in the paper that they might be computationally equivalent. And finally, we want to mention that our result based on Tarski's theorem, in order to extend this to conditional inequalities,
25:05
one needs to know whether the theory of reals with addition, multiplication, and exponentiation is decidable. And this is a major open problem in model theory today.