On the Expressiveness of LARA: A Unified Language for Linear and Relational Algebra - TIB AV-Portal

On the Expressiveness of LARA: A Unified Language for Linear and Relational Algebra

00:00

20

Zugehöriges Material

International Conference on Database Theory (ICDT)

Barceló, Pablo Higuera, Nelson Pérez, Jorge Subercaseaux, Bernardo

Formale Metadaten

Titel

On the Expressiveness of LARA: A Unified Language for Linear and Relational Algebra

Serientitel

23rd International Conference on Database Theory (ICDT 2020)

Anzahl der Teile

25

Autor

Barceló, Pablo

Higuera, Nelson

Subercaseaux, Bernardo

Lizenz

CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.

Identifikatoren

10.5446/46829 (DOI)

Herausgeber

International Conference on Database Theory (ICDT)

Erscheinungsjahr

Sprache

Inhaltliche Metadaten

Fachgebiet

Genre

Abstract

We study the expressive power of the Lara language - a recently proposed unified model for expressing relational and linear algebra operations - both in terms of traditional database query languages and some analytic tasks often performed in machine learning pipelines. We start by showing Lara to be expressive complete with respect to first-order logic with aggregation. Since Lara is parameterized by a set of user-defined functions which allow to transform values in tables, the exact expressive power of the language depends on how these functions are defined. We distinguish two main cases depending on the level of genericity queries are enforced to satisfy. Under strong genericity assumptions the language cannot express matrix convolution, a very important operation in current machine learning operations. This language is also local, and thus cannot express operations such as matrix inverse that exhibit a recursive behavior. For expressing convolution, one can relax the genericity requirement by adding an underlying linear order on the domain. This, however, destroys locality and turns the expressive power of the language much more difficult to understand. In particular, although under complexity assumptions the resulting language can still not express matrix inverse, a proof of this fact without such assumptions seems challenging to obtain.

23rd International Conference on Database Theory (ICDT 2020)15 / 25

1

15:02

A Dichotomy for Homomorphism_Closed Queries on Probalistic Graphs

2

10:11

Distribution Constraints

3

11:59

The Shapley Value of Database Tuples

4

12:30

Infinite Probabilistic Databases

5

12:53

Executable First-order Queries in the Logic of Information Flows

6

10:24

Weight Annotation in Information Extraction

7

12:16

Datalog with Negation and Monotonicity

8

10:31

Towards Streaming Evaluation of Queries with Correlation in Complex Event Processing

9

12:08

The Space Complexity of Inner Product Filters

10

10:15

Coordination-Free Byzantine Replication with Minimal Communication Costs

11

12:10

Random Sampling and Size Estimation Over Cyclic Joins

12

12:16

Containment of UC2RPQ: The Hard and Easy Cases

13

09:01

Dynamic Complexity of Document Spanners

14

12:04

On the Expressiveness of Languages for Complex Event Recognition

15

13:26

On the Expressiveness of LARA: A Unified Language for Linear and Relational Algebra

16

10:52

Integrity Constraints Revisited: From Exact to Approximate Implication

17

14:41

Reverse Prevention Sampling for Misinformation Mitigation in Social Networks

18

12:00

A Simple Parallel Algorithm for Natural Joins on Binary Relations

19

11:51

On Equivalence and Cores for Incomplete Databases in Open and Closed Worlds

20

14:15

When Can Matrix Query Languages Discern Matrices?

21

12:32

Optimal Joins Using Compact Data Structures

22

11:11

A Family of Centrality Measures for Graph Data Based on Subgraphs

23

45:00

Current Challenges in Graph Databases (Invited Talk)

24

15:05

Foundations of SPARQL Query Optimization

25

51:16

Facets of Probabilistic Databases

Automatisches Abspielen

Sprache

Text

Bild

00:00

Lokales NetzRelationale DatenbankAlgebraisches ModellLineare AbbildungFormale SpracheDatenverwaltungMetropolitan area networkBestimmtheitsmaßARM <Computerarchitektur>Formale SpracheRelativitätstheorieVirtuelle MaschineLinearisierungComputervirusStrömungsrichtungArithmetischer AusdruckComputeranimation

00:26

MultiplikationProdukt <Mathematik>Kartesisches ProduktRelation <Informatik>Relationale DatenbankAlgebraisches ModellLineare AbbildungCodeLineare AlgebraRelativitätstheorieMatrizenrechnungAlgebraisches ModellFunktionalLastTabelleTensorVirtuelle MaschineAbfrageCASE <Informatik>Gebäude <Mathematik>GruppenoperationMultiplikationRechenwerkVollständiger VerbandComputeranimation

00:56

Relationale DatenbankGlobale OptimierungNatürliche ZahlAlgebraisches ModellTensorLineare AbbildungGlobale OptimierungAlgebraisches ModellFunktionalComputeranimation

01:20

Relationale DatenbankAlgebraisches ModellLineare AbbildungAttributierte GrammatikDatenbankDeskriptive StatistikSigma-AlgebraTabelleDatenmodellSchlüsselverwaltungEnergiedichteComputeranimation

01:41

TabelleAssoziativgesetzSchlüsselverwaltungTabelleFlächeninhaltNebenbedingungAlgebraisches ModellVersionsverwaltungNichtlinearer OperatorSchlüsselverwaltungKonstruktor <Informatik>Computeranimation

02:06

Nichtlinearer OperatorFunktion <Mathematik>MaßerweiterungTabelleAssoziativgesetzFormale SpracheOrdnung <Mathematik>Kategorie <Mathematik>GrenzschichtablösungFunktionalLeistung <Physik>MaßerweiterungPhysikalische TheorieTabelleTermDifferenteMapping <Computergraphik>NebenbedingungGruppenoperationWeb-SeiteComputeranimation

02:54

Relationale DatenbankAlgebraisches ModellSchlüsselverwaltungLineare AbbildungDatenbankLineare AlgebraRelativitätstheorieFunktionalLeistung <Physik>TermVirtuelle MaschineAbfrageNichtlinearer OperatorKartesische KoordinatenPrädikat <Logik>Arithmetischer AusdruckPrädikatenlogik erster StufeVerknüpfungsgliedQuick-SortNormalvektorComputeranimation

03:30

TermOperations ResearchBeobachtungsstudieAttributierte GrammatikFormale SemantikSchlüsselverwaltungVerknüpfungsgliedFlächeninhaltComputeranimation

03:48

Formale SemantikAttributierte GrammatikTabelleVersionsverwaltungNichtlinearer OperatorSpieltheorieRechenwerkSystemaufrufCASE <Informatik>Computeranimation

04:33

Formale SpracheNebenbedingungProjektive EbeneTabelleVersionsverwaltungNichtlinearer OperatorMultiplikationsoperatorGruppenoperationQuick-SortFlächeninhaltGüte der AnpassungCASE <Informatik>Workstation <Musikinstrument>WhiteboardSymboltabelleComputeranimation

05:18

Formale SemantikDivisionMaßerweiterungFunktionalLeistung <Physik>MaßerweiterungSchnittmengeMechanismus-Design-TheorieWhiteboardComputeranimation

05:50

AbfrageSchlüsselverwaltungDruckverlaufVirtuelle MaschineVariableMultiplikationsoperatorAusdruck <Logik>SoftwaretestInstantiierungServiceorientierte ArchitekturComputeranimation

06:10

TermMathematische LogikOrdnung <Mathematik>Mathematische LogikFunktion <Mathematik>Ausdruck <Logik>BitLeistung <Physik>Lokales MinimumPrimidealTabelleTermSchlüsselverwaltungTupelPrädikatenlogik erster StufeComputeranimation

06:57

SchlüsselverwaltungKonstanteMathematische LogikFormale SemantikFunktionalMaßerweiterungResultanteArithmetischer AusdruckSchlüsselverwaltungDomain <Netzwerk>Prädikatenlogik erster StufeVerknüpfungsgliedGüte der AnpassungProzess <Informatik>Mehrschichten-PerzeptronBeobachtungsstudieAssoziativgesetzComputeranimation

07:44

TheoremPrädikat <Logik>Funktion <Mathematik>MaßerweiterungRegulärer Ausdruck <Textverarbeitung>Translation <Mathematik>FunktionalMaßerweiterungTranslation <Mathematik>Negative ZahlPrädikatenlogik erster StufeFormale SpracheQuick-SortComputeranimation

08:15

Formale SpracheFunktion <Mathematik>MaßerweiterungUmwandlungsenthalpieFormale SemantikNebenbedingungVollständigkeitAttributierte GrammatikFormale SpracheFormale SemantikBewertungstheorieAusdruck <Logik>NebenbedingungFunktionalMaßerweiterungResultanteUmwandlungsenthalpieSchlüsselverwaltungElement <Gruppentheorie>Prädikatenlogik erster StufeMAPGruppenoperationSchwebungGradientWorkstation <Musikinstrument>KonditionszahlComputeranimation

09:18

TheoremFunktion <Mathematik>MaßerweiterungCodierung <Programmierung>Formale SemantikNebenbedingungRegulärer Ausdruck <Textverarbeitung>UmwandlungsenthalpieVorzeichen <Mathematik>VollständigkeitFormale SpracheRelativitätstheorieFunktionalMaßerweiterungTermVirtuelle MaschineLuenberger-BeobachterNegative ZahlUmwandlungsenthalpieArithmetischer AusdruckEinsMAPGruppenoperationWärmeausdehnungWorkstation <Musikinstrument>Algorithmische LerntheorieComputeranimation

09:52

SchlüsselverwaltungFunktion <Mathematik>MaßerweiterungOperations ResearchPaarvergleichLineare AbbildungZeitbereichPrädikat <Logik>Formale SpracheOrdnung <Mathematik>ModelltheorieKategorie <Mathematik>FunktionalKardinalzahlPaarvergleichQuick-SortLinearisierungAdditionBeobachtungsstudieSchlüsselverwaltungDomain <Netzwerk>TVD-VerfahrenZeitzoneGrundsätze ordnungsmäßiger DatenverarbeitungEinfache GenauigkeitServiceorientierte ArchitekturComputeranimation

10:49

MaßerweiterungFormale SpracheSchlüsselverwaltungPrädikat <Logik>Numerisches VerfahrenProdukt <Mathematik>KardinalzahlMaßerweiterungVersionsverwaltungNichtlinearer OperatorPrädikat <Logik>Algorithmische LerntheoriePrädikatenlogik erster StufeComputeranimation

11:18

TermFaltungsoperatorOrdnung <Mathematik>AbfrageFaltungsoperatorElement <Gruppentheorie>Computeranimation

11:32

TermFaltungsoperatorRegulärer Ausdruck <Textverarbeitung>AbfrageSchlüsselverwaltungGenerizitätAussage <Mathematik>Formale SpracheTaylor-ReiheOrdnung <Mathematik>PaarvergleichFaltungsoperatorArithmetischer AusdruckSchlüsselverwaltungComputeranimation

11:48

TermTheoremFormale SpracheFaltungsoperatorRegulärer Ausdruck <Textverarbeitung>Lineare AbbildungStellenringUmkehrung <Mathematik>Prädikatenlogik erster StufeBildverstehenAdditionDruckspannungBefehl <Informatik>Computeranimation

12:09

Umkehrung <Mathematik>TermTheoremFormale SpracheFaltungsoperatorRegulärer Ausdruck <Textverarbeitung>Lineare AbbildungAussage <Mathematik>StellenringAbfrageOrdnung <Mathematik>Rekursive FunktionBildschirmmaskeKomplex <Algebra>MaßerweiterungUmkehrung <Mathematik>OrdnungsreduktionBitEreignishorizontComputeranimation

12:34

TermRekursive FunktionAussage <Mathematik>BeschreibungskomplexitätInverseFormale SpracheGeradeQuick-SortFlächeninhaltGüte der AnpassungNichtlinearer OperatorBeobachtungsstudieRechter WinkelOrdnung <Mathematik>Computeranimation

12:52

Operations ResearchArithmetischer AusdruckFormale SpracheAlgebraisches ModellBeschreibungskomplexitätKomplex <Algebra>Physikalische TheorieComputeranimation

Transkript: Englisch(automatisch erzeugt)

00:02

Welcome everybody. Hello. I am Pablo Barcelo. This is our talk on the expressiveness of LARA, unified language for linear and relational algebra, joined work with Nelson Iguerra, Jorge Perez, and Bernardo Suercaso. So current data management, data science, machine learning obligations

00:22

demand applying methods from both relational algebra and linear algebra. Relational algebra deals with relational tables to extract, transform, load the data, and linear algebra functionalities to perform analytical tasks, in this case over matrices or tensors.

00:43

So, then there has been a natural code for an algebra that captures the functionalities needed in both worlds, and also in the machine learning community for a high-level query language for handling tensors properly, an API or a query language, and also for developing

01:01

optimization techniques that are multi-system, that is, that apply to both worlds together. So, was a proposal for an algebra that merges functionalities from both linear and relational algebra. This was proposed by Hutchinson, Ho, and

01:23

Soutou in a workshop in Sigma 2017. And here there is a brief, brief description of what LARA does. So, LARA, the data model of LARA are associated tables in which we have attributes that are keys, and

01:41

we also have values. Each table of keys is associated with a table of variables, and keys are really keys in the database sense. That is, there are no two tables in the associated table with the same key. And the syntax of the algebra is very minimalist. It consists of four constructs. The first one says, I can refer to a table.

02:03

I can take the join of two tables with some binary operator that helps us to solve conflicts due to the key constraints. I can take the union of two tables, which is this join looking down with an aggregate operator, or I can extend a table

02:24

with an extension function, which maps, essentially maps values to different values, right? And it is these extension functions that define the value, the expressive power of this language.

02:42

So, although several theoretical properties were studied in terms of what this language can express in the original paper, several orders of interest were left open. In particular, how powerful is LARA for capturing these two languages, linear algebra and relational algebra?

03:04

Which kind of machine learning applications or functionalities does it really subsume? And what is its expressive power in terms of usual query languages that we know in the database context? So, in our work, we study expressiveness of LARA by using the yardstick of first-order with aggregation plus some built-in predicates

03:27

that we will have to to specify. And we will also compare its relative expressiveness in terms of some common machine learning operations. So, let me briefly explain the semantics of LARA.

03:41

Well, we can take the join of two tables as usual. So, here over common predicates, over common key attributes. So, here we have key attributes i and j, here j and k. So, the first table here is 0, 0 in A. It joins with the first table in B00. So, this gives us the first table of

04:05

the join 0, 0, 0. And the question is which value do we put? We have a value coming from table A and we have a value coming from table B. But then the binary operator in the join tells us how to solve the conflict by taking the multiplication. So, it will be a 1.

04:22

Union is similar, but when taking the union of two tables, we will project only on those key attributes that are common. So, in this case, we only keep j. We put all tables

04:40

together. We just take the union of all these tables in the projection. But now the projection can violate the key constraint. And the question is how do we solve it? This time by using the aggregate operator in the union symbol, which is addition. So, we will put together all tables

05:00

with key 0. And the value that we will use to solve the conflict will be a 3. And finally, the extension, which is a very important feature of the language. What it does is to map an associative table by changing the values with respect to some to some function. So, in this case, it tells us just to divide the value of x by y and return set.

05:27

And the real question here is what this extension functions can define. For instance, are they allowed to compare keys or not, right? And this is crucial for understanding the expressive power of the language. Here, there is a more sophisticated example of how Lara can express

05:46

some important architectures, the self-attention mechanisms commonly used in machine learning today, but I will not go into details due to the lack of time. So, what is the drastic logic that we're going to use? First-order logic with aggregation.

06:02

So, it's a two sorted version of first-order, very, very well known, in which we will have both key and value variables and elements, plus aggregator. So, here for instance, we have a formula that outputs all keys that are keys such that

06:22

together with another key k prime, they are a tuple and they define a tuple on table R with value b, and then I will output the value b prime of all, which is the maximum value of all b's that are associated with k in this state, right?

06:46

So now, we are going to then compare the expressive power of our logic in terms of first-order logic with aggregation, of Lara in terms of first-order logic with aggregation. So, we need a couple of assumptions. The semantics, the syntax of Lara is very general. So, we need to

07:08

consider some instantiations of it to obtain this result. First of all, we assume that the domain of keys equals the domain of values, which is a common assumption made by Lara, which allows

07:20

it to compare between the two sorts, and also that there are two distinguished constants, 0 and 1, that we can refer to, and it will help us to match the semantics of the two languages. So, it shouldn't be so surprising, but Lara's expression with some extension functions in a set omega can be translated into the

07:43

logic first-order with aggregation. If we allow built-in predicates for the extension functions. And since Lara is really a positive language, there is no negation involved, the translation goes into the negation-free fragment of first-order with aggregation.

08:03

Now, for the opposite directions, embedding first-order with aggregation into Lara, which is more interesting, we need further assumptions. First of all, as usual, when we deal with first-order versus an algebra, we need to assume that language is safe

08:20

with standard. We also need to assume that some specific extension functions are available in Lara, for instance, that we can copy an attribute into a fresh one, that we can add an attribute or a value in which all elements are zero, that we can compare whether two keys are equal or non-equal, and so on.

08:40

And also, and very importantly, that we use a key constraint semantics, right? So, semantics of first-order logic doesn't assume the presence of keys, and we have to impose them, and we have to impose them by when we take, for instance, the union of two tables, the result might no longer be an associative table,

09:00

and we will have to explain how to resolve the key valuations with some aggregate over it. So under these conditions, we can prove that safe first-order with aggregation formulas under the key constraint semantics, with access to some built-in pregates that represent extension functions, can be translated into Lara expressions,

09:21

assuming that these are specific extension functions that we mentioned, like copy, add, and so on, are available. The important observation here is that expressions in Lara are positive, but negation can be encoded in the language with particular extension functions as the ones that we specified.

09:41

Now, what about the relative expressiveness of Lara? What it can express in terms of some common machine learning obligations? This really depends on which extension functions are permitted. More in particular, it depends on the variations allowed for comparing keys or values. Now, what it is easy to extend to put in the language is comparisons for values. This

10:03

corresponds to essentially the numerical sort, and thus we can add arbitrary numerical comparisons on top of it. What is hard is when we want to compare the non-numerical sort of keys, because adding comparisons over them corresponds, for instance, to adding a built-in

10:21

linear order or addition over the domain of a structure, which is a functionality that we know from final model theory complicates the study of the language. It extends the expressed power, but complicates the study. And so we start with a simple language called tame Lara, in which we do not allow to compare keys, only to compare

10:45

values with arbitrary numerical predicates, but keys can be compared only with respect to equality. And this language is important because it has two properties. It is generic with respect to keys, and it can be translated into an extension of first-order with aggregation without any built-in

11:03

predicates, right, with arbitrary numerical predicates. So what about, for instance, an important operation like convolution, which is an extended version of dot product with respect to matrices, and it's very popular in machine learning.

11:21

Convolution is not expressible in this tame Lara, and why? Easily, because convolution is not a generic query. It really depends on the order in which we put the elements in the matrix, right? And tame Lara was generic. What do we need to express convolution then?

11:40

Convolution is expressible in tame Lara if we extend comparisons of the keys with a linear order. If we compare whether a key is smaller than another. This language, which is quite powerful, we denote by tame Lara with an order, and this is sufficient to express convolution. Now, what about inversion?

12:04

Inversion is not expressible in tame Lara either, and the reason is that inversion is not a local query, while tame Lara can be embedded in this fragment of first-order with aggregation, which is local, right? Now, can inversion be expressed in this extension with an order in the domain?

12:25

Very difficult to answer, but at least under complexity assumptions, it cannot. Tame Lara with this linear order doesn't express inverses either, and it seems that some form of recursion is needed for this.

12:41

So, concluding remarks, Lara can express a vast array of aberrations, but sometimes it requires complicated features like this linear order on the domain, and we believe that there is still room for a simple language charger that provides for simpler way to express this and many others, and

13:02

most theory or complexity related questions. Many thanks.