We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

On the Expressiveness of LARA: A Unified Language for Linear and Relational Algebra

00:00

Formale Metadaten

Titel
On the Expressiveness of LARA: A Unified Language for Linear and Relational Algebra
Serientitel
Anzahl der Teile
25
Autor
Lizenz
CC-Namensnennung 3.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
We study the expressive power of the Lara language - a recently proposed unified model for expressing relational and linear algebra operations - both in terms of traditional database query languages and some analytic tasks often performed in machine learning pipelines. We start by showing Lara to be expressive complete with respect to first-order logic with aggregation. Since Lara is parameterized by a set of user-defined functions which allow to transform values in tables, the exact expressive power of the language depends on how these functions are defined. We distinguish two main cases depending on the level of genericity queries are enforced to satisfy. Under strong genericity assumptions the language cannot express matrix convolution, a very important operation in current machine learning operations. This language is also local, and thus cannot express operations such as matrix inverse that exhibit a recursive behavior. For expressing convolution, one can relax the genericity requirement by adding an underlying linear order on the domain. This, however, destroys locality and turns the expressive power of the language much more difficult to understand. In particular, although under complexity assumptions the resulting language can still not express matrix inverse, a proof of this fact without such assumptions seems challenging to obtain.
Lokales NetzRelationale DatenbankAlgebraisches ModellLineare AbbildungFormale SpracheDatenverwaltungMetropolitan area networkBestimmtheitsmaßARM <Computerarchitektur>Formale SpracheRelativitätstheorieVirtuelle MaschineLinearisierungComputervirusStrömungsrichtungArithmetischer AusdruckComputeranimation
MultiplikationProdukt <Mathematik>Kartesisches ProduktRelation <Informatik>Relationale DatenbankAlgebraisches ModellLineare AbbildungCodeLineare AlgebraRelativitätstheorieMatrizenrechnungAlgebraisches ModellFunktionalLastTabelleTensorVirtuelle MaschineAbfrageCASE <Informatik>Gebäude <Mathematik>GruppenoperationMultiplikationRechenwerkVollständiger VerbandComputeranimation
Relationale DatenbankGlobale OptimierungNatürliche ZahlAlgebraisches ModellTensorLineare AbbildungGlobale OptimierungAlgebraisches ModellFunktionalComputeranimation
Relationale DatenbankAlgebraisches ModellLineare AbbildungAttributierte GrammatikDatenbankDeskriptive StatistikSigma-AlgebraTabelleDatenmodellSchlüsselverwaltungEnergiedichteComputeranimation
TabelleAssoziativgesetzSchlüsselverwaltungTabelleFlächeninhaltNebenbedingungAlgebraisches ModellVersionsverwaltungNichtlinearer OperatorSchlüsselverwaltungKonstruktor <Informatik>Computeranimation
Nichtlinearer OperatorFunktion <Mathematik>MaßerweiterungTabelleAssoziativgesetzFormale SpracheOrdnung <Mathematik>Kategorie <Mathematik>GrenzschichtablösungFunktionalLeistung <Physik>MaßerweiterungPhysikalische TheorieTabelleTermDifferenteMapping <Computergraphik>NebenbedingungGruppenoperationWeb-SeiteComputeranimation
Relationale DatenbankAlgebraisches ModellSchlüsselverwaltungLineare AbbildungDatenbankLineare AlgebraRelativitätstheorieFunktionalLeistung <Physik>TermVirtuelle MaschineAbfrageNichtlinearer OperatorKartesische KoordinatenPrädikat <Logik>Arithmetischer AusdruckPrädikatenlogik erster StufeVerknüpfungsgliedQuick-SortNormalvektorComputeranimation
TermOperations ResearchBeobachtungsstudieAttributierte GrammatikFormale SemantikSchlüsselverwaltungVerknüpfungsgliedFlächeninhaltComputeranimation
Formale SemantikAttributierte GrammatikTabelleVersionsverwaltungNichtlinearer OperatorSpieltheorieRechenwerkSystemaufrufCASE <Informatik>Computeranimation
Formale SpracheNebenbedingungProjektive EbeneTabelleVersionsverwaltungNichtlinearer OperatorMultiplikationsoperatorGruppenoperationQuick-SortFlächeninhaltGüte der AnpassungCASE <Informatik>Workstation <Musikinstrument>WhiteboardSymboltabelleComputeranimation
Formale SemantikDivisionMaßerweiterungFunktionalLeistung <Physik>MaßerweiterungSchnittmengeMechanismus-Design-TheorieWhiteboardComputeranimation
AbfrageSchlüsselverwaltungDruckverlaufVirtuelle MaschineVariableMultiplikationsoperatorAusdruck <Logik>SoftwaretestInstantiierungServiceorientierte ArchitekturComputeranimation
TermMathematische LogikOrdnung <Mathematik>Mathematische LogikFunktion <Mathematik>Ausdruck <Logik>BitLeistung <Physik>Lokales MinimumPrimidealTabelleTermSchlüsselverwaltungTupelPrädikatenlogik erster StufeComputeranimation
SchlüsselverwaltungKonstanteMathematische LogikFormale SemantikFunktionalMaßerweiterungResultanteArithmetischer AusdruckSchlüsselverwaltungDomain <Netzwerk>Prädikatenlogik erster StufeVerknüpfungsgliedGüte der AnpassungProzess <Informatik>Mehrschichten-PerzeptronBeobachtungsstudieAssoziativgesetzComputeranimation
TheoremPrädikat <Logik>Funktion <Mathematik>MaßerweiterungRegulärer Ausdruck <Textverarbeitung>Translation <Mathematik>FunktionalMaßerweiterungTranslation <Mathematik>Negative ZahlPrädikatenlogik erster StufeFormale SpracheQuick-SortComputeranimation
Formale SpracheFunktion <Mathematik>MaßerweiterungUmwandlungsenthalpieFormale SemantikNebenbedingungVollständigkeitAttributierte GrammatikFormale SpracheFormale SemantikBewertungstheorieAusdruck <Logik>NebenbedingungFunktionalMaßerweiterungResultanteUmwandlungsenthalpieSchlüsselverwaltungElement <Gruppentheorie>Prädikatenlogik erster StufeMAPGruppenoperationSchwebungGradientWorkstation <Musikinstrument>KonditionszahlComputeranimation
TheoremFunktion <Mathematik>MaßerweiterungCodierung <Programmierung>Formale SemantikNebenbedingungRegulärer Ausdruck <Textverarbeitung>UmwandlungsenthalpieVorzeichen <Mathematik>VollständigkeitFormale SpracheRelativitätstheorieFunktionalMaßerweiterungTermVirtuelle MaschineLuenberger-BeobachterNegative ZahlUmwandlungsenthalpieArithmetischer AusdruckEinsMAPGruppenoperationWärmeausdehnungWorkstation <Musikinstrument>Algorithmische LerntheorieComputeranimation
SchlüsselverwaltungFunktion <Mathematik>MaßerweiterungOperations ResearchPaarvergleichLineare AbbildungZeitbereichPrädikat <Logik>Formale SpracheOrdnung <Mathematik>ModelltheorieKategorie <Mathematik>FunktionalKardinalzahlPaarvergleichQuick-SortLinearisierungAdditionBeobachtungsstudieSchlüsselverwaltungDomain <Netzwerk>TVD-VerfahrenZeitzoneGrundsätze ordnungsmäßiger DatenverarbeitungEinfache GenauigkeitServiceorientierte ArchitekturComputeranimation
MaßerweiterungFormale SpracheSchlüsselverwaltungPrädikat <Logik>Numerisches VerfahrenProdukt <Mathematik>KardinalzahlMaßerweiterungVersionsverwaltungNichtlinearer OperatorPrädikat <Logik>Algorithmische LerntheoriePrädikatenlogik erster StufeComputeranimation
TermFaltungsoperatorOrdnung <Mathematik>AbfrageFaltungsoperatorElement <Gruppentheorie>Computeranimation
TermFaltungsoperatorRegulärer Ausdruck <Textverarbeitung>AbfrageSchlüsselverwaltungGenerizitätAussage <Mathematik>Formale SpracheTaylor-ReiheOrdnung <Mathematik>PaarvergleichFaltungsoperatorArithmetischer AusdruckSchlüsselverwaltungComputeranimation
TermTheoremFormale SpracheFaltungsoperatorRegulärer Ausdruck <Textverarbeitung>Lineare AbbildungStellenringUmkehrung <Mathematik>Prädikatenlogik erster StufeBildverstehenAdditionDruckspannungBefehl <Informatik>Computeranimation
Umkehrung <Mathematik>TermTheoremFormale SpracheFaltungsoperatorRegulärer Ausdruck <Textverarbeitung>Lineare AbbildungAussage <Mathematik>StellenringAbfrageOrdnung <Mathematik>Rekursive FunktionBildschirmmaskeKomplex <Algebra>MaßerweiterungUmkehrung <Mathematik>OrdnungsreduktionBitEreignishorizontComputeranimation
TermRekursive FunktionAussage <Mathematik>BeschreibungskomplexitätInverseFormale SpracheGeradeQuick-SortFlächeninhaltGüte der AnpassungNichtlinearer OperatorBeobachtungsstudieRechter WinkelOrdnung <Mathematik>Computeranimation
Operations ResearchArithmetischer AusdruckFormale SpracheAlgebraisches ModellBeschreibungskomplexitätKomplex <Algebra>Physikalische TheorieComputeranimation
Transkript: Englisch(automatisch erzeugt)
Welcome everybody. Hello. I am Pablo Barcelo. This is our talk on the expressiveness of LARA, unified language for linear and relational algebra, joined work with Nelson Iguerra, Jorge Perez, and Bernardo Suercaso. So current data management, data science, machine learning obligations
demand applying methods from both relational algebra and linear algebra. Relational algebra deals with relational tables to extract, transform, load the data, and linear algebra functionalities to perform analytical tasks, in this case over matrices or tensors.
So, then there has been a natural code for an algebra that captures the functionalities needed in both worlds, and also in the machine learning community for a high-level query language for handling tensors properly, an API or a query language, and also for developing
optimization techniques that are multi-system, that is, that apply to both worlds together. So, was a proposal for an algebra that merges functionalities from both linear and relational algebra. This was proposed by Hutchinson, Ho, and
Soutou in a workshop in Sigma 2017. And here there is a brief, brief description of what LARA does. So, LARA, the data model of LARA are associated tables in which we have attributes that are keys, and
we also have values. Each table of keys is associated with a table of variables, and keys are really keys in the database sense. That is, there are no two tables in the associated table with the same key. And the syntax of the algebra is very minimalist. It consists of four constructs. The first one says, I can refer to a table.
I can take the join of two tables with some binary operator that helps us to solve conflicts due to the key constraints. I can take the union of two tables, which is this join looking down with an aggregate operator, or I can extend a table
with an extension function, which maps, essentially maps values to different values, right? And it is these extension functions that define the value, the expressive power of this language.
So, although several theoretical properties were studied in terms of what this language can express in the original paper, several orders of interest were left open. In particular, how powerful is LARA for capturing these two languages, linear algebra and relational algebra?
Which kind of machine learning applications or functionalities does it really subsume? And what is its expressive power in terms of usual query languages that we know in the database context? So, in our work, we study expressiveness of LARA by using the yardstick of first-order with aggregation plus some built-in predicates
that we will have to to specify. And we will also compare its relative expressiveness in terms of some common machine learning operations. So, let me briefly explain the semantics of LARA.
Well, we can take the join of two tables as usual. So, here over common predicates, over common key attributes. So, here we have key attributes i and j, here j and k. So, the first table here is 0, 0 in A. It joins with the first table in B00. So, this gives us the first table of
the join 0, 0, 0. And the question is which value do we put? We have a value coming from table A and we have a value coming from table B. But then the binary operator in the join tells us how to solve the conflict by taking the multiplication. So, it will be a 1.
Union is similar, but when taking the union of two tables, we will project only on those key attributes that are common. So, in this case, we only keep j. We put all tables
together. We just take the union of all these tables in the projection. But now the projection can violate the key constraint. And the question is how do we solve it? This time by using the aggregate operator in the union symbol, which is addition. So, we will put together all tables
with key 0. And the value that we will use to solve the conflict will be a 3. And finally, the extension, which is a very important feature of the language. What it does is to map an associative table by changing the values with respect to some to some function. So, in this case, it tells us just to divide the value of x by y and return set.
And the real question here is what this extension functions can define. For instance, are they allowed to compare keys or not, right? And this is crucial for understanding the expressive power of the language. Here, there is a more sophisticated example of how Lara can express
some important architectures, the self-attention mechanisms commonly used in machine learning today, but I will not go into details due to the lack of time. So, what is the drastic logic that we're going to use? First-order logic with aggregation.
So, it's a two sorted version of first-order, very, very well known, in which we will have both key and value variables and elements, plus aggregator. So, here for instance, we have a formula that outputs all keys that are keys such that
together with another key k prime, they are a tuple and they define a tuple on table R with value b, and then I will output the value b prime of all, which is the maximum value of all b's that are associated with k in this state, right?
So now, we are going to then compare the expressive power of our logic in terms of first-order logic with aggregation, of Lara in terms of first-order logic with aggregation. So, we need a couple of assumptions. The semantics, the syntax of Lara is very general. So, we need to
consider some instantiations of it to obtain this result. First of all, we assume that the domain of keys equals the domain of values, which is a common assumption made by Lara, which allows
it to compare between the two sorts, and also that there are two distinguished constants, 0 and 1, that we can refer to, and it will help us to match the semantics of the two languages. So, it shouldn't be so surprising, but Lara's expression with some extension functions in a set omega can be translated into the
logic first-order with aggregation. If we allow built-in predicates for the extension functions. And since Lara is really a positive language, there is no negation involved, the translation goes into the negation-free fragment of first-order with aggregation.
Now, for the opposite directions, embedding first-order with aggregation into Lara, which is more interesting, we need further assumptions. First of all, as usual, when we deal with first-order versus an algebra, we need to assume that language is safe
with standard. We also need to assume that some specific extension functions are available in Lara, for instance, that we can copy an attribute into a fresh one, that we can add an attribute or a value in which all elements are zero, that we can compare whether two keys are equal or non-equal, and so on.
And also, and very importantly, that we use a key constraint semantics, right? So, semantics of first-order logic doesn't assume the presence of keys, and we have to impose them, and we have to impose them by when we take, for instance, the union of two tables, the result might no longer be an associative table,
and we will have to explain how to resolve the key valuations with some aggregate over it. So under these conditions, we can prove that safe first-order with aggregation formulas under the key constraint semantics, with access to some built-in pregates that represent extension functions, can be translated into Lara expressions,
assuming that these are specific extension functions that we mentioned, like copy, add, and so on, are available. The important observation here is that expressions in Lara are positive, but negation can be encoded in the language with particular extension functions as the ones that we specified.
Now, what about the relative expressiveness of Lara? What it can express in terms of some common machine learning obligations? This really depends on which extension functions are permitted. More in particular, it depends on the variations allowed for comparing keys or values. Now, what it is easy to extend to put in the language is comparisons for values. This
corresponds to essentially the numerical sort, and thus we can add arbitrary numerical comparisons on top of it. What is hard is when we want to compare the non-numerical sort of keys, because adding comparisons over them corresponds, for instance, to adding a built-in
linear order or addition over the domain of a structure, which is a functionality that we know from final model theory complicates the study of the language. It extends the expressed power, but complicates the study. And so we start with a simple language called tame Lara, in which we do not allow to compare keys, only to compare
values with arbitrary numerical predicates, but keys can be compared only with respect to equality. And this language is important because it has two properties. It is generic with respect to keys, and it can be translated into an extension of first-order with aggregation without any built-in
predicates, right, with arbitrary numerical predicates. So what about, for instance, an important operation like convolution, which is an extended version of dot product with respect to matrices, and it's very popular in machine learning.
Convolution is not expressible in this tame Lara, and why? Easily, because convolution is not a generic query. It really depends on the order in which we put the elements in the matrix, right? And tame Lara was generic. What do we need to express convolution then?
Convolution is expressible in tame Lara if we extend comparisons of the keys with a linear order. If we compare whether a key is smaller than another. This language, which is quite powerful, we denote by tame Lara with an order, and this is sufficient to express convolution. Now, what about inversion?
Inversion is not expressible in tame Lara either, and the reason is that inversion is not a local query, while tame Lara can be embedded in this fragment of first-order with aggregation, which is local, right? Now, can inversion be expressed in this extension with an order in the domain?
Very difficult to answer, but at least under complexity assumptions, it cannot. Tame Lara with this linear order doesn't express inverses either, and it seems that some form of recursion is needed for this.
So, concluding remarks, Lara can express a vast array of aberrations, but sometimes it requires complicated features like this linear order on the domain, and we believe that there is still room for a simple language charger that provides for simpler way to express this and many others, and
most theory or complexity related questions. Many thanks.