We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

On the Expressiveness of LARA: A Unified Language for Linear and Relational Algebra

00:00

Formal Metadata

Title
On the Expressiveness of LARA: A Unified Language for Linear and Relational Algebra
Title of Series
Number of Parts
25
Author
License
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
We study the expressive power of the Lara language - a recently proposed unified model for expressing relational and linear algebra operations - both in terms of traditional database query languages and some analytic tasks often performed in machine learning pipelines. We start by showing Lara to be expressive complete with respect to first-order logic with aggregation. Since Lara is parameterized by a set of user-defined functions which allow to transform values in tables, the exact expressive power of the language depends on how these functions are defined. We distinguish two main cases depending on the level of genericity queries are enforced to satisfy. Under strong genericity assumptions the language cannot express matrix convolution, a very important operation in current machine learning operations. This language is also local, and thus cannot express operations such as matrix inverse that exhibit a recursive behavior. For expressing convolution, one can relax the genericity requirement by adding an underlying linear order on the domain. This, however, destroys locality and turns the expressive power of the language much more difficult to understand. In particular, although under complexity assumptions the resulting language can still not express matrix inverse, a proof of this fact without such assumptions seems challenging to obtain.
Local area networkRelational databaseLinear mapSigma-algebraFormal languageData managementCoefficient of determinationMetropolitan area networkArmRegular expressionLinearizationVirtual machineOcean currentFormal languageComputer virusGeneral relativityComputer animation
MultiplicationProduct (business)Cartesian productFinitary relationLinear mapRelational databaseSigma-algebraQuery languageTensorVirtual machineGeneral relativitySigma-algebraFunctional programmingStructural loadLinear algebraMatrix (mathematics)Table (information)CodeTheory of relativityCASE <Informatik>Group actionExecution unitMultiplicationLattice (order)BuildingComputer animation
Relational databaseMathematical optimizationNatural numberSigma-algebraTensorLinear mapSigma-algebraFunctional programmingMathematical optimizationComputer animation
Relational databaseSigma-algebraLinear mapKey (cryptography)Sigma-algebraData modelTable (information)Descriptive statisticsAttribute grammarDatabaseFood energyComputer animation
Table (information)Associative propertyKey (cryptography)Table (information)AreaConstraint (mathematics)Key (cryptography)Constructor (object-oriented programming)Sigma-algebraOperations support systemRevision controlComputer animation
Operator (mathematics)Extension (kinesiology)Function (mathematics)Table (information)Associative propertyOrder (biology)Extension (kinesiology)Table (information)Different (Kate Ryan album)First-order logicTheoryFormal languagePower (physics)MappingFlow separationFunctional (mathematics)Group actionConstraint (mathematics)Web pageComputer animation
Relational databaseSigma-algebraKey (cryptography)Linear mapLinear algebraCartesian coordinate systemGeneral relativityOperator (mathematics)Power (physics)DatabaseRegular expressionFirst-order logicFunctional programmingPredicate (grammar)Virtual machineQuery languageQuicksortNormal (geometry)Logic gateComputer animation
Term (mathematics)Operations researchObservational studyKey (cryptography)Semantics (computer science)Attribute grammarAreaLogic gateComputer animation
Semantics (computer science)Revision controlTable (information)Attribute grammarOperations support systemSystem callGame theoryCASE <Informatik>Execution unitComputer animation
Projective planeRevision controlTable (information)Formal languageOperations support systemConstraint (mathematics)Multiplication signGroup actionCASE <Informatik>Goodness of fitWhiteboardQuicksortSymbol tableAreaWorkstation <Musikinstrument>Computer animation
Semantics (computer science)Division (mathematics)Extension (kinesiology)Extension (kinesiology)Power (physics)Functional (mathematics)Mechanism designSet (mathematics)WhiteboardComputer animation
Query languageKey (cryptography)PressureVirtual machineMultiplication signVariable (mathematics)Well-formed formulaInstance (computer science)Software testingService-oriented architectureComputer animation
Term (mathematics)LogicOrder (biology)First-order logicLogicBitPower (physics)TupleTable (information)Well-formed formulaKey (cryptography)Prime idealMaxima and minimaFunction (mathematics)Computer animation
Key (cryptography)Logical constantKey (cryptography)Extension (kinesiology)Functional (mathematics)Semantics (computer science)First-order logicLogicRegular expressionDomain-specific languageResultantGoodness of fitLogic gateObservational studyProcess (computing)Labour Party (Malta)Associative propertyComputer animation
TheoremPredicate (grammar)Function (mathematics)Regular expressionExtension (kinesiology)Translation (relic)Translation (relic)First-order logicNegative numberExtension (kinesiology)Functional (mathematics)QuicksortFormal languageComputer animation
Formal languageFunction (mathematics)Extension (kinesiology)Latent heatConstraint (mathematics)Semantics (computer science)Complete metric spaceFunctional (mathematics)Extension (kinesiology)Key (cryptography)Constraint (mathematics)Well-formed formulaElement (mathematics)Attribute grammarLatent heatFirst-order logicSemantics (computer science)ResultantValuation (algebra)Formal languageGroup actionCondition numberGradientBeat (acoustics)Workstation <Musikinstrument>Level (video gaming)Computer animation
TheoremFunction (mathematics)Extension (kinesiology)Codierung <Programmierung>Semantics (computer science)Constraint (mathematics)Regular expressionLatent heatSign (mathematics)Complete metric spaceFormal languageFunctional (mathematics)Negative numberRegular expressionVirtual machineExtension (kinesiology)State observer1 (number)Latent heatGeneral relativityFirst-order logicThermal expansionGroup actionLevel (video gaming)Workstation <Musikinstrument>Machine learningComputer animation
Key (cryptography)Extension (kinesiology)Function (mathematics)Operations researchPairwise comparisonLinear mapDomain-specific languagePredicate (grammar)Pairwise comparisonNumeral (linguistics)Observational studyQuicksortAdditionBounded variationFormal languageKey (cryptography)Functional programmingOrder (biology)LinearizationModel theoryDomain-specific languageTime zoneService-oriented architectureSingle-precision floating-point formatGodComputer animation
Extension (kinesiology)Formal languageKey (cryptography)Predicate (grammar)Numerical analysisMachine learningPredicate (grammar)First-order logicOperations support systemExtension (kinesiology)Product (business)Revision controlNumeral (linguistics)Computer animation
Term (mathematics)ConvolutionConvolutionOrder (biology)Query languageElement (mathematics)Computer animation
Term (mathematics)Regular expressionConvolutionKey (cryptography)Query languageGeneric programmingPropositional formulaTaylor seriesFormal languagePairwise comparisonConvolutionKey (cryptography)Order (biology)Regular expressionComputer animation
Term (mathematics)TheoremFormal languageConvolutionRegular expressionLinear mapFirst-order logicInversion (music)Local ringAdditionMachine visionStress (mechanics)Statement (computer science)Computer animation
Inversion (music)Term (mathematics)TheoremFormal languageConvolutionRegular expressionLinear mapPropositional formulaQuery languageLocal ringExtension (kinesiology)RecursionForm (programming)Inversion (music)Order (biology)Complex (psychology)Event horizonBitReduction of orderComputer animation
Term (mathematics)RecursionPropositional formulaKolmogorov complexityInverse elementOperator (mathematics)Right angleFormal languageQuicksortLine (geometry)Observational studyGoodness of fitAreaOrder (biology)Computer animation
Operations researchExpressionFormal languageSigma-algebraKolmogorov complexityComplex (psychology)TheoryComputer animation
Transcript: English(auto-generated)
Welcome everybody. Hello. I am Pablo Barcelo. This is our talk on the expressiveness of LARA, unified language for linear and relational algebra, joined work with Nelson Iguerra, Jorge Perez, and Bernardo Suercaso. So current data management, data science, machine learning obligations
demand applying methods from both relational algebra and linear algebra. Relational algebra deals with relational tables to extract, transform, load the data, and linear algebra functionalities to perform analytical tasks, in this case over matrices or tensors.
So, then there has been a natural code for an algebra that captures the functionalities needed in both worlds, and also in the machine learning community for a high-level query language for handling tensors properly, an API or a query language, and also for developing
optimization techniques that are multi-system, that is, that apply to both worlds together. So, was a proposal for an algebra that merges functionalities from both linear and relational algebra. This was proposed by Hutchinson, Ho, and
Soutou in a workshop in Sigma 2017. And here there is a brief, brief description of what LARA does. So, LARA, the data model of LARA are associated tables in which we have attributes that are keys, and
we also have values. Each table of keys is associated with a table of variables, and keys are really keys in the database sense. That is, there are no two tables in the associated table with the same key. And the syntax of the algebra is very minimalist. It consists of four constructs. The first one says, I can refer to a table.
I can take the join of two tables with some binary operator that helps us to solve conflicts due to the key constraints. I can take the union of two tables, which is this join looking down with an aggregate operator, or I can extend a table
with an extension function, which maps, essentially maps values to different values, right? And it is these extension functions that define the value, the expressive power of this language.
So, although several theoretical properties were studied in terms of what this language can express in the original paper, several orders of interest were left open. In particular, how powerful is LARA for capturing these two languages, linear algebra and relational algebra?
Which kind of machine learning applications or functionalities does it really subsume? And what is its expressive power in terms of usual query languages that we know in the database context? So, in our work, we study expressiveness of LARA by using the yardstick of first-order with aggregation plus some built-in predicates
that we will have to to specify. And we will also compare its relative expressiveness in terms of some common machine learning operations. So, let me briefly explain the semantics of LARA.
Well, we can take the join of two tables as usual. So, here over common predicates, over common key attributes. So, here we have key attributes i and j, here j and k. So, the first table here is 0, 0 in A. It joins with the first table in B00. So, this gives us the first table of
the join 0, 0, 0. And the question is which value do we put? We have a value coming from table A and we have a value coming from table B. But then the binary operator in the join tells us how to solve the conflict by taking the multiplication. So, it will be a 1.
Union is similar, but when taking the union of two tables, we will project only on those key attributes that are common. So, in this case, we only keep j. We put all tables
together. We just take the union of all these tables in the projection. But now the projection can violate the key constraint. And the question is how do we solve it? This time by using the aggregate operator in the union symbol, which is addition. So, we will put together all tables
with key 0. And the value that we will use to solve the conflict will be a 3. And finally, the extension, which is a very important feature of the language. What it does is to map an associative table by changing the values with respect to some to some function. So, in this case, it tells us just to divide the value of x by y and return set.
And the real question here is what this extension functions can define. For instance, are they allowed to compare keys or not, right? And this is crucial for understanding the expressive power of the language. Here, there is a more sophisticated example of how Lara can express
some important architectures, the self-attention mechanisms commonly used in machine learning today, but I will not go into details due to the lack of time. So, what is the drastic logic that we're going to use? First-order logic with aggregation.
So, it's a two sorted version of first-order, very, very well known, in which we will have both key and value variables and elements, plus aggregator. So, here for instance, we have a formula that outputs all keys that are keys such that
together with another key k prime, they are a tuple and they define a tuple on table R with value b, and then I will output the value b prime of all, which is the maximum value of all b's that are associated with k in this state, right?
So now, we are going to then compare the expressive power of our logic in terms of first-order logic with aggregation, of Lara in terms of first-order logic with aggregation. So, we need a couple of assumptions. The semantics, the syntax of Lara is very general. So, we need to
consider some instantiations of it to obtain this result. First of all, we assume that the domain of keys equals the domain of values, which is a common assumption made by Lara, which allows
it to compare between the two sorts, and also that there are two distinguished constants, 0 and 1, that we can refer to, and it will help us to match the semantics of the two languages. So, it shouldn't be so surprising, but Lara's expression with some extension functions in a set omega can be translated into the
logic first-order with aggregation. If we allow built-in predicates for the extension functions. And since Lara is really a positive language, there is no negation involved, the translation goes into the negation-free fragment of first-order with aggregation.
Now, for the opposite directions, embedding first-order with aggregation into Lara, which is more interesting, we need further assumptions. First of all, as usual, when we deal with first-order versus an algebra, we need to assume that language is safe
with standard. We also need to assume that some specific extension functions are available in Lara, for instance, that we can copy an attribute into a fresh one, that we can add an attribute or a value in which all elements are zero, that we can compare whether two keys are equal or non-equal, and so on.
And also, and very importantly, that we use a key constraint semantics, right? So, semantics of first-order logic doesn't assume the presence of keys, and we have to impose them, and we have to impose them by when we take, for instance, the union of two tables, the result might no longer be an associative table,
and we will have to explain how to resolve the key valuations with some aggregate over it. So under these conditions, we can prove that safe first-order with aggregation formulas under the key constraint semantics, with access to some built-in pregates that represent extension functions, can be translated into Lara expressions,
assuming that these are specific extension functions that we mentioned, like copy, add, and so on, are available. The important observation here is that expressions in Lara are positive, but negation can be encoded in the language with particular extension functions as the ones that we specified.
Now, what about the relative expressiveness of Lara? What it can express in terms of some common machine learning obligations? This really depends on which extension functions are permitted. More in particular, it depends on the variations allowed for comparing keys or values. Now, what it is easy to extend to put in the language is comparisons for values. This
corresponds to essentially the numerical sort, and thus we can add arbitrary numerical comparisons on top of it. What is hard is when we want to compare the non-numerical sort of keys, because adding comparisons over them corresponds, for instance, to adding a built-in
linear order or addition over the domain of a structure, which is a functionality that we know from final model theory complicates the study of the language. It extends the expressed power, but complicates the study. And so we start with a simple language called tame Lara, in which we do not allow to compare keys, only to compare
values with arbitrary numerical predicates, but keys can be compared only with respect to equality. And this language is important because it has two properties. It is generic with respect to keys, and it can be translated into an extension of first-order with aggregation without any built-in
predicates, right, with arbitrary numerical predicates. So what about, for instance, an important operation like convolution, which is an extended version of dot product with respect to matrices, and it's very popular in machine learning.
Convolution is not expressible in this tame Lara, and why? Easily, because convolution is not a generic query. It really depends on the order in which we put the elements in the matrix, right? And tame Lara was generic. What do we need to express convolution then?
Convolution is expressible in tame Lara if we extend comparisons of the keys with a linear order. If we compare whether a key is smaller than another. This language, which is quite powerful, we denote by tame Lara with an order, and this is sufficient to express convolution. Now, what about inversion?
Inversion is not expressible in tame Lara either, and the reason is that inversion is not a local query, while tame Lara can be embedded in this fragment of first-order with aggregation, which is local, right? Now, can inversion be expressed in this extension with an order in the domain?
Very difficult to answer, but at least under complexity assumptions, it cannot. Tame Lara with this linear order doesn't express inverses either, and it seems that some form of recursion is needed for this.
So, concluding remarks, Lara can express a vast array of aberrations, but sometimes it requires complicated features like this linear order on the domain, and we believe that there is still room for a simple language charger that provides for simpler way to express this and many others, and
most theory or complexity related questions. Many thanks.