On the Expressiveness of LARA: A Unified Language for Linear and Relational Algebra - TIB AV-Portal

On the Expressiveness of LARA: A Unified Language for Linear and Relational Algebra

00:00

20

Related Material

International Conference on Database Theory (ICDT)

Barceló, Pablo Higuera, Nelson Pérez, Jorge Subercaseaux, Bernardo

Formal Metadata

Title

On the Expressiveness of LARA: A Unified Language for Linear and Relational Algebra

Title of Series

23rd International Conference on Database Theory (ICDT 2020)

Number of Parts

25

Author

Barceló, Pablo

Higuera, Nelson

Subercaseaux, Bernardo

License

CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.

Identifiers

10.5446/46829 (DOI)

Publisher

International Conference on Database Theory (ICDT)

Release Date

Language

Content Metadata

Subject Area

Computer Science

Genre

Conference/Talk

Abstract

We study the expressive power of the Lara language - a recently proposed unified model for expressing relational and linear algebra operations - both in terms of traditional database query languages and some analytic tasks often performed in machine learning pipelines. We start by showing Lara to be expressive complete with respect to first-order logic with aggregation. Since Lara is parameterized by a set of user-defined functions which allow to transform values in tables, the exact expressive power of the language depends on how these functions are defined. We distinguish two main cases depending on the level of genericity queries are enforced to satisfy. Under strong genericity assumptions the language cannot express matrix convolution, a very important operation in current machine learning operations. This language is also local, and thus cannot express operations such as matrix inverse that exhibit a recursive behavior. For expressing convolution, one can relax the genericity requirement by adding an underlying linear order on the domain. This, however, destroys locality and turns the expressive power of the language much more difficult to understand. In particular, although under complexity assumptions the resulting language can still not express matrix inverse, a proof of this fact without such assumptions seems challenging to obtain.

23rd International Conference on Database Theory (ICDT 2020)15 / 25

1

15:02

A Dichotomy for Homomorphism_Closed Queries on Probalistic Graphs

2

10:11

Distribution Constraints

3

11:59

The Shapley Value of Database Tuples

4

12:30

Infinite Probabilistic Databases

5

12:53

Executable First-order Queries in the Logic of Information Flows

6

10:24

Weight Annotation in Information Extraction

7

12:16

Datalog with Negation and Monotonicity

8

10:31

Towards Streaming Evaluation of Queries with Correlation in Complex Event Processing

9

12:08

The Space Complexity of Inner Product Filters

10

10:15

Coordination-Free Byzantine Replication with Minimal Communication Costs

11

12:10

Random Sampling and Size Estimation Over Cyclic Joins

12

12:16

Containment of UC2RPQ: The Hard and Easy Cases

13

09:01

Dynamic Complexity of Document Spanners

14

12:04

On the Expressiveness of Languages for Complex Event Recognition

15

13:26

On the Expressiveness of LARA: A Unified Language for Linear and Relational Algebra

16

10:52

Integrity Constraints Revisited: From Exact to Approximate Implication

17

14:41

Reverse Prevention Sampling for Misinformation Mitigation in Social Networks

18

12:00

A Simple Parallel Algorithm for Natural Joins on Binary Relations

19

11:51

On Equivalence and Cores for Incomplete Databases in Open and Closed Worlds

20

14:15

When Can Matrix Query Languages Discern Matrices?

21

12:32

Optimal Joins Using Compact Data Structures

22

11:11

A Family of Centrality Measures for Graph Data Based on Subgraphs

23

45:00

Current Challenges in Graph Databases (Invited Talk)

24

15:05

Foundations of SPARQL Query Optimization

25

51:16

Facets of Probabilistic Databases

Automatic playback

Speech

Text

Image

00:00

Local area networkRelational databaseLinear mapSigma-algebraFormal languageData managementCoefficient of determinationMetropolitan area networkArmRegular expressionLinearizationVirtual machineOcean currentFormal languageComputer virusGeneral relativityComputer animation

00:26

MultiplicationProduct (business)Cartesian productFinitary relationLinear mapRelational databaseSigma-algebraQuery languageTensorVirtual machineGeneral relativitySigma-algebraFunctional programmingStructural loadLinear algebraMatrix (mathematics)Table (information)CodeTheory of relativityCASE <Informatik>Group actionExecution unitMultiplicationLattice (order)BuildingComputer animation

00:56

Relational databaseMathematical optimizationNatural numberSigma-algebraTensorLinear mapSigma-algebraFunctional programmingMathematical optimizationComputer animation

01:20

Relational databaseSigma-algebraLinear mapKey (cryptography)Sigma-algebraData modelTable (information)Descriptive statisticsAttribute grammarDatabaseFood energyComputer animation

01:41

Table (information)Associative propertyKey (cryptography)Table (information)AreaConstraint (mathematics)Key (cryptography)Constructor (object-oriented programming)Sigma-algebraOperations support systemRevision controlComputer animation

02:06

Operator (mathematics)Extension (kinesiology)Function (mathematics)Table (information)Associative propertyOrder (biology)Extension (kinesiology)Table (information)Different (Kate Ryan album)First-order logicTheoryFormal languagePower (physics)MappingFlow separationFunctional (mathematics)Group actionConstraint (mathematics)Web pageComputer animation

02:54

Relational databaseSigma-algebraKey (cryptography)Linear mapLinear algebraCartesian coordinate systemGeneral relativityOperator (mathematics)Power (physics)DatabaseRegular expressionFirst-order logicFunctional programmingPredicate (grammar)Virtual machineQuery languageQuicksortNormal (geometry)Logic gateComputer animation

03:30

Term (mathematics)Operations researchObservational studyKey (cryptography)Semantics (computer science)Attribute grammarAreaLogic gateComputer animation

03:48

Semantics (computer science)Revision controlTable (information)Attribute grammarOperations support systemSystem callGame theoryCASE <Informatik>Execution unitComputer animation

04:33

Projective planeRevision controlTable (information)Formal languageOperations support systemConstraint (mathematics)Multiplication signGroup actionCASE <Informatik>Goodness of fitWhiteboardQuicksortSymbol tableAreaWorkstation <Musikinstrument>Computer animation

05:18

Semantics (computer science)Division (mathematics)Extension (kinesiology)Extension (kinesiology)Power (physics)Functional (mathematics)Mechanism designSet (mathematics)WhiteboardComputer animation

05:50

Query languageKey (cryptography)PressureVirtual machineMultiplication signVariable (mathematics)Well-formed formulaInstance (computer science)Software testingService-oriented architectureComputer animation

06:10

Term (mathematics)LogicOrder (biology)First-order logicLogicBitPower (physics)TupleTable (information)Well-formed formulaKey (cryptography)Prime idealMaxima and minimaFunction (mathematics)Computer animation

06:57

Key (cryptography)Logical constantKey (cryptography)Extension (kinesiology)Functional (mathematics)Semantics (computer science)First-order logicLogicRegular expressionDomain-specific languageResultantGoodness of fitLogic gateObservational studyProcess (computing)Labour Party (Malta)Associative propertyComputer animation

07:44

TheoremPredicate (grammar)Function (mathematics)Regular expressionExtension (kinesiology)Translation (relic)Translation (relic)First-order logicNegative numberExtension (kinesiology)Functional (mathematics)QuicksortFormal languageComputer animation

08:15

Formal languageFunction (mathematics)Extension (kinesiology)Latent heatConstraint (mathematics)Semantics (computer science)Complete metric spaceFunctional (mathematics)Extension (kinesiology)Key (cryptography)Constraint (mathematics)Well-formed formulaElement (mathematics)Attribute grammarLatent heatFirst-order logicSemantics (computer science)ResultantValuation (algebra)Formal languageGroup actionCondition numberGradientBeat (acoustics)Workstation <Musikinstrument>Level (video gaming)Computer animation

09:18

TheoremFunction (mathematics)Extension (kinesiology)Codierung <Programmierung>Semantics (computer science)Constraint (mathematics)Regular expressionLatent heatSign (mathematics)Complete metric spaceFormal languageFunctional (mathematics)Negative numberRegular expressionVirtual machineExtension (kinesiology)State observer1 (number)Latent heatGeneral relativityFirst-order logicThermal expansionGroup actionLevel (video gaming)Workstation <Musikinstrument>Machine learningComputer animation

09:52

Key (cryptography)Extension (kinesiology)Function (mathematics)Operations researchPairwise comparisonLinear mapDomain-specific languagePredicate (grammar)Pairwise comparisonNumeral (linguistics)Observational studyQuicksortAdditionBounded variationFormal languageKey (cryptography)Functional programmingOrder (biology)LinearizationModel theoryDomain-specific languageTime zoneService-oriented architectureSingle-precision floating-point formatGodComputer animation

10:49

Extension (kinesiology)Formal languageKey (cryptography)Predicate (grammar)Numerical analysisMachine learningPredicate (grammar)First-order logicOperations support systemExtension (kinesiology)Product (business)Revision controlNumeral (linguistics)Computer animation

11:18

Term (mathematics)ConvolutionConvolutionOrder (biology)Query languageElement (mathematics)Computer animation

11:32

Term (mathematics)Regular expressionConvolutionKey (cryptography)Query languageGeneric programmingPropositional formulaTaylor seriesFormal languagePairwise comparisonConvolutionKey (cryptography)Order (biology)Regular expressionComputer animation

11:48

Term (mathematics)TheoremFormal languageConvolutionRegular expressionLinear mapFirst-order logicInversion (music)Local ringAdditionMachine visionStress (mechanics)Statement (computer science)Computer animation

12:09

Inversion (music)Term (mathematics)TheoremFormal languageConvolutionRegular expressionLinear mapPropositional formulaQuery languageLocal ringExtension (kinesiology)RecursionForm (programming)Inversion (music)Order (biology)Complex (psychology)Event horizonBitReduction of orderComputer animation

12:34

Term (mathematics)RecursionPropositional formulaKolmogorov complexityInverse elementOperator (mathematics)Right angleFormal languageQuicksortLine (geometry)Observational studyGoodness of fitAreaOrder (biology)Computer animation

12:52

Operations researchExpressionFormal languageSigma-algebraKolmogorov complexityComplex (psychology)TheoryComputer animation

Transcript: English(auto-generated)

00:02

Welcome everybody. Hello. I am Pablo Barcelo. This is our talk on the expressiveness of LARA, unified language for linear and relational algebra, joined work with Nelson Iguerra, Jorge Perez, and Bernardo Suercaso. So current data management, data science, machine learning obligations

00:22

demand applying methods from both relational algebra and linear algebra. Relational algebra deals with relational tables to extract, transform, load the data, and linear algebra functionalities to perform analytical tasks, in this case over matrices or tensors.

00:43

So, then there has been a natural code for an algebra that captures the functionalities needed in both worlds, and also in the machine learning community for a high-level query language for handling tensors properly, an API or a query language, and also for developing

01:01

optimization techniques that are multi-system, that is, that apply to both worlds together. So, was a proposal for an algebra that merges functionalities from both linear and relational algebra. This was proposed by Hutchinson, Ho, and

01:23

Soutou in a workshop in Sigma 2017. And here there is a brief, brief description of what LARA does. So, LARA, the data model of LARA are associated tables in which we have attributes that are keys, and

01:41

we also have values. Each table of keys is associated with a table of variables, and keys are really keys in the database sense. That is, there are no two tables in the associated table with the same key. And the syntax of the algebra is very minimalist. It consists of four constructs. The first one says, I can refer to a table.

02:03

I can take the join of two tables with some binary operator that helps us to solve conflicts due to the key constraints. I can take the union of two tables, which is this join looking down with an aggregate operator, or I can extend a table

02:24

with an extension function, which maps, essentially maps values to different values, right? And it is these extension functions that define the value, the expressive power of this language.

02:42

So, although several theoretical properties were studied in terms of what this language can express in the original paper, several orders of interest were left open. In particular, how powerful is LARA for capturing these two languages, linear algebra and relational algebra?

03:04

Which kind of machine learning applications or functionalities does it really subsume? And what is its expressive power in terms of usual query languages that we know in the database context? So, in our work, we study expressiveness of LARA by using the yardstick of first-order with aggregation plus some built-in predicates

03:27

that we will have to to specify. And we will also compare its relative expressiveness in terms of some common machine learning operations. So, let me briefly explain the semantics of LARA.

03:41

Well, we can take the join of two tables as usual. So, here over common predicates, over common key attributes. So, here we have key attributes i and j, here j and k. So, the first table here is 0, 0 in A. It joins with the first table in B00. So, this gives us the first table of

04:05

the join 0, 0, 0. And the question is which value do we put? We have a value coming from table A and we have a value coming from table B. But then the binary operator in the join tells us how to solve the conflict by taking the multiplication. So, it will be a 1.

04:22

Union is similar, but when taking the union of two tables, we will project only on those key attributes that are common. So, in this case, we only keep j. We put all tables

04:40

together. We just take the union of all these tables in the projection. But now the projection can violate the key constraint. And the question is how do we solve it? This time by using the aggregate operator in the union symbol, which is addition. So, we will put together all tables

05:00

with key 0. And the value that we will use to solve the conflict will be a 3. And finally, the extension, which is a very important feature of the language. What it does is to map an associative table by changing the values with respect to some to some function. So, in this case, it tells us just to divide the value of x by y and return set.

05:27

And the real question here is what this extension functions can define. For instance, are they allowed to compare keys or not, right? And this is crucial for understanding the expressive power of the language. Here, there is a more sophisticated example of how Lara can express

05:46

some important architectures, the self-attention mechanisms commonly used in machine learning today, but I will not go into details due to the lack of time. So, what is the drastic logic that we're going to use? First-order logic with aggregation.

06:02

So, it's a two sorted version of first-order, very, very well known, in which we will have both key and value variables and elements, plus aggregator. So, here for instance, we have a formula that outputs all keys that are keys such that

06:22

together with another key k prime, they are a tuple and they define a tuple on table R with value b, and then I will output the value b prime of all, which is the maximum value of all b's that are associated with k in this state, right?

06:46

So now, we are going to then compare the expressive power of our logic in terms of first-order logic with aggregation, of Lara in terms of first-order logic with aggregation. So, we need a couple of assumptions. The semantics, the syntax of Lara is very general. So, we need to

07:08

consider some instantiations of it to obtain this result. First of all, we assume that the domain of keys equals the domain of values, which is a common assumption made by Lara, which allows

07:20

it to compare between the two sorts, and also that there are two distinguished constants, 0 and 1, that we can refer to, and it will help us to match the semantics of the two languages. So, it shouldn't be so surprising, but Lara's expression with some extension functions in a set omega can be translated into the

07:43

logic first-order with aggregation. If we allow built-in predicates for the extension functions. And since Lara is really a positive language, there is no negation involved, the translation goes into the negation-free fragment of first-order with aggregation.

08:03

Now, for the opposite directions, embedding first-order with aggregation into Lara, which is more interesting, we need further assumptions. First of all, as usual, when we deal with first-order versus an algebra, we need to assume that language is safe

08:20

with standard. We also need to assume that some specific extension functions are available in Lara, for instance, that we can copy an attribute into a fresh one, that we can add an attribute or a value in which all elements are zero, that we can compare whether two keys are equal or non-equal, and so on.

08:40

And also, and very importantly, that we use a key constraint semantics, right? So, semantics of first-order logic doesn't assume the presence of keys, and we have to impose them, and we have to impose them by when we take, for instance, the union of two tables, the result might no longer be an associative table,

09:00

and we will have to explain how to resolve the key valuations with some aggregate over it. So under these conditions, we can prove that safe first-order with aggregation formulas under the key constraint semantics, with access to some built-in pregates that represent extension functions, can be translated into Lara expressions,

09:21

assuming that these are specific extension functions that we mentioned, like copy, add, and so on, are available. The important observation here is that expressions in Lara are positive, but negation can be encoded in the language with particular extension functions as the ones that we specified.

09:41

Now, what about the relative expressiveness of Lara? What it can express in terms of some common machine learning obligations? This really depends on which extension functions are permitted. More in particular, it depends on the variations allowed for comparing keys or values. Now, what it is easy to extend to put in the language is comparisons for values. This

10:03

corresponds to essentially the numerical sort, and thus we can add arbitrary numerical comparisons on top of it. What is hard is when we want to compare the non-numerical sort of keys, because adding comparisons over them corresponds, for instance, to adding a built-in

10:21

linear order or addition over the domain of a structure, which is a functionality that we know from final model theory complicates the study of the language. It extends the expressed power, but complicates the study. And so we start with a simple language called tame Lara, in which we do not allow to compare keys, only to compare

10:45

values with arbitrary numerical predicates, but keys can be compared only with respect to equality. And this language is important because it has two properties. It is generic with respect to keys, and it can be translated into an extension of first-order with aggregation without any built-in

11:03

predicates, right, with arbitrary numerical predicates. So what about, for instance, an important operation like convolution, which is an extended version of dot product with respect to matrices, and it's very popular in machine learning.

11:21

Convolution is not expressible in this tame Lara, and why? Easily, because convolution is not a generic query. It really depends on the order in which we put the elements in the matrix, right? And tame Lara was generic. What do we need to express convolution then?

11:40

Convolution is expressible in tame Lara if we extend comparisons of the keys with a linear order. If we compare whether a key is smaller than another. This language, which is quite powerful, we denote by tame Lara with an order, and this is sufficient to express convolution. Now, what about inversion?

12:04

Inversion is not expressible in tame Lara either, and the reason is that inversion is not a local query, while tame Lara can be embedded in this fragment of first-order with aggregation, which is local, right? Now, can inversion be expressed in this extension with an order in the domain?

12:25

Very difficult to answer, but at least under complexity assumptions, it cannot. Tame Lara with this linear order doesn't express inverses either, and it seems that some form of recursion is needed for this.

12:41

So, concluding remarks, Lara can express a vast array of aberrations, but sometimes it requires complicated features like this linear order on the domain, and we believe that there is still room for a simple language charger that provides for simpler way to express this and many others, and

13:02

most theory or complexity related questions. Many thanks.