Faster Analytics for Fast Data with Apache Pinot and Flink SQL
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 69 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/67309 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
Berlin Buzzwords 202157 / 69
11
25
39
43
45
51
53
54
60
00:00
Linker (computing)Order (biology)FeedbackMenu (computing)DemosceneData managementMultiplication signData storage device10 (number)2 (number)Real-time operating systemOrder (biology)Level (video gaming)Analytic setCASE <Informatik>Source codeVariety (linguistics)Uniform resource locatorDimensional analysisFlow separationGene clusterView (database)MultiplicationProfil (magazine)Subject indexingStrategy gameSet (mathematics)Streaming mediaSlide ruleCategory of beingDifferent (Kate Ryan album)Concurrency (computer science)Physical systemWordWeb pageCausalityCartesian coordinate systemEvent horizonNumberMetric systemFile formatRootScatteringMathematical analysisFitness functionCoordinate systemQuery languageFunctional (mathematics)Connectivity (graph theory)Replication (computing)Service-oriented architectureElement (mathematics)DatabaseDemo (music)Content (media)Server (computing)WindowPartition (number theory)Game controllerBuildingOperator (mathematics)XMLComputer animation
07:38
Game controllerSoftware bugRootMathematical analysisEvent horizonDigital filterApproximationStructured programmingQuery languageRange (statistics)Subject indexingNetwork topologyInverse problemWindowView (database)Confluence (abstract rewriting)Computer-generated imageryMessage passingGame theoryLoginFormal languageSpring (hydrology)Directed setOpen setFluid staticsPlastikkarteMereologyHacker (term)WeightMountain passSign (mathematics)Metric systemLaserInstance (computer science)Configuration spaceTable (information)Server (computing)Video game consoleReading (process)Streaming mediaMaxima and minimaData typeUniform resource locatorString (computer science)Function (mathematics)Singuläres IntegralLocal GroupPort scannerProgrammable read-only memoryGraphical user interfaceField (computer science)Data storage deviceExecution unitCodierung <Programmierung>RoutingMenu (computing)Text editorInterior (topology)File formatSubsetStatisticsDependent and independent variablesEvent horizonCountingQuery languageOrder (biology)String (computer science)CuboidGeometryService-oriented architectureStrategy gameGroup actionDemo (music)Instance (computer science)Dimensional analysisTheoryResultantSubject indexingStreaming mediaMultiplication signForm (programming)Real-time operating systemoutputObservational studyFile viewerServer (computing)Range (statistics)Process (computing)Analytic setAttribute grammarInformationNetwork topologyFile formatPoint (geometry)Scripting languageOcean currentPresentation of a groupMoment (mathematics)Field (computer science)Functional (mathematics)Table (information)Uniqueness quantificationDifferent (Kate Ryan album)MultiplicationGame theoryRow (database)Total S.A.Default (computer science)Configuration spaceLocal ringUniform resource locatorComputer animationXML
15:07
TouchscreenStreaming mediaTable (information)Partition (number theory)Remote procedure callEvent horizonStapeldateiData storage deviceDatabaseInstallable File SystemDatabase transactionSource codeBlogInternet der DingeReal numberExecution unitState of matterData streamLocal GroupCountingBuildingBlock (periodic table)Query languageSubsetGamma functionoutputKey (cryptography)LogicStreaming mediaProcess (computing)Declarative programming2 (number)QuicksortNumberTask (computing)InformationProduct (business)Event-driven programmingDifferent (Kate Ryan album)Data storage deviceVariety (linguistics)Descriptive statisticsAnalytic continuationStandard deviationDimensional analysisBound stateLocal ringQuery languageReal numberRegular graphWindowMaterialization (paranormal)Primitive (album)Function (mathematics)Table (information)Event horizonReal-time operating systemComputer fileState of matterSource codeIntegrated development environmentMechanism designVisualization (computer graphics)Client (computing)Lattice (order)Remote procedure callSoftware frameworkCartesian coordinate systemData managementPoint (geometry)CASE <Informatik>Scripting languageEqualiser (mathematics)Link (knot theory)SubsetLevel (video gaming)Demo (music)Formal grammarSemantics (computer science)Analytic setFitness functionComputer animationDiagram
20:28
Demo (music)String (computer science)Local GroupPort scannerAsynchronous Transfer ModeTable (information)InformationOnline helpFile formatQuery languageCountingInterior (topology)MiniDiscLimit (category theory)Field (computer science)Server (computing)Physical systemText editorDependent and independent variablesStatisticsData typeVideo game consoleTracing (software)Confluence (abstract rewriting)Metric systemContent (media)Time zoneComputer-generated imageryView (database)Group actionSubsetBinary fileSource codeVisual systemWhiteboardWindowStreaming mediaOcean currentGroup actionDescriptive statisticsBitFigurate numberTable (information)Functional (mathematics)Event horizonQuery languageMultiplication signLine (geometry)CASE <Informatik>Software testingDemo (music)InformationFile formatoutputPoint (geometry)Computer filePartition (number theory)Dimensional analysisResultantKey (cryptography)Real-time operating systemRegular graphRepresentational state transferWordRow (database)Process (computing)WindowOperator (mathematics)SubsetFile systemGame theoryReal number2 (number)Field (computer science)XMLComputer animationProgram flowchart
25:50
Lattice (order)Slide ruleFormal grammarView (database)SubsetSubject indexingLinker (computing)Magnetic stripe cardScale (map)StapeldateiGraphical user interfaceFile formatComputer fileOnline helpSubject indexingSinc functionBitStreaming mediaPoint (geometry)Level (video gaming)SubsetScaling (geometry)Demo (music)InformationComputer architectureMassQuery languageReal numberComputer animation
27:44
Server (computing)RootSoftware bugMathematical analysisSlide ruleComputer fileFormal grammarLinker (computing)Level (video gaming)GoogolFile formatProcess (computing)Real-time operating systemRight angleSlide ruleChannel capacityForm (programming)Server (computing)BitMultiplication signComputer configurationPoint (geometry)Event horizonStapeldateiGame controllerPhysical systemOcean currentService-oriented architectureMechanism designRow (database)WritingINTEGRALDemo (music)Real numberDifferent (Kate Ryan album)MultilaterationComputer animationProgram flowchart
32:05
Semantics (computer science)Division (mathematics)Key (cryptography)Regular graphFunctional (mathematics)CASE <Informatik>Cartesian coordinate systemRight angleMathematicsObservational studyWindow functionCubeLattice (order)Partition (number theory)Raster graphicsEndliche ModelltheorieFocus (optics)Different (Kate Ryan album)MassSubject indexingWebsiteFrame problemMultiplication signImperative programmingTable (information)Data storage deviceQuery languageSet (mathematics)Range (statistics)Server (computing)Computer architectureAnalytic setSource codeCore dumpDataflowOrder (biology)AdditionEvent horizonDemo (music)Network topologyDimensional analysisConstraint (mathematics)Slide ruleGroup actionTrailSymmetric multiprocessingReal-time operating systemXMLUML
Transcript: English(auto-generated)
00:07
Welcome to the session where I'll be talking about how to build complex real-time analytical use cases using Flink and Apache Pino. I guess I can just skip the intro.
00:20
Thanks to Fabian for introducing. Today, I'll begin by discussing the use cases of real-time analytics and shed light on why this is fast becoming an important need for most of the modern businesses today. I'll give an overview of Apache Pino and explain
00:42
why it's fit for building such fast real-time analytic use cases. I'll discuss the ingestion challenges that Pino faces today, which it's not able to overcome on its own. Next, I'll talk about Apache Flink and how it can be used to
01:01
overcome some of these complex ingestion challenges in Pino. Finally, we can conclude with a cool demo with the Twitch streams. Let's get started. When we talk about real-time analytics, there's actually many different subcategories of use cases,
01:21
and each one has its own unique requirements. We'll go through some of this in the next few slides. One of the most important category is user-facing analytics, where you're exposing your analytical capabilities directly to your customers or end-users.
01:40
For example, LinkedIn has this Who Viewed Your Profile dashboard, which it provides to all its 700 million plus members, where you can get a personalized view of profile views sliced across multiple dimensions such as time, industry segment, geographic location, and so on.
02:03
Another example is the LinkedIn feed relevance, where in order to make sure you're not seeing the same thing again and again, we want to know for a given story or content, how many times has a user seen this in the last 14 days or so?
02:20
This can be done with a SQL query, something like this. Now, this may seem straightforward, but you're executing this query on a huge database of 700 million plus users, and every time you visit LinkedIn for all active members, this has to be executed, which translates to several tens of
02:42
thousands of QPS on your underlying database. Then each such query must execute very quickly in the order of milliseconds. Otherwise, it's going to be a bad experience for the users. Another good example is a restaurant manager by Uber Eats. This is a dashboard given to restaurant owners across the globe,
03:02
where they can see different things like sales metrics on a weak or weak manner, the inaccurate orders, top-selling items, and so on. You can imagine to build something like this, you're also doing a lot of concurrent queries, and again, each such query must execute very quickly.
03:24
Another important category of real-time analytics is business metrics. This is where you're tracking the key indicators of your business in a real-time manner. Doing this in real-time is important for day-to-day operation and also things like anomaly detection.
03:43
For example, page views is an important business metric for Uber, page views is an example of LinkedIn, and demand and supply ratios is a business metric for Uber. Here you see an example where the number of page views suddenly dropped,
04:04
and you want to be able to detect this in real-time. More importantly, you also want to know why that anomaly happened. In other words, which dimension resulted in the page views to drop and detecting and doing the root cause analysis
04:21
in real-time is also very important. Finally, we have dashboards, which everyone pretty much knows about. This is one place where you can track all your application and system metrics. As you can imagine, this can also result in a lot of concurrent queries and having a real-time view of
04:41
this is extremely important for your operational needs. All such use cases and many more can be built on top of Apache Pino. For those who haven't heard of this, Apache Pino is an open-source distributed data store that can ingest data from a wide variety of sources such as Kafka,
05:04
S3, HDFS, and so on, and make it available for querying in real-time. At the heart of Pino is a column store, and it features a rich set of indexes and aggregation strategies that make it a great fit for all such use cases.
05:23
It's quite a mature product as of now. It's being used in a lot of big data companies around the globe and has a rapidly growing community as well. Some of the largest Pino clusters can do upwards of million plus events per second ingestion, can easily do hundreds of thousands of queries per
05:43
second while still maintaining millisecond-level latency. This is an overview of how Pino fits in your overall data ecosystem, and we can take the example of LinkedIn. Every time people visit LinkedIn.com,
06:01
all the events generated will be emitted to a streaming system like Kafka, and all the entity data around users and companies can be stored in some OLTP store. From here, data is continuously being archived into a long retention store like HDFS for a variety of other use cases.
06:25
As I mentioned before, Pino can actually ingest data from all these sources. Within LinkedIn, we ingest data from Kafka and HDFS and provide a consolidated logical view to the user. We hide the complexity of the actual data sources,
06:44
and then you can build all these different use cases on top. If you look under the hood of Pino, let's look at what are the different components. The incoming data from the data source is organized in a column format and spread out across what we call as a Pino server.
07:06
You can add as many Pino servers as you want, and you can configure replication amongst all these servers. There's a Pino controller which is responsible for all the cluster coordination functions such as membership,
07:22
replication, and partitioning, and so on. Finally, we have the Pino broker which can take a user query or query and then do a distributed scatter gather across all the servers. What it does is it will identify which servers are responsible
07:41
for serving this query and send the query directly to those servers. All these servers will then do local processing and then return an intermediate result to the broker. The broker will then do a final aggregation and return it back to the user.
08:01
As I mentioned, what makes Pino really fast for the real-time analytics is all the rich indexing strategies that is available out of the box. For example, you can configure inverted, sorted, or range index for any of the numerical columns in your schema.
08:24
JSON index lets you do fast queries on semi-structured or unstructured data. As the name implies, GeoIndex will accelerate your geospatial queries. There is a special index called StarTree, which is also how our company is named,
08:41
which lets you pre-aggregate values across a range of dimensions. This makes complex aggregation queries really, really fast. One other feature I want to call out here is something that we added recently in Pino, is the ability to observe data.
09:03
You can actually have real-time data coming through Kafka, which has mutations, and be able to update your Pino table in real-time. This is something I'll actually be demoing today. Now that we know a brief theory of Pino,
09:20
let's see it in action. What I have here is a local Docker instances for Pino, Kafka, and Zookeeper. Oh, and I forgot to mention the demo. In the demo, what we'll do is we'll consume the Twitch stream information using its API,
09:44
and emit all these events into Kafka. Then subsequently, we'll ingest it in Pino and query the data in real-time. I have this nifty Python script, which all it does is it queries Twitch API,
10:01
and then emits the events to Kafka. Let's start that. If you look at the Kafka topics, we should see something called as Twitch Streams. Just to see how the events look like,
10:21
they look something like this. We have an ID, which uniquely identifies a Twitch stream. You have all the user information, you have the game information, and also has an event time attribute, which defines the point in time at which this event was generated.
10:44
Now that the events are in Kafka, we can go ahead and start querying it in Pino. When you deploy Pino, it comes with a convenient UI to do different things like manage your cluster topology and also create tables.
11:04
Let's go ahead and create a table for our Twitch stream. First things first is to add a schema for our table. We'll add ID as our dimension. We can add the game name as another dimension.
11:21
These are both strings. We can add a viewer count, which is a metric, and then the event time, which is currently in the form of a string, so we'll add it as a dimension. One last thing we'll do is add
11:43
a special column called event time MS, which is actually not in your input Kafka stream. This will be a derived column, and this will be designated as a time column within Pino. The time column is currently by default how the data is partitioned in Pino.
12:02
That's our resulting schema. Let's go ahead and save that. Now we can add a real time table with the same name. Within this, we can configure how we want to generate the derived column, which is event time MS.
12:20
This is really useful when your input stream does not have the fields in the right format. What I'm going to do here is use a built-in function called fromDateTime. All it does is takes an input column, which is event time, which is in string, and convert that in milliseconds.
12:41
Let's go ahead and do that. What this is going to do is for every record ingested in Pino, it's going to apply this transformation and generate a derived column called event time MS. Then we will partition data on the new column. We also want to specify the Kafka topic name and the Kafka URL.
13:08
There are other things that you can do which I want to go through right now like retention, quotas for your query, and so on. Let's go ahead and save that. Now we're ready to query the data coming from Twitch.
13:24
Keep in mind, this is real live data which is actually happening on Twitch right now. If I do count star, you should see the total count increasing as you can see below. This is cool, but let's do a slightly complicated query,
13:43
which is we want to do a total count of streams grouped on the stream ID. Intuitively, you expect the count per ID to be one.
14:03
There should be only one unique stream per ID. But as you can see, we currently have an issue here. What you see is there's multiple events happening for a given ID. Let's take a deeper look why this is happening.
14:24
What you can see here is for the same Twitch stream ID, you see multiple events with different event time and also different viewer count. What's happening is the Twitch stream is constantly being updated and we are injecting
14:41
these often duplicate or observable events in the Kafka stream. At the moment, we haven't configured Pino to handle upserts. This is currently just with Pino and the current Kafka stream, we are unable to handle upserts. Let me go back to my presentation.
15:07
In order to handle upserts within Pino, the prerequisite is the input Kafka stream must be partitioned on the primary key. In this case, that's the ID column, which was not happening right now.
15:21
Now, of course, I could have done that in my Python script, but oftentimes, you don't control the input Kafka streams. You need a mechanism to do repartitioning of your data. Even more complex scenarios is when your input stream or table does not contain all your data that you want to analyze.
15:41
You want to do either a stream-stream join or stream-table join to compute this materialization. Finally, you can have decoration requirement where you have events coming in through your data source and you want to decorate it using an external RPC either with something sitting in an OLTP store or behind an API.
16:04
For all such ingestion challenges, we rely on Apache Flink. Again, hopefully, you all know Apache Flink already. It's an extremely popular stream processing framework, which lets you perform computational tasks on bounded and unbounded streams of data.
16:24
It comes with a wide variety of input and output connectors and features rich API, and also includes things like state management, which makes it a great fit for building different applications, such as event-driven applications,
16:42
streaming ETL, and analytics, and so on. Of course, Flink is quite a mature product and it's used in a lot of companies around the globe. Especially what I want to focus is the Alibaba's numbers from 2019. This is quite a while ago.
17:00
The decent numbers are probably much higher, but it was able to do 2.5 billion events per second at peak, which is really impressive. For this particular talk, I want to focus on one important aspect of Flink, which is the Flink SQL. As the name implies, it lets you express
17:20
your computational logic using a declarative way, something like this. This is based on the Apache Calcite grammar, which is very similar to ANSI SQL, but also adds some advanced things like window semantics, which is required for continuous queries. As you can see here,
17:41
Flink SQL is actually built on top of the existing primitives. In fact, a given Flink SQL query would be translated into the underlying API and executed as regular Flink job. Again, you can execute it on unbounded and bounded streams
18:01
so when you're running it against something like Kafka, it runs as a continuous query, so it keeps generating output continuously. As opposed to something like a standard S3 or HDFS file, it will be executed as a traditional SQL query. This is just a very high-level overview of Flink.
18:22
I highly recommend the talk from Marta Pes and other people from Vervetica for Flink and Flink SQL. Excuse me. Okay, so what we'll do now is to show how we can solve some of the ingestion challenges we saw
18:40
on Pino using Flink. So we'll go back to our Twitch API and we'll continue generating the real Twitch stream information into Kafka. But what we're also doing here is to prefetch the tags information and store it as a JSON file in S3.
19:03
At this point, we'll be using Flink to do a join between this Kafka topic and this S3 file, sort of like a stream table join, and emit the information back to Kafka. The other thing the Flink job will be doing is repartitioning this data on the primary key
19:22
that we need for pknobs words. So it'll be partitioning on the ID column. And finally, I'll show how this can be ingested into Pino and we'll do a cool visualization using supersets. So let me switch back to my demo environment.
19:41
So first thing I'll do is start a Flink SQL client tool, which is a very convenient way of submitting your Flink queries and starting the actual Flink job. Okay, so first thing we want to do is create a table to read from the Kafka topic
20:01
which has the real-time Twitch stream information. Let's go ahead and do that. So it contains all the dimensions from the Twitch stream API and also defines where the data is coming from, which is our local Kafka cluster. Next, we'll create a table to consume data
20:22
from the JSON file stored in S3. Let's do that. It has only two dimensions, the tag ID and description. And as you can imagine, we'll be joining on the tag ID column. And again, here we are showing, okay, the connector is file system
20:41
and we'll be reading from S3. And finally, we'll create a table which is a result of the join operation of these two things. So it has the dimensions from both the Kafka and tags file and we want to emit it back to Kafka.
21:01
Hence, we're using the Kafka connector. The other thing, if you notice here, we are defining, we're specifying the key field as ID. So what we want to do is partition the data on the ID column. In other words, all records with the same Twitch stream ID will end up in the same Kafka partition
21:23
and will enable Pino to do upsets. Okay, so now we're ready to actually execute our join query, which looks something like this. And again, it looks pretty identical to a regular ANSI SQL query.
21:40
We select, we project all the dimensions from the two tables and then define the join criteria, which is the tag ID. And I'm doing a simple inner join here, but Flink has a lot of advanced ways of defining windows for your join function. Okay, so at this point, the join was executed.
22:01
We have a job running, which is continuously joining data from Kafka and S3 and emitting events to Kafka. So if I look at my Kafka topics, I should see a new topic pop up here, which is Twitch streams with tags.
22:22
And this is the topic which includes the join as well as the repartitioning, thanks to Flink. At this point, what I can do is to save time, I've already created a Pino schema, which has all the dimensions we want. And I'm also specifying a primary key here, which is the ID column.
22:42
Similarly, I also have an absurd table, which looks similar to the table that we created before. So let's go and add this to Pino using the convenient REST API.
23:02
So now we are ready to query our absurd table. So as you can see, it has all the dimensions as a result of the join from Kafka and S3. We can re-execute our group by query and see what the result looks like now.
23:22
As I mentioned before, the new Pino table, we have partitioned the data on ID column and enabled absurd within Pino. So now the result of the group by, oh, sorry, I'm still using the old table, pardon me.
23:43
As you can see now, the group by is indeed one. And Pino is actually successfully either doing deduplicating data or handling absurd correctly from the input stream. So in this manner, we saw how Flink can easily do join
24:01
and repartitioning in a matter of minutes. And then this was all real data from Twitch, just to re-emphasize. At this point, what we can do is use superset to visualize the information. So let's go and add the new table that we created in Pino, which is the stream's absurd.
24:24
One thing, if you haven't used superset before, I need to let superset know which is your time column. And in our case, that's event time milliseconds. That's our temporal column. And we also want to tell superset the format, which is epochs in millisecond.
24:44
Okay, so now we can start visualizing this data coming from Twitch. So I'll pick a line chart bucketed by every second. And let's say we want to see everything from now to minus seven days.
25:03
Okay, so you can see the current demo that we ran and something I was testing in the morning. And the queries are returning obviously very fast because what superset is doing is sending it to Pino and then querying the Twitch stream data in real time.
25:20
We can also do a little bit more complex things like figure out what are the most popular streams happening right now. So let's do a group by on the game name. And you can again see this is really fast because of Pino and Flink. And then this is the current popular streams
25:40
happening on Twitch as of now. So overall, what we saw, let me switch back. Just to reiterate what we just saw, we had real stream information being emitted into Kafka and tags information going to S3.
26:02
We did a join using Flink SQL. Repartitioned the data also using the same Flink SQL query. Ingested into Pino and Pino was able to do handle the upsorts correctly at this point. And you can use anything, something like superset to visualize all your data.
26:21
So I can conclude, stop here and then take any questions. But overall, Flink SQL is a really powerful construct which lets you do complex things in a very, very fast manner as you saw right now. And is being used at a massive scale in Alibaba and other companies.
26:40
Apache Pino is also, we saw the distributed and scale out design and the rich indexing support that it features. And it also being used in a lot of companies around the world. Before I stop, I would do want to acknowledge Marta Pais who's taking a bow here as she should. So she helped me a lot with the initial demo
27:02
and answering the Flink questions that I had. So thank you Marta. At this point, I can stop here and take any questions that you guys have. Thanks a lot. I guess we're waiting for Fabian to be on the stage. While Fabian is coming back,
27:25
since I'm unable to see the questions, I can talk a little bit more on the Pino architecture and I mentioned the scale out design. So I'll quickly talk a little bit more on that
27:42
while we wait. So as I mentioned, the data is laid out in a column format across all these servers. So this forms, this makes it very easy to expand capacity on the Pino side. Anytime you're facing a bottleneck,
28:02
we can just add more servers. The controller will automatically get the new, identify the new servers and start putting segments, Pino segments onto the new servers. Similarly, you can add brokers at any point
28:23
and this is how we can keep scaling out the Pino cluster at will. Okay, I can again stop here and take any questions. Yeah, thanks for this awesome talk.
28:42
Sorry for the technical problems. No problem. Yeah, that was really awesome demo. I have a question though. Have you thought about or do you think it would make sense to integrate Flink with Pino a little bit tighter?
29:02
Yeah. Similar as you did with the leveraging Presto for the joint capability, would that be an option to somehow fuse the systems together? Yeah, great question. Yeah, so this is a common ask from many folks
29:22
where you want to basically skip an intermediate stage between Flink and Pino, right? Currently, as in the demo also I mentioned, we have to emit the events to Kafka and then ingest into Pino. So currently there is one way that we're working on
29:41
right now, which is a segment writer API that is available for Flink jobs to directly use and produce to Pino. The downside is, so let me maybe step back into how Pino actually ingests the data.
30:02
So when we are fetching data from real time, the records are ingested one at a time and they are being converted into a column format for the corresponding segments. But when you, in the offline world,
30:21
we create the segments outside of Pino and then copy it into Pino. So that's essentially what we do with the current integration between Flink and Pino, which is use the segment writer API to generate a local segment and then push the local segment to Pino. So the trade off here is,
30:42
the freshness of your data depends on how big your segment is. So if you keep producing, so you can keep appending to your local segment within your Flink job. And let's say you do that for 10 minutes. So the data will be available for querying 10 minutes later.
31:01
So it's more like a micro batch more today. So that's the current mechanism, right? So you can indeed create segments within Flink and push to Pino. And actually Uber is playing around with that as we speak. The other one that we want to get to is a write API in Pino.
31:22
So be able to write one record at a time indirectly into Pino. And this is something that you're still working on. And once that's available, then Flink can directly start writing into Pino and make it available in real time. Yeah, awesome. Thanks.
31:41
So in the meantime, we also got a few questions from the audience. The first one is how is Apache Pino different from Google BigQuery or AWS Athena? Got it. I don't really have any slide for that, but I can talk about it. So when you compare,
32:01
so Pino, the emphasis or the optic Pino is really optimized for accelerating the real time analytics, right? So the focus is on reducing the ingestion latency of the data coming in. So basically make the data available to query
32:22
within milliseconds from when it is generated from the source. And also the query latency is also, the focus is to keep it on the millisecond range. And if you look at BigQuery, it's optimized for a different set of problems, right? It's optimized for more complex SQL queries
32:44
where the ingestion latency is may or may not be that important. It's okay to have the data coming in minutes later or hours later. And the focus is on executing more warehouse style complex SQL queries. And again, the throughput and latency
33:02
that BigQuery can do, to do something like 100,000 QPS can get prohibitively expensive on BigQuery. Whereas Pino is designed for handling massive QPS for OLAP cubes, cube style queries. With Amazon Athena, I think that's more of,
33:23
it's more like, I guess you can compare that to Flink. More so than Pino here. Pino is a data store. So you can put your data in and query it whenever you want. So you can have seven months, one year. Within Uber, we have something that has
33:42
one per almost two years worth of data in Pino in some use cases that you can query. And it's a traditional data store, right? So it's a pull semantics, whereas Amazon Athena is more on the push semantics. Yeah, thanks for the, yeah.
34:00
So there's one more question. So how rich are the curing capabilities compared to regular SQL? Yeah, yeah, great question. So Pino again is optimized for OLAP queries. So it can speed up aggregation functions,
34:20
group by and order by and all that. But what we don't do effectively is N-way joints, for example. The focus is not to support complex joints within Pino. So for that, as Fabian already mentioned, we integrated closely with Presto to do all those complex things in the Presto layer.
34:45
And Pino can handle the filtering aggregation, some of the basic window functions. So that would be one. Like joins is one example, which is not supported today. We do have lookup joins within Pino. So what we do is,
35:01
let's say you have a small dimension table and you want to decorate your larger fact tables in Pino with this small dimension table, that is supported today. So you can have dual lookup join locally within each Pino server. So, but having large fact fact join,
35:20
I think that that's not supported today. Okay, thank you. One more, there's even two more questions. One is the last one. Partitioning on the upset key seems like a strong constraint. What's the advantage compared to writing every event and then using an analytical function on it,
35:42
like an over partition by order by, partition by key order by event time, and then last function on that. Right, yeah, so it really comes down to query latency. What we want to do is, minimize the work that needs to happen at query time.
36:03
As I mentioned, Pino is used for a lot of like user facing analytics, and it's embedded in the core business flow within LinkedIn and Uber. So any query latency delays will actually affect the overall site latency for LinkedIn and Uber.
36:20
So it's imperative that the latency SLA is within 100 milliseconds. So from that frame of thought, we wanted to minimize what we do at query time. So we came up with this model where we assume that the data is partitioned on the primary key beforehand, and then within a server,
36:40
we use a simple bitmap to keep track of like, we are co-locating all the observable events together, which enables Pino to do handle upsets. On the question of pre-partitioning is expensive, yes, there is an additional cost to it, but oftentimes you can,
37:01
it's a matter of selecting a key within your Kafka producer. And that's all, if you looked at my demo, all I did was select a key for my Kafka producer, and that was it. And this can be done with your existing applications, or even if you're ingesting change log from Debezium, you can select a key there and so on.
37:23
So how is Pino different from, actually, Druid? Yeah, yeah, great question. This is Pino and Druid architecturally very similar, both ingest data in the same way, they both column store. The differences I mentioned in my previous slide,
37:41
I can just read it out since we don't have much time. One of the main differences is the different set of indexes that we already have and we keep adding. Pino's pluggable architecture makes it very easy to add new indexes in a very easy manner. So currently in a range index, JSON index,
38:02
geospatial index, Star Tree index, this is not available in Druid. And this is what makes Pino really fast. Text search is not available in Druid. The Lucene index that we've added in Pino, being able to absurd data, that's actually an architectural difference that's there in Pino and not in Druid.