We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Norikra: SQL Stream Processing in Ruby

00:00

Formal Metadata

Title
Norikra: SQL Stream Processing in Ruby
Title of Series
Number of Parts
65
Author
License
CC Attribution - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language
Producer

Content Metadata

Subject Area
Genre
Abstract
Data streams (ex: logs) are becoming larger, while analytics in (semi-) real time is also becoming more important today. Moreover, there are many non-programmers who wants to write queries in this big data era. Norikra is an open source server software that provides "Stream Processing" with SQL, written in JRuby, runs on JVM. We can write and add queries for stream easily on Norikra, without any editors, compilers or deployments. I will talk about implementation of Norikra and its application in LINE corporation.
39
Process (computing)Streaming mediaBit rateSoftwareOpen sourceData streamOffice suiteMultiplication signDiagram
Cuboid
Right angleMusical ensembleMoment (mathematics)Cartesian coordinate systemTwitterRational numberLine (geometry)Message passingInternet service providerFacebookMereologyMetropolitan area networkForm (programming)Computer animation
Metric systemLoginGame theoryMultiplication signComputer animation
Data storage deviceProjective planeMultiplication signPhysical systemCoefficient of determinationService (economics)Letterpress printingData managementResultantAnalytic setComputing platformGraph (mathematics)Data analysisMessage passingLogin
Process (computing)SoftwareOpen sourcePhysical systemDataflowComputing platformInternet service providerData storage deviceComputer animation
Wave packetStreaming mediaInternet service providerDependent and independent variablesBit rateGraph (mathematics)Projective planeMathematicsResponse time (technology)Extension (kinesiology)Algebra2 (number)Multiplication signComputer animation
Graph (mathematics)Visualization (computer graphics)CalculationStreaming mediaPlug-in (computing)Extension (kinesiology)MathematicsRule of inferenceService (economics)Logic gateBootingGoodness of fitComputer animation
Metric systemService (economics)MathematicsLoginPower (physics)Configuration spaceComplex (psychology)Query languageMultiplication signEstimatorComputer animation
Event horizonArithmetic meanComputing platformMiddlewareHacker (term)Service (economics)Cartesian coordinate systemWritingLoginQuery languageElectronic data processingData analysisMultiplication signSoftwareProcess (computing)2 (number)HypermediaForm (programming)Direction (geometry)Streaming mediaProduct (business)Proper mapSoftware engineeringNumbering schemeNetwork topologyGradient
SoftwareNumbering schemeOffice suiteMiddlewareOpen sourceService (economics)Installation artStreaming mediaProcess (computing)Server (computing)StatisticsError messageDigital photographyInterface (computing)Computer animationDiagram
Client (computing)Web 2.0Message passingInterface (computing)Library (computing)SoftwareInstallation art2 (number)
Demo (music)2 (number)Event horizonService (economics)Client (computing)Object (grammar)Web 2.0Process (computing)
2 (number)Event horizonOpen setGraph (mathematics)CountingShooting method
Event horizonField (computer science)Range (statistics)outputResultantBoolean algebraComputer configurationExpressionQuery languageStreaming mediaVideo game consoleState of matterMultiplication signBit ratePerturbation theoryCellular automatonNumbering schemeNumber
Multiplication signStapeldateiFunction (mathematics)Perturbation theoryLogic gateMachine visionGroup actionOrder (biology)2 (number)Streaming mediaResultantQuery languageoutputComputer animation
Java appletHash functionFunctional (mathematics)Query languageEvent horizonField (computer science)Process (computing)
Attribute grammarObject (grammar)Query languageLatent heat
Message passingServer (computing)Dependent and independent variablesProduct (business)CASE <Informatik>Connected spaceWeb 2.0Cartesian coordinate systemData loggerIntegrated development environmentLine (geometry)WebsiteComputer animation
ResultantSystem administratorData loggerError messageFunction (mathematics)EmailVideo game consoleDenial-of-service attackBootingPRINCE2FreewareMultiplication sign
LoginError messageCASE <Informatik>Process (computing)Sound effectDefault (computer science)Pulse (signal processing)Form (programming)Workstation <Musikinstrument>Arithmetic meanWebsiteComputer programmingCountingMultiplication signComputer clusterService (economics)Traffic reportingProgrammer (hardware)Query languageElectronic data processingWritingFlow separationVideo game console2 (number)System administrator
CASE <Informatik>GoogolEvent horizonLetterpress printingProper mapResultantGoodness of fitNetwork topologyProcess (computing)Boss CorporationGrass (card game)Computer architectureStreaming mediaServer (computing)Metric systemWeb 2.0Web serviceRow (database)Service (economics)Query languageCloud computingLoginSpreadsheetDirectory servicePhysical systemGraph (mathematics)
Process (computing)TwitterLambda calculusComputer architectureSoftware developerLibrary (computing)Java appletComputing platformPoint (geometry)Query languagePosition operatorStapeldateiOpen setProgrammer (hardware)User-defined functionRepository (publishing)Set (mathematics)Complex (psychology)Event horizonStreaming mediaPlug-in (computing)DivisorDirectory serviceBasis <Mathematik>Electronic data processingExpert systemForcing (mathematics)Cellular automatonSoftwareComputer programmingGenderValue-added networkDifferent (Kate Ryan album)Device driverDialect
Category of beingCycle (graph theory)Query languageSoftwareComputing platformElectronic data processingWebsiteDiagram
Transcript: English(auto-generated)
Okay, and I will talk about,
it's time for my talk, so. I will talk about Norikra. Norikra is an open source software and I, that I love, and to process and data streams and the open source software written in Ruby. So this, and these are topics of today's talk.
And at first, I will talk about, at first, I will talk about and why and I love Norikra. And it is very important to understand what Norikra is and how Norikra works.
So, and my name is Satoshi Tagomori, also known as Tagomori, and that is my account name of Twitter and GitHub and many others. And I'm from Tokyo, Japan, and I'm working in line cooperation. Line cooperation is an internet service company is serving as a message application, line.
And line is a message application, just like WhatsApp or Facebook Messenger or any other. So, and line, line have an about 130 million users in worldwide and mainly in Asia and South America
and a part of Europe. So, and moreover, and we have many sub-services online platform, like in Japanese manga, electric publish, publish, publishment, and then camera and news, or Q&A service, or in weather news, and many games.
So we must handle a huge amount of logs and metrics and at the same time, and we must handle various kind of metrics and logs. So, and I am working about our data
and analytics platforms, and this is a very simple monitoring and data analytics platform overview. And at first, and we must collect many data from many servers, and pass in this data and clean up this data, and store in distributed storages,
like in Hadoop, HDFS. And then, and process the data and visualize it in graphs or charts on many others. So, and that is why I am a committer
of a Fruently project. Fruently project is, Kyoto Tamra talks about Fruently project, and two days ago, the first day of this RubyConf. And please check the slides about Fruently, but in roughly saying,
Fruently is a log management system and aggregate to collect many logs and to aggregate these logs, and to put these data and result data into any storages or remote systems.
So, and we are using Fruently in our data platform systems, and to deliver many data, and to control data flows. On the other hand, and we are using Hadoop and Hive to process our storage and stored data.
And Hadoop and Hive is a very famous open-source software, and many Internet service companies use these softwares. And then, and we must process the stream data
as soon as possible to show, to find any troubles or any surprising changes of our Internet service traffic. And like, HTTP response called in percentages, or HTTP request rate per second,
or an HTTP response times graph. And this graph is generated by a Fruently project. Fruently have a very,
Fruently have an extension and features, and the plugin features, and we can write and use any plugins to aggregate and stream data, and make percentages, or many values.
And we can put this data on graph tools, or any other visualization tools. And so Fruently is very good for simple data or simple calculations. But our services, and there are many more,
more and more different services, and there are many changes in a day, including in logging, and there are many kind of logs for each services, and there are many different metrics for each services. So, and Fruently, Fruently requires and configuration changes and restarts,
and to change what to do. And Fruently is not so good for processing about complex data or fragile environment, like data schema changes, or changes what we want to do.
So we want to add or remove queries any time we want. So what we want, we will change it very frequently. So we want to add or remove queries any time that we want and we want to write many queries for service log streams,
or we want to ignore events without data we want. And data schema will be changed very frequently. But the application engineers cannot know what requires,
what data process platform required. So they, so we should create a system and that the application engineers can change their log schemas and their logs and meanings any time.
So, and data analytics platform can or should be able to ignore events without data we want. And there, in my company, there are many service directors and growth hackers, they are growth hackers. So they are not software engineers,
but they know what is important for growth of our services. So we want to make our service directors and growth hackers to write their own queries for what they want.
That is why I wrote in Norikra, Norikra is a data processing platform and middleware to realize these requirements. Okay, Norikra, Norikra is a schema-less stream processing
as middleware and with SQL. That is a open source service software and within JRuby and runs on JVM. Norikra is distributed in RubyGems.org, so we can install Norikra, just do gem install Norikra,
and then we can launch the server by Norikra stat. And Norikra have some interfaces like in CRI client or client libraries named as nruby-client.gem. So we can operate Norikra with these CRI commands
or we can also control the software and over web UI and over HTTP API with JSONs and message back. So let me show some demo of Norikra.
Okay, and we can install Norikra by this command and already installed. Okay, and Norikra is written in JRuby, so this launch requires very long seconds. So, and here, this is a sample.
Okay, and this is a event example. Oh, and the JSON objects are within two fields,
name and quantity. So we can feed these data into Norikra over Norikra client and event send or my service sales. Okay, but this command and nothing happens.
So because Norikra requires, okay, this is in web UI, and Norikra requires and target definition. So at first, we should, oh, we should define
the targets, target opens. And now I'm cheating, so in this shell, I'm using in C Ruby. So C Ruby does not require several seconds to launch the commands. And okay, now I will feed log events
and continuously into Norikra to Norikra. Okay, so Norikra can find and field names
from these event streams, like a name and quantity. So now we want to select these fields from this event stream, like by this and very simple SQL.
My service, filter, and to specify to print and query results into console. Okay, successfully added.
And we can get, now we got the result data from result data with a name and quantity from this and event stream. So anytime we can change the input data schema. So now, this is another example
with another additional schema. And now, I will feed this data to Norikra, but the previous query works well
and the previous events. But schema is already changed. Okay, and drunk Boolean optional fields detected automatically. And now, and we can write any SQL to, for example,
we can count any input events by this, oh, where drunk.
Drunk is, and Boolean, so this expression is correct. But this SQL is not correct for Norikra. Norikra requires the range and how long is the aggregation rate, Norikra requires
an aggregation range. And what, and okay, with this specification, Norikra can count their events for every five events.
And Norikra can count their drunk, events with drunk is true, and in every five events. Okay, and query is added, oh, oh, oh, there is too many
and record, so suspend the previous query, okay. We can get an output data, oh, time batch. Oh, I mistakes at, okay. And also, we can count in any,
and some of quantities by this, and very simple, query, name and summarizing. Summarization of a quantity from my service, sales.
And group by name, order by summarization, descendant. And the get and summarization for every five seconds.
Okay, and we can get even some aggregated results
by an SQL query for this input stream. That is how Norikra works. And we can, so with Norikra handles and schemas event stream, and we can add
or remove data fields whenever we want. And Norikra uses SQL, and Norikra requires a normal restart to add or remove queries. And Norikra's SQL, with Norikra SQL,
and joins and subqueries are available, and we can add and use the defined functions and written in Ruby or Java or any other JVM languages, and we can publish that UDF as a Ruby gem.
And Norikra can handle nested hash and arrays, and these values are accessible directly from SQL like this. So user attribute have a JSON object,
nested JSON object, and attend is a nested JSON array, but Norikra's query is extended, so we can access user.age, or attend.data0 or any other specifications.
Okay, and we are now using Norikra in our production environment, and the first-hand use case is the error log summarizations and we have a web API for our partners.
Over that API, any user sends messages to our partner's official accounts,
and then our server sends these messages to our partner's server, and that is written in a Business Connect server at the right side, and then our partners respond their own messages to our API server,
and then we bring that response to our users in our application line. But our partner's server, if our partner's server goes down,
if our error messages, if all of our error messages and brings to our partners, and that is the really and flooding. So to avoid the flooding of error logs or error messages,
and we are now summarizing these error logs and error messages by this SQL in Norikra, and then Norikra puts the summarized and output and currently sends an email to our partners,
and at the same time, Norikra saves these results of summarized log into MySQL, and then our administration console shows that the summarized logs to our partners.
And error log summarization is a very major use case for Norikra. On the other hand, Norikra, we are using Hadoop, and at the same time, in our data processing platform,
and we can process the same data and with Hadoop and with Norikra. And our service programmers can write queries on Hadoop and on Norikra for just the same data.
So the prompt, these features are used to generate and prompt reports and daily fixed reports. And for our services, our programmers are writing
and these queries are hyped for and fixed reports and each day, and also writes queries to for Norikra
for our two producer prompt reports. Prompt reports are generated every one hour or several minutes or several seconds. These prompt reports are shown in administration console
for our customers of other services. And this is a use case by Google engineer, and Google, the cloud platform service architect of Google Cloud Platform.
And he uses Norikra with Google BigQuery and to count the web service requests and web service responses, and to show these results on dashboard on Google spreadsheet and Google Apps script.
He uses a web server, NGX, and NGX writes and it's access log into disk. So, friendlies are executed on each server
and each reads an access log and sends it to Norikra. And Norikra summarizes these access logs per server and the summarized records are sent to BigQuery directory
and another aggregation node. And this is a total overview of this system. And the summarized data are collected into an aggregation node, and then aggregation node can count the whole status of these events
within another Norikra aggregation node. And then friendlies writes these results into Google spreadsheet, and Google spreadsheet shows graphs and summaries by Google Apps script.
If users want metrics not already defined, and users can throw queries into BigQuery. And Norikra and BigQuery can process
just the same data set. And this architecture is called as a Lambda architecture and named by an engineer of Twitter. And Lambda architecture handles batch processing
and stream processing. So we can use SQL-like DSL, both in BigQuery and Norikra, and we can build an Lambda architecture platform with Norikra.
Why Norikra is written in JRuby? This is the main, there are big two factor, and one is ESPA. ESPA is a set and complex event processing library
written in Java. And ESPA provides an SQL-like DSL, and that is the base of Norikra query. And ESPA has been very well written library,
and we can process very huge amount of data with ESPA. And RubyGems.org, and RubyGems.org is of course an open repository for public UDF fragments, user-defined function plug-ins of Norikra
and provided as RubyGems. JRuby, and I don't, before Norikra, I do not use JRuby, but these two factors makes me to use JRuby.
So JRuby is, JRuby for me is just Ruby, and this is brought by great JRuby developer team. And JRuby makes developing Norikra dramatically faster with an ESPA and well-known Ruby's positive points.
And JRuby with RubyGems and RubyGems.org for an easy deployment and installation. And JRuby, and with JRuby, we can use Java libraries and like JRuby or ESPA
or many other libraries. I'm using many other Java libraries in Norikra. That is a very good point to build data processing middleware. But then my point is there are not so many JRuby users,
especially in Tokyo. And of course, we can find many JRuby committers in Tokyo or in Japan, but not so many JRuby programmers. We cannot find not so many JRuby programmers in Tokyo. When I got confused and call any Java method
or any other processing, but then I cannot ask these questions for not so many people. But then this is not so big a minus point that JRuby is very great software, I think.
So, and this is a wrap-up. And I believe that if you are interested in Norikra, and please check the software's URL documentation site or software URL, GitHub.
But then I believe that Norikra brings in our data platform. Norikra brings in more queries and more simplicity and less latency to our data processing platforms. So if you are interested in Norikra, please try it.
Okay, thank you.