We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Frictionless Application (IDE for CSV)

00:00

Formal Metadata

Title
Frictionless Application (IDE for CSV)
Title of Series
Number of Parts
542
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
This talk will present a new data management IDE for CSV that provides functionality to describe, extract, validate, and transform tabular data. It's a logical continuation of the Frictionless Data project's standards and software with a focus on the non-technical audience: data publishers, librarians, and, in general, people who prefer visual interfaces over command-line interfaces and programming languages.
14
15
43
87
Thumbnail
26:29
146
Thumbnail
18:05
199
207
Thumbnail
22:17
264
278
Thumbnail
30:52
293
Thumbnail
15:53
341
Thumbnail
31:01
354
359
410
Open setProjective planeDatabaseTable (information)Cartesian coordinate systemSoftware developerFrictionOpen setIntegrated development environmentGoodness of fitComputer animation
Open sourceFrictionTable (information)Software frameworkStandard deviationRepository (publishing)BlogData managementSource codeWeb portalType theoryError messageUniverse (mathematics)Singuläres IntegralGamma functionField (computer science)MIDIIntegerOpen setConfiguration spaceMathematicsTask (computing)ClefFunction (mathematics)MaizeLink (knot theory)ExplosionString (computer science)Numbering schemeWeb pageDialectChecklistDefault (computer science)Local GroupSymbol tableMiniDiscOrder (biology)MereologyGroup actionCASE <Informatik>Data storage deviceInterface (computing)Field (computer science)Projective planeTable (information)Cartesian coordinate systemNeuroinformatikComputer fileProgrammer (hardware)Data managementRow (database)TorusTransformation (genetics)Service (economics)Software developerCellular automatonDisk read-and-write headComplex (psychology)Software frameworkQuery languageWordAnalytic continuationStandard deviationError messageCore dumpCAN busSurgeryInformation securityNumbering schemeOpen setWeb applicationDescriptive statisticsOcean currentFile formatDifferent (Kate Ryan album)FrictionSoftwareValidity (statistics)Repository (publishing)Type theoryMetadataView (database)Web 2.0LaptopLevel (video gaming)Visualization (computer graphics)Computer programmingComputer animation
FrictionSoftware testingOnline helpOpen setCAN busCartesian coordinate systemVisualization (computer graphics)FrictionSet (mathematics)Presentation of a groupStandard deviationComputer fileSoftware frameworkSpeech synthesisInterface (computing)XMLProgram flowchart
FeedbackBeta functionFrictionForcing (mathematics)Cartesian coordinate systemSuite (music)Multiplication signFrictionLink (knot theory)Beta functionComputer animation
Presentation of a groupMultiplication signProcess (computing)Mobile appRule of inferenceUsabilityProgram flowchart
WordBitFrame problemVisualization (computer graphics)Decision theoryPerfect groupLibrary (computing)Proof theoryProjective planeView (database)Revision controlCartesian coordinate systemGoogolRow (database)Interface (computing)Open sourceSoftware frameworkPoint (geometry)PlanningLimit (category theory)Table (information)Set (mathematics)Online helpSemiconductor memoryDatabaseStandard deviationLevel (video gaming)Web 2.0FrictionType theoryContrast (vision)WebsiteUsabilityGroup actionNumbering schemeArmSinc functionGoodness of fitScaling (geometry)Uniform resource locatorWave packetInformationCellular automatonCore dumpMeeting/InterviewComputer animation
FrictionStack (abstract data type)Server (computing)QuicksortInformationHypermediaMetropolitan area networkData managementProjective planeSoftware frameworkPoint (geometry)SurgeryOpen sourceState of matterMereologyWrapper (data mining)Visualization (computer graphics)Cartesian coordinate systemLevel (video gaming)Interface (computing)Client (computing)Streaming mediaMeeting/InterviewComputer animation
Key (cryptography)Table (information)Error messageText editorObject (grammar)Constraint (mathematics)Meeting/Interview
Standard deviationCartesian coordinate systemKey (cryptography)Different (Kate Ryan album)Multiplication signComputer animation
Open sourceMereologyDiagramOpen setFrictionMultiplication signProjective planeArithmetic meanTwitterSoftware developerSource codeMeeting/Interview
Thermal expansionProjective planeOpen sourceSoftwareGaussian eliminationMereologyStandard deviationGoodness of fitVideo gameFrictionComputer animationMeeting/Interview
TwitterProjective planeNeuroinformatikWordRankingPhysical systemOpen sourceElement (mathematics)Software testingOrder (biology)Multiplication signComputer animationMeeting/Interview
Open setProjective planeLevel (video gaming)FrictionInternet service providerRevision controlSoftwareCollaborationismCore dumpSoftware developerMultiplication signHand fanNeuroinformatikSystem callComputer simulationSource codeComputer animation
Streaming mediaWordProjective planePerfect groupFamilyPresentation of a groupMeeting/Interview
Revision controlVideo gamePresentation of a groupPlanningSet (mathematics)Beta functionComputer animation
Multiplication signMeeting/InterviewComputer animationProgram flowchart
Transcript: English(auto-generated)
Hi everyone, my name is Evgeny and I am a tech lead of the Frictionless Data Project at Open Knowledge Foundation. Today I'd like to present you our new development called Frictionless Application.
Frictionless Application is an IDE for C3 and other tabular general data formats. This tool hasn't yet published, so today it will be more like a future preview. But you can access the database on GitHub.
The main purpose of Frictionless Application is data analysis, data validation, data publishing, and many other aspects working in the data. Let's start from overview of the Frictionless Data Project.
Frictionless Data Project helps people to publish and consume data. It's built on top of open data standards, such as table schema, data resource, data package. You might have heard about them.
And on top of these standards, we're working on different software. For example, we have developed Frictionless Framework for Python, so people can describe, validate, extract, or transform their data in Python or command line.
We developed Frictionless Repository, which is similar to Frictionless Framework, but it's run on GitHub infrastructure using GitHub Actions. And it provides continuous data validation.
We also last year presented LiveMark, our data visualization and publishing tool. And today we will talk about Frictionless Application. But first of all, I'll describe Frictionless Framework and Frictionless Repository to show why Frictionless Application.
So Frictionless Framework is written in Python, and it provides command line interface and Python interface to describe, extract, validate, and transform, including publishing tabular data.
So this project had been here for a while and have a rich community and a lot of people
using this, but it requires you to work with command line interface and or know how to code in Python. So for many people, it's just not possible to use Frictionless Framework.
So to solve this, we developed Frictionless Repository. It's a GitHub Action. So it works like a continuous data validation service. If you have data published on GitHub or you store some data related project on GitHub, you can
add Frictionless Repository GitHub Action and on every push to your repository, it will be validating your data. And this solves the problem of people who are not able to
use Frictionless Framework because it requires programming and knowing command line interface. But still it's kind of for tech people who knows what's GitHub, how to create GitHub Actions.
So having said all this, I'd like to introduce Frictionless Application. It's a fully visual tool that can be published as a web application, more like a demo, and more importantly published as a desktop application that you can install on your computer. And it's fully visual. It's for non-programmers.
And our goal is to make it really easy to use. So it will look just like us in Excel, the file manager like from Jupyter notebook. And the core features is that for any CSV file on your computer, you'll be able to see it as a table.
And after this, you can see the metadata, what fields, counts it has, what
types, you can edit it, you can add validation checks and see how it validates. And in general, this project just does like everything visually what our more level projects do for programmers and command line interface users.
It makes sense to add that it's not only for tables.
For example, you can edit your metadata like table schemas or data package descriptors. And here I'd like to show you how you can validate your data table and fix the errors.
So when you upload or open a data file in Fictions application, it provides you table view and the errors and a lot of data tables and CSV have errors shown in red.
And there is a description of what this error and you can just edit the cells like in Excel and clean your data and save it.
Okay, so the goal for our beta release is also to support creating charts from data tables. And we use here, I'll do after Vega annotation or Vega-Lite.
And so the idea that if it were possible to do a lot of Vega supported charts using data from the tables,
so you'll be able to set what column use for x-axis, what will be the transformer, what's the type of this axis, et cetera, et cetera. Currently it's under development, but the goal is to have it.
Of course, it will be a non-complete data-related application if it weren't for SQL features. So yes, a fixed application is possible to query your table, your data, whatever formats it had.
Was it CSV or Excel or whatever? It will be indexed in a SQLite database, so you will be able to query your data.
And not only individual tables, but more complex queries join different data tables. So basically it will be just SQL interface to data files on your computer.
Okay, let's say we're done updating and editing and analyzing our data tables. What we can do in Friction's application is pack everything as a data package.
Data package is a standard, open standard for describing collection of files, and it is a cornerstone of Friction's data project. So in Friction's application you can create a data package and fill it with your files.
And then we're working on future for publishing the data. So currently working on GitHub, Zenodo, and Seacon targets for publishing a data package as a dataset.
For example, data package is a set of files, collection of files, and it will be a Zenodo dataset. These features are already implemented for Friction's framework,
and here we're just working on the visual interface for them. And I think it's very important to provide this way for non-coding people, visual way to publish data.
Okay, I think that's it for my presentation. And as I mentioned, Friction's application is approaching beta that will release in a few months. So please stay tuned. I think we will be presenting it also on csvconf in Argentina this year.
And when it's released, any feedback, ideas, and suggestions will be really amazing for us and really helpful. Also, just a side thing, just don't forget helping people of Ukraine.
I'm just hitting the link here. And that's it. Thanks for your time, for your attention, and have a great fourth day.
Thank you. Thank you, Evgeny, for this presentation.
Now it's time for the Q&A, and it's live. We had some questions. I know that a lot of people were waiting for the new app and the presentation. And the first question is from Paul. Can you describe the design process for the user interface, user experience of the application, please?
First of all, thanks for your words. And I probably disabled my camera because it sometimes sounds weirdly so. Sorry for that.
So, yes, so let me put it this way. We are frictionless data, and we are kind of like a project with history already. But for us, we always kind of like limited, regarding resources, as many open source projects
and funded projects by some grants and help by community. So our goal is just to keep things like really simple.
And for the application, we use really standard layout and ways to show the information, as I mentioned, for like files, Excel type of table views. And also we use just really high level JavaScript library material UI
to just eliminate all the layout design decisions. It just looks like, I don't know, Gmail or Google Drive, just based on material UI style of everything.
Sorry, you muted, I think. Thank you for your answer, Evgeny. Another question from Jo that we discussed before the conference. She really likes and finds really useful this new version.
And when for you will you use this or frictionless or might choose, for example, to use OpenRefine? I think, thanks for the question. And I think you would use frictionless when your work is, first of all, tied to frictionless standards.
And I hope and at least I know for partially that more and more people are starting using frictionless standards
in their work, data package, table schema. And frictionless application is just kind of like the most natural way to work with these standards. Regarding OpenRefine, of course, if you're looking for some already established and well supported solution
with a lot of plugins, et cetera, for now, of course, you will choose OpenRefine. But I'll just suggest to try frictionless application at some point and see if it's like more modern,
provides more features, maybe unique features. So it's not fair to ask me for this because, yeah, I'm just, yeah. And other question really precise from Paul. The table view is plan to scale till how many rows?
So regarding the visual interface, it will be just a data frame to the database. So the pagination and it can just works for whatever your local database created by frictionless application can be scaled.
So when it's currently working on the web version of the application, next step will be publishing it as a desktop application.
And since then, the limit only be like local memory. So it can work. Frictionless data goal is kind of like... So I'll say that the frictionless data works the best for like small data sets and middle size data sets.
And for middle size data sets like million of rows, it will be good because it's just a data frame. So it's not like, for example, Excel when you just limit it to like a little bit of rows.
Okay, perfect. Another question from Oleg. The publication to Zenodo GitHub looks great. Any thoughts about plugging in other targets? Thanks for the question and for the kudos.
Yes, as the application just creates a visual interface on top of frictionless framework.
We have future request for other data targets and we just made some research to add others. Try it and you can, etc, etc. So we will be happy to adding new targets and it's only a question of resourcing.
So when we have an established version of the application and... Yeah, so currently we're kind of like having three of these, Seacon, GitHub and Zenodo. It gives us a chance to check everything, use this like a proof of concept and it's tested and it works.
It will be like relatively easy to add others. Perfect. From Paul again.
Can you describe the software stack for the frictionless application? Thanks, Paul. Yes, so again, as mentioned that as any open source project, we have limited resources.
The frictionless application is basically a wrapper around the frictionless framework. So I'm not sure what's exactly the question because for such a deep software,
like from visual interface to all level stuff, we use a lot. So regarding big parts, frictionless application is a wrapper around frictionless framework, just kind of like a client when frictionless framework is a server. We also publish at some point frictionless API. So people will be able to use in Python the server we use in the application,
which may be useful for maybe creating some in-house solutions for data validation. And aside of frictionless parts, it's React material UI and just stand for state management.
Sorry, I'm not sure if it's relevant information or it's the whole level. I think that people can also watch your live stream after later and may be happy to have this kind of precise answer.
That's why it's perfect. But I have another question. It's about the table editor check. Does it check all errors from data package validation, including the foreign key constraints?
Currently, honestly, it doesn't because it's just incomplete. But yeah, the goal is just to be like 100% aligned to the standards. And if that package standard says that foreign keys should be validated,
it will be validated in frictionless application. So the goal for the better release is definitely support release. Okay. I'm checking if there is any other question. No, I have some question more about the development of frictionless and the story.
I have been working on this project for a long time. Can you maybe tell us about the challenges that you face to the different steps of the development? And what is for you, this diagram is specialized about open source research software.
What is for you the advantage of open source, but also maybe the proven cons of doing this open source development? Sorry, I didn't hear the first part.
Yes, it's about your experience. You have been in this project for a while. What are the challenges that you faced? Yes, it's a good question. For six minutes we have, maybe like for the next few hours.
So I'll try to say something maybe less usual than maybe something that will be more interesting than saying that
resources are limited and etcetera, etcetera. So for us maybe this might be interesting that from the beginning of the project, also it's also probably kind of usual thing.
I think the initial idea, great idea created by Rufus Polak of these fictional standards, of course it wasn't, you know, it was too general because it was an idea. And during the life of the project, I would say that at least for the technical part,
because I was leading initially like the software only, now I'm leading the project in general, I see it like we're trying to pick good parts and remove bad parts.
So just trying to figure out what's really useful thing from frictions that we can provide and what's just this like, you know, 80-20% thing. So yeah, I would maybe suggest on your open source project to start like to do like as early as possible
this elimination of things that it's not your like critical part to being useful. Because yeah, I can say a lot about the usual thing about documentation,
contribution, like pull requests, not like synchronized between like among like contributors, etcetera. But it's the same like for every open source project, so. Maybe I have related to this topic a question about open source.
We are discussing a lot about sustainability of project. That's why for you, what are the main elements to have sustainability for your project and in the next steps maybe for that?
Frictions data has been supported for a long time by the Sloan Foundation. What like really thankful and currently it's a core open knowledge project
supported by the Open Knowledge Foundation. And but still, yeah, it's an ongoing discussion of sustainability and for so this project like this is like not wide enough.
Like, for example, Webpack, which is now fully funded by I think Open Collective and just outside contributors. So it's just a few projects like this can live like just using donations.
So in our domain, I think a project like this still needs to build collaborations. This some hopefully like NGOs or like high level data projects
and start providing some customizations and tailored versions of the software
to kind of like to provide resources for core development. Just what I think. Perfect. Checking if there is any other question. For now, we are at the end of the day of full day of conference in Brussels.
That's why I think there is less people connected right now. But do you have like take last comment about like your next step for your project that you explained in your presentation?
Take a message, last words to conclude our Q&A question. Q&A live stream. Yeah, thank you. So the next steps will be publishing a beta release set and we're planning to do so in a few months in March, April,
and we're targeting CSVConf in Argentina. And it will be great if some of our listeners can join us there in Argentina. And I hope you'll be doing more live version presentation
with all the features already working. Great. So that's a short, short, like one. We wish you that at the beginning. Thank you so much for your time and your presentation.
Yeah, thanks a lot. Thank you.