We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Elephants, ibises and a more Pythonic way to work with databases

00:00

Formal Metadata

Title
Elephants, ibises and a more Pythonic way to work with databases
Title of Series
Number of Parts
112
Author
License
CC Attribution - NonCommercial - ShareAlike 4.0 International:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
A few weeks ago I was working on setting up a relational database to explore records from DataSF’s Civic Art Collection. Whenever I attend a tech conference I try to spend a day or two in the city to check out its cultural scene, so this seemed like useful information! I decided to use MySQL as my database engine. Coming from a Pandas background I was surprised by how unproductive and restricted I felt writing raw SQL queries. I also spent a significant amount of time resolving errors in queries that worked with one flavor of SQL but failed with MySQL. Throughout the process, I kept thinking to myself if only there was a more Pythonic way!!! A few weeks later I was introduced to Ibis. I live in Zimbabwe and the first thing that pops into my mind when I think of the word ibis is a safari. One of my favorite things to do when I'm not working is to go on a game drive. Whenever I've been adventuring on safari I usually see ibises perched on top of an elephant. The contrast between the creatures is stark! The African Sacred Ibis is a small, elegant creature that's named after the ancient Egyptian god Thoth. While as many of us know, an elephant is a very big and complex animal. This image serves as a great metaphor for the Python package and how it interacts with big database engines. Ibis allows you to write intuitive Python code and have that code be translated into SQL. Whether you’re wanting to interact with SQL databases or wanting to use distributed DBMSs, Ibis lets you do this in Python. You can think of the python code as the less complex elegant layer sitting on top of any big data engine of your choice. At the moment, Ibis supports quite a few backends including: Traditional DBMSs: PostgreSQL, MySQL, SQLite Analytical DBMSs: OmniSciDB, ClickHouse, Datafusion Distributed DBMSs: Impala, PySpark, BigQuery In memory analytics: pandas and Dask. Anything you can write in an SQL select statement you can write in Ibis. You can carry out joins, filters, and other operations on your data in a familiar, Pandas-like syntax. In this talk, we'll go through several examples of these and compare what the SQL code would look like versus writing to the database with Ibis. Overall, using Ibis simplifies your workflows, makes you more productive, and keeps your code readable.
29
Software developerMereologyBitVideo gameSoftware engineeringLecture/Conference
Software developerDirectory serviceSoftwareComputer iconDiscrete element methodSelf-organizationInformationControl flowLink (knot theory)Computer-generated imageryUser profileType theorySoftware testingDatabaseAnalytic setLibrary (computing)CodeFront and back endsNeuroinformatikComputer configurationDatabaseSet (mathematics)WebsiteKey (cryptography)Shared memoryForcing (mathematics)Data typeOpen setCrash (computing)MultilaterationString (computer science)Water vaporLibrary (computing)CodeDifferent (Kate Ryan album)GeometryComputer fileTwitterSoftware developerCuboidQuicksortTouchscreenMedical imagingGraph coloringGame controllerInformationProjective planeMultiplication signRadical (chemistry)LaptopWritingSystem callStability theoryGoogolMereologyWave packetVideoconferencingView (database)Self-organizationType theorySoftware engineeringSemiconductor memoryFront and back endsPerspective (visual)Video gameTerm (mathematics)Software testingHand fanSoftware bugReal numberFitness functionFrame problemComputer animation
CodeFront and back endsCompilerStatement (computer science)Line (geometry)Computer configurationOrder (biology)String (computer science)Regulärer Ausdruck <Textverarbeitung>GEDCOMScale (map)Formal grammarJohann Peter HebelGamma functionMaxima and minimaDatabaseOperator (mathematics)Uniqueness quantificationCodeNumberScaling (geometry)TouchscreenSet (mathematics)Different (Kate Ryan album)Term (mathematics)MathematicsSemiconductor memoryLibrary (computing)ExpressionString (computer science)Type theoryFood energyGroup actionRadical (chemistry)Frame problemFront and back endsTable (information)Limit (category theory)Finite differenceSubsetSoftware maintenanceSelectivity (electronic)Connected spaceOcean currentComputer configurationRow (database)Data typeStructural loadCASE <Informatik>Software testingSimilarity (geometry)Sampling (statistics)ResultantPoint cloudElectronic mailing listQuery languageMultiplication signHand fanState of matterSystem callComa BerenicesWordMathematical optimizationPosition operatorShared memoryCausalityComputer animation
Scale (map)Computer-generated imageryLaptopDatabaseWorld Wide Web ConsortiumWeb pageTable (information)Tabu searchMIDIGamma functionRepeating decimalTable (information)DatabaseExtension (kinesiology)Medical imagingFood energyUniform resource locatorArchitectureLaptopProfil (magazine)Computer fileSystem callProjective planeFunctional (mathematics)TouchscreenLatent heatOptical disc driveSoftware frameworkInstallation artInformationWebsiteOrder (biology)CodeFrequencyNormal (geometry)CountingFlow separation
File formatTerm (mathematics)Logical constantTwitterBlogTouchscreenGeometryCodeDatabaseUniform resource locatorMultiplication signLaptopForm (programming)Right angleData storage device1 (number)Term (mathematics)BitComputer iconQuicksortExtension (kinesiology)EmailStructural loadLine (geometry)Moment (mathematics)Series (mathematics)Table (information)Computer fileFrame problemString (computer science)BlogWebsiteLibrary (computing)Front and back endsObservational studyFingerprintCASE <Informatik>ArmOcean currentComa BerenicesSource code
FreewareMultiplication signComputer animationLecture/ConferenceMeeting/Interview
Term (mathematics)DatabaseComputer fileBitLibrary (computing)FeedbackAnalytic setInsertion lossComputer configurationFront and back endsDivision (mathematics)Point (geometry)Goodness of fitStructural loadMoment (mathematics)PlanningLecture/Conference
CASE <Informatik>Front and back endsExpressionGroup actionSequelAxiom of choiceCompilerString (computer science)Query languageComputer configurationSlide ruleRight angleLecture/Conference
Front and back endsExpressionQuery languageString (computer science)WritingCodeType theoryTerm (mathematics)Right angleLecture/ConferenceMeeting/Interview
YouTubeBasis <Mathematik>INTEGRALPlanningArrow of timeConnected spacePatch (Unix)PiBit rateState of matterFilm editingMilitary baseMechatronicsLecture/Conference
Arrow of timeCollaborationismRight angleComputer configurationSoftware developerPlanningFront and back endsTerm (mathematics)Goodness of fitLecture/Conference
Lecture/ConferenceMeeting/InterviewXML
Transcript: English(auto-generated)
So, before I get started, I thought I would share a little bit more about myself. I am, like was mentioned, I'm from Harare, Zimbabwe, and Zimbabwe is a country in the southern part of Africa.
It's just above South Africa, and I've lived there for most of my life. Very interesting country to live in, but actually few here from Zimbabwe. Another thing about me is I'm a developer advocate. I work for Voltron Data, and I just started Developer Advocacy about a few months ago.
I started last year in November. Before Developer Advocacy, I was working as a software engineer at NVIDIA, and I was working on a GPU-based data frame library called QDF, so very focused on performance and speed with Python, but I really enjoyed engineering work, but really was also interested
in Dev Advocacy and decided to kind of switch, and so far, it's been good. This is actually my first talk as a developer advocate, so hopefully it goes well. And then a third and final thing about me is that I am the outgoing vice chair, or
the outgoing vice chair at the Python Software Foundation, an outgoing director, and if you've never heard of the PSF, the PSF is the nonprofit organization behind Python, and it does a lot of stewarding of the language and of the community and making sure the community is inclusive and really good and safe for people in different parts of the world.
And throughout this talk, if you're enjoying the talk or want to ask me a question or something, feel free to tweet at me. My handle is Marlene underscore ZW. I'm pretty active on Twitter, so feel free to DM me or something if you have a question, and if you want to visit my website, it's just MarleneMangami.com.
So when I was thinking about, like I mentioned before, that I'm a developer advocate, and something that I enjoy doing, even before I started Dev Advocacy, is traveling and speaking at different conferences in different parts of the world, and usually,
because I come from Zimbabwe, it's quite a long flight, so I try to add some days, either before or after the conference, to explore around the city. And when I was thinking about coming to Dublin, I thought a cool thing to do could be to explore the public art that's around the city, and when I was thinking about this,
because this is a Python conference, and this is a talk about databases, I decided to go online and Google open data sets about art in Dublin, and there was a very cool website called Dublin that I found, and the website kind of shares
different data sets, open data sets, about information about the city of Dublin. So I found this really cool project on the website called Dublin Canvas, and what it does is it's actually an initiative by the city that invites artists from around Dublin to,
I guess it's actually more than just Dublin, it seems, but they invite artists to create art or apply to create different canvases where they paint old traffic control boxes that were gray, and they paint them with different colors, so that's an example of one on the screen, it's not super clear, and it's been kind of, has anyone noticed these across,
like there's one really outside, so it's kind of been cool to see those as I've actually been here, but decided to work with this data set, they have CSV files and a geo JSON file as well, so I thought this could be kind of cool to explore with IBIS for my talk as well,
and learn about art in Dublin. So when I was doing this, I decided to create a MySQL database and try and create like a relational database, and I faced like several issues when I was trying to do this, and for me some of these issues are, one of the first things that I came across
is that I wasn't as familiar using MySQL because I had come back, come from like a software engineering background, so I had done a lot of work mainly on the back end of things, so I was very used to like bug fixes and you know feature requests and things like that, but then now trying to write SQL code, I felt super unproductive because I was like writing
SQL strings and Googling all the time, and just wasn't feeling like this is not Python, so wasn't feeling good, and then another thing is that I had to like consistently be in my terminal, so when I was taking a look at the art, I wanted to like visualize the data
I was seeing, and for that I'm used to using like a Jupyter notebook, and it's easier to share the information as well, but when I was using MySQL, it felt very restrictive in terms of using the terminal only, and something else is that there was geographical data that I wanted
to work with, and one of the issues with using some of these databases is that data types are not really preserved, or you can't really do certain things with databases with MySQL, data types like geographical data types, and so that was an issue, and then finally I also
thought as you know someone who's used to using Python, I didn't think that it was like SQL strings were very like maintainable in terms of writing TIS and things like that, so I don't really have an issue with SQL, but like I felt like this was not very Pythonic in my perspective, so I kept thinking to myself, there's got to be a more Pythonic way to work with
databases, and so I was introduced to Ibis, so what exactly is an Ibis? In real life, not in Python land, an Ibis is a bird, and like I mentioned before, I live in Zimbabwe, and like maybe once
or twice a year, I will like go outside of my city, and I will go to places where there are animals, and one of my favorite animals is an elephant, and elephants are these you know really big creatures, and sometimes I see these tiny birds sitting on top of an elephant,
and sometimes those birds are Ibises, and so this really striking image of this really big elephant with a tiny sort of elegant bird sitting on it is a good metaphor for us to think about when we are trying to understand what Ibis is in the Pythonic sense. So in Python, Ibis
is a Python package that provides us with a more Pythonic way to work with databases, and the purpose of Ibis, it was created mainly for analytical SQL, so if you're someone who wants to do computations on larger data sets or things like that, and you want to do this
with Python rather than using SQL strings, this is a really good option for you to choose as well. Another thing as well is if you're using, particularly if you're using large data sets that are able to, maybe your data is able to fit on drive but is not able to like fit in RAM,
oftentimes like a library like Pandas is not actually able to let you load in your data and like edit things or run computations on your data, Pandas will just like crash, and so Ibis also solves this issue, and I'll kind of talk about how a bit later, and it was also inspired by dplyr, so if you've used R and you've worked with data in R,
you've probably used dplyr before, so Ibis was very heavily influenced by dplyr. So how does this relate to the metaphor I talked about earlier on? Well, we can think about it like we can imagine maybe the Lion King or something like that, and there's a tiny bird
on top of this big elephant, and the bird is telling the elephant where to go and what to do, so it's like maybe saying go left or go right and take me to the watering hole, and the elephant is just listening to the bird and like doing all the work, maybe moving stuff out of
the way and things like that, and we can kind of think of Ibis in this way doing that, so the idea behind Ibis is that we want lightweight Python code to be able to tell a larger backend or a database engine what to do and to do the heavy lifting for us,
so instead of pulling all, so usually what in Pandas, if you're using a data frame library like Pandas, and it's not just Pandas actually, there's lots of other great data frame libraries in Python as well that usually you would pull all your data into memory, for example,
and then Pandas would be the one that would do the heavy lifting if you're doing things on your data like filtering or transforming that data, and so what we want to do instead is we want to allow users with Ibis to write Python code, and that Python code is then translated into
an SQL string or SQL alchemy expressions, and that those expressions in those strings are then what then tells the database engine what to do, so the data actually isn't then loaded into memory, the database engine then just carries out the operations, so there's a lot more efficient
than loading in your data into memory like with another data frame library, it takes a lot less longer and oftentimes can be done like with a lot less effort and things like that and can maintain data types as well, so that's also a really positive thing, so let's actually look at some Ibis code and see what this looks like in practice. Here on the screen,
I hope you can see it, I actually feel like it's kind of small now, but sorry, but this, I will try and have a library or something available, a notebook available after this talk, so you could always take a look at the code, but basically what we have is an IPython
terminal that's up there, and this code is very like Pandas-like, so the idea with Ibis is to make it as familiar as possible so that you are very used to being able to use this syntax, so the first thing we're doing is we're importing Ibis like you would with any other library,
then after that you're setting interactive to true, and all that does is that it makes sure that when you run an expression you're getting the results printed to your console, whatever you're using, so we're doing that, and then after that we're connecting to the crunch-based
database, and we're using as you can see it says like hopefully you can see it says ibis.sqlite, and so you can choose another database engine, you don't have to use SQLite, but you could use like MySQL or something like that, whichever database backend you want to choose, you can do
that, and we're saving that database connection to a variable called con, and then after this then allows us to be able to take a look, you can do things like list out the tables that are available in your database, you can also then choose a table and do different operations on
your data set, so here in the example we're taking the name column and we are finding the number of unique values in that column, and the next thing we're also you know finding the number of unique values in the region column, and you can do things like group bys and things like that as well, and so you can do like usually a lot of the intuitive things that you
can do with the pandas data frame you're able to do with ibis as well, but in a way so it's really cool in terms of you are kind of treating an SQL table in a similar way that you would use
a pandas data frame, but it's not loading that data into memory, so it's not taking as much energy and you're also not having to write SQL strings, which I don't know, depending on who you're talking to is like a plus and some people find SQL strings fine, so it just depends who you're talking to I guess. So let's actually talk about what this looks like under the hood.
So when we start out, like I mentioned before, you're creating your ibis expression, and so your user is writing that pandas like syntax, and then what is happening from there is that ibis then takes that expression and it type checks the expression,
the data, and then it also optimizes it, and it converts that into either an SQL alchemy expression or it converts it into SQL strings, and then that string or that expression is then
passed to a database engine, whichever database engine you want to use, you know there's lots of options with ibis as the back end that you can use, and so then that's executed and and then the database is the one that does whatever the operation is that you would like to do.
So one of the things that we can keep in mind as well, so for example if you're looking at this current example that's there on the screen, you can see the dot hid method is being called, and so usually if you are using something like a pandas data frame,
what would happen is that pandas calling dot hid would kind of load all of your data into memory, and then it would carry out the operations on all of the rows in your data set, and then it would return just that subset of data from the hid, but if you're using ibis what happens instead
is that because a lot of database engines are optimized to be able to like perform as efficiently as possible, instead on ibis what ibis does is there's a dot hid command, it actually takes
creates an sql expression from your first expression in ibis there, and it adds on a limit clause, so instead so on the back end what happens is that limit clause is passed to your database engine, and the database engine only retrieves the rows that the user has asked for
in that limit clause, and then it carries out the operations on that small subset of rows, and and then returns whatever the result is, so as you can see again it's just emphasizing that it's using the most efficient path in term instead of like having to consistently like load huge amounts of data into your memory. So going back to our example of elephants,
I'm actually, you know, well this is me in Zimbabwe, and this is me by an elephant, and just so you know elephants are really big, and they're really awesome
creatures actually if you ever meet an elephant in person, but that's me besides an elephant for scale to show you just like how big they are, and in a similar way that elephants are really big your data can be really big as well, and so sometimes when you have very large data sets,
you don't want to use the same database engine, so you might start off using SQLite, but you want to switch and use something bigger, maybe you want to move your data into the cloud and use something like BigQuery or something like that, and so because you want to be able to
switch your your data sometimes you also don't want to like change your code a whole lot and maybe do the same operations on the data, but you want to go from a smaller data set to a larger data set or something like that, and one of the cool things that IBIS allows you to do is to scale up or down based off of your data with very few syntax changes. So this is
kind of what it looks like in practice. So the first cell there that you can hopefully see on screen is if someone is using an SQLite database, and so there you're just connecting to that SQLite database and you're running the same code there, you're trying to like
group using a groupby and you're mutating your data as well, and if you then decide I want to try and use something like Postgres, all you have to do is change the connection. You don't actually have to change the code that's following, but you just have to change the
database engine that you're selecting or the database specifically that you're using, and you can do that again scaling that even higher to something like BigQuery, and so all of these options, something that is a cool use case for this is if you're using maybe a sample of small data and then you carry out your operations on that sample to kind of test out and see
do I like what it's returning, does this look good, and things like that in a quick way, and then after that you can then decide to take maybe that same exact code and then run it on your full data set using a larger engine. So really helpful I think for people who
often need to scale between different sets of data of different sizes. Okay, so going back to the beginning of our talk and we talked about art around the city of Dublin. So we want to go ahead, I think this is a great example to look at a practical way that we can use Ibis by exploring
the art around Dublin. So the first thing we're going to do is we're going to install Ibis. What I did when I was doing this was I installed Ibis, and in order to install Ibis you use a pip install Ibis framework. This is not really intuitive honestly, like it should be
pip install Ibis, but it's not because the name Ibis was already taken by another project apparently. And so I've done this by mistake and installed Ibis, just the framework itself, and it won't work because it's like a templating engine so you're going to have some issues.
So just as a reminder to keep in mind just install pip install Ibis-framework. And once you've installed that you want to import Ibis just as Ibis in the same way that you can see on the screen there. And then what I did to get
started was I set interactive to true and what I had done is I had gone onto that dub-linked website and I downloaded the CSV data and I stored it into an SQLite database and I named that database dublin-art. And so from there I'm saving that database to the db variable there.
And then after that we can kind of like look through a table. So the table that I saved all of this in was called dublin-art-table. And if you can look at the data that's available to us from that CSV file we can look at the councils that the art is contained in. We get information
about the artist themselves, their name. We get some information about what the artwork is titled and we also get like the location the specific location of where the art is and things like something really cool as well as a preview image of what the art looks like in the city as
well. So as you can see again this is very if you're used to using pandas this is very intuitive in when you're writing the code. And so an example of querying the data, a couple of things that I decided to do was I used a distinct method just to find out and
count methods just to find out how many artists were involved in this project and there were 420 unique artists in this project which is amazing because I'm like that's a lot of artists like in one city I feel like that's awesome. And then after that there were also five councils so it wasn't just dublin city council that was involved in this project but several other
councils were involved in like getting art out there. And there are several methods that are available for you to use in ibis and if you want to just take a look at those methods one of the ways that you can do it is if you have your table already you can hit the period and
then the tab button and just in the same way that you would in a normal Jupyter notebook the methods would like pop up on the screen for you. But another thing you can do is you can also take a look at the docs there over at ibis-project.org. Ideally it would be ibis.org
but I don't know who's naming all of these things because I don't know this is not intuitive but it's fine as well but you can visit ibis-project.org and get the docs there. So what I ended up doing the method that I used here was I used the filter method to be able to
select the locations of all of the arts specifically in dublin city so all of the art available and these were the locations that I found that were available and I wanted to specifically just take have a preview of the art that was available at New Street, South Dublin 8 was the one that I just decided to start with and this was really
pretty straightforward to be able to display the art there in a Jupyter notebook so all I did was I then you know chose the location column and looked for that specific street or that
specific location I executed the command to tell the db to like get that data for me and then I specifically used like the iloc function to be able to get the first art that was located in that location or in that column and so here on the screen I was able to pull up the
profile image that was based off of the location where I found that art and you can see it on the screen it's like in between I'm also not sure why they didn't paint the other boxes but like
they painted the middle one and so it's I'm not sure what it says but that was like the first art that I saw and then after that I decided as well that I wanted to specifically try and find the art that was surrounding us here at the Dublin Convention Center and I think it could
be kind of cool I don't know if well I'll try and make the notebook available but even if the notebook even if you don't use the ibis thing itself and if you see one of the pictures around and want to take a picture of it and tweet it or something like that that could be kind of cool if you see an art canvas or something around so I wanted to find the specific locations
closest to the convention center that had the art so I could kind of like spot it once I was here and so to do this I decided to use to change to Postgres SQL and Postgres actually has an extension called post just I guess is how you pronounce it and that allows you to use
sort of geographical or geospatial geometry data so that you can visualize like geometry data or use a geo JSON file and get data from that file a lot more easier than you would with other databases for example so switching over to Postgres was fairly easy like I also demonstrated
earlier all I had to do was change the back end to instead of ibis.sqlite I was now typing out ibis.postgres and then connecting to that Postgres database and I had previously before created the Postgres database and I'd load it in the geo JSON file that I got from
the website and all I was doing there or I'm doing on the screen there is I'm saving the the I'm saving the specific table that I created which I named Dublin canvas
to a valuable called canvas locations and then I'm listing out the columns that are available in that table and the main thing column that we are interested in that WKB geometry column and that just contains WKB data which is like a form of a way to store like longitude and latitudes as well so in terms of working with the actual geographical data I decided to use
Geopandas to help work with this data and so I got I was able to using ibis I was able to get the WKB geometries from my Postgres database and then I was also able to get the
titles like the respective titles based off of their geometries as well in those first two lines of code and then from there I'm using the Geopandas library to then convert that wk data into latitude and longitude and save that into a series called geo series and then I created a
a data frame a geo pandas data frame that contained the titles of the artwork with the geometries as well so I'm not I'm sure there could have been an easier way to do this but
this is like I was just experimenting and this was like the easiest way that I could find to do this so then after that I was able to also find the geographical location of the convention center itself the longitude and latitudes of the convention center and I added those into that data frame as well so I could like kind of see where the art was in respect to
the to the convention center so then finally in a Jupyter notebook which what I is one of the reasons that I appreciate using ibis is that you can do almost everything within one Jupyter
notebook so I then took all of this data and I was able to visualize it using folium and geo pandas as well and you using the explode method from geo pandas I was able to show the geo like the geographic locations of the art that was closest to the convention center
and actually if I was hovering over this is just a screenshot but if I was to hover over the other icons the names or titles of the art pieces would then show up and so this was really cool that I could actually see which art pieces were closest and there were two specific
ones that were very close the one on the left and the right of the Dublin convention center there were two that were the closest the first one is one called rabbits has anyone seen this one around no now look for it now you have like a good excuse to look for it and I've actually seen it but it was like there's a little bit of graffiti on it now I'm like
great thanks to the people in the city but like it has a little bit of red graffiti on it at the moment but it's called rabbit so if you spot it hopefully you can spot it another one as well the one on the other side closest to the convention center was an art piece called Beckett apparently Beckett is an Irish writer and also director pays or something like that so I
haven't actually seen this one I haven't spotted it yet but if you do feel free to take a picture of it so all of this I did using ibis I did this all in one Jupyter notebook I was also able
to do this in the most efficient way as possible because python I was writing python code but that python code was then being converted directly into sql strings that then told the David database what to do and so this was like one of the most efficient ways to be able to work with databases in my opinions with with ibis so that's actually all the time I have for my talk
today well that's the end of my talk if you would like to contact me you can feel free to email me my email is marlene at boltron data.com you can also tweet at me it's marlene underscore zw and I will release a blog post and a notebook that contains all of the code that
I shared in this session so if you'd like to take a look at the blog post just feel free to find I haven't released it yet but when I do it'll be at marlene mangami.com so yeah thanks everyone for listening okay thanks marlene for the very interesting talk
cool package we have plenty of time for questions almost 15 minutes so we also have remote sessions so we'll start with one in the room and then switch to remote if there's a question so if you want to ask a question please come to the mic out front hello you showed us
so thanks for the talk you showed us lots of ways to use ibis and I was checking a little bit online on the library it doesn't seem to be as good in terms of insertion and creation
of data from python to your database than it is in analytics have you tried that a little bit you have any feedback or yeah I think that that's a great question and I do think that's an issue that I came across in terms of like creation of data or if I have a csv file and want to
actually create a database using that csv file in python using ibis and it actually hasn't been like I actually asked the divs like the people who are working on it like actively what the plan for that was and they said that they're working on it at the moment I think
if you use there is a duck db back end for it and if you use duck db I think there's an option to be able to like as the back end there's an option to be able to load in a csv file that way and create a database that way but I do feel like it would be very useful to be able to just like load in your csv data into like and create like a MySQL like database right there
with ibis so I agree with you I think that's a pain point but I don't know when you're gonna fix it so that's good good question yeah yeah thank you for the great talk and introduction
to ibis I think it was on slide 10 that you mentioned that the when you have an ibis expression you can either compile that to a sql query sql string or compile that to a sql alchemy expression and I didn't quite get in which case you would want one or the other I mean
I guess compiling to the SQL string is the performance and obvious choice so when would you want the SQL options right that's a great question as well I think from what I've seen like it just depends on the back end that you're using so if you're using SQL lite I think for sure it
compiles to SQL strings but if you're using something like I think actually like for example the duck DB back end I'm pretty sure that compiles to SQL alchemy I'm not really sure what the rationale is for each one I think they try to find which one is the most optimized
to be able to get you the fastest result and so I think in terms of that that's probably how they do that based off of the back end and maybe also based off of like the type of query that you're you're doing as well so I think that's probably how they decide which one but it
kind of varies and I will say as well you can like it also like it also doesn't always like execute to or compile to SQL strings or SQL expressions if sometimes you can use like pandas as a back end or like pi spark as a back end or dosk and so if you want to like
write python code in ibis that you're used to and want to like have that be actually translated into dosk or pi spark you can do that as well with ibis so thanks for the question yeah maybe quickly check do we have a remote question maybe not then please go ahead
hi merlin and thanks for that great talk I also really enjoyed your recent youtube talk on the introduction to arrow that I'd recommend for anyone and interested in that and kind of looking at the connection so Voltron is obviously the kind of company associated
with Apache arrow are there any plans to kind of integrate sorry integrate arrow further into this and move away from kind of like a numpy basis and have it more into that arrow ecosystem into ibis yes yes I have definitely thank you for watching that talk by the way and yes I definitely know that there are plans to actually have arrow as an option for the
back end so to have like an arrow back end that you can use with ibis so that's something that they were actively like developing right now but it isn't something that is currently available but it is something that I know that they're developing I know as well like there's
some ideas to do this in collaboration with the duck db back end so I'm not sure if it's just that you're just going to be using duck db and that will convert it to arrow but it is it is yes that's a good question and it is something the developers are working on right now okay
great looking forward to it thanks okay if there are no more questions our next session will begin at 11 35 so we have about 10 minutes until the next talk please join me in thanking Marlene again for her great talk thank you