We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

The Continuum Platform: Advanced Analytics and Web-based Interactive Visualization for Enterprises

00:00

Formale Metadaten

Titel
The Continuum Platform: Advanced Analytics and Web-based Interactive Visualization for Enterprises
Serientitel
Teil
19
Anzahl der Teile
119
Autor
Lizenz
CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
Identifikatoren
Herausgeber
Erscheinungsjahr
Sprache
ProduktionsortBerlin

Inhaltliche Metadaten

Fachgebiet
Genre
Abstract
Travis Oliphant - The Continuum Platform: Advanced Analytics and Web-based Interactive Visualization for Enterprises The people at Continuum have been involved in the Python community for decades. As a company our mission is to empower domain experts inside enterprises with the best tools for producing software solutions that deal with large and quickly-changing data. The Continuum Platform brings the world of open source together into one complete, easy-to-manage analytics and visualization platform. In this talk, Dr. Oliphant will review the open source libraries that Continuum is building and contributing to the community as part of this effort, including Numba, Bokeh, Blaze, conda, llvmpy, PyParallel, and DyND, as well as describe the freely available components of the Continuum Platform that anyone can benefit from today: Anaconda, wakari.io, and binstar.org.
Schlagwörter
80
Vorschaubild
25:14
107
Vorschaubild
24:35
SystemplattformKontinuumshypotheseVisualisierungInteraktives FernsehenUnternehmensarchitekturEntscheidungstheorieOpen SourceGruppenoperationKontinuumshypotheseSoftwareplattformComputeranimationVorlesung/Konferenz
HardwareGebäude <Mathematik>KontinuumshypothesePublic-domain-SoftwareTheoretische PhysikSupercomputerInteraktives FernsehenAudiovisualisierungOpen SourceOffene MengeFokalpunktSystemplattformProdukt <Mathematik>RechenzentrumAnalysisEDV-BeratungCodeUnternehmensarchitekturServerDesintegration <Mathematik>UnternehmensmodellSelbst organisierendes SystemGewicht <Ausgleichsrechnung>AppletPlot <Graphische Darstellung>SystemprogrammierungBildschirmmaskeAnalytische MengeVisualisierungSupremum <Mathematik>CodecMereologieUnternehmensarchitekturMAPOpen SourceHardwareSoftwareGraphikprozessorInformatikWurzel <Mathematik>Gebäude <Mathematik>AdditionProdukt <Mathematik>ExpertensystemDivergente ReiheRechenzentrumFokalpunktDatenanalyseMultiplikationsoperatorKontinuumshypotheseNeuroinformatikSoftwareentwicklerKategorie <Mathematik>SchlüsselverwaltungEDV-BeratungSoftwareplattformEreignishorizontDomain <Netzwerk>Kartesische KoordinatenProjektive EbeneMultiplikationSpeicherabzugMaschinenspracheSelbst organisierendes SystemSchnitt <Mathematik>Euler-WinkelAnalysisRechter WinkelGruppenoperationPolygonzugParallele SchnittstelleDatenverarbeitungssystemResiduumZahlenbereichInformationsmodellierungWellenpaketTypentheoriePlotterUnternehmensmodellMobiles InternetGradientWort <Informatik>Globale OptimierungFlächeninhaltMehrkernprozessorEinfügungsdämpfungComputeranimation
Funktion <Mathematik>Framework <Informatik>VariableÜbersetzer <Informatik>SoftwareentwicklerProgrammiergerätEindeutigkeitPunktspektrumSampler <Musikinstrument>Objekt <Kategorie>RechenwerkVektorrechnungDatenanalyseDistributionenraumSpieltheorieInstallation <Informatik>SoftwareplattformOpen SourceDatenverwaltungDokumentenserverBinärdatenKontinuumshypotheseNotebook-ComputerTermGebäude <Mathematik>UnternehmensarchitekturProgrammierumgebungServerVersionsverwaltungMereologiePunktwolkeVorzeichen <Mathematik>Analytische MengeGruppenkeimBrowserInteraktives FernsehenAudiovisualisierungSkalierbarkeitMultiplikationSpeicherabzugCompilerGraphikprozessorGlobale OptimierungBefehlsprozessorSkriptspracheAppletSelbst organisierendes SystemMereologieNotebook-ComputerDatenanalyseBildschirmfensterPortabilitätProgrammierumgebungDatenverwaltungSoftwareOpen SourceAusnahmebehandlungFormale SpracheMaßerweiterungPunktwolkeAnalytische MengeKontinuumshypotheseUnternehmensarchitekturKartesische KoordinatenAutorisierungDistributionenraumHilfesystemEinfach zusammenhängender RaumResultanteCodeWort <Informatik>DifferenteRechenzentrumVersionsverwaltungProgrammierspracheHalbleiterspeicherBitSoftwareentwicklerInstallation <Informatik>Mooresches GesetzProjektive EbeneServerSchnitt <Mathematik>ProgrammiergerätSystemidentifikationSystemverwaltungPunktspektrumFreewareWeb-DesignerSoftwaretestProgrammbibliothekGrenzschichtablösungKategorie <Mathematik>BinärcodeMinkowski-MetrikSystemaufrufOffene MengeZahlenbereichTypentheorieSoftwareplattformTeilmengeEindeutigkeitMusterspracheGruppenoperationUnternehmensmodellEuler-WinkelReverse EngineeringEinfügungsdämpfungZentrische StreckungPhysikalisches SystemKälteerzeugungProdukt <Mathematik>Web SiteZentralisatorÄhnlichkeitsgeometrieCoxeter-GruppePunktBildgebendes VerfahrenDialektTermWärmeübergangEin-AusgabePlotterGebäude <Mathematik>Leistung <Physik>Quick-SortStabInverser LimesGemeinsamer SpeicherVorlesung/KonferenzComputeranimation
AudiovisualisierungBrowserInteraktives FernsehenSkalierbarkeitMultiplikationBefehlsprozessorSpeicherabzugGraphikprozessorGlobale OptimierungCompilerSoftwareplattformAppletSkriptspracheOpen SourceIntelMathematikKernel <Informatik>Elektronische PublikationProgrammbibliothekROM <Informatik>FreewareFunktion <Mathematik>MaschinencodeSpieltheorieTeilmengeChatten <Kommunikation>Reelle ZahlBildverarbeitungMandelbrot-MengeCodeDateiformatProgrammDatenanalyseAnalysisOverhead <Kommunikationstechnik>SystemprogrammierungPunktArchitektur <Informatik>Regulärer Ausdruck <Textverarbeitung>Sampler <Musikinstrument>Übersetzer <Informatik>Interface <Schaltung>AbstraktionsebeneKeller <Informatik>TabelleE-FunktionParallele SchnittstelleDistributionenraumVerschlingungZeichenketteIndexberechnungZählenGraphiktablettSymboltabelleEntscheidungstheorieProxy ServerCASE <Informatik>Generator <Informatik>Arithmetischer AusdruckSymboltabelleAbfrageGraphVerkehrsinformationRechter WinkelSehne <Geometrie>Notebook-ComputerDatentypInformationMultiplikationSoftwareplattformLikelihood-FunktionMereologieInternetworkingStreaming <Kommunikationstechnik>Minkowski-MetrikArray <Informatik>Formale SpracheMultiplikationsoperatorMaschinenspracheElektronische PublikationFunktionalLaufzeitfehlerCodeZahlenbereichArchitektur <Informatik>DruckspannungTypentheorieFlächeninhaltVerdunstungOpen SourceKartesische KoordinatenDateiformatNeuroinformatikShape <Informatik>SpieltheorieTeilmengeMetropolitan area networkSelbstrepräsentationKeller <Informatik>NamensraumTabelleSchlüsselverwaltungData DictionaryElement <Gruppentheorie>ProgrammbibliothekOrtsoperatorDifferenteRechenbuchHilfesystemMobiles EndgerätAbstraktionsebeneMAPAbstandDatenanalyseWort <Informatik>SoftwareentwicklerProjektive EbeneTranslation <Mathematik>PunktVarianzMapping <Computergraphik>SummierbarkeitProgrammierumgebungFundamentalsatz der AlgebraLastFront-End <Software>ÄquivalenzklasseGruppenoperationKategorizitätJust-in-Time-CompilerImplementierungIntegralDeklarative ProgrammierspracheVirtuelle MaschineComputeranimation
StellenringMini-DiscVerschlingungIndexberechnungRechenwerkAudiovisualisierungBrowserAppletSkriptspracheServerClientInteraktives FernsehenGasströmungCodeHalbleiterspeicherPlotterDifferenteBenutzerbeteiligungProgrammbibliothekMAPMusterspracheRelativitätstheorieSoftwareentwicklerMultigraphNeuroinformatikArithmetischer AusdruckProzess <Informatik>Rechter WinkelPartikelsystemAudiovisualisierungMathematikVideokonferenzBitVolumenvisualisierungVisualisierungInteraktives FernsehenLastZentrische StreckungDemo <Programm>Vorlesung/KonferenzProgramm/QuellcodeXML
MultiplikationsoperatorMathematische LogikPunktspektrumNeuroinformatikGeradeBefehl <Informatik>Prozess <Informatik>AudiovisualisierungZahlenbereichSoftwaretestDemo <Programm>MAPFrequenzSoftwareplattformZeitreihenanalyseQuick-SortBrowserVisualisierungBildgebendes VerfahrenCoxeter-GruppeTwitter <Softwareplattform>DifferenteProgrammbibliothekApp <Programm>SpieltheorieComputeranimation
Web ServicesBenutzerbeteiligungSzenengraphBrowserSocketDatensichtgerätObjekt <Kategorie>TelekommunikationDifferenteObjektmodellServerGamecontrollerHydrostatikMathematische LogikWeb-SeiteFront-End <Software>PlotterInteraktives FernsehenMathematikDynamisches SystemAnalysisWeb SiteSchreiben <Datenverarbeitung>Computeranimation
MultitaskingMetropolitan area networkVersionsverwaltungKartesische KoordinatenProjektive EbeneGruppenoperationMathematikDifferenteSichtenkonzeptBildverstehenMultiplikationsoperatorZeitreihenanalysePlotterBitBrowserDimension 3TensorAuflösung <Mathematik>Web SiteFlächeninhaltServerSoftwareplattformSkriptspracheProgram SlicingSelbst organisierendes SystemDruckverlaufFrequenzInformationsspeicherungRechenschieberGeschlecht <Mathematik>AudiovisualisierungWidgetResamplingBenutzerbeteiligungComputeranimation
ServerNotebook-ComputerSkriptspracheAppletMinimumMathematikAudiovisualisierungNichtlinearer OperatorMAPSoftwareentwicklerOrdnung <Mathematik>MultiplikationPlotterSkriptspracheProzess <Informatik>Bildgebendes VerfahrenMengenlehreHardwareComputeranimation
SkriptspracheAppletLipschitz-StetigkeitWidgetHauptidealAutorisierungBenutzerfreundlichkeitWeb SiteArithmetische FolgeEreignishorizontCodeMAPComputeranimation
Divergente ReiheDigitalfilterDemo <Programm>SoftwareentwicklerClientOpen SourceEDV-BeratungZentrische StreckungClientCodeOffene MengeOpen SourceZahlenbereichSelbst organisierendes SystemDatenanalyseSoftwareplattformSchreiben <Datenverarbeitung>MAPComputeranimation
PasswortMathematikFormale GrammatikInformationInterface <Schaltung>PunktFlächeninhaltKategorie <Mathematik>Prozess <Informatik>MAPSichtenkonzeptArithmetisches MittelMereologieAkkumulator <Informatik>SoundverarbeitungTrennschärfe <Statistik>RichtungArchitektur <Informatik>DialektSoftwareentwicklerPlotterHochdruckFront-End <Software>CodeInteraktives FernsehenWidgetVorlesung/Konferenz
Offene MengeDokumentenserverProzess <Informatik>EDV-BeratungServerWellenpaketComputeranimation
Transkript: Englisch(automatisch erzeugt)
I need to talk about Continuum platform. It's a sponsored talk, so hopefully I won't sound too sponsored-ish.
My intent is actually just to talk about some of the technologies we're working on that are open source. I'll give you a little brief insight into what we do as a company, but mostly I'm gonna talk about the open source tools we're doing that really drive, from my experiences, NumPy and SciPy communities. We are basically a team of scientists, engineers, and data scientists trying to build tools
for others, scientists, engineers, data scientists. We feel like in the wider ecosystem of computer science and computer technology, that category of people, the domain experts, the scientists, the domain scientists, tend to get left behind as people focus on developer tools only. So we tend to be developers that focus on the scientific tools. And there's a lot of need for this,
and the real essence of the big data movement is really getting insight from those data, and that insight requires models, scientific models, typically. I'm Travis Oliphant. My background is in NumPy, SciPy. I'm actually on the PSF. I'm a PSF director currently as of June. I started the NumFocus Foundation. We'll talk a little about that beyond.
I've been a professor at BYU. I've been a scientist myself. My roots are as a scientist, but we created a company really to allow other people to build open source software. We love open source software. Peter Wang is my co-founder. Two and a half years ago, we built Continuum. Our whole purpose is really to allow other people to help us build open source and deliver it to the enterprise
and really make it a part of everybody's enterprise experience. So that's what we're about. We love open source. It's part of our DNA. I've been contributing to open source since 1998 when I first found Python, and I've been a Linux user. A lot of us do a lot with open source. Now we've got 50 people worldwide. We have remote developers.
Depending on the project, remote developers work really well. Sometimes it can be difficult. So we try to find those projects where remote developers can work really well, but they are available. We have major contributors to NumPy, SciPy, Pandas, SymPy, IPython, and we love more. We love new open source products as well. We think that open source can be more than just a hobby.
Our desire is to grow the community. That's why we started the NumFocus Foundation two and a half years ago as well. This foundation, its whole purpose is to promote accessible computing in the sciences and to back NumPy, SciPy, Pandas, SymPy. A lot of these projects, they are emergent open source projects with just kind of a loosely affiliated community and not much money to help them.
And so NumFocus' purpose is to gather money from enterprises and drive it towards sprint development, towards open source scholarships, towards diversity training, diversity events. NumFocus also sponsors and promotes and actually receives any residual income from the PyData ecosystem, the PyData conference series.
We're having one as an affiliated event to this event, so please come to the PyData conference. You'll hear all about the great scientific tools, the great data analysis tools that are emerging. Now as a company, what we sell is enterprise consulting and solutions and using optimizing performance to managing DevOps in a big data pipeline to building native applications in the web or on the desktop.
We also provide training, Python for science, Python for finance, as well as practical Python through our partners, Dave Beasley and Raven Henninger. And then we are building the Continuum platform, which is a product for kind of taking the desktop to the data center and back that allows people to deploy data analysis applications and dashboards.
So our products are all centered around that platform. They kind of take the appearance of Anaconda add-ons and Anaconda Server, Wakari Enterprise. I'll show you briefly just those. The key behind these products is to really give experts and scientists what they really are asking for. I've been spending a lot of time myself as a scientist. I kind of understand what the workflows they desire are
and we're trying to bring that to large organizations, large companies. So this is a picture I show of the Continuum platform. You can see that it rests on an open source base and an open source base that we contribute to greatly. We continue to contribute to it. The IPython, SymPy, SciPy, NumPy, Pandas, that basic baseline. And we have additional open source products
that we're writing and growing. Numba, Bokeh, Blaze, Diane, Conda, LVMPy, PyParallel, all these things are trying to bring high level scientific applications, make them easier to write, make them faster, make them take advantage of the hardware that's changing today, GPUs, multicore. I wrote NumPy six years ago. I still know all the bad places
where it's not optimized. There's many, many places and it's not optimized because it can't take advantage of multiple cores or can't take advantage of multiple GPUs. On top of that, we deliver Anaconda and then above that are some of the proprietary applications that we provide, all about creating applications that can deploy in the enterprise very, very quickly and really empower the domain experts
that exist in every organization. Why Python? We love it because it provides a spectrum. What you'll see here in the Python community is kind of different categories of people. You have some people that are web developers and they love that. Some people are DevOp folks and system administrators and they love that. And then I kind of in the camp of data scientists, scientists, and sometimes it can be challenging
because we don't all speak the same language and so we kind of talk and use different words and different terms, different libraries. But one thing that's great about the Python community is it is a community and people for the most part listen to each other, try to work forward on solutions that help everybody. And in particular, some of those people that are in the Python community aren't even developers. They're what I call an occasional developer.
They're the cut and paste programmers. I have an idea. I kind of want to put a few things together and Python, it fits my brain. It's partially leveraging my English language center so I can kind of understand what it's saying and I don't have to be a developer to use it. And I can build things very quickly. Python does that. It's very unique actually among all programming languages. Now NumPy, it plays a central role
in the kinds of tools that we build. It's at the center of a large stack of data analytics libraries. There's a lot of users of NumPy actually. I think about three and a half million. It's hard to tell because they don't ever tell me. They don't write home and send me a postcard. Sometimes it'd be nice, you could actually get a sense of who did and who used it. As a company, so that's kind of what we build on,
but as a company, we ship Anaconda. Anaconda is a free, easy to install distribution of Python plus 100 libraries. One thing that's challenging about the NumPy stack is it uses extension modules, it uses C, it uses sometimes Fortran for SciPy. How do you get that installed? It's not enough to just have a source install solution. We have to have a binary install solution.
So we invented Conda and Conda is, and we work with the Python packaging authority to try to promote Conda, help understand how it fits in to the overall packaging story in Python. But essentially it's like Yum and apt-get for Linux, except it's for all platforms, Linux, Mac, and Windows. It's a fantastic distribution that people rave about and they love it when they use it.
Why do they love it? I think Conda is a big reason. Conda is a cross-platform package manager. It helps you manage a package and all its binary dependencies. It's an easy to install distribution that supports both Python 2.7 and Python 3.3. You can actually install Anaconda for 2.7, then create environments. I just had a talk by Red Hat, they call these software collections in the Linux space.
We call them environments. They're system-level environments that lets you, they're more than just Python, they support anything. So you can run Python 3.3 in a separate environment on a Python 2.7 base. You can also do the reverse, get a Python 3.3 base and run Python 2.7 as a compatibility test development environment separately. It's a fantastic solution for bridging the gap
between Python 3 and Python 2. Then there's over 200 packages available. scikit-learn, scikit-image, ipython-notebook, just at your fingertips, conda install gets them and you're off and running. No more compiling dependencies than we're trying to figure out how to install it. And this is all for free, completely free. You can even redistribute the binaries we make. So that's Anaconda.
Its purpose is to make Python ubiquitous in data science and have, there should be no excuse for anybody in the world using Python to solve their data analysis needs. And that's why we made Anaconda. Get it at continuum.io downloads. It's free for downloading, free for distributing as well. And we do sell some things on top of that. As a company, we have to stay in business. We have to sell something. And part of that is Anaconda server.
It's a commercial supported Anaconda, provides support, provides identification licensing. It also provides a package mirror and kind of a management tool. And if you're interested in that, I can talk more about that to others. Come see me later. Binstar.org is, you can see kind of what Anaconda server might look like on a on-premise installation. By going to binstar.org, signing up, get a free account,
and you can upload there any package you like. There's a three gigabyte limit, so don't just show up all your movies in Anaconda packages. But you can put any binary package you like and share that with somebody else so that they can easily install your solution. And as long as it's public, as long as anybody can download it, it's completely free. Well, Kari is our hosted analytics environment solution.
It's a fantastic way to quickly and easily get running with the IPython notebook. You can sign up and instantly you're in an IPython notebook running code. Now, the free version gives you a node with only a little bit of memory and only a tiny bit of computational power. But it's great for teaching, for showing, for demonstrating. If you want more power, you can easily upgrade to get as powerful a node as you like.
Then Makari Enterprise is the on-premise version of that cloud story. It's adapted for, the UI has changed to allow LDAP support integration, it installs to internal servers, it has the notion of projects and teams, and lets people instantly collaborate on a large-scale project, and then share the results of their workflows with others very easily.
So from desktop to data center is kind of our platform story. It helps you Anaconda on the desktop, Makari on the data center, and a seamless connection between the two so you can go from writing code on your desktop to deployed applications that are on the cloud or on the data center on-premise. So that's our solution. That's the thing we are building together as a company that helps all enterprises everywhere.
But the part I like the best is the open-source tools that we're actually building as a part of this. We feel it's critically important to continue to build open-source technology. So we have key open-source technology that builds on top of NumPy, SciPy, SciPy, Pandas, and the rest. So Blaze, Bokeh, Numba, Conda. I don't really have time to explain all of these in the brief time I have.
Tomorrow, my keynote, I'll be talking not about all these technologies, a little bit, I'll mention Numba, probably mostly about Blaze and kind of how I see it as part of the story for the future of big data analytics. We do have some add-ons. I've talked about those before. So I'm gonna briefly talk about kind of these technologies, get you excited about it.
We're looking for help. We're looking for developers who can help us with each of these. These are paid positions. So one is Numba. Numba is really a technology about taking the C Python stack and providing compiled technology to it. So PyPy is a fantastic project, but it doesn't integrate with the NumPy stack very well. NumPy, Matplotlib, SciPy, SymPy. So we took and took the LLVM technology stack,
and with decorators, we can take a function, compile it to machine code, and integrate it in with the rest of the NumPy stack very easily. Takes advantage of the LLVM tool stack. The kind of work that we're doing is to basically translate a function that looks like this with a JIT decorator, create a general assembly kind of code via the translator,
and then LLVM takes that code and runs on your platform. It can do amazing things. I think it changes the game. It lets Python essentially be like a compiled language. It's a subset of Python, and we can go into the details if you like later. But a subset of Python can now, you can write it in the Python syntax and get compiled performance
just as if you'd written C or C++. And we have numerous examples of that. Very, very easy way to get optimized performance out of your Python code. Here's a simple example of a Mandelbrot generation. Gotta have your Mandelbrot generation example. It illustrates the ability to call functions and have that bypass the Python runtime and essentially be a low-level machine code. So this is one way to bypass the GIL.
Use Numba to add a JIT, and now you have a Python, it's not in the Python runtime anymore. It's actually compiled code, and you can release the GIL and execute that. So that's Numba. Blaze is our data to code seamlessly. It's about taking, the fundamental problem Blaze tries to solve is when you have data in,
let's say it's an HDFS, or somebody else in your team says, well, I think we should have it in Postgres with Greenplum, or maybe we should have it in Netezza. Maybe we should have just a bunch of HDF5 files. That decision of how you store your data ends up determining how you write code, how you write your queries, how you write your solution in Python. It shouldn't be that way.
There ought to be a way to write an expressive table-oriented code that then you just plug in whatever data you have, and even allow you to cross different tables and have the same expression work across all those tables. So Blaze is a foundation for large-scale array-oriented computing that leverages the technologies that are out there already. So with data,
this is describing some of the pain involved in data. There's many, many kinds of formats. The big data pipelines are constantly changing. It can be difficult to reuse code in that environment. The Blaze architecture has an API. It has some fundamental pieces, a deferred expression and a pluggable compute infrastructure and a pluggable data infrastructure. So it's a flexible architecture
that it can scale across multiple use cases. So data, for example, it can be stored as CSV files, or a collection of JSON files, or HDFS, or HDF5, or just in SQL. You can add your own custom data type. So a simple API lets you add it, but then your Python-level expression is common. It's more numpy-like. You can slice it, you can dice it, you can grab pieces of it.
And then you can write in a compute graph that refers to part of that data. So this is a compute abstraction that basically can sit on the top of multiple backend libraries, things like pandas. DIND is a next-generation numpy equivalent. It's a C++ library, does the same things as numpy, but allows other, it's more general, allows things like variable-length strings,
ragged arrays, and categorical data types, which are missing from numpy. It can also sit on top of Spark, which is part of the Hadoop ecosystem. PyTables, which from our friend Francesc, who's sitting in the back. Then this Blaze expression graph, you can write a single expression and have it attached to multiple data sources and pull it all together
in a single application. Here's a simple example. We have a generalized data declaration format called DataShape, which generalizes numpy's dtype. And this DataShape allows you to describe data universally in a way that can sit on top of multiple data formats. So here I'm creating a symbolic table. And this symbolic table, then I can write an expression
involving that symbolic table, including joining, group bys, aggregations. Now that creates a deferred expression. And then the load data, there's different implementations of load data, depending if my data's in SQL or if it's in Spark. And then I simply map the elements of what I've loaded to basically a dictionary representation of the namespace that that compute is going to evaluate in.
And then the compute maps the expression graph to the actual backend calculations that are needed. So whether it be pandas in memory or Spark on a 100 node cluster, the same code can be executed. So this is the load data showing the difference between a Spark and pandas. I'll talk more about this tomorrow
because I think it really sets the stage for reusable computing and reasonable expressions and helping people make sense of the diverse and changing world of big data and large scale array-oriented computing. So the last technology, I didn't tell you a lot about Conda. I've got a lot of videos out there.
If you want to hear about Conda, there's actually some jokes about me constantly talking about Conda because I love it so much. You can find videos about Conda on the web. I'm going to talk about Bokeh, which is our visualization library. I'm really excited about the visualization library. A lot of people are as well. It basically allows you to do interactive plotting in the web without writing JavaScript. So as a Python developer,
you can write interactive visualizations in the same spirit as D3, but using Python. Now it's still in development, but quite a bit can be done already. You can have novel graphics. Actually, the violin plot came from a Seaborn library using the matplotlib compatibility of Bokeh. So you could have a matplotlib plot and then essentially render it with Bokeh
to provide the interactivity and the JavaScript rendering. Lots of different kinds of graphics can be built. There's even streaming and dynamic data that can be built. I have a simple demo here I'd like to show. Basically, it's running in the background. So if I go to, this is just my basic computer, and it's been running for a while, and it's the microphone.
What I'm doing is using the NumPy stack to do a Fourier transform on the audio coming from the microphone and show that spectrogram in a couple of different ways. So I can see the time series. I can see the frequency. Here's the time series. Here's the frequency spectrum, and then here's a image map of the frequency spectrum. Take this line and rotate it and stick it in an image, and then it kind of moves across. So I get a spectrogram image over time.
And then here's just a radial plot, just for fun. So you can see that this is sampling the microphone. Can't whistle that high. Anyway, and there's things to do with the game.
So this is a JavaScript library, and you can actually do this from Python. Currently, this demo is currently written taking advantage of the Bokeh.js backend, but it's being written in Python, so you can show just how to do this sort of thing in Python and create these kind of visual apps.
It illustrates many things about the platform that I think is the new platform for visualization, which is the web browser. So this is what we're doing. It's what it's about. It's the kind of things you can do if you come with us, if you come work with us. Let's go back to my presentation, not the Twitter feed, although that,
maybe some of you tweeted. The other aspect of dynamic interaction is that because it runs, there's a web socket communication and an object model, Bokeh creates a scene graph in the web browser and an object model that can be reflected in the Python side. So you can write objects, an object model in Python that gets reflected to the browser, and you can have server side control. You can also just have all that logic in the browser
and have kind of a static web page that has all the interactive logic in the browser. So this is just an example of essentially the web service updating the plot, and then the backend server updating the plot in Python and having the web display change. So it's a great way to handle streaming data and all kinds of different interactions.
You can also do big data kinds of analysis pretty easily with this kind of setup. I can go to, this is running actually in the US, so these are time series that are stored on a server, and I have just a ability to zoom in. So you can see it actually updates. It zooms in initially with the data it has, then it goes back to the server and updates a higher resolution version,
and these are all links, these different plots. So it's just a simple example of resampling. Then I can reset the view, and then it expands out after it grabs the data. This is actually back in the US, so there's a little bit of latency between them. Here I have an example. I'm actually looking at the whole world. You can see I've zoomed into a particular slice. This is a worldview. It's a three-dimensional time series, about four gigabytes of data
we got from the JPL from NASA, and it shows the ocean view, and it's in time. So I'm seeing a 2D projection of the world, but this slider changes the time view, and it takes a little bit to bring back all that data, but if I zoom in to a particularly interesting area of the world, I can see that it updates from the server.
She gives me back this resolution view, and then I have projections that show the period through time, and I can change which slice it shows. You can see it's updating down here. So that's just an example of an application built with the visualization, and the kind of things you can do very quickly and then deploy in a website, in a web browser,
across your organization. Another, you have little widgets you can provide. This is just an example of a simple widget and some dummy data about downloads, and it'll adjust as I slide through it. These are the kinds of things you can do from Python without running JavaScript using Bokeh and its application technology.
So there's lots of, that's the gist of what we're trying to do on the platform, basically from data to visualization and beyond, and make it easy for people to do it at a high level so they don't have to be expert developers, they don't have to change and know everything about SQL and about JavaScript and about development operations in order to get solutions that take advantage
of multiple kinds of hardware, multiple kinds of data sets, and high level ideas. So no JavaScript, just a little more example of the kind of plots you can do. With Bokeh, there's actually gonna be a tutorial at PyData. I invite you all to come to PyData, and the tutorial given by the principal author
of Bokeh, Brian Vandivin, will be here. There's also a great website you can go to that will explain Bokeh, bokeh.pydata.org, it's got a gallery, you can go in and look at the code. It's still a work in progress, it's 0.5 just came out, the widgets just came out, it's making rapid progress, but it's usable today, but if you find something that you want
and it's not there, let us know, and I'm sure it's either on the roadmap or it will be added if you let us know about your particular needs. Okay, so that's a quick run through of the technologies we build and the kinds of things we do, and basically I'll end by talking about the openings that we have.
There's many openings for the Numba team, for the Blaze team, the Bokeh team, embedded consultants. If you wanna live in New York, come talk to me. I have great opportunities for you in New York City, and these are opportunities that not only work with a client, but work with the rest of our team in helping us build this platform based on open source technology that can benefit large and small scale organizations
around the world. We're really excited what we're doing. We think we have ideas that can really help and transform the way people write code and write code for high level data analysis, and we'd love to have you join us. So I think with that, I'll ask for questions or anything else you wanna know about.
So we got any questions? Thanks for the talk. I have two questions regarding the Python part of the Bokeh.
So first of all, I remember that at the beginning, the Bokeh was something which was trying to implement the grammar of graphics for Python, but recently I saw that there's no mention in documentation about the grammar of graphics. Are you still using the same kind of interface, or is it just?
I would say it's not the grammar of graphics. Well, I know that some of the developers see the grammar of graphics as a good direction, but not necessarily complete. And so Bokeh.js itself uses concepts for the grammar of graphics and its architecture. The interface is something that can be added on top. So for example, ggplot, which is a,
it currently has a back end to matplotlib, can easily be retargeted to Bokeh.js. In fact, we have examples using matplotlib's interface of doing that. So I'd say the grammar of graphics discussion is higher level than kind of Bokeh and Bokeh.js. And the second question is regarding the widgets and interactivity of the plots.
So as I understand, widgets is something that you can play with to, for example, select some data points, get some old information about particular data. Is it something that you have to implement in JavaScript, or you can just use Python code to define the widgets? For which? For example, if I want to select some data points
and print maybe some toolkit. Right, selecting data points and printing them. I believe that's on the roadmap to be done from Python. I think currently to do that, you have a nascent Python interface to that. And so if it works for you, it might be enough, but it's possible that it's still not quite complete, that API. So the idea is you won't have to use JavaScript.
I'm not sure if we're completely finished with that API on the selection of points side. Any more questions? And Brian will be here later today and he can give you a lot more explanation of Bokeh. Anyone else? No? Okay, then thank you very much, Travis. Right, thank you.
Thank you. Thank you.