Using Ruby to Build a modern Memex
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Alternativer Titel |
| |
Serientitel | ||
Anzahl der Teile | 66 | |
Autor | ||
Mitwirkende | ||
Lizenz | CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/46574 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
Ruby Conference 201862 / 66
5
10
13
14
17
18
21
22
26
29
37
45
46
48
50
51
53
54
55
59
60
61
63
65
00:00
Gebäude <Mathematik>VideokonferenzInformationBrowserElektronisches BuchTwitter <Softwareplattform>Formation <Mathematik>DatensatzGraphInformationMathematikTelekommunikationProdukt <Mathematik>VideokonferenzAutomatische IndexierungAnalogrechnerBitInterpretiererProjektive EbeneVirtuelle MaschineElektronisches BuchVersionsverwaltungNichtlinearer OperatorProzess <Informatik>GradientInformationsspeicherungLesen <Datenverarbeitung>Umsetzung <Informatik>BrowserVerkehrsinformationPlastikkarteChatten <Kommunikation>Genetische ProgrammierungTextur-MappingNeuroinformatikPen <Datentechnik>LoginKurvenanpassungMultiplikationsoperatorURLDigitale PhotographieRechter WinkelPackprogrammAssoziativgesetzTwitter <Softwareplattform>JSONXMLTechnische Zeichnung
05:38
Demo <Programm>KnotenmengeClientSpezialrechnerAbfrageDokumentenserverBimodulSichtenkonzeptStellenringDichte <Stochastik>RechenwerkVideokonferenzTurtle <Informatik>Physikalische TheorieElektronischer ProgrammführerService providerMarketinginformationssystemCodeUmsetzung <Informatik>Keller <Informatik>GoogolPufferüberlaufHill-DifferentialgleichungDatenstrukturFormale SpracheGraphInformationMehrrechnersystemOrdnung <Mathematik>PerspektiveSchaltnetzSoftwaretestPhysikalisches SystemResultanteE-MailQuick-SortAbfrageSpannweite <Stochastik>KorrelationsfunktionLesen <Datenverarbeitung>Interaktives FernsehenBrowserZoomKonditionszahlDifferenteFahne <Mathematik>Kontextbezogenes SystemMultiplikationsoperatorPufferüberlaufDigitale PhotographieMessage-PassingKeller <Informatik>Demo <Programm>Ruby on RailsTwitter <Softwareplattform>DokumentenserverPolygonzugVisualisierungCASE <Informatik>InstantiierungSchnittmengeComputeranimation
11:01
ComputerTermNormierter RaumGerade ZahlKonvexe HülleLesezeichen <Internet>MathematikSystemplattformATMInhalt <Mathematik>SpeicherabzugZeitstempelMessage-PassingDatenbankDatensatzDatenstrukturGraphSynchronisierungVideokonferenzBenutzeroberflächeGefrierenBitDivisionExistenzaussageMereologiePhysikalisches SystemProjektive EbeneRechenschieberVirtuelle MaschineWiderspruchsfreiheitZeitzoneFestplatteQuick-SortAbfrageVersionsverwaltungReelle ZahlCASE <Informatik>FehlermeldungTranslation <Mathematik>TeilgraphSchnittmengeLesen <Datenverarbeitung>Web-SeiteThreadRegulator <Mathematik>Reverse EngineeringFokalpunktBrowserDateiformatQuellcodeMultigraphDatenmissbrauchElektronische PublikationElektronisches ForumTableau <Logik>DifferenteZeitstempelNeuroinformatikMinimalgradKontextbezogenes SystemMultiplikationsoperatorDigitale PhotographieSchreiben <Datenverarbeitung>Message-PassingDienst <Informatik>Figurierte ZahlYouTubeTwitter <Softwareplattform>FacebookSoftwareentwicklerApp <Programm>Lesezeichen <Internet>MathematikStereometrieKombinatorische GeometrieGerichteter GraphLastPhysikalische TheorieNP-hartes ProblemRechenbuchObjekt <Kategorie>URLComputeranimation
20:16
DatenverarbeitungssystemVirtuelle MaschineMIDI <Musikelektronik>Zellulares neuronales NetzTermMaschinenschreibenGoogolFontZahlenbereichZeitzoneSuite <Programmpaket>StrebeRechenwerkService providerSurjektivitätE-MailDisjunktion <Logik>SichtenkonzeptBrowserBefehl <Informatik>DatensatzGraphInformationMathematikRückkopplungDatenverarbeitungssystemVideokonferenzProgrammiergerätAnalogrechnerBitGruppenoperationPhysikalisches SystemProjektive EbeneVirtuelle MaschineZeitzoneE-MailAbfrageVersionsverwaltungCASE <Informatik>Prozess <Informatik>Coxeter-GruppeRoutingGleitendes MittelThreadBrowserVorzeichen <Mathematik>DatenmissbrauchOpen SourceWeg <Topologie>NewsletterSchreib-Lese-KopfTextur-MappingDifferenteRechenbuchNeuroinformatikKontextbezogenes SystemMultiplikationsoperatorDigitale PhotographieMessage-PassingDemo <Programm>Twitter <Softwareplattform>ComputerspielGroßrechnerAnalogieschlussLastLeistung <Physik>Güte der AnpassungComputersicherheitURLComputeranimation
30:01
EindringerkennungCodeDatenbankDatensatzInformationMathematikStatistikModelltheorieKategorie <Mathematik>BenutzeroberflächeTotal <Mathematik>BitForcingGruppenoperationPhysikalisches SystemStellenringZahlenbereichVerschlingungQuick-SortAbfrageVersionsverwaltungInternetworkingAusnahmebehandlungCASE <Informatik>CyberspaceInstantiierungQuadratzahlNotebook-ComputerPunktNegative ZahlSchnittmengeLesen <Datenverarbeitung>BrowserData MiningQuellcodeGenetische ProgrammierungPunktwolkeOpen SourcePasswortSchlüsselverwaltungZeitstempelZehnp-BlockKontextbezogenes SystemMultiplikationsoperatorEntropie <Informationstheorie>Digitale PhotographieMessage-PassingDienst <Informatik>Demo <Programm>Twitter <Softwareplattform>SoftwareentwicklerApp <Programm>ZeichenketteGenerator <Informatik>Physikalischer EffektSkriptspracheVorlesung/Konferenz
37:41
Formation <Mathematik>XMLComputeranimation
Transkript: Englisch(automatisch erzeugt)
00:19
I hope you enjoyed your lunch.
00:21
So I'm Andrew Lewis. I'm gonna be talking about a personal project I've been working on for the last little while. It's called Memex, and I'm gonna explain what that is and go over the historical version and my version. So I'm gonna start with some history. Usually we don't have too much history at tech conferences, and if it's history, it's about how the last technique from a year ago is outdated.
00:40
That's history. I'm gonna talk about the kind with black and white photos, and it's really old. We're gonna start in the 1930s, and the character we're talking about is Vannevar Bush. So this is a picture of him here. He was an inventor and engineer at MIT. He built some of the first analog computers. So these were really cool, large mechanical devices, and they would do math problems, more or less.
01:01
Usually calculus, that was the hard thing to do otherwise. And he built some of the largest, first computers, and they were really cool. He worked out of MIT, as I said. This is an example of one of the machines. So what it's doing, the operator on the right is actually tracing the curve of a graph, and then the integral is being calculated.
01:21
So really cool devices. So in the 1940s, everything shifted in American science because World War II happened. So everything reoriented. People were pressed into the war effort. And in the home front, there was also an explosion of information, and this was being used to support the war effort. So new bureaucracy sprung up, new processes,
01:42
new information was being produced. So just like going through the photos from this time, it's pretty cool just to visualize how much new information and reports were coming out, just rooms of people just creating information. And Vannevar Bush had a problem. His job during the war was to interpret reports, scientific reports, and make recommendations to the president.
02:01
So every day, his desk would be flooded with things to read, things to understand, and he was just buried under information. So he had a phrase. He says, we're being buried under our own product. So technology was allowing us to create information more than ever before, but we didn't have tools to make sense of it or understand it. So when the war wrapped up, he put his engineering hat back on,
02:21
and he published an essay that would create a device that would solve all these problems. It was called As We May Think. It was from 1945. And he came up with this device called the Memex. So here's an illustration of it, a pretty simple desk-sized device, screens on the top and microfilm inside.
02:40
And the Memex, he said, is a device that an individual stores all their books, records, and communications, and then you can search through it, navigate it, and this was really cool. It would just be the one place where your personal information would be stored in. It would have this cool stylus for adding notes and drawings into it, this voice recorder that you could add voice memos into your Memex, even this clip-on camera
03:03
that you can add photos into your Memex. It's a little bit like science fiction, but this was the idea he had. But the reason that we're talking about the Memex still to this day is this idea that he had. So Vannevar Bush identified that when we're looking for information in our brain, we don't go through an alphabetical index, we don't go through a categorical index.
03:21
We navigate almost like a graph, so we think about who we were with or where we were or what we were doing at the same time. And he said that the Memex could do this mechanically. So if you imagine these are nodes in the Memex's data set, as the user would navigate through their data, the trails between the nodes would be recorded. And then later on, as the user wants to search for something, they can use these trails
03:41
to find something by association again. So the unfortunate thing about the Memex was that it was never built. It was a conceptual device, it was just an essay. And yeah, it was never built. This was very unfortunate. So it made me sad. I'll just talk a little bit about myself.
04:00
I'm an information pack rat. If there's a piece of information that I've generated or seen or created about myself, I like to store it. So this is my journal from grade five. I started doing that and haven't really stopped since then. My report cards that I've kept, this is from kindergarten. This is my movie stubs that I've saved a record of, each one of them.
04:22
This is a map I tried to do, recording my walks through the city. This was before Google Maps. I just wanted to have a sense of where I was walking in the city. Even my chat logs from high school I saved, and if you read this carefully, they're really not worth preserving, but I thought it might be fun to save them. Not the deepest conversation. But in this kind of new era, this new digital era,
04:42
we have more information than ever before, and I'm being inundated with personal history. Like, it's too much to archive. And it's just an overwhelming problem for me. Like, I'm generating more than ever, and it's harder and harder to track. So a while ago I was thinking about this. I think the most obvious solution would be maybe to talk to a therapist.
05:02
This is not what I did. Instead, I used Ruby. So I decided to build my own Memex, and this was a way to solve this kind of personal archiving problem. So the first thing I did was just going about gathering data. So first of all, it's a lot of my reading and browsing history.
05:20
So for my RSS reader or my browser or my ebook reader, I save records of what I read or consume. My digital consumptions, like the things I like on Twitter or the videos I watch, music podcasts, I have records of these and I have importers. My location history from my GPS device in my pocket is always recording me. And then my messaging and social interactions
05:41
like email, Slack, and then a lot of stuff like qualitative data like journaling that I mentioned before, just annotations I take or notes that I make. So I have all this large dataset and I've kind of put it into a single place. I'm gonna do a live demo now. Last night I had a nightmare that my demo didn't work and then everybody was filing out. So this is my nightmare.
06:00
If this happens, maybe just check Twitter instead of walking out and I'll make it better. So I'm gonna do the live demo now. Cool, so this is, is that, people can read, is that okay? Yeah, cool. Okay, so this is the basic screen. What we have at the top is a query, an active query. Then we have the results being displayed on the timeline
06:21
and we have an idea of results over time on the left. So what we're seeing now is a query for everything I've done on GitHub. And you can see it's a range of information, like it's a ticket I created. Or what I can do now is add another verb, so verb like. So this is everything on GitHub that I've liked. So this is the repositories I've liked.
06:43
Just to give an idea of the graph structure that I'm using for my system, we can turn on this visualizer here. So everything's from the perspective of myself. So I'm in the middle. This is the Andrew Lewis node. This is a repository that I've liked and these are the tags that are associated with it. So in this case, Electron has been shared by a few repositories
07:01
that I've liked. So I can use this graph structure to navigate through my personal history. So if, for instance, I wanna find everything Electron-based that I've liked on GitHub, I can do something like this. So I'm doing a traversal, finding Electron, then finding repositories that it's about and then times I've liked that.
07:21
So I execute this query. And yeah, so these are the repositories about Electron that I've liked on GitHub. And I can use this technique for all kinds of things, like I can find my listening history and I'm doing a traversal now. I wanna find songs where the creators are Aretha Franklin.
07:42
So now this is gonna be a query for every song that I've listened to that was created by Aretha Franklin. And then I can add other conditions. So for instance, love. So now we're adding a full text search. So this is every song that matches love by Aretha Franklin that I've listened to. So you can kind of get a sense of how I can use this information to navigate through my personal history.
08:02
I can also do pretty open-ended queries. So something pretty simple, like just searching for Ruby. Like what's everything in my system matching Ruby? So it's a combination of this is like a tweet that I've liked. This is some browser history. If you're wondering about why the photos are there, it's because I have OCR on every photo that I have.
08:20
So if I add the photograph tag, you can get an idea of like photos that where it's matched Ruby. So I can always search my personal history through even just imagery. This is pretty useful for even like tweets that have a lot of text. I can search all these. So to do something similar, I can do verb messaged.
08:40
So this is every message that I've sent or received that matches Ruby. Right, so it's from a range of sources, this email. It's fun to go back in my personal history and go back, this is 2005 now, and see like some of the early thoughts, the early discoveries about Ruby on Rails.
09:01
And it just kind of gives a sense of like where I was at at the time. I forgot like how difficult I found Rails at the time. And it's something you forget, but like this lets me almost like zoom back in my time and understand it. This is the first mention of Ruby. I don't think it's about Ruby the language. I think it was a high school chemistry test. But it's fun to be able to navigate my personal history that way.
09:22
Perhaps more usefully for developers, I have my bash commands. So this is like everything I've done on the command line, which I think a lot of people have through bash history. But the cool thing I can do, like beyond searching, I mean, this is still, you can do this with your bash history. Like I can do a search for a tool.
09:41
But the cool thing about my system is that I have a lot of different information and it's all collated by time. So gifts that call is a command I use every now and then to create animated gifts. I always forget the order of the flags. So what I can do is go back in time, find the time I used it, and then kinda see what else I was doing at the time. So in this case, I was looking at some Stack Overflow history.
10:01
And just being able to see different things on the same timeline in the context is really powerful for understanding where I was at, understanding how I came to an idea or how I learned something. So a lot of the original Memex was about reading. So I have my full reading history. So just to do the broad query, this is all the books I've read.
10:22
So a lot of people see the amount of data I have and they wanna do kind of quantified self stuff. So I mean, I have that kind of data. So this is a graph of my reading history over time. I can do silly things like my reading time by hours. So this is like the hours of the day that I read. I can also do silly things like correlation.
10:43
So does reading increase my mood? I can kinda maybe derive some sort of answer about this. Maybe if I read more, my mood goes up. But honestly, the more I do this sort of query, the less interested I am in this data. And I think the really powerful thing for me is just the context and being able to see
11:00
where I came to something. So in this case, I mean, let's see this reading session. I can go back and see the context of that day. So I was at home, I was with my partner. It was a four degree day, which is Celsius. And then I can also see what I was doing in that reading session. And so I can just see the quotes that I saved. I can see the browser history I did
11:21
while reading that book. And if I ever come across this link, for instance, I can always go back and see where I came from it. So in this case, I came while reading a book, and that's pretty powerful for understanding personal history. So that's just a rough idea of the system I've put together. I'm gonna do some more queries at the end, but I'm gonna go back to the slides for now.
11:41
So it worked, I'm glad about that. Sorry, something's freezing here. Okay, cool. Yeah, so I'm just gonna give a rough overview. So I wasn't showing it here,
12:00
but I have the whole app running in an Electron app, and that holds both the interface that you saw and then the importers that run and collect data. And then those talk to an API. There's basically two endpoints. There's one read endpoint that handles all the queries that I do, the subgraph queries. And then there's a separate endpoint for the writing. And generally, the importers have write-only access, and they use a single endpoint to just hammer the API
12:22
with personal history that is being imported. And this talks to a really fancy graph database that I discovered. It's called Postgres. So I've kind of explored other ways of doing this, but honestly, Postgres is rock solid, and there's a node table and a relationship table, and all the graphs are kind of generated
12:40
from these very simple queries, and Postgres is amazing at doing this. It's amazing for full text search. It's amazing for geolocation. So I've just settled on this, and it really works well. So the API is in Rails. The interface is in Ember.js. The desktop app is in Electron, and then the importers are all in Ruby. So I'm gonna focus on this for a bit.
13:00
So the importers are set up as an ETL system. So ETL stands for extract, transform, load. So first we extract from a source. This could be a scraper. This could be an API importer. It could be a big zip file of data. Then we, oh, yeah, I don't know what translation. So we transform it.
13:21
I don't know what's going on. Okay, so we extract it, we transform it, which is changing that piece of data into the schema that I control, and then we load it into the API. So I wanna focus on this, the data problem that I have. I think you look at this system,
13:40
and you think the interface with the hard part or the API was the hard part. Really, nothing compares to how much difficulty I've had in just trying to acquire my personal data, and every time I'm finished, something will break, or something will not work, and I have to go back and spend more time on it. So there's a bunch of problems.
14:01
Something is, yeah, so I mean, the main problem is that there might not be an API for something I'm trying to do, and this is often the case. For example, Kindle, you know, you read on your Kindle device, and you save quotes, or you have a reading session, and there's no API for Kindle.
14:21
So what people typically do and what I had to do was scrape this page, and it's pretty awful stuff. That's really the only way to get this data. Even if there's an API, it's not one that might be designed for me to use. So a lot of these things are private APIs that I have to somehow reverse engineer or understand. This is an example. This is my iMessage history.
14:41
So iMessage syncs to the hard drive, and then I can read it off an SQLite database, but I have to kind of make sense of this horrible schema and make sense of where my message is and how to get the timestamp, and it's a lot of reverse engineering. It's quite difficult, and it changes a lot. This is another example. Even if there's a public API, it might not be a good one.
15:00
Does anybody work for Twitter here in the room? Shoo, oh, okay. Some hands half up. Okay, so the Twitter API is really challenging, I think, to integrate with. There's a lot of inconsistencies. I'm gonna focus on one in particular, the favorite. So when I favorite something on Twitter, I want a record of it because it's pretty nice to be able to search it. The favorites endpoint is sorted by the publish time.
15:22
So if I favorite something from three years ago, that's gonna be like hundreds of pages back in history, and there's pretty much no way that I can notice that new favorite unless I page back every single time. And there's no timestamps of when I've liked something, so to build an accurate timeline of the things I've liked I really have to pull this endpoint every three or four minutes
15:40
and then page back 100 pages. So it's quite challenging to actually get my data in a good format this way. This is just like a forum post where people vent about this. Does anybody work for Fitbit in this room? Okay, no. I mean Fitbit's great. I have a Fitbit. It works really nicely. It takes my heartbeats. It records my sleep,
16:00
which is really important for my timelines. If you squint closely, you might notice something that's missing from these timestamps. Does anybody see what's missing? Let me yell it out. So it's missing a time zone. So the Fitbit time is actually local to the device. So I came from Eastern time, and as I crossed into Pacific time, now all my timestamps don't have a record of,
16:22
they're now based on Pacific time, and I have no record of when I switch time zones. So this data set is almost worse than useless because if I'm gonna build a timeline, I'm gonna put that sleep record in the middle of my browsing history. So it's almost useless in a sense. So yeah, this is another forum post
16:40
where people are venting about the lack of time zones. Even if there's a good API for what I'm trying to do, it might not last forever, and this is something I've been noticing more and more over the time I've been working on this project. So here's an example. Instagram used to have a pretty good API, and you could go in and look at your photos you published or the photos that you've liked, but then suddenly with the Cambridge Analytical scandal,
17:02
they kind of freaked out, and they just deprecated their API if you look closely immediately. So there's almost no advance warning on this, and it really caught a lot of developers off guard, and as of now, there's really no way that I can get my Instagram history into my system. Another example is YouTube. So YouTube has a pretty good API.
17:20
I used to be able to pull my watching history from which videos I've watched on YouTube, and it's a really nice source of data for searching later on. They removed their history end point, so I can no longer see this, and they never really explained why. They just sort of silently removed it. It caught a lot of people off guard as well, and here's an example of a forum post where people are venting about it. Someone kind of calls them out on what their mission is
17:41
and asks them if they forgot about this, and Google hasn't really said why they removed the viewing history, and this is a big problem for me because I really want this data. Okay, so if you're thinking GDPR has solved this, GDPR is the new privacy regulations that are coming out of Europe, and one of the stipulations is that you need to have a user,
18:00
users should be able to export their data in a machine-readable format. So in theory, this is great for me. A lot of services have added export features like Facebook, Instagram now has an export, but the problem is that these export formats are not really great. They're not good for me as a user. They're not really good for, so they're not good for me as a human just to read the export, and they're not really good for machines to parse. So here's an example.
18:22
This is my Facebook messaging history. So in May, I would download that big zip file that you get, and you get a big JSON file of the messages. So this is how they represented the participants in a messaging thread. In September, this is what they did. They just kind of changed the structure into a different sort of object, and there was no documentation or warning
18:41
about this export change. Like, I mean, it's really not designed for the kind of thing I'm trying to do, so I mean, they're not gonna notify the change, but it really makes my importers fragile because I have to always watch out for errors and then try to correct them really quickly. So and another thing they did is they added 1,000 millisecond precision timestamps, and again, this just broke my importers one day,
19:01
and I had no idea why, and I had to kind of figure it out and then add that change. So I ask myself this all the time. Am I doing something that just shouldn't be done, like sort of an uphill battle? I'm trying to collect my personal information and bring it into one place, but I really feel like the tech world has not made this easy for me, and this is not something I should be doing.
19:22
It seems like it's not really encouraged, and I ask myself this all the time, but I think I take a little bit of comfort from the first Memex because I think it did a lot of things wrong itself back in 1945. So the first thing that it did wrong was computers were supposed to be big, so this is a computer here you're looking at.
19:40
If you think the little things that people are using are the computers, that's not the computer. The computer is the room of people. They're computers. If you look at the label, it's the computing division. So computers were large size. They were either humans doing calculations. Even later on, this is a hard drive. Computers were large. The Memex, on the other hand, was like this desk-sized device.
20:01
This is like a mock-up of what it might have looked like. It's not real Memex, but someone put together a version of it that what it might have looked like. So it's a desk-sized single-user system. So it was smaller than computers were supposed to be in the 1940s. Another thing was the Memex was solving,
20:20
it had the wrong audience. Computers were for institutions, for corporations, for big groups that had money. So this is an example of one of Vannevar Bush's early analog computers, and it was being used by MIT. This was MIT's analog computer. Or corporations had large mainframes. This is what computers were supposed to be for. And then I think this is the most important.
20:41
Computers were used for hard, important problems, world-changing problems for institutions. The Memex, on the other hand, was just solving a pretty simple personal problem. It's like, how do we stay on top of information overload? So computers, this is Bletchley Park, and it was being used for cracking Nazi codes,
21:00
hard, important problem. This is a computer being used for doing ballistic calculations. It was called the ENIAC, one of the first digital computers. This is the UNIVAC doing, recording the census. So computers were used for these big, hard problems. Yeah, the Memex was this single-user device that was just there to help people understand their personal history and their search and navigate it and make sense of it.
21:22
And I think the phrase that rolls around in my head is a machine for the mind. So this was not a computer for doing payroll calculations and stuff like that. It was a machine for understanding our own minds. And this was the really unique thing about the Memex, and the reason we're still talking about it now, it created this thread of almost yearning
21:40
for personal devices that would help us and be able to use technology to help and empower individuals. And I think as Rubyists, a lot of the jobs that are available to us are working for advertising, are working for creating surveillance systems. There's a lot of things that we can use our power for. I'd like to challenge us all to maybe think about how we can use Ruby
22:01
to just help us to maybe expand on some of the dreams that we had 50 years ago, or really make our brains better, make sense of the world. So I'm gonna do another demo now. So was anybody at Keep Ruby Weird on Friday? Cool, a few hands. So that was last Friday in Austin.
22:21
This started on Tuesday in LA. So a few friends and I were going to be at both, and we got to thinking, wouldn't it be a bad idea to drive in between? So we looked it up on Google, 20-hour drive, that's a lot of driving. I always love when Google suggests that you're doing something wrong by telling you to fly instead. But yeah, we decided, we had a little bit of time,
22:43
so we'd drive in between. So I'll just go over some of the queries I can do based on the road trip and how I use it in my own life and kind of what it's enabled me to do. So I'm gonna switch back to my demo now. So Keep Ruby Weird was a conference in Austin.
23:00
It took place in the Alamo Drafthouse. So as I'm sitting there listening to talks, I might look up stuff and that's a pretty good, I might remember it in two years and wanna find that thing I was listening to about. So I can do something like this. So verb browsed and it occurred within the Alamo Drafthouse.
23:21
So this is all my browser history that I did while sitting in that room. I can make it more specific. I might have remembered that I looked up a Wikipedia article based on a talk. I might not remember anything else about what that talk was about or what the context was, but I remember that I looked up a Wikipedia article in the Alamo Drafthouse. So because of the way I had my data set,
23:40
I have a way to find this. So in this case, it was a really cool talk that you should all watch the video of, but I looked up what a quine was. So the graph system I have based on my personal history allows me to navigate my own personal history in all kinds of different ways. So as I said, we decided to drive between Austin and LA. So let me just pull it up
24:01
kind of like the overall record. So the verb I use is traveled. The instrument is automobile. There's a lot of clumsy language, but bear with it. And I went with my friends Phil and Max. So I can do occurred with Max, Phil. So now this is all the automobile trips I did with Phil and Max, and I can plot it on a map
24:20
and just to kind of load the full trip. Yeah, so this is the route we took. So I have this history available, this timeline of our drive, but I can use it to do all kinds of things like look up the photos that we took along the way. So we're photographed, and it occurred during an automobile ride.
24:41
So this is every photo I took while driving with my friends Max and Phil, and it should load. Yeah, so this is like the photos I took. And again, because of the OCR, as we were crossing into California, the time zones changed. Programmers like time zones, and I might make a joke about it, and I want to look up that photo I took of the sign where the time zone changed. So I can use the OCR to just do
25:01
kind of like a shot in the dark here. I'm looking for photographs that involve time zone while driving with my friends Phil and Max. And this should load. Yeah, so this is the photo of the sign when you cross into the time zones, and it lets me find this kind of stuff pretty easily. Another thing is I arrive here and people ask, oh, did you eat anything cool along the way?
25:21
So I can do a search for, so I want to stay with my friends Phil and Max as I worked with them the whole weekend. And then I'm gonna find visited. So these are the places I visited. But I want to scope it down to activities during. And so what we're doing here is finding places where inside that visit I ate something.
25:43
So this pulls up all the restaurants I ate while with Max and Phil. And it kind of gets the sense of where I was. And then for each one of these things, I can always load more context and see what that day was like. So again here, Phil and Max come up. It was a 22 Celsius day. And then I can look within that and see
26:01
what did I actually eat at the ski. And so I drank water, I had a burger. Here are the photos we took. Here's me looking up. We were directly on top of the San Andreas Fault, so I have the record of me searching for that and being a bit worried about it. And it gives a sense of what I was doing, and it's a fun way to navigate my personal history.
26:22
And so while we were in there, the restaurant has this wall of dollar bales where people will leave messages. And let's say I couldn't remember the restaurant name and I couldn't remember what trip it was on. I can find photos of these dollar bales that we left in a totally different way. I can just do something like for photographed.
26:42
I'm still with my friends Phil and Max, and then I wanted to involved about money. So I have image classification going on my photos, so I can do something like this. And it should pull up the three photos. I'm not sure what I typed wrong. But yeah, it would pull up the photos of us leaving the money on the wall.
27:04
So we arrived in LA when we put on the map again. So this is our driving again. And as we're driving into LA, I'm like, oh, this looks familiar. I'm pretty sure I've driven on this stretch of LA before. So I can look at the trail that we took into the city
27:21
and just use it to search, when was it around this location again? So what I've done is I researched for traveling history. And if I zoom out, I can see other trips I took in 2016 when I was here. And I drove on that stretch of highway before. And I can kinda confirm that I've seen it before. And it's really fun for just having that intuition of if you've seen something before.
27:41
Another thing similarly is on Tuesday, did anybody go to the Broad Gallery? It's up the street. It's really cool. It's a new contemporary art gallery. So I popped over there. You should all do it if you have a bit of time. And as I'm walking through the art, I noticed an artist and her name seems familiar. So just one of the things I can do, and I do this a lot, is just search someone's name or a topic
28:02
and just see what the broad history is. And as I suspected, these are the notes I took while I was in there, but I can also see, her name was mentioned in a email newsletter that I got. It was also in my RSS reader. I looked up her Wikipedia article when I was liking a tweet. It gives this incredible context.
28:21
As I'm walking through the gallery, I can really pull up what I've seen about a topic before. And it really helps me be present and understand my own history and then enjoy the present a lot more because I can fit it into this narrative that I have in my head about my personal history. Yeah, so that's what I can do with my personal history.
28:44
One phrase that rolls around in my head, this is also from Vannevar Bush. So he said about his machine, as the human molds the machine, so the machine remolds the human mind. And I find myself doing this all the time because I have all this personal history in now. I use this system all the time. And because I use it, I have a desire to add new features
29:01
or to have different visualization or add a new way to interact with the data. And that molds the machine. But vice versa, as I add features, the machine kind of molds me as well. I understand myself better. I navigate the world differently. And it's a really nice symbiotic relationship, a really good feedback loop that I find myself in.
29:23
So would anybody use something like this? Show of hands. Okay, that's cool. I think people probably who didn't put up their hands are worried about privacy and security. That's also cool. But yeah, if you are interested, I do plan on launching some version of this in the future. You can put your email address into this. I'd love to talk about it
29:40
or send me an email if you have thoughts or ideas about this. But yeah, I do hope to launch this, mostly as an open source project. I just want to figure out how I can receive some revenue for working on this. But yeah, thanks for listening and I'd love to take questions if you have any.
30:02
Yeah, so the question was, how does the system know what I ate? So in general, everything you saw in the demo was automated, like it was sources like my phone or browser history being pulled in automatically. There's a few categories that I still do, I manually tag. So that's the friends I see, the eating that I do, and a few other things. I have a really simple system for tagging.
30:20
I'll record that I ate a burrito or I ate pizza or something like that. So that's manual. Honestly, the eating and drinking records and that kind of stuff is less useful. I think you could get a lot of value from the system without having that kind of stuff. But yeah, I do that manually. Yeah, cool. Okay, so the first question was how often I use it. I use it all the time.
30:41
I'm in an art gallery and I'm searching all the time to see the context. Or I meet someone, I'll just check, maybe they're a friend of a friend or I've seen their name in Twitter or something like that. So I use it all the time. That's the main reason. There's also kind of introspection, like seeing what kind of things I'm reading that's different from a year ago. So I use it for introspection.
31:01
And then I guess the follow-up question is how has it changed me? I think it's hard to point to one single thing, but I think of this almost like a journal. Journals are really good for understanding where you were in the past, how you've changed, what you were stressed about that you're no longer stressed about. And I think everybody should have a journal or some version of it. It's really powerful. So I think what I have is kind of a supercharged version
31:20
of that journal. And then the benefits I get are all the benefits that a journal would give you. So being able to read my history about my first few years writing Ruby, I was really, I didn't get it. I struggled with a lot of stuff. I was excited by it. But it's cool to read that now and then see the things that I'm not stressed about or things that I don't find hard or the things I'm not anxious about. So you really understand yourself that way.
31:42
There's a lot of practical things like just being able to look up commands that I ran or things that I looked up. That's really powerful and I use that a lot. Yeah, yeah, totally. So yeah, the question is about are there kind of negative habits that come out of the system? I think most of my friends would say that they don't really notice the fact
32:01
that I'm tagging things or searching too much. So I think that's good. I try to kind of keep it, I don't tell people automatically what I'm working on. So I think I lead a pretty normal life except that I have this system available. There are definitely things like, I'm always thinking, so when I go from one restaurant to the bar next door or something like that, that's often not enough space for my GPS to record
32:21
that it's a new venue. So I have walked around the block in the past to trigger the timestamps of when I shifted venues. So there's silly little things like that. I'm pretty obsessive about having really high quality data. So I will do things like that. Or when you click on a link in Twitter, it doesn't necessarily open in Safari.
32:41
And therefore I won't have a browser history. So often I'll just open in Safari that I manually trigger a history entry. So I have a lot of habits like this just to force the history to be recorded. But I hope it hasn't really changed me too much. I've always been an introspective and nostalgic person. So this is feeding into that. I don't know for better or worse, but that's just who I am.
33:01
Yeah, totally. So the question is about habits and identifying them. Yeah, I think there's a lot of really simple things. I think one of the techniques around habit forming is being able to do something for 30 days or whatever the number is. And for me, I can set up a dashboard and say do I read every day? And I can mark that green square for every day
33:23
and then build a habit around it. So it gives me a powerful tool for creating habits. And also a powerful tool for just understanding what habits I might need to create. And if I don't wanna change myself, then this thing isn't necessarily gonna force me to do it. But it lets me, if I wanna change myself or I'm able to identify something, I wanna change it.
33:42
It's pretty easy to set up a dashboard or set up a query to look at that habit and then maybe figure out an action item out of it. So the question is where does the code run? So what you saw in the demo was all running off my local development version on my laptop. What I'm playing around with is an Electron app that houses the interface and houses the importers.
34:03
And then underneath it, the API can run in Docker or your own cloud instance. And I think that would be the model I would wanna go towards. What I absolutely don't wanna do is have a system which aggregates personal information from everybody at the same time. I don't want a centralized version of this. It's unethical, I think. So yeah, what I think will make sense
34:22
is just an Electron desktop app with the database either on the hard drive or in a personal cloud. And the data's just kind of like... I haven't set it up to be multi-instance. I don't even have a user ID in the database. It's just designed to be a single user app. Yeah, so the question is about data growth over time.
34:41
There's a lot of data sets that are just streaming thousands and thousands. I think it's minute granularity on Fitbit data, so that's thousands per day. Also, my GPS trails are thousands per day. So there's a lot of that kind of statistical data that's being generated. But yeah, overall, I think it's about tens of thousands of data points per day are being added.
35:02
Browser history will always be little thousands. The photos I take will be maybe 50 a day. So yeah, tens of thousands a day. So the question is, I showed my Bash history. Obviously, that could be a really bad idea. So the question is, do I remove keys or things that I use? I think it's forced a really good habit,
35:22
which is not putting keys in your command line history. There is a way for me to delete that if I accidentally put it in. The other thing I was thinking about doing, I haven't implemented it yet, but I think I could build some sort of entropy checker, and if it's a high entropy string, that could be a password, and I would just maybe prompt the user or do something about that. But yeah, in general, I think because it is
35:41
a single-user app with an audience of one, like myself, I'm a little bit more tolerant of putting secret data. I have all my text messages which are full of gossip. There's a lot of stuff that is very personal that I trust in the system, and I know that I could delete it if I have to. But yeah, I think something like a password
36:00
is a use case that I probably should specifically try to address with maybe an entropy checker or some sort of watch, yeah, because that's pretty dangerous. Yeah, so the question is if I open-source it, someone could come along and build a service that aggregates information and mines it from users. I mean, maybe. I think the only reason someone would be comfortable
36:23
installing this is if they could look at the source and understand that it's me who's not a bad person, and there's kind of an open-source community around it, and it's not being centrally hosted. I think if someone asks you to sign up for a centrally hosted version of this, I think most people would rightfully not want to do it. Like, it's just a stupid idea, I think.
36:41
So I mean, that's my hope. My hope is that users will be smart enough not to sign up to any old person who wants to centrally host this version, but I guess that's not really a good enough answer, but I don't know. I guess that's the reality of the internet system we've created, so yeah. How often do I import the data? Yeah, okay, so the question is, yeah, how often do I import?
37:00
Generally, things are running every few minutes if I can, because I really want accurate timestamps. But some things have good timestamps in the actual endpoint itself, so I don't have to run it that frequently. But yeah, the importers are constantly running. Even while I've been standing here, my system now has the records of me presenting in it.
37:21
So generally, things are always flowing in. Yeah, and it's all just bash. Just like RubyScript's running on cron. I don't think I see any other hands. So thanks for listening. Again, if you're interested in using this or trying it or have ideas or horrors about this, I'd love to talk, and you can sign up if you're interested. Thanks a lot for listening.