Mailpile
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 199 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/32570 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
00:00
Data storage deviceQuicksortFreewareCodeFormal languagePhase transitionDuality (mathematics)Multiplication signEncryptionMassSoftwareResultantProjective planeState of matterAsynchronous Transfer ModeSocial classProduct (business)E-learningServer (computing)Power (physics)Web 2.0Physical systemSearch engine (computing)User interfaceSoftware developerEmailClient (computing)2 (number)WebdesignProcess (computing)Goodness of fitNeuroinformatikXMLUMLLecture/ConferenceComputer animation
02:42
EmailSoftwareSocial classForm (programming)Server (computing)Point cloudTwitterWeb serviceMultiplication signSelf-organizationFilter <Stochastik>Stallman, RichardVirtual machineOpen sourceRight angleNatural numberWordNP-hardQuicksortComputer fontInformation privacyForestLecture/Conference
04:26
Internet service providerGravitationNatural numberSound effectQuicksortStrategy gameInformation privacyWeb 2.0MassCentralizer and normalizerSpeciesGraph (mathematics)Vector spaceExtension (kinesiology)Flow separationSoftware developerWebsiteWeb browserLecture/Conference
05:13
GoogolMessage passingDirection (geometry)WebsiteQuicksortUniqueness quantificationSoftware developerRevision controlWeb 2.0Point cloudOpen setEmailLecture/Conference
06:01
Client (computing)Video gameForm (programming)EmailReading (process)Content (media)Automatic differentiationPoint cloudKey (cryptography)Standard deviationEncryptionProcess (computing)Cloud computingLecture/Conference
07:04
SoftwarePoint cloudRegular graphEmailFreewareBuildingEncryptionComputer animation
08:15
Process (computing)QuicksortFreewareSoftwareMultiplication signSearch engine (computing)Projective planeOnline chatUser interfaceEmailSlide ruleLecture/ConferenceComputer animation
09:32
Forcing (mathematics)PlanningBitMultiplication signSoftware developerDisk read-and-write headWritingMereologyLecture/ConferenceComputer animation
10:28
Revision controlSoftwareProjective planeEncryptionElectronic signatureReal numberAlpha (investment)MereologyPhysical systemUser interfaceBayesian networkMultiplication signRule of inferenceDivisorFilter <Stochastik>Search engine (computing)BitQuicksortOffice suite
11:54
Source codeSharewareEncryptionBranch (computer science)NP-hardStability theoryInternetworkingAlpha (investment)QuicksortKey (cryptography)Control flowData managementElectronic mailing listMoment (mathematics)LoginSoftware developerWebsiteBound statePhysicalismResultantNormal (geometry)Line (geometry)Online helpFlow separationData storage deviceLecture/ConferenceXML
14:15
Multiplication signImplementationEncryptionWeb pageSharewareStatisticsEmailKey (cryptography)Duality (mathematics)System administratorSound effectMessage passingControl flowState of matterDifferent (Kate Ryan album)Cursor (computers)Local ringDefault (computer science)Client (computing)WordSlide ruleImage registrationGotcha <Informatik>User interfaceDigital photographyPhysical systemSoftware developerLecture/ConferenceSource code
17:28
Search engine (computing)Event horizonMultiplication signEmailDefault (computer science)Message passingCartesian coordinate systemOpen sourceCore dumpData conversionDifferent (Kate Ryan album)1 (number)INTEGRALMedical imagingLaptopRight angleMiniDiscCustomer relationship managementData managementKey (cryptography)FacebookQuicksortArithmetic progressionWeb browserCuboidContent (media)Bit rateLecture/ConferenceSource code
19:08
Plug-in (computing)Open sourceData storage deviceDesign by contractDatabaseOnline helpContext awarenessPlane (geometry)Electronic mailing listPlastikkarteSoftware developerInheritance (object-oriented programming)Lecture/ConferenceSource codeProgram flowchart
19:47
User interfaceKeyboard shortcutMultiplication signRepresentational state transferEmailFront and back endsComputer-assisted translationPoint (geometry)Search engine (computing)Source codeLecture/Conference
20:33
Type theoryCodeInformationResultantQuicksortPlug-in (computing)Electronic visual displayThumbnailMedical imagingDatabaseTranslation (relic)Source codeLecture/Conference
21:33
1 (number)Address spaceEmailElectronic mailing listMessage passingAttribute grammarProfil (magazine)Source code
22:16
Power (physics)Moment (mathematics)Line (geometry)Message passingEmailLocal ringPoint cloudSynchronizationSearch engine (computing)Source code
23:28
Slide ruleProjective planeUser interfaceMessage passingInternetworkingDefault (computer science)Expected valueEmail1 (number)SoftwareStandard deviationFreewareOpen setLecture/ConferenceSource codeXML
24:30
SoftwareEmailGoodness of fitLinear regressionInformation privacyCodePlug-in (computing)WritingMessage passingIntrusion detection systemMereologyParsingComputer fileMultiplicationInfinityElectronic mailing listQuery languageReading (process)Computer hardwareRule of inferenceMetadataType theoryMappingAttribute grammarUser interfaceWebsiteMoment (mathematics)Asynchronous Transfer ModeSet (mathematics)MiniDiscEncryptionFormal grammarPhysical systemFormal languageNumberField (computer science)Uniform resource locatorMobile appWeb 2.0Instance (computer science)Term (mathematics)Right angleFinite differenceBuildingCore dumpSearch algorithmProjective planeSubject indexingScaling (geometry)Different (Kate Ryan album)Electric generatorTouchscreenMobile WebResultantExistenceMatching (graph theory)QuicksortNeuroinformatikInformationUser interfaceSearch engine (computing)Interactive televisionLine (geometry)Level (video gaming)Default (computer science)Web browserFile formatVirtual machineCryptographyServer (computing)Configuration spaceProcess (computing)Software testingPolar coordinate systemSummierbarkeitPoint (geometry)Touch typingSharewareElectronic program guideKey (cryptography)WordIdeal (ethics)SurfaceSystem callMathematicsTotal S.A.BitCASE <Informatik>Filter <Stochastik>CuboidLecture/Conference
33:35
Search engine (computing)Electric generatorFigurate numberRange (statistics)Plug-in (computing)EmailEncryptionSubject indexingRight angleLogical constantBitArithmetic progressionMathematical optimizationInformation securityTerm (mathematics)Limit (category theory)Lecture/Conference
34:38
Query languageOcean currentSearch engine (computing)Content (media)Subject indexingCryptographyEmailMereologyInstant MessagingOrder (biology)Set (mathematics)MetadataMessage passingGame theoryLecture/ConferenceXML
36:21
MereologyEmailResultantMessage passingFilter <Stochastik>Pattern languageWave packetGroup actionDefault (computer science)CodeTelecommunicationOperating systemPlug-in (computing)Reading (process)Machine visionStatisticsTerm (mathematics)Survival analysisMultiplication signMatching (graph theory)QuicksortSearch engine (computing)Form (programming)Universe (mathematics)Metropolitan area networkVideo gameSampling (statistics)
39:12
Message passingExpected valueFilter <Stochastik>Lecture/ConferenceJSONXML
39:52
EmailMultiplication signAreaGoogolShared memoryBounded variationCross-correlationJSONXMLLecture/Conference
40:58
EmailPrice indexMessage passingOptical disc driveQuicksortProcess (computing)Point (geometry)Generic programmingMultiplication signVideo gameWordDescriptive statisticsEvent horizonJSONXMLLecture/Conference
42:56
MiniDiscEmailCryptographyTelecommunicationSoftwareMedical imagingTowerPower (physics)WritingEncryptionReading (process)Proof theorySoftware testingComputer animation
43:55
CryptographyStrategy gameKey (cryptography)EncryptionBinary codeWrapper (data mining)Symmetric-key algorithmLibrary (computing)Cartesian coordinate systemConfiguration spaceSet (mathematics)Term (mathematics)Open setCellular automatonSubject indexingComputer fileLecture/Conference
44:35
Observational studyConnected spaceAddress spaceError messageTelecommunicationMultiplication signCryptographyElectronic signatureComputing platformTheoryString (computer science)Subject indexingPhase transitionChainInformationKey (cryptography)Right angleServer (computing)Installation artEmailMoment (mathematics)Distribution (mathematics)Group actionInformation securityWeb 2.0Formal verificationPower (physics)MereologyMessage passingDefault (computer science)FeedbackUser interfaceElliptic curveMetadataState of matterInformation privacyData miningElectric generatorSign (mathematics)Modal logicWeb serviceMassEncryptionLecture/ConferenceXML
48:34
SoftwareMetadataEmailPower (physics)Message passingLeakMultiplication signEncryptionVelocityLecture/ConferenceXML
49:16
Communications protocolPeer-to-peerMessage passingEmailWeb serviceLeakXMLLecture/Conference
50:05
EmailSlide ruleMultiplication signExtension (kinesiology)Field (computer science)Internet service providerBasis <Mathematik>Connected spaceFacebookCASE <Informatik>Computer programmingInternetworkingXMLLecture/Conference
51:18
Multiplication signMereologyType theoryLecture/Conference
Transcript: English(auto-generated)
00:00
And welcome with me, Bjarne Joruna-Einassen, he will present Mailpile, and let's go, I think. Can you guys hear me? This is a brief introduction to who I am, what my qualifications are.
00:21
I'm just a bachelor's in computer science, not a lot of education, but I've been doing Linux and stuff since the early days. And free software has been my passion for a very long time. As a result, I decided to dedicate myself to it full-time in 2010, and I've managed to pull that off one way or another.
00:42
My latest project, a project that I'm working with Brennan and Smaori and a nice community of people on, is called Mailpile. And in brief, Mailpile is an email client. There's been some confusion. We're using web-based technology for the user interface, so some people think maybe it's a server.
01:03
But no, it's not a mail server, it is a mail client. It's built on web-based technology, so we can use modern web design to make it look nice and make it accessible. It has a powerful search engine, which is actually how the project got started. And we're hoping to make it really easy for people to use PGP, both to sign and encrypt email.
01:27
Obviously, free software. We're currently in a weird dual licensing phase where the code is available under two licenses. We plan to drop one of those when our community has voted on what they prefer. And it's written in popular languages, Python and JavaScript.
01:44
I say it's a project to take email back. And that's sort of what I'm going to be talking about in the first half of this talk, is the motivation and history of the project. Not a long history, so it won't be a long talk. And then in the second half, I'm going to go into some of the technical details because I know you guys are all techies and that's what you're interested in.
02:03
So why another email client? I don't mean to offend anyone, but I feel like the state of free software email has been pretty crap for a while. There's not been a lot of development and the way we're doing things is kind of stuck in the 90s.
02:22
And I feel like we need to catch up because if the free software community is going to compete with Gmail and compete with Microsoft, we need to innovate and we need to provide a better experience to our users. And we just haven't been doing a good job at that. And if we don't, then we will never have mass encryption of email. Email will always be written on the back of a postcard while we're trusting the cloud and while we're using proprietary software.
02:46
So this is important. Email has been becoming increasingly centralized, in my opinion. People these days, when they think of email, they just sort of assume that it's Gmail or Hotmail. They just assume that it's someone running on someone else's hardware.
03:04
A lot of companies don't bother running mail servers anymore. And a lot of schools don't bother running mail servers anymore. Government organizations don't bother running mail servers anymore. And I think this is a bad trend. But it's going to continue if we don't provide better software. I don't like having email in the cloud filtering my spam for me because I consider that to be a form of censorship.
03:26
I would like to have a nice spam filter running on my machine. But that means I have all the data and I can find it if it gets lost. If Google filters something away, I have no idea why. I don't even know what happened. But they provide good service.
03:40
It's very cheap. All you have to do is pay for it with your privacy. And Edward Snowden has explained one facet of why that's a bad thing. Evan Moglen has talked about this and Richard Stallman has been talking about this for a long time. But it's not really cool to listen to him anymore, is it? But he's right.
04:02
Stuff in the cloud, when other people are running things for us, that's worse for our freedoms than closed sources. So I like to piss people off by saying that now Microsoft is our ally and Google is our enemy. Because honestly, at least Microsoft is letting you run the software yourself. When you put the stuff in the cloud, then they have your data.
04:21
They will very easily lock you in. It's very hard to move. You have this sort of natural monopoly effect that we've seen. The biggest provider always seems to win. Everyone gravitates there. So you have these massive centralized silos of data and it's terrible for privacy and it's bad for a lot of things.
04:42
There's also a risk of something which some people in the room are old enough to remember. Do you guys remember Microsoft's strategy of embrace, extend, extinguish? Show of hands. Oh, great. That's a good crowd. So for the rest of you, this is what Microsoft tried to do to the open web.
05:01
So they said, hey, this web thing is kind of neat. Let's give everyone a free browser. And into an explorer, it wasn't a bad browser. It was nice. But it came with this thing called ActiveX. And they encouraged developers in the Microsoft ecosystem to use ActiveX to make their websites more appealing, make them sort of cool and fancy and do neat things.
05:24
So they extended the web using proprietary technology. And they were hoping that by doing so, enough websites would move to using their proprietary version of the web that they would kill the open web because nobody else had ActiveX and it was a closed technology.
05:41
There's a risk of this happening with things like email if we let the cloud run it for us. And we've already seen sort of very small hints of things moving in this direction. And I'm sorry to pick on Google because I used to work there and they're nice guys, but the example I have does come from Gmail. Gmail has started to add features where if you send messages that are formatted in a particular way
06:05
and you're a Google partner, then buttons might show up in the email. You might be able to buy something within your email client and you'll be able to do stuff that you won't be able to do with any other mail client. So they started to extend email and make it into something that it never was before
06:21
and make it into something that is definitely not an open standard and is not interoperable with the rest of the ecosystem. So this is happening and it's something that we should be aware of. And then of course, email in the cloud is fundamentally incompatible with encryption. And the reason for that is these cloud providers, they all base their businesses on advertising
06:43
and they need to be able to target those ads and they can't target those ads if they can't read the content of your email. So if they don't know what you're talking about, they can't show you relevant advertisements, they won't make any money. So they're never going to encrypt your mail. I mean, aside from the fact that giving them all the keys is a bad idea,
07:01
they're just not going to do it. So I was worried about this, a bunch of other people worried about this, so we came up with this idea. Let's build MailPile and let's do it a little bit strategically. We want to achieve five things. We want to make software that the free software community enjoys hacking on,
07:21
that you guys like to play with, something that's accessible to the community. We want to build software that regular people want to use, so it has to be attractive and user-friendly and fast and all of those things that we expect from good software. We want to make email encryption understandable, so people will do it without thinking about it.
07:40
It can't be something that just the techies do. It has to be something everyone can do. And we want to make it easy to decentralize. We want to make it easy for people to take their data back, make it easy for people to run their own infrastructure and move out of the cloud as much as possible. And even if not everyone does this, this will have a noticeable benefit to the overall tech community,
08:04
because it will keep the cloud guys honest. If people can leave, then they will be kept honest. And finally, we need to find better business models, because one of the Achilles heels of the free software community is so much work is done by volunteers,
08:20
and we have very limited resources. We need to figure out ways to actually do this as a full-time job. So this is the sort of stuff that we're thinking about in MailPile. Timeline of the project so far. In 2011, I wrote an experimental search engine. I was able to search all my email in milliseconds.
08:41
It was great. And then I went back to my regular job, kept doing what I was doing. Nothing happened in 2012, so there was no slide. Frigate's there. But in 2013, things started to happen. My friend Smaori, he'd been harassing me and saying, hey, let's do something with this MailPile thing. It's kind of cool.
09:01
I'm using Thunderbird, and I don't like it. I want something better. And I kept saying, go away. I'm doing my job here. Leave me alone. But we bumped into Brennan, who is a user interface designer. We bumped into him in the hot tub at one of the pools in downtown Reykjavik. Had a nice chat. Didn't think any more of it. Bumped into him again at a cafe.
09:21
I think he was following us. You were, weren't you? We had some coffee. We had some beer. He looked over my shoulder and was like, what are you working on? And I was, I'm playing with this email thing. And 10 minutes later, he made a logo for the project, and we're still using it. So we decided to join forces.
09:41
We came up with this plan, and we decided to do a fundraising campaign. So we raised, we did an Indiegogo fundraising campaign that launched in August. We launched at the OM Festival in the Netherlands, which was fantastic. And we succeeded. We raised enough money that we can work on this full time for a year, and actually a little bit extra.
10:01
I think we're going to, the current budget suggests we'll get 16, 17, 18 months out of that. Depends a bit how Bitcoin develops. If Bitcoin keeps going up, then we'll get a few months that way. But since September, we've just had our heads down.
10:22
We've been writing code. We've gone to a couple conferences to talk, but for the most part, we've just been working. And 2014, we realized, well, we have to keep some of those promises we made back in August. So we've mailed out most of the things that we promised to people that backed us. We mailed out shirts and stickers and some rocks as well.
10:43
People that paid enough money got Icelandic rocks. And we're announcing our first milestone here at FOSDEM, which is the alpha version of it.
11:01
And we're one day late. Because, you know, it's a real software project. They're never on time. So what's in the alpha? The highlight of the alpha is that this is not vaporware. We actually are writing software. You can download it and you can play with it. And a big part of what we're doing is trying to prove that you can actually crowdfund stuff like this.
11:25
And it's not a risk. So keeping our promises is really important. We have a nice HTML5-based user interface. We have a fast search engine, which is not based on that much. And we'll discuss why later.
11:41
We have basic support for PGP encryption signatures. It's a bit clunky. It's new, but it's there. We have spam filtering based on the Spam Bay's Bayesian filter system. And we have over 30 volunteers translating mailpile into their local languages, which we think is just fantastic.
12:03
Low lights. Have to have some low lights to bounce off the highlights. This is an alpha. Don't expect it to work. Or, you know, don't expect it to work well. It's tricky to install and configure. At the moment, it is still very much for developers. We don't have IMAP and POP3 support integrated yet.
12:23
And the reason for that is I haven't... Well, we haven't sort of figured out how we're going to store all that data. Because one of the things we want to do is make sure things are stored encrypted. And we need to think about how we do that. We don't have support for SMINE. Don't know if anyone cares, but it was on the list of things we talked about.
12:41
And we don't have a compelling story about how we're going to do PGP key management yet. So handling keys expiring and keys being revoked and all that stuff. We need to put a UI on that that doesn't kill the normal user. And that's going to be tricky. There's lots of stuff. There's lots of work to do still.
13:01
You can get the source code. You can clone the alpha release. And from this point on, we're going to try and keep that branch sort of stable for people who just want to look at it. And we'll break shit and make things messy on the main branch. We have live demos. Because one of the things that we face here is we're backed by over 3,000 people.
13:24
Most of which are not particularly technical. So they're not going to be able to download the source code from GitHub. So we put some effort into making sure demos were up so they could go and see it and play with it. And they know what we're working on. And of course, I encourage all of you to check them out as well.
13:41
Oh, maybe I'll do it myself. So let's see if this... Do we have a new website? I'm refreshing. Hard refresh. Oh, hard refresh.
14:01
Well, I don't know if I even have internet actually. So this is the intermission of my talk. I'm going to do a little demo, show you what it looks like. And then I'm going to go through talking about some of the technical implementations. And I may use up all of my time and then some.
14:23
So I'm going to count on them to stop me. And I will stop so we can take some questions. Oh, look! Mailpile is now in alpha. This is our demo page.
14:41
And we can try it. So for the sysadmins in the room, we are running this on four VPSs that... Aww, why?
15:00
Try the dual stat network? Okay. How's the dual stack? Well, if it doesn't work, I just run it off local host. So that's how we do it most of the time anyway.
15:21
This is what the demo is supposed to look like. It's an email client. It's not going to do anything if I click on it because that's the live one. This is my personal mail, which is using a different... I think I've got it zoomed out. There we go. We can make it bigger.
15:44
Oh no, did that break too? The demo effect is teasing us today. If I read an email, you can read about Media Goblin, which is kind of cool. So you can read messages. You can see my cursor there.
16:02
Here at the top we have some icons. These are going to signify the state of encryption. So whether the message is signed or encrypted. And it pops up a tooltip to explain what's going on. This is the first draft. We're going to be improving this interface over time, but we have some things.
16:22
I can click on reply here. I can try and encrypt it, but it says there's an error, and the reason being I don't have the key for the recipient. Let's see if details work. Here I can try removing Chris because he doesn't have a key.
16:42
I'll just send my reply to Smawdy. Oh, it still can't encrypt. That's a bug. Oh no, look, there's this other guy here. Now it's encrypted. It bounced and everything.
17:01
So this is what we're trying to do. We're trying to make it really easy. One of the ideas is that if we know that we have a key for the person that we're communicating with, it will suggest that the mail be encrypted by default. We know that's going to cause problems because a lot of us are using multiple devices and maybe only one of them can actually decrypt mail.
17:22
We're also looking at ways to communicate preferences so that if we see that we always get encrypted mail from this person or always get signed mail, then it's safe to encrypt, but if sometimes we get a message that's not, maybe our default policy for that recipient is to just sign. That's stuff that we need to work on.
17:43
This is a search engine. I could search for Linux Iceland conference. Let's see if I have any email about that. My mailbox has about 160,000 messages in it.
18:02
This is running on a relatively crappy laptop with a spinning disk. That was reasonably fast, right? This is what we're doing. It shows tags here. These are all tagged with inbox. I can browse the tag. It's trying to show the conversations.
18:23
I think if I do that, it will filter it and only show me the ones I haven't bothered reading yet. It's an email client. It's making progress. We have a view of contacts. Right here, this is pulling contacts out of my GPG keychain.
18:41
One of the core ideas we have is we want to integrate key management with contact management. There shouldn't really be a difference. You shouldn't have to go to some separate application to manage your keys. You should be able to manage them along with everything else to do with the contacts. We're pulling pictures of people down from Gravatar.
19:03
That's where those images come from. We'd like to pull pictures from other sources. We'd like to grab them from Facebook if we can, but I don't know if that's going to work. That would be cool. We have various plug-ins for importing. I'm going to repeat the question.
19:20
He asked, how about CardDAV? We have various plug-ins for pulling in contacts from other sources as well. We can pull in contacts from the Mork database, which is how Thunderbird stores its contacts. We have a CardDAV plug-in. I haven't tried it myself. These things are all under development, but they're definitely going to happen.
19:41
You can help. Here should be a list of my tags. It gives me some stats. Here's where we bank for money so we can keep working on this. Please, please send us all your money. It'll be great.
20:01
That's the web interface in a nutshell. I know I have a room full of techies here. Keyboard shortcuts? Not yet. We definitely have them soon. It's just a matter of time. We do one thing at a time. The backend to this is written as a search engine.
20:22
Every time you hit an API... It's written as a REST API. Every time you hit an endpoint, you'll get JSON back. You can actually just request that instead of looking at the inbox with HTML. This is the same data. A lot of data, actually. This is all the information about those search results
20:42
that were visible. It's got these big blobs here of that there. That's actually someone's thumbnail, an image. It's pulling back a lot of stuff. You can interact with this programmatically. We're hoping that will lead to some interesting sort of plugin type things.
21:01
Question there? Did you say that code reads the mock database format? Smawde did that. I think he pulled code from somewhere. I'm going to repeat that for posterity. You paste it on a Python translation of Perl script.
21:24
Apparently, it's ugly. I showed you guys the full composer. I showed you the reply one. This is what the full composer looks like. It has the same attributes as the other. I can choose profiles. I can say I don't want to sign this message
21:42
and search for people. This is searching in my personal list of addresses. Any address that I see is incoming mail. It shows up as a suggestion there.
22:01
The ones that actually have keys, as I said, they get a little lock. Hopefully, that will encourage people to use those ones instead of the others. A slightly geekier demo. If I go for this one here. This is the command line interface to MailPile.
22:22
How? No. It's got a bunch of commands. You can search for things. Some scary messages.
22:52
There's various things that you can do. It's a relatively powerful search engine. Could you repeat the question?
23:08
Yes. I'm actually going to get into that in a moment. He's asking if all these mails were offline, if they were stored locally. Yes, we are encouraging people to store things locally. We want to take things out of the cloud and decentralize.
23:21
That's one of the goals. Yeah. I know people want to sync with IMAP. We just have to go through how we do that in the same way. I'm going to go back to my slides and tell you all about the technology behind this. Or some. How does MailPile work?
23:40
This is your chance to run away and escape. The overall architecture of the project. The guiding principles are, this is free software, we're going to use open standards, and we are proponents of decentralization. We believe users should own their own data, and we should help them do so.
24:00
Searching is a critical feature. It's not enough that people have to organize all their mail into folders because even a regular user has thousands of messages. Someone like me who's been on the internet for well over a decade has a couple hundred thousand. Do that in a bit.
24:21
Our main user interface is the web, but we're going to allow for alternate ones as well. We want to encrypt by default whenever possible, and we have the expectation that our end users are not the people in this room, they're non-technical people. People sometimes ask me who I'm writing this software for, and I would like this software to be the answer to the question,
24:44
when someone comes to me and says, I'm worried about my privacy, what can I do? Because today, we don't have a good answer to that. Someone comes and says, I want my email to be private, we're going to say, install Thunderbird, learn to use Enigmail, and that's kind of hard to do.
25:01
And Thunderbird isn't even being maintained anymore. Actually, it is being maintained, but it's not being actively developed. Getting into the code, we have a Python core, which is where we handle things like configuration, search engine is part of the core,
25:20
reading and writing and sending email is part of the core, crypto, the HTTP server, and we have a plugin architecture. It's not stable yet, but we do want to be able to develop those things as plugins. And we're already doing so. We support multiple mailbox formats, and that's starting to become pluggable.
25:40
Importing contacts, as I said, that was written as a plugin from day one. Setup and systems integration, so when you install MailPile in a new machine, and it goes and it finds your mail.app, and it finds your settings, and it figures out what to do, that would be plugins that people write for different operating systems. And we have plugins that allow you to tweak the search engine itself,
26:02
so you can teach the search engine to read your mail and create new keywords, or to create complex queries based on something that makes sense to you. We have a web API. I touched on that briefly in the demo, where you can get things back as JSON. Most of the URLs that people see in their web browser,
26:20
they map directly to a Python method, and that Python method, by default, if you hit as an API endpoint, which has JSON, if you hit it using a regular interface, it'll give you HTML, but we also provide HTML, which is embedded in JSON, which is useful for doing AJAX-y type things, and there's plain text, and we might do XML or whatever,
26:42
if there's a demand for it. The HTML that we do render is rendered using the Jinja 2 templating engine. The interface itself, there we're using HTML5 APIs, we're using jQuery, Bootstrap, Less, all the modern tools.
27:02
We want to do progressive enhancement. We would like the site to work without JavaScript. It doesn't quite do that yet, but you can use it in a read-only mode. And we want to have responsive design so it scales to different side screens and mobile devices and so forth, because honestly, it's going to be a while before someone develops a mail pile app for your phone,
27:23
but we want the web interface to at least look nice in a phone if you've exposed your mail pile to the public web. And it will be themeable and skinnable for people that want to do that. There are some alternate interfaces to the app. I showed you the command line very briefly. There's also a Python interface
27:41
where you can say import mail pile and instantiate a mail pile instance and interact with it programmatically. We use that for testing, and it could be used for various other weird things. We might support XML RPC just because that's really easy to do in Python. And I would love it if someone ported the user interface of MUT
28:00
so I could use it on mail pile. That would be cool. And details. How does it work? So I get the question, why are we not using X, Y, or Z all the time? Why did we write our own search engine? And the honest answer is because that's how the project started. I was just curious, can I write a search engine?
28:21
So I went and I did. And it works, and it's fast, and it's under 1,600 lines of code at the moment. So I don't really feel any need to change. The main benefit I see to this is that it's very, very simple. I can explain it to you now, how it works, and you'll understand it. And I consider that to be a huge benefit.
28:42
And I don't have that visibility into other tools. We want to make sure that this gives people a good experience that's customized for email. We want to be sure that we're storing all of the data encrypted on disk. We want to have a lot of control over how this behaves. So having a simple, small code base
29:00
that we wrote ourself is very appealing. And it also means we don't have another dependency when we start packaging this and shipping to users. So how does it work? MailPile reads your mail. Out of the message, it will generate an ID for it. It will extract some metadata, things like the subject, the recipients, who sent it, size of the message, that kind of thing.
29:23
And it will extract a bunch of keywords, which are the things that you can actually search for. It will then create posting lists. These are basically maps that map a keyword to a set of message IDs. These are just pretty simple files. They just have a list of keywords,
29:43
and then there's a map to some IDs. Then there's a metadata index, which maps these same IDs to the metadata, which tells you something about that message. Metadata index is something that I store in RAM, and that was realizing that I could do this, realizing that email was now small enough
30:02
and our computers are now powerful enough that I can put all that data in RAM and nobody's going to notice. That's what started this project, because when you can do that, all of a sudden everything becomes really fast, because most search queries can be answered by opening one small file, reading it, and then looking up all the other data in RAM.
30:20
That means I can answer any search query under 200 milliseconds on crappy hardware. If you have good hardware, it just gets faster. This is how you would implement this if you were writing code yourself. In my implementation, you get the message IDs for a given keyword. You generate a file name by hashing the keyword.
30:42
You open it up, you parse it, and you return a set. In reality, I'm grouping some of the keywords together so that I don't have an infinite number of files. There are some problems with this. Adding things is easy. Deleting them means I have to go and open all the files and remove stuff, so that's a bit messy.
31:01
The metadata index, it looks like that. It's just a dictionary. You have an ID, and it maps to a bunch of attributes. What we put in the metadata index is all of the data that we need to generate search results. When you search for potato, I get back a list of messages, and I can find the subject lines, senders,
31:22
all the details that I need to show you the list. I can show you the list immediately. That's how we get a good experience when you're searching. The metadata we're currently storing encrypted on disk, GBG encrypted. Posting lists are not yet encrypted, but they will be.
31:42
Putting those two things together, that's a five-line search engine. Start with a set of all of the message IDs that exist. For each keyword in the query, you narrow down the set. You just look up the message IDs that match that query, and you do a set intersection.
32:02
Then when you've done that for all of the terms that people are searching for, you go through the results, and you print them out. That's the core of the search engine right there. Building on top of this, we added tags, which are similar to the labels that are in Gmail.
32:22
These are basically search terms where you can edit the result set. You can add and remove message IDs to this keyword, and that allows you to do all sorts of things. That means you can create an inbox, and you can mark certain messages unread or spam or whatever. Now we have things that do that automatically.
32:42
You have filters. This allows you to start organizing your email in a structured way. You have plugins, as I mentioned before. This allows you to make new rules for how to generate keywords. I'm really hoping someone will write a plugin to parse PDF files and extract the text so I can search for those as well.
33:02
Someone could write a plugin that understands the grammar of their language or understands their particular field of specialization, something that parses. One thing that would be cool is to parse messages that come from Ryanair and come from EasyJet and extract the information about the flight and make that into a search keyword.
33:24
There's a lot of things that can be done there. Then there's the other side. Instead of generating keywords, you can create magic keywords that are dynamic. The way you search for dates is implemented as one of these plugins. You can search for dates arranged 2010 to 2014.
33:40
That's not a keyword, but I can create keywords out of that, and that's what this plugin does. It will actually tell you to search for date 2010 or date 2011 or date 2012, that kind of thing. Of course, plugins can also just use the search engine to generate interesting views, figure out things about your email.
34:03
Tricky bits, building a search engine like this, is mostly to do with actually reading the mail and generating useful terms, so which keywords we put in the index and which keywords we don't. Then, of course, there's the constant optimization battle to make it fast and make it not too big and so forth and so on.
34:23
Works in progress. We are not able to delete things from the search index right now. That's a bit of a limitation. We'll fix it. The encryption needs work. My security people tell me that it's not good enough as it is, and it would be nice to have a better query language.
34:42
The current query language is pretty stupid. I'm going to pause, actually. Do people have questions about the search engine before I move on? You have a question. Oh, yeah, that's in the encrypted part.
35:03
Those are decrypted. The question is, what do we do about encrypted mail? The answer is, if you tell MailPilot it's allowed to, and this has to be a setting because not everyone is comfortable, MailPilot will just decrypt your email when it sees it. It will index it just like anything else.
35:22
That's one of the reasons that the search index needs to be kept secure because I believe that encrypted email will not be usable unless you can search in it, so this feature needs to work. But in order to do so, we need to be able to encrypt the search index, we need to be able to encrypt the metadata, and we need to do so in a way so that we're not leaking
35:42
the contents of these private messages. Anybody over there? I'm about to talk about that. But yes. Up there? Actually, if it's not about search, let's do it afterwards.
36:08
I'll try and leave some time. How am I for time? Oh, okay, well, that means that we don't talk about crypto. But let's talk about spam filtering because it's cool.
36:22
So how does it work? Mailpilot spam filters are based on statistical analysis of the mail. So we want to have engines that read the mail and give us some sort of result. By default, we're using spam base. Messages that match are auto tagged with a spam tag, and then messages that have that tag are by default hidden from search results.
36:44
So the spam isn't in your face. We train the filter, or we train the Bayesian engine by looking at the user's behavior. So going in a little deeper, statistical analysis is trying to answer the question,
37:01
what is spam to you? Because not everyone has the same spam, and not everyone has the same idea about what is spam. We do that by feeding the same keywords as go into the search engine, into a Bayesian filter, and it will then hopefully be able to classify the mail into its spam, or it might be spam, or it's definitely not spam.
37:23
And by default, we use spam base. The training is the important part. If we get this wrong, then the spam filter will perform very poorly, and this is where we need to work a bit more. But what we're trying to do is we're looking at how the user interacts with their mail. So if you take a message and you drag it to the spam folder,
37:43
that's a pretty clear signal that that's spam. And if you read a message and then you reply to it, that's a pretty clear message that that message was not spam. So we're using things like that to assemble a corpus of messages for training. And my vision for doing this is I want to train relatively frequently,
38:03
because I believe that both your communication patterns and the patterns of spam are going to change over time. So I do not expect your spam filter to match your spam from two years ago, but I would like it to match the spam you're getting this week. So Mailpile is tracking these actions.
38:21
When you click on a message and read it, that message gets tagged as having been read. When you reply to a message, it gets tagged as this message was replied to. And when you manually organize things, you move them from one tag to another, it tracks that as well. And that allows us to choose messages for training.
38:40
We also have bacon filters. Turns out nothing that I just said has anything to do with spam. It's just how you organize your mail. So what we're interested in doing is using the exact same plugin and the exact same code and use Spambase as a general-purpose mail classifier.
39:02
So you can go to some tag that you've just created, say this is the FOSDEM tag, and you can just click AutoTag. And then over time, you've put some messages in that tag, you haven't put other messages in, Mailpile should be able to learn which messages belong under the FOSDEM tag and do that for you.
39:22
I hope it works. I think it's cool. Yes, I think I am... Should I go through one more? This one's kind of long. I think people want that one. Okay.
39:40
Did anyone want to ask about the spam stuff? Yes. Basically, the baseline expectation from the user is going to be the Google experience in spam filtering. But of course, because you're only using one mailbox, your corpus is way smaller than whatever Google is using,
40:01
and so you don't have the cross-benefits. So I'm wondering, is Baze actually still working against modern spam? So the question is whether Baze is actually good enough on its own. And you raised the question of whether we'll be able to compete with people like Google that have access to lots of other people's mail and can cross-correlate.
40:23
Time to tell. I think Smauti has been using Thunderbird and its Baze and spam filter for a while, and he's reasonably happy with it. But this is an area where I do expect that we're going to need to iterate and develop, and if necessary, it might be interesting to look into ways that users can collaborate
40:41
and share details in a decentralized way. That's a big, hand-wavy project, so I'm not going to promise that we're going to do that. But yeah, this is a valid question, and I don't really have a clear answer for it. Yeah? What if a spammer starts GPG encrypting their spam email sending to you,
41:04
and then you won't be able to do the spam filtering until the user opens the email message, because then he decrypts it? So I think what you're asking is what happens when spammers start encrypting their mail? At that point we win, because at that point Google can't filter it, but we can, because we're doing the spam processing after the mail has been decrypted.
41:23
So that's one of the reasons that in an encrypted world you need to move the spam filter out to the edges to where the users are. Yes? As an administrator, I sometimes receive mail forwarded by users who get frightened by spam
41:41
and wonder, oh, how should I react to this? So forwarding is not always an indication of actually a ham. Sometimes yes. What is this? Yeah, that's true. What that actually means, though, is that that's an interesting message. The odds are the user is going to forward that to you, but he's also going to put it in a spam folder.
42:04
And that means I have a very strong signal that that was interesting spam. And you know, I'm perfectly happy with that. It works both ways. So I'm going to move along, talk about crypto. There's one up there.
42:22
You're asking how can a friend of yours benefit from your spam training? I'm not sure. I'm not sure that he can. It depends on whether your spam is similar enough to his. And one of the things that we need to look into before we ship this is whether there's value to building a sort of generic pre-trained filter and shipping that
42:43
and then having that learn over time and how we would do that. But that's something that we don't even start looking at until next summer or even later. But it's a good question. I don't know. So, crypto. Where should we do crypto? I'm not going to say we do, because it's not all implemented yet.
43:02
But this is what we're talking about. We want to encrypt the data at rest. So all the data that MailPile generates should be stored in a safe way. I know some people have full disk encryption, but not everyone does. I would like people to be able to store their mail piles on USB sticks and just carry them around.
43:22
There's many reasons for doing this. We're using crypto when we're reading, writing, and sending mail. We're using HTTPS, obviously, wherever appropriate. We would like to anonymize communication with the external network. So when MailPile goes off and downloads those images from Gravatar,
43:42
it shouldn't really be telling Gravatar that I'm talking to these people. That needs to be anonymized in some way. We might want to implement some of the proof-of-work hash-cache things, which relates to one of the other crypto ideas that comes up soon.
44:01
Data at rest. Our strategy there is to just shell out to the existing tools. We're going to use the GPG binary or the OpenSSL binary or wrapper libraries and use that to encrypt basically everything. Application settings, contacts, search index, etc. Using GPG is relatively slow, so we're probably only going to use that for the config file.
44:27
That will then contain a key that is used for symmetric encryption via OpenSSL, because OpenSSL AES encryption is quite fast. That's our strategy there.
44:41
Reading email. There's the basics. You parse the PGP mine. You call out to GPG to check whether the signature checks out and decrypt things as necessary. But we would like to be able to plug in other things. We would like to be able to plug in other crypto systems, because the crypto community is developing all these things,
45:02
and I'm not sure they have a good platform where they can actually play with them. I would like to be able to plug in some other friend, Steph, who wrote something called TBP, which is pretty bad privacy. It's based on elliptic curves, but does similar things to GPG, and it would be cool if we could just plug that in and let people play with it.
45:20
As I answered earlier the question, decrypting can be done during the indexing phase, so encrypted messages can be searched. When we do that, that means that we bubble up information about the state of the messages into the metadata and into the search index. You can search for messages that have bad signatures, or the UI can give visible feedback,
45:42
because everything that the UI shows you is based on what's in the metadata index. That's important plumbing, and that is mostly working as of this month. Writing email, again, generate PGP MIME, use GPG to actually do the signing and the encrypting.
46:04
We want to do best effort security. We would like to encrypt messages whenever we can. How we figure out whether we can is an open question. We'd like to sign by default. There's really no reason not to. We need to give the user the power to change these settings,
46:22
but for the most part we want to do our best. We would like to use trust on first use, so Tofu for key verification. That's like SSH, where the first time you see the key, you just say, okay, this is probably the right key, and then you only complain if it changes in the future.
46:42
This is because the Web of Trust is very complicated. I'm not sure we can put a user interface on that, to be honest. The Web of Trust is also leaking private information. The Web of Trust is revealing to anyone who cares to look at who's talking to whom and who knows whom,
47:02
how we're connected. A lot of people consider that to be such a massive privacy problem that it just should not be used. This is how we're heading right now. You're welcome to talk us out of it. We listen to feedback, but this is what we're talking about at the moment. We're probably going to distribute keys in an ad hoc way,
47:22
including just attaching them to outgoing mail. Sending mail, that's a fun one. We should use TLS when possible. That's been in SMTP forever, and obviously we should use it. But we also have this idea, SMTorp, or SMTP over Tor.
47:40
This is when MailPile starts becoming a mail server. We built a very simple SMTP server into MailPile. We shipped MailPile and Tor together, so an end user installs MailPile, they also get Tor. People know what Tor is, right? It's an anonymity tool.
48:01
It anonymizes communications, and it's also a darknet. You can register what's called a hidden service on Tor, and people can connect to that, and the anonymity of that connection is strongly guaranteed. What this means is you get an email address, which looks really ugly. It's your name at some long gibberish string.
48:23
I think that they very helpfully made those strings longer recently, like 80 characters or something. It ends in .onion. But what that means is that when your MailPile is connected to Tor, one MailPile can deliver email directly to another,
48:41
and it doesn't have to go through any outside relays. It never leaves the encrypted Tor network, and we have suddenly closed the metadata leak that people are complaining about with SMTP, where the NSA is just listening to who is sending who messages all the time, because a lot of the time that happens in the clear.
49:01
This is a different problem from the lack of encryption of message headers in PGP. This is the actual SMTP dialogue, where you say mail from somebody, RCPT to somebody else. That's always in the clear unless people are using TLS, and TLS is not widely deployed. We're not widely deployed enough.
49:21
This closes that leak completely, and it allows emails to suddenly become a peer-to-peer messaging protocol without inventing any new protocols. I love that. We don't invent anything. We just assemble things. By building on SMTP, instead of inventing some new protocol, that means we can use the existing infrastructure.
49:41
We could use something like Postfix or Exim or something. Put that on a hidden service instead, and do relaying if your mail pile is not online often enough for the peer-to-peer solution to work for you. Question about that. What are dark mail doing, and is it relevant? I have no idea. The question is, what are dark mail doing, and is it relevant?
50:03
From what I've heard, dark mail are doing something clever with SMTP, which is Jabra. In my mind, that's not really email. I don't know if they plan to be backwards compatible with the world of email. They'll just have to answer those questions. I don't know. But if they publish something, we might well implement it.
50:24
If it's good, we'll implement it, but we just have to wait and see. I think this is my last crypto slide. HTTPS, that's a no-brainer. We should be using it all the time. I would like to anonymize the traffic by shipping with Tor.
50:41
We can start downloading from Gravatar. We can download from Facebook and here and there without revealing who you're talking to. In some cases, we will also be able to anonymize the outgoing SMTP connections. There are providers like Gmail, which is one of the things they do really well. They explicitly allow incoming connections from Tor.
51:01
They do this because they want activists to be able to communicate. So there are some providers that are known to behave well and we should be able to connect to them over Tor instead of going through the clear text internet and revealing data. So that's it for me. Yay, we have an alpha.
51:28
Can we take some more questions? Do we have some more questions? General type. Please, hands up so I can have the microphone maybe. Looks like people are running away. Maybe we should just gather and talk.