We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Seeks Project: Let's take back Websearch

00:00

Formal Metadata

Title
Seeks Project: Let's take back Websearch
Title of Series
Number of Parts
64
Author
License
CC Attribution 2.0 Belgium:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Abstract
Conventional websearch is a solitary and locked place. Searchers are forced to interact with remote proprietary black boxes with no possible feedback, and are isolated from each others. There's an estimated billion queries issued everyday by several hundred millions of users. Those queries are grounded into real life events and activities common to many searchers. However, users cannot share their experience and build the results together. Seeks develops a decentralized architecture for enabling collaborative websearch. This short talk intends to outline the Seeks architecture, and to show in simple words how Seeks allows its users to issue queries and get answers from other users while not revealing the true query itself. Seeks is an open peer-to-peer architecture and a free software application for enabling collaborative websearch. Its specific purpose is to regroup users whose queries are similar so they can share both the query results and their experience on these results. On this basis, Seeks allows for true real-time, decentralized, websearch to emerge. In the long term, there is no need for web crawlers and third-party web indexes as users ca npush content directly to the groups of seekers. By hashing queries, Seeks offers a unique solution that allows to issue queries and get answers while not revealing the true query itself. In its present state, Seeks is a free software meta-search engine that can be deployed on local or remote machines. Seeks supports a fair set of options, from thumbnails to more advanced and experimental features such as content similarity analysis, automated clustering and personalization of results. While in its early phase, Seeks aims at demonstrate that there exists a path toward collaborative, efficient, free software powered websearch on the Internet.
InformationQuicksortSoftwareRow (database)Web 2.0Connected spaceSoftware developerHacker (term)FreewareProjective planeComputer animationXMLLecture/Conference
Computer networkSoftwareConnected spaceRow (database)MassDataflowCategory of beingComputer animation
Computer networkConnected spaceVirtual machineAlgorithmFeedbackQuery languageBuildingDatabaseThetafunktionResultantRight angleQuery languageGoogolType theorySoftwareVirtual machineMassContext awarenessPower (physics)Content (media)Category of beingWordExpert systemWeb 2.0Black boxDataflowInternet forumLevel (video gaming)Form (programming)VarianceSearch engine (computing)Characteristic polynomialProfil (magazine)BitBuildingOperator (mathematics)Point (geometry)Multiplication signComputer animationLecture/Conference
Similarity (geometry)Point (geometry)FeedbackQuery languageReal numberShared memoryWeb 2.0Similarity (geometry)Right angleSinc functionSearch engine (computing)Computer animationLecture/Conference
Computer architectureOpen setOpen sourceGroup actionDirected setCollaborative filteringReal numberQuery languageAutomationRankingContent (media)Web crawlerContent (media)ResultantReal-time operating systemBlogType theoryBusiness modelAlgorithmGroup actionPoint (geometry)Similarity (geometry)RankingSearch algorithmWeb crawlerWeb 2.0FacebookElectric generatorQuery languageService (economics)Nichtlineares GleichungssystemCode2 (number)SoftwareGame controllerFreewareCartesian coordinate systemLogical constantSearch engine (computing)Slide ruleCircleComputer architectureCollaborationismProof theorySinc functionComputer animationLecture/Conference
Content (media)Web crawlerGroup actionInformation privacyQuery languageMenu (computing)FreewareComputer-generated imageryRankingOpen sourceMeta elementVideoconferencingSummierbarkeitComputer networkServer (computing)InformationFree productVertex (graph theory)Software developerMachine learningUser interfaceReduction of orderPhysical systemLocal GroupScale (map)Information securityDatabaseTask (computing)Electronic mailing listProcess (computing)MathematicsSoftware developerGoodness of fitWeb 2.0InformationProfil (magazine)Computer architectureInsertion lossSimilarity (geometry)SoftwareHookingQuery languagePeer-to-peerMetasearch engineFreewareWeb crawlerContent (media)WeightMultiplication signServer (computing)Search engine (computing)OnlinecommunityMachine learningInformation privacyVirtual machineResultantUser interfaceAsynchronous Transfer ModeTask (computing)Subject indexingTouch typingBasis <Mathematik>Different (Kate Ryan album)Hash functionGroup actionBuildingPoint (geometry)Projective planeWebsiteTable (information)Right angleLaptopInformation securityVideoconferencingDatabaseMetreLine (geometry)Real-time operating systemComputer-generated imageryComputer animationLecture/Conference
Point (geometry)Multiplication signComputer animation
XML
Transcript: English(auto-generated)
I'm going to give you a very short and brief overview of the six projects, and what you'll get is a sort of one path to possible take back of web search, the way, you know, Firefox took back the web.
This take back is a long, long work. It's a hard work. It'd be difficult, but that's the path that I know, and that's the path that I work with, with a bunch of people. So this is what I'm going to show you, and maybe you'll be convinced that the first software community, and us, and hackers, and developers, blah, blah, blah, we can do something about it.
Okay, so as you know, we have something fantastic at our fingertips. We have the most connected human-built network of recorded history. It's the internet, and on this network, we have basically a massive flow of data, and this data, we can categorize them into two categories.
Human-generated and human-targeted data, and machine-to-machine data. So in this context, what is search? Search is this basic operation for
humans to look up some human-targeted content throughout this massive amount of data. Search is this fascinating characteristic that we all depend on it. Most of the time, we ask something to
Google or Bing or whatever search engine before we even ask our friends, so we're really dependent on these engines. And also, the level of technique that is required, and the usage that is required by users to use this engine has low variance. What it means is that whether you're my mother or you're a power searcher, so-called power searcher,
the tools that you're using, the way you're using Google or any other engine, it's almost the same. The power searcher is maybe a little bit better, but it's almost the same. So what happens is that we are all dependent on search, and we all use it, and if you're lucky
to not have understood or seen this before, as users, we're locked up, and we're locked up in several ways. First, because we're interacting with black boxes, proprietary black boxes, which means that basically you've all experienced this, typing in some keywords, getting some results back, you're not satisfied with the results, you change the keyword, you get another query,
and so on and so on, and you iterate that until you get satisfied. And sometimes you just get garbage, right? And the machine will never tell you that you got garbage, because the machine will never tell you that she doesn't know. But it doesn't know, actually. So you can't give feedback. That's the first lockup. The second lockup is very important to me. It's basically we're all doing the same queries over and over again in our own corner.
We're isolated. If I look for something and I read a form, and I find at the end of the form that the answer to my question is not there, the hundred or the thousand people behind me who are going to do the same query, well, they'll do the same thing.
They'll go through the form and read it again. So we're isolated. The third lockup is that we're all experts in something. We're typing queries to these search engines, and we're all experts in something. But we distill our queries to these search engines, and we keep nothing on our machines, right? So we don't build profiles of expertise that should be ours.
So the point here is truly web search is lagging behind, and there are reasons for this. I'll try to tell you why. But the way I can convince you that it's lagging behind is that we're millions, even billions of users, and every day there is an estimated one billion queries being issued on the search engines.
And of course these queries, they're rooted in reality, right? We're looking in the infosphere for things that are real. Recipes, medicines, whatever, right? But we're looking for something real. And since we're looking for something real, many of us, we're looking for similar things, right?
But we cannot interact. There is no feedback. We cannot share our queries like we can share something else. So what is SIX? SIX is an open architecture, and it's a free software application that tries to fill the gap in web search by making it collaborative, making it social.
And the main idea is that a social web search is inevitable. It will happen. Whether SIX succeeds or whether some other people come up with better ideas, it doesn't matter. In the end, we'll do search together, okay? The way we do some other things together. The idea behind SIX, the main idea is very, very simple. It's very natural, too.
It is to regroup users who perform similar or identical queries so they can share and collaborate on the results. That's all. This simple idea has many interesting consequences. So I'll try to describe SIX with just this slide, very briefly. Basically what happens is that SIX regroups people in real time in so-called search groups around similar queries, okay?
So people who type similar queries, they get regrouped in search groups. These search groups, these people in the search groups, they can work together. So we give them collaborative filtering tools to work together. So these tools, they allow them to exchange results automatically or actively by
clicking, to recommend results, to toss results away, to do a lot of things. So basically it gives the ability to these people to process data. And the third point is very important, is we allow some users to push some more data to the search groups for them to process.
So let me give you an example. If you have a blog and you write a new entry to your blog, what you're going to do is you're going to devise a few queries, at least one query that describes your new entry of your blog. You're going to do that, you know, type in that query into SIX. You're going to be regrouped with people who did similar queries. But instead of looking at the results that you get from the search engine, you'll push your content.
You'll say, hey, this is a potential new result for you guys. If you've been lying, that means if your query is not in an equation with the content, you'll get tossed. People will say, okay, that's garbage, we don't want it. If you do this to groups, the filtering, and the publication to the groups, you can build a real-time, collaborative, and fair search engine.
Fair because anyone can push some data. Okay, here's the important thing. Why do we need to do it? If social web search is a simple idea, why doesn't it exist? All these big companies, they could do it. There's one reason.
The business model of these companies, right? The existing engines, they feed on user personal data in exchange for the service that they give you. This data, they use to target advertising and generate revenues for these engines. If you let users share their queries and be clustered into groups of people with similar interests,
and if you let people push them some content, give them some more content and results like advertisers, then these advertisers don't need to go through the existing paywalls like AdWords or some others. They don't need to go to these centralized entities and pay the fee to be able to push their results.
So basically what happens is that these existing search engines, because of the business models that they have, that work. But because of these business models, they will never do it. And the proof, the fact is that today, with things that you may like or don't like,
it doesn't matter, like Facebook and others, they're still lagging behind. They haven't done it. And the lockup is this. It's because they simply cannot do it. Six builds on a few principles. I will outline them very quickly. The first one is if you want to take back web search, you've got to give more control to users over the search algorithms.
What Six does is that it regroups the people so they're together. Once they're together, they can work together and they have more control over their results and the algorithms. The second thing is transparency. Transparency means that you're able to modify the equations, the way the ranking is done in the engine freely.
You do not have to rely on proprietary code and equations and let your results be ranked by them. So the way Six does this is that it's a free software. It's one of the reasons why it's a free software.
It's a free software, and so it lets you change the code as you want, and this does not impact the other users. Also, the third principle is fairness in content deployment. By letting people push some results to the search groups, so people who search similar content as they do,
this removes the crawlers so we don't need to copy the web every five minutes again and again. So it removes the crawlers. We're pushing more data to the groups, but also it's more fair because anyone can push some data. We're not dependent on the SEOs, or if you have more money than others,
you can make your website better. So it's well recognized by the existing crawlers. Finally, one of the most important points is that Six takes the user privacy very seriously, and for this reason, it builds a decentralized architecture.
Six relies on a peer-to-peer network. The way it works is that you install Six on your machine. It can be a desktop, a laptop, it can be a Shiva plug. It works on all these devices. You install Six on your machine, and it will learn from your behavior and navigation, and also from your queries that you're typing in every day, and it will build a profile for you, a filter through which you look at the results
from other search engines or whatever feeds you're using. These profiles over the peer-to-peer network exchange fragments of information through the search groups to help each other improve your results and get some recommendations and so on. So it looks just like another search engine.
You can use it every day. You get no loss of results, like quality of results. It uses existing engines to get results. You can use whatever other feeds you want, and see the little stars and the recommendations on the right? That's the information that is shared by the users.
Here's the roadmap. The first step was to build what seems in my knowledge is the only free software meta-search engine. This is done. We can search for text, image, videos, tweets.
There are some nice tools built in, and we're done with this. This is something you can install on your own laptop or on your desktop. The second step, which is coming up soon, is the rising of a peer-to-peer net that connects all these six nodes together. So whatever you're doing on your machine, if you let it do so, it will have an impact
to other people searching similar things as yours. This impact will be proportional to the similarity of your profile to some other user's profile. The third step is that you let, on this network, people push some more content to the search groups, freeing ourselves from the existing engines,
and becoming more real-time. The fourth step is like the step for the end of time that we'll probably never see. It's interesting, though, to have it in mind. When you wake up in the morning, you know where you're going. If you let the web servers index their own content
and build queries that describe their content and hook them up on the search groups, and push this information to the search groups, then you free yourself completely from the crawlers. So that's for the end of time. We can move gradually this long path toward this end. But this end should work.
There's no technical reason why it wouldn't work. And in the end, we would not re-index the entire web. Let's just be more active in the push receive mode. If you're interested in the project, you can test it. You can test some public notes, 6.fr. There's some others. You go on the website, 6project.info.
They have a bunch of six notes that are maintained by volunteers, and you can play with them on a day-to-day basis. I use it on a day-to-day basis, and I'm a hardcore searcher, and I have no problem. The quality of the results is, in my opinion, even better than other engines. You can get in touch with us on IRC. We have a bunch of interesting tasks that need to be addressed.
So I'll give you an overview here. If you like peer-to-peer network, if you like distributed hash tables, if you like network security, there are a lot of interesting issues we need to address. Database, also replicated databases. We move slowly but surely,
and we need a lot of different competences. The second point is machine learning. If you like machine learning, if you want to learn more about machine learning, and if you're willing to participate, please come in. There are a lot of interesting tasks to be addressed. Eventually, what you can do with SIX is that,
as it is a free software, you can build a user interface of your dream. You don't need to rely on a user interface that the existing engines give you a change whenever they want. You can build whatever you want. I'm a C++ developer. I can't really design UIs, so I consider this is a job for the community and users
to do whatever they want, try whatever they want. If it fails, it fails. If it's good, it's good. Okay, so that's all. The point being, come and help us if you like the project, or try to use it and see if it's any good for you. Thank you.
Thank you very much. Oh, wow. Do I have time for questions or not? Not really.