Merken

Your search doesn’t work

Zitierlink des Filmsegments
Embed Code

Automatisierte Medienanalyse

Beta
Erkannte Entitäten
Sprachtranskript
Chinese
great and all of you made it out of your bed after some nice employee last evening things like some of some other people didn't do that and so today we're going to learn about how to make search work and input in particular how to find out if you search box doesn't work how can I tell you something about that and I to
be working for search apart from that I am a member of the purchase of Foundation cofounder of approaching the world which does machine learning and cofounder of Columbus where a conference on search scale and stored in Berlin but apart from that spans the past couple of years in companies that do research 1st 4 years in consultancy doing our search applications for German publishers and then 2 years at Nokia Maps and of was was Elastic Search and 1 repeating question always walls have to find out whether the search box is actually working and how to convince those customers and humanitarian what's real problems are as opposed to what they think the problem is so this is
usually when I go around to my microphone and ask people who they are at the end of and to give a brief introduction because I have a background in doing it I won't do that to the entire growth this morning but I still women have which would have a show of hands how many of you have ever workers a search engine that they've used because it's just been Bergen just was that and how many of you have integrated research into some website it was educated with which everyone
how many of you have to build your own search application of integrated and seeing Elastic search whatever where how many of you know a lot about using it's very it's I don't need to know anyone who knows oracle and become just 1 field here when I did that she's a lot and nearly everyone Elastic search yeah that so 1st briefly want to show
you that searches just about searching pages isn't just about searching techniques that can be quite a bit more diverse and quickly skim over the use cases because many of you very familiar with building such applications and what that involves so the 1st
non-text application that comes to my mind is that of course a mapping seriously search the use case here is that given the having user who has a mobile device that uses and the latitude and longitude and maybe a category C to see you at the users searching for restaurant or sending username offset restaurants and you want return that particular and i it's interesting to the user or CHS sending you some
complete address including street name city name and house number or just a fraction of that address and you have then have to return to the correct location so this is something that can easily be implemented with a
search engine in not just case it's mostly been implemented was a combination of the scene so typical of typical use case and
also not quite so Arquillians use case is you wanna do market research so you've got some brand names that you want to monitor and you take all the mentions of offset brand name in social networks via Twitter or Facebook or what have you put it in your search engine and then due to an analytics on it so in this case I happen to have a baby and I happen to have a Twitter account cycle some of these Tampa mapping some
and as a use case which is not quite so operates but what that which is particularly popular with so ElasticSearch is the half log files saying standard about that similar clouds over standard application of files and you feed them into your search applications and again to join analysis on that ElasticSearch comes was looks best to put the stuff into the search engine and the sky gone for real music visualization on top they can easily monetary applications with that 1 can easily scale of and as a
slightly more typical use case if got a product database in this case it's a database of apartments and you want to filter them not only by location and distance but you may also want enter some
kind of free-form searches you may want to filter down down using particular attitudes and you may not be aware of once you create the database so usually see searches tend to be covered by a half the usual search engines like was seen as well so my focus
today will be the end user facing such applications are not so much the locking use case but more the product database mapping search or typical usual text search now this really dimensions that make our search engine quality tend to be ideal interface because of user interface sex and you don't find the search box of of helping anything also it's a query language isn't what you do is expected to be it doesn't help you as well as the 2nd dimension tends that the responsiveness of society it's usually is that if a user
enters a search term and you get your response in a way more than 100 ms that web pages and perceived to be very slow so it's like a having agent lacks that can be noticed by the user and with that something for people to integrate but it turns people a the dimension in search result quality what
I mean is that is if the search results that the user searching what actually in my index is a document that the user looking for in that the next particularly report what important for the mapping use case if the restaurants that's users searching for his mom and the next become do anything was ranking you just come returning the 2nd
1 is it's the query terms that the users using in in order to find the document are actually associated with the document single synonyms for example some person may search for analog computer that were in me really mean an Apple MacBook was the other way around and the 0 1 which usually comes
to mind is the ranking quality like this a document that users looking for actually ranked on top or is somewhere at the lower end of the ranking the presentation will focus only on the ranking quality and how to find out whether all documents actually are in the next year focus on that use it on speed on your right if we think about how to get to determine whether search ranking quality is correct we 1st have to remind ourselves how to search and search engines are typically integrated and what
typical such problems are let's think of a mapping application and let's
think our users pitching for no that obvious answer for that query clearly is a city in Canada that should be 1st whereas right now however
1 this particular user is somewhere in the middle of Berlin has has few viewport down to assist particularity and if you're very familiar with balloon and then 100 levels of area you know that there's a TAC chairman restaurant called Toronto so our search may actually not means the City of Toronto in Canada that means a sort of type tasty the restaurant so result ranking quality isn't actually that
obvious and does not as objective as it may seem so a lot of our ranking
depends on your context where where am I 1 of my really looking for something else within their in influence on my my perceived quality it's checking think of typical spelling errors that if you just use predefined queries that using when doing your unit tests you may not think of all the ways that people misspelled search terms but you do see it in your locks another thing is and personal preference where and global popularity of of this general popularity just because you
matches favorite children's books on top and book search doesn't mean that this children's book is actually the bond that everyone is looking for so you may want again have a look at what users really are looking for another aspect
that you can think of seasonality just because research works and summer doesn't mean that that's great in the winter so maybe destinations that I received a lot depending on the season and a great difference when checking if your research works if you never know how the system works as opposed to being a newbie user you typical user doesn't know how you know how even in Unix works well then it's just magic state so you some query
terms and you forget that relevant of documents but that's what they expect their know-how that indexes built so if you crafted test queries yourself you always know how the system works and you know how to craft a test queries the we
had to problem of subjectivity and another example for that would be take a random set of people and people from different areas and ask them for the best ice ice cream and told you don't wanna know how many different answers again another place there's a difference this different meanings for the same term in different countries restaurant in Germany
typically means the nice planes failed go to have been arrested girlfriend in the yes that may be in a place they you got to get food and they can sit down so there once was it's at what has known in Germany from McDonald's flexi different restaurant Americans do expect you to return McDonald's of his interests and in Germany it depends on who you ask whether that's embarrassing on also there may be
used including additional information in order to find the result which you don't have an index think of in music Music search engine where you don't put the published data into the index the published it as part of a search query that and you can't use information of course know 1st step
towards a better search engine our metrics that I typically called finding the elephant in the room that stuff for that is that is very outstanding worry obvious once you take a closer look what do I mean that
that you go to your locker class that should make it very you lock your search and usage for us 0 response curves so I is a on each situation note down how many was no response to gaps or if you didn't do that at the at implementation time you at least takes these queries
really really around them on your system on a test instance that has all the data and look for queries that return no result when you find out that it's stuff like spelling mistakes and you should in particular be looking for common spelling mistakes and include sees different terms the assignments you
find searches for non-indexed information and again you should be looking for for that repeated occurrences of this but if it's something that's very often that shit pointed towards a feature you should be implemented in which you should implementing an expert also could pointed to what information to Fhloston preprocessing a
popular movie in and Germany which is called uh 23 thanks transitions with strength of people just search for 23 and you normalize all numbers to a specific term number but don't keep the actual number you won't be able to return the money for that particular stage so again if you look
for 0 response queries that already pointed to what's problems in your implementation the next step would be to do not only locks acquiring plus response you also lock which results the
user clicked on what you get for that this education at a very 1st step to search for a searches way don't have any indirect so you not explicitly looking for queries that made it get results but no 1 ever think of them so obviously sees results crap and again you can look for common query we do very
reformulations that is you take a query accrue groups queries sessions see given at certain time frame class there a returning user is 1 recession and then you look for common reformulations against me pointed to what's in an incentive for copper all of these approaches are great for finding bugs in your they're great for finding bugs in your current implementation and they're great for prioritizing features because it doesn't pay very well if you just pick a
random I want to add new solutions to this is that such and such term receptor targeted to the to want to use
however it still a very coarse assessment so if you want to go along with the deeper and more fine-grained we will 1st take a
look at that search in a nutshell centers housing and US exception to properly integrated the then we have our
search so what you this is what you take entitlement of course if you and look at the certification that's not all you
can get a little bit more information you may get the user's
location because that's maybe what the browser sends through education at BI it could also be simply the use used and so you see it as such application knows approximately very also to be to your browser will send a preferred use language typically if say you have a lot in mapping application you will send some kind of support you may have a twofold 1 but may also have some down to its very interesting area already you will send information about the operating system so if you have limits and you send query apple text search engine that maize means something totally different than if you do have an Apple MacBook and you're searching for Apple the search may have information on previous searches if you're returning users and they targeted that also something very
obvious that you do not need to send those kind of the the of the cities that are interesting on Monday to friday may not be the same as that analysis is done on Saturday and Sunday same forwarding hours as opposed to free time the on the
other hand um for each results there's all hatched some additional information conditional only just name torrent attached to the city of Toronto there's also something like city states that if you have to decide whether to literature in the restaurant of the city it may be interesting to notice that cities particularly large if it's far away from the user or if there is a popular vacation destinations you do have array you may have reading information on the and that's a very popular restaurant light may be a good idea to return that restaurant as opposed city you probably have information if you do and this kind of tracking on how often people
click on the not so in addition to just the query string you do have a lot of additional information that's occurring that's
being sensors browser the browser that's metadata that metadata thus the query arrives in research applications in now something interesting happens consistent application has to formulate the in our case an elastic search queries so what usually uses query encodes information that sensor is a problem that's right and send through some kind of mobile device is that the mobile application plus a user query because maybe time and time and time and date arms query of course goes to use search beckoned and you get a documents the interesting part happens there because this is where you decide how to weight all the different ranking signal and how to combine them so this is where you decide how to take a
sister from boxes and probably many many more of these and how to combine them in a career that works questions that
you may want to answer is I found worry that works or a fine occurred transformations of words but here the different group transformations much much faster on my back and this is still gives the same results and question that's also uses how should each of these signals be weighted is of more importance of my
city is large or is it more important at my restaurant because a lot of positive ratings and of course which function to use in order to combine them which signals some shit to emphasize the importance of a few I orifice search result which signals should decrease decrease of importance so all the
obvious solution take a few queries taken several test configurations that looked lexical makes sense and just
typical solution for that that I often the in practice he has a requested developer
development is besides on I'm going to test these 5 configurations I'm going to fire a handful of queries and see if it works the idea except that the developer clearly knows how the system works so they they know which queries were perfectly well and which queries definitely do not work so you're resulting ranking
function is 150 to what's working for your development maybe not so good and
as a typical solution you've got management making the decision you've got a product only funded that customers they next has developed and they look at the ranking function against a choose a set of mutually handful of
queries maybe 10 maybe 20 but as management is that a lot of time it's definitely not 100 queries there score is the usually targeted toward their interests typically doesn't work as well so
selecting just a handful of queries and firing sees manually usually not a good idea it better solution a great solution actually would be sitting next to
user looking over their shoulder and using real queries now you can't go out in the wild that sit in a cafe and look over the shoulder of users however what you
can do it is if you do have these interlocking you can in indirectly look over the shoulder of the user speaking taking queries about block sample it is number say 100 and use these in order to do evaluation when he gets through that so 1st of all don't take queries only from 5 AM in the morning on Monday morning take it from all of the times
that you want to cover 2nd don't take only 5 degrees but did multiple also don't take Kruse just for certain locations but take it all was a space where you want to have you hostage perform very well except when you talk about target is to improve search for state sets coming out of sync posting
so that's what I mean if it's if I a user diverse enough samples not just is small fraction but is that really do some no the solution is
take a lot of Korea's take diverse enough samples yeah 1 In this work
so the question was what was the target function when sampling said is something you have to define before taking the sample and you are a business goal is to improved search for researchers is coming out of Germany you should take cruise coming out of Germany if you're is a goal is as a 1st step to to improve what either to improve search for all the various or and for all users or if it's at least evaluate how well you search works for all users thing you take a sample from all income increases so in particular at the mapping application it does make sense to narrow down a bit from the rich set of queries to take a sample you may have the objective to improve search in a certain country you may have the objective to improve search for certain cities you may have the objective to improve search for certain content so he always trying what you target user group is that want it was you so and so if I were to question was the same proofs of best average over 6 and that's up to you to fuck to define what you wanna do what you typically do is to look at the average so you typically look at the performance of C 100 queries and then the average it and then you know how you before if however you say OK I do have I want to see what my worst queries are and I wanna see sets a specific type of query and I wanna improves that specific types and you would look at the worst performing theories of course but the typical our starting point is to look at the and again you could even say OK I do not want sample and bite your coffee but I do want a sample for certain types of queries say I only want sample queries said have a certain devices attached to it and I wanna sample only queries that have this has something to do with eating eating and drinking and going out that's sets again your business objectives from all university have which once the duty which 1 the 1 hand and typically is the 1st step is to improve the overall impact experience and at a certain point you really at you will realize that you can't easily improves the overall experience but there's a few types of queries and detect food types of users which are like low-hanging fruit in order to improve the take a sample just from that compute C and qualities era improve and that only the bounded for that you should again we run your experiment on the whole set to make sure that you didn't and decreased quality for the remaining users except the the so
if you think about this translation of and thinking signals to words and the elastic search queries what's this really means if you run these the kinds of experiments is that he defines sort of like a template to translate from Europe and rank it from a user-supplied query data and metadata to really ElasticSearch query and that's where we less actually helps to provide can degrees there you can define OK I want a ElasticSearch query executed and in
sports I want to have the variables filled with the actual data and that data will be what comes out of the log files have been supplied by the users so you can easily define multiple templates 1 for each sound signal configurations combination that you want to test and can and 1 your experiment again the index without having to we write any code so that I could see the corresponding Korea no
there's 3 ways of evaluating how good you are is a manual labeling and tagging efforts that you can do since 1 efforts such kind of sort of works and in offline mode after after having but your search online and have been collected blocks and this 1 online metric we will 1st look at the
manual tagging to goal here is to still do a manually quality and evaluation but to use real user queries and real
returned results and the goal is to decide which of those with multiple sees providing so you take
your and locks if you in a good year for each query of all lot which receive readout results if returned if you didn't do you remember that you can always we want this queries against the system which was like when these queries were run so replied some and you send them off to could be a mechanical turk people could be people you hire yourself 1 thing to keep in mind is that you
need to train these people so you need to tell them what kind of results you expect or what kind of result is kind of like can kind of in that case it typically involves going through a few iterations of curious if you take together with these 2nd thing out of this and entails SNR typically you should have at least 2 to 3 people look at 1 query independently what he gets from that is that you can take a look at those queries which is the opinions of people were largely different than everyone agrees it's very obvious if everyone disagrees of this the height of the green weight and you should again take a closer look at these and groups yeah
now there's 2 things that you can
get the 1 like the Coca-Cola Pepsi experiment you take a side-by-side comparison you don't tell them which sites which which is new in which is and you tells them tell me which 1 is better from that you can deduce which training function expand it usually works for for whatever
reason functions to 1 evaluate this slightly tricky because you have to evaluate complete ranking sets so you have to evaluate usually you do it for the top 5 top 10 results typically for the 1st search result page because that's what people tend to look at people don't tend not to page what's also tricky was sentenced to evaluate after the fact after you've got the result of which 1 is better to say why it was my it's better if you do not track additional information if you do do not tell you people around tell me why but what what what what went 1 1 which side a next step
could be tell annotators to wage and results that and tell me if which will which of the results of I saw is a good 1 and which 1 is better what I get from that
is which query a set of annotations of this results from of one's not irrelevant as 1 as well and etc. so you get multiple sets
of begin to from that is
computed precision precision means of all the results of the returned how many relevant which portion was slightly it's great as a 1st overview and typically if if you hear position and you do have a background in machine learning you may tell me that in this 2nd part missing and that recall recall meaning of all the relevant results that I should have returned how many did I
return if you think of web search and I tell me how to annotate all the Internet in order to find out how many relevant results the word so when you look at it in search especially in web search even in an innate usage is only precision because even you can imagine that in order to find the rather all relevant to at restaurant in Berlin you have to be very knowledgeable about 1 the land in order to give me a some kind of free calls however position does have a problem here even in society and that is that that's the 2 results that's that you see below i
can difficult and clearly see another equally good because in the and the right-hand side you have you want on top in the left hand side you have a relevant 1 and so on me personally I would prefer using the left and right also all documents are counted either online or element is next cases seasonal III and I don't have anything else it's OK Bad if I do have a lot more better results I should better leave it out so the 1st
fix is to introduce 1 integrates something something like
it's embarrassing to show that it's kind of occasion it's and this is essential to show the ad ETSI so great ranking function and you get a better evaluation 2nd fix and to discount see lower ranks so in if you have a very relevant results by the trained very low its contribution to the overall some should be lower than if it would have been on top so there's 2 ways of took computer depending on how much discounting you what last but not least you can normalize because imagine you have only 1 relevant result you don't wanna show 5 results just in order to and push the number of what we wanna do is shows is 1 result so you can normalize by for that reason what is the maximally achievable discount and discounted cumulative gain you compute that by simply ordering on your results by relevance great and computing cities chief for
that and to the normalization of the this is still a few of the issues that he could fix the if things like the observation that people do not tend to look and 81 results it's just a little relevant 1 that's different metrics of success like year often systems and not going to go into details here's here's publications outside that explain much better
now if you do have manual tigers and if they go over results anyway there's 2 things that you can tell them that I chief to accomplish but that may help you implementation great deal 1 thing and tell them to
look for embarrassing stuff giving a little bit of time to look for the funny stuff here it's OK if you look for a baseball cap that it's OK to show baseball but the right results may not be what is people should be shown even if the 2 purchase ounces is still working on the on but only in Amazon and other things that
people taggers can look for is given a query could as good a have entered this query was occurring next tonality it happens to be a kindergarten in when it's not and the index of con be returned is your tigers sees
some portion of an address or the some like something that looks like a restaurant and all that looks like a garden that they know or they can easily tell you hey there's something missing you should be just fixing these issues but you should be looking for patterns here so it is there is certain category of you I'm using is in is there an issue in a certain country is there issue in a certain city so don't fix just this 1 index instance but use it as like 80 days right some kind some kind of quot pointed to what's a bigger problem that you should be looking for was this kind of
manual tagging is great for assessing your current ranking quality it's great for comparing to a ranking function because can always leave testing you can always compute precision value for both ranking functions they need for the 2nd ranking function is the thing the 2nd ring function is
slightly similar to the first one is even cheaper because most likely you're not returning completely different results so you can reuse the sanitation however it still expensive because of manual work we can fix that by going
slicing wanted to a more automated approach good going for slightly more automated approach usually means getting quite a bit more noise into your measurements as well these are not quite as precise as manual checking but cheaper what the need for
it in your log files is queries you need the results that city return and you need the exact results of a user clicked on so in your search result page you need some kind of feedback loop to what's the application that tells you which link user clicked on if something sometimes missing in application ones I have these
3 of these 3 input data that's what should do want of demand function like should I
simply so I to increase engagement surface or that's what's typically done for e-commerce websites on Microsoft experiment gone wrong they had messed up to use its function completely that resulted in a increase revenue from the positive result advertisement clicks and in an increase in search result clicks why was so because ranking was so completely broken people had to do a lot
more clicking in order to find the well and we solve so i in the short term that means an increase in in revenue an increase and clicks In the long term however that doesn't mean it means that people there is a Fitch so this is not what you wanted 0
publication out already for nearly 10 years now there are people that nitrogen experiment how do people interact search people tend which is obvious to scan search results top to bottom so that means I can use clicks as well and the feedback however what I can do is to use a click on the search result as an absolute and relevancy feedback point what I can do however is to say OK somebody signals that result of this 1st and the 2nd 1 the probably 1st and the 2nd 1 went quite as
interesting so you can take it as I read relative feedback and incorporate that into your your ranking function something that is a little bit more noisy but of which you also woodlot what can also use in order to increase your data is to look at the complex being warm water and then whatever was clicked before because the reasoning behind that is probably the user came back because she wasn't into it is she wasn't happy was the 1st if you think about your search behaviors that may be true for quite a few cases In some cases it may just mean OK I want a 2nd opinion I want to read more on the same topic so that's where the noise is coming from another sit down and think to look that could be you clicked on something but it
didn't take 1 was just below that because people tend to scan what's around what whatever was sick to click you cannot say if that what you get click is more important than what was on position 10 let alone on position 100 because you cannot um say that the user actually solves this result but you can be reasonably certain sectors what was around click position so this quite some licenses data but it should be very cheap to to get this if you got us here there's something called presentation bias users tend to be worried and trusting in your search results so they tend to to trust that the 1st result actually the island was even if he turns the ranking completely around still in preference for picking on the 1st of they had somehow have to account for that also uses of course on the task and the limited number of results so what he cannot find out what that is if you put a ranking is completely broken was the use case that for certain use case you're interesting stuff is on page 100 you're not going to learn through this kind of a metric and again it's only possible for life implementations of so this means if your ranking is completely broken and it's going to scare reuses
and this may not be the best best way to find out because the users may may have run away once you know that that's really broken it's great for assessing
current ranking function functionality and it's great for comparing 2 ranking functions it's slightly risky if you introduce a dramatic change and you are not absolutely certain sets this dramatic change leads to an improvement what leads to comparable user experience because of life only
so the last approach which is now real life only if you're reasonably certain that you knew ranking function works but you still want have a comparison against life system anyone a shows that it
works is a typical AB guessing you again for evaluation decreased results showed clicks and that both ranking functions that I can do to set up 1 that
is you take a small portion portion of users and shows and just a new ranking functions and to take a larger
portion radio I saved and shows and the old ranking fraction and then you just compare where users click through they tend to click on the top results tomorrow could they just randomly click or whatever and as an approach it
tends to be about depends the work very well to take for subset of users a kind of simple like approach you take those ranking functions field 1 and the new 1 and Interleaf's that was that to
convey and it my users click more on the what results or more on the new results In order to account for presentation bias you switch the interleaving once the old 1 and once the new 1 is on top and then from that he did choose which which 1 is better slightly less risky because you always have sealed results in between and it's also easy to interpret because you know where results are coming from so
this is great for comparing to rank conventions however its online only so you don't want to scare reuses by putting something online which clearly masses the user experience completely uh so in real life
experience if you wanna put a constant together you won't be using
just 1 of these approaches you will typically use of a chain of filters in order to evaluate your ranking signals and in order to evaluate could be a combination so you look to proclaim you developers and management or whatever comes to you with new ranking idea I
wanted to system used all i wanna change ranking the typically come was to just 3 queries where it really improves experience so you will go and 1st maybe for is to enable the test that sets off line show your energy use both working and tell them which coming which 1 is
better or you will rerun it against a pre annotated set of search results and you will receive it is my ranking still kind of a k is a better or is that worse given on the annotations that I already have maybe re annotating whatever came up if you're new ranking functions succeeds in these annotations and the sanitation experiments and is reasonably good is reasonably good I mean and it does it's good enough in order to the a reason to have ranking fashion put online you will then go and potentially around Ireland life life the the test or you will put it online and Trixie clicks being done so you will always have this filtering you will 1st probably do see see some proof cook precision computation you will send potentially do an AB test and I would like offline AB test that we had before like 2nd 2nd person and then you will potentially due to an online in the test so that you will always have a multi-step quote process and what you will see is that most many of the great ideas of people come up with on and actually that great what wages for a very small subset of queries and then it's up to you just to prioritize and to think was its worst serious to put that on line all as to go with the current solution that if you want
to read through the information on the topics this lots of information online 1st reference it should look at Our in the publications on the right hand side of the 1st the 1st 1 is the amazing guys and that Microsoft safe bonds the tests and compared for instance which kind of metrics influence how power you're a ranking function changes like if you use the wrong and target function like for instance in improving clicks and user engagement with Europe such results will happen on the other interesting publications as 1 by items on the query chains someone which tells you how to do search result interleaving and that's the 1 that tells you how to interpret the click data the interesting that's serious and a free as in free as in free as in free beer book called information retrieval setting can learn but that said it can read and learn a lot about the evaluation functions in there they also talk about computing recall for a search this works great if you have academic
and collections document collections electric for instance there you know which documents in the collection you can compute recall keep in mind that in your real life experience this may not be as easily computed computable as you think this 1 book II in
particular like that search patterns it's not about ranking evaluation and more about ranking quality but it talks a lot about the features and search engine should have in our in terms of user interface in terms of use of features that makes experience a lot by don't like for instance the obvious fact that the stuff and there's some that pages online there's a talker and building buzz words of talked about ranking quality again young next and serious publications buying Google which talk about how they do and search quality where again whilst down to a multiple filtering span multiple filtering step process bear fruits you go for the annotated documents that
which you can easily use in order to compute your performance thing going for a b a full 2nd set comparisons and then going and throwing old multiple filtering and signal ideas underlying so it's usually you
come up with the signal and makes a change your 4 annotators of C to see if the changes worth anything you re on energy the only change results use multiple metrics potentially precision which are potentially and his achieved potentially whatever the cost of the effect also keep in mind that the position and typically is used for for the top n results and you should target the size of n to the user interface if you are showing only 5 results on the 1st result page it doesn't make a lot of sense to compute precision at 10 and then the only test those for ranking changes on real users may you've seen before said it does make a difference and it does make a positive and with that I'm happy to take questions and I have been given these tiny little
else to hand out to people who now ask me questions that as a given government question already well someone looking in the back I got 1 question during the talk this is not so think you know the questions we further questions the only woman over the last slide here the this is the 1st of the I want you so what it's going through the so the question was whether ElasticSearch comes with many features that helps was quality improvement and that's a very great question because and that's actually on our roadmap I've and hospitals important in implementing support for that so the 1st step will be found supporting users was slots something In a 2nd step will be supporting users will doing CAD testing stuff and away you had of that if you think about you to have annotations you have lots you have clicks so the next logical thing is to learn a ranking function and do not manually is think about which signals I want integrate and in particular how to weight and you could easily learned that ring said waiting an automatic so doing things is learning to rank is like the ultimate goal it's not like something that's going to be released next week of course unless you want is applied to each of the in that part of it is on the road because so so the reasoning behind that and said what I have seen and what many other developers have been developers tend to Nova at least some of that but usually in their companies they don't get the time to implement something they don't have the time to implement CAD Annotation Framework see may get time and money to pay for it and mechanical turk people but there's still quite a lot of overhead in order to come up with a decent and sampling approaches that and the annotation approach was featured back into the application was running says across Europe currently nexus lots of infrastructure code involved and that's something that ElasticSearch in the future will have of what do you have the score of the related all so what is so the question the question was where energy is what information they have to give them 1st of all it depends on you use case which information you have to give them it also depends on what you want to know is was these annotations if you have local people doing the annotations and you want to know what their personal preference and give them as little information as soon as possible typically however you will give them a fixed set of rules if you want a sample and guidelines you can search Google for the Google search annotated guidelines there should be a page PDF online like 3 to 4 pages which explains which kinds of information on Google annotators get in order to annotate pages because this so In this room everyone typically knows what a search engine spam page looks like 1 of the uh some links on that looks like what link farm looks like your word typical Mechanical Turk person probably doesn't have that knowledge so you need to tell them what to expect what is used in a lot of time on that so you have to like in all you can go you know it is also over world FIL the question was if your site is very specific favorite technical side how do I get that information to the people ideally you you will get annotators who have some background so when I think back about seeing mapping use case that we had at no can we always try to find people from different geographies at least so they they understand of police language that's being spoken because evaluating any Russian restaurant if you don't even know how to read current truly is kind hard it works as good can be done and developers have done and that it is extremely hot now it also depends on how much noise you can deal with if you can deal was like the annotations and the however you cannot afford the domain experts in order to do the annotation is probably better to get anything annotated back and have nothing but usually you are trying to find the domain experts what's happened historically regarded that's just the 1 or 2 and if there are no more questions yeah the the was and was only you going to the world what
what the what you said so the question was if you have multiple if you have lots lots of products and you have hundreds of signals that you have in your mind in order to ranking what would be the use standard approach that 1st of all I would go with whatever is your search engine comes was and try that like with the default configuration and see how that works and go from there also probably I would try to not use to many signals to begin with like in text search people usually start tweaking which ranking functions they use if they have standard text search what I tell them as just do this do what what what's do whatever is shipped with by default which is the of ideas and that's something that to be works extremely well and was product eject would probably starts the same way just as itself was an equal ranking always start whatever you got feeling it's as if he had no data this little you can do want to do world all right you're not going to work for both of us or maybe maybe wrote about what can be viewed as of this so we're on to search for the reason that it is we want to bring the user you I was planning what does that mean it means that some of stock usually think about it and it's not over because they had not been very seriously because of what you use of music because researchers in the area of is Asian various forms of so usually it's the 1st of all l on what I was talking about you was natural constant sees a whole different topic about such placement patent placement so it's a whole different topic about deciding which advertisement to show which depends on the budget to which depends on how much a customer's being in order to handle up on on the top and which also depends on whether 1 in at the in apart from that you should be waiting for users to send e-mail you should you should see that you should see it much much earlier and to be typically see having a lot of if you look close enough and typically you shouldn't do users looking just on request but when you manages standing behind you like what is what is this thing is that all OK but you should have some kind of test what which which shows you see typical of the values that you should be looking at take further questions yeah 1 you know how to do so problem the of what you should was down in an ideal case you should be able to trust whichever uh preferred to use the language across all of users tend to and targets the search for that typically you can't do that so typically you might even do some like OK in which location is the unit of class which is the preferred language of some of the pair parametres that used by need uses them thing it again depends on them where they're monotone improve search do I want to search for German users so major look for queries coming out of Germany that's not only German users but it's probably the majority I know the problems that you cannot trust the and language that some of the power but again maybe application has a means of finding out what the preferred user languages maybe it's a combination of you may maybe you can guess the let me use a language from the query set these people tend and language and so what obviously in in your case you have lied about it but it is clear well yeah but not in the in that case the here and now so in the that case I would always be full to the translated content if set is available on the use of language and otherwise try to find what is the best guess what people speaking this primary language typically have as as a 2nd language so you can't do any better I can't think of a better way to do that because it always ends up being like guessing this that would be able to use and of the end of service languages than for instance if we you could have thank you I'm being told that I'm out of time so if you have any further questions come to our poster and get the remaining Alex what if
Offene Menge
Zentrische Streckung
Freeware
Software
Quader
Reelle Zahl
EDV-Beratung
Kartesische Koordinaten
Elastische Deformation
Algorithmische Lerntheorie
Ein-Ausgabe
EINKAUF <Programm>
Computeranimation
Quader
Web Site
Datenfeld
Suchmaschine
Besprechung/Interview
Web Site
Kartesische Koordinaten
Elastische Deformation
Computeranimation
Mapping <Computergraphik>
Bit
Kategorie <Mathematik>
Kartesische Koordinaten
Computeranimation
Homepage
Bruchrechnung
Facebook
Adressraum
Schaltnetz
Besprechung/Interview
Zahlenbereich
Analytische Menge
Soziale Software
Computeranimation
Demoszene <Programmierung>
Freeware
Softwaretest
Twitter <Softwareplattform>
Suchmaschine
Dreiecksfreier Graph
URL
Gammafunktion
Datenlogger
Datenhaltung
Applet
Ähnlichkeitsgeometrie
Kartesische Koordinaten
Elektronische Publikation
Biprodukt
Computeranimation
Metropolitan area network
Audiovisualisierung
Reelle Zahl
Suchmaschine
URL
Abstand
Simulation
Streuungsdiagramm
Standardabweichung
Analysis
Benutzeroberfläche
Quader
Datenhaltung
Hausdorff-Dimension
Ideal <Mathematik>
Kartesische Koordinaten
Euler-Winkel
Biprodukt
Fokalpunkt
Computeranimation
Mapping <Computergraphik>
Suchmaschine
Endogene Variable
Datenerfassung
Retrievalsprache
Hilfesystem
Dimension 2
Schnittstelle
Mapping <Computergraphik>
Resultante
Automatische Indexierung
Hausdorff-Dimension
Endogene Variable
Besprechung/Interview
Indexberechnung
Web-Seite
Term
Term
Computeranimation
Homepage
Suchmaschine
Rechter Winkel
Indexberechnung
Abfrage
Analogrechner
Einfache Genauigkeit
Kombinatorische Gruppentheorie
Ranking
Ordnung <Mathematik>
Fokalpunkt
Term
Term
Computeranimation
Resultante
Offene Menge
Besprechung/Interview
Abfrage
Kartesische Koordinaten
Ranking
Quick-Sort
Computeranimation
Übergang
Mapping <Computergraphik>
Metropolitan area network
Flächeninhalt
Datentyp
Datenerfassung
Arithmetisches Mittel
Objekt <Kategorie>
Lesezeichen <Internet>
Einheit <Mathematik>
Matching <Graphentheorie>
Besprechung/Interview
Abfrage
Kontextbezogenes System
Ranking
Term
Computeranimation
Fehlermeldung
Softwaretest
Subtraktion
Besprechung/Interview
Abfrage
Physikalisches System
Term
Computeranimation
Arithmetisches Mittel
Menge
Flächeninhalt
Automatische Indexierung
Randomisierung
Aggregatzustand
Resultante
Ebene
Addition
Subtraktion
Suchmaschine
Automatische Indexierung
Gruppe <Mathematik>
Mereologie
Abfrage
Information
Ordnung <Mathematik>
Computeranimation
Softwaretest
Resultante
Linienelement
Klasse <Mathematik>
Besprechung/Interview
Implementierung
Abfrage
Physikalisches System
Information
Term
Computeranimation
Endogene Variable
Metropolitan area network
Suchmaschine
Endogene Variable
Instantiierung
Umwandlungsenthalpie
Expertensystem
Gruppenoperation
Zahlenbereich
Implementierung
Abfrage
Information
Term
Computeranimation
Endogene Variable
Metropolitan area network
Endogene Variable
Information
Resultante
Rahmenproblem
Klasse <Mathematik>
Besprechung/Interview
Gruppenkeim
Implementierung
Abfrage
Information
Ranking
Computeranimation
Endogene Variable
Programmfehler
Bit
Digitales Zertifikat
Browser
Formale Sprache
Parser
Abfrage
Ausnahmebehandlung
Kartesische Koordinaten
Term
Computeranimation
Mapping <Computergraphik>
Metropolitan area network
Uniforme Struktur
Flächeninhalt
Suchmaschine
Netzbetriebssystem
Inverser Limes
URL
Information
Resultante
Metropolitan area network
Weg <Topologie>
Zustandsdichte
Information
Extrempunkt
Computeranimation
Informationssystem
Aggregatzustand
Analysis
Addition
App <Programm>
Browser
Mobiles Internet
Abfrage
Kartesische Koordinaten
Computeranimation
Metadaten
Metropolitan area network
Mereologie
Information
Decodierung
Zeichenkette
Resultante
Subtraktion
Quader
Ortsoperator
Gruppenoperation
Besprechung/Interview
Transformation <Mathematik>
Bitrate
Computeranimation
Metropolitan area network
Softwaretest
Diskrete-Elemente-Methode
Zustandsdichte
Wort <Informatik>
Reelle Zahl
Ordnung <Mathematik>
Chi-Quadrat-Verteilung
Softwaretest
Lineares Funktional
Abfrage
Ausnahmebehandlung
Physikalisches System
Biprodukt
Computeranimation
Entscheidungstheorie
Softwaretest
Datenmanagement
Zustandsdichte
Menge
Ablöseblase
Softwareentwickler
Konfigurationsraum
Software Engineering
Elektronische Publikation
Stichprobe
Besprechung/Interview
Abfrage
Zahlenbereich
Sprachsynthese
p-Block
Computeranimation
Datenmanagement
Stichprobenumfang
Reelle Zahl
Ordnung <Mathematik>
Leistungsbewertung
Bruchrechnung
Elektronische Publikation
Stichprobe
Raum-Zeit
Synchronisierung
Computeranimation
Metropolitan area network
Multiplikation
Diskrete-Elemente-Methode
Softwaretest
Minimalgrad
Menge
Zustandsdichte
Stichprobenumfang
Reelle Zahl
Aggregatzustand
Bit
Punkt
Besprechung/Interview
Gruppenkeim
Kartesische Koordinaten
Physikalische Theorie
Computeranimation
Metadaten
Softwaretest
Mittelwert
Datentyp
Stichprobenumfang
Inhalt <Mathematik>
Grundraum
Umwandlungsenthalpie
Lineares Funktional
Business Object
Template
Abfrage
Objekt <Kategorie>
Mapping <Computergraphik>
Minimalgrad
Menge
Beweistheorie
Wort <Informatik>
Ordnung <Mathematik>
ATM
Variable
Automatische Indexierung
Template
Schaltnetz
Besprechung/Interview
p-Block
Elektronische Publikation
Konfigurationsraum
Quick-Sort
Computeranimation
Quarkmodell
Resultante
Kraftfahrzeugmechatroniker
Gewicht <Mathematik>
Reelle Zahl
Besprechung/Interview
Gruppenkeim
Abfrage
Physikalisches System
Computeranimation
Leistungsbewertung
Quarkmodell
Resultante
Lineares Funktional
Web Site
Wellenpaket
Menge
Besprechung/Interview
Information
Paarvergleich
Computeranimation
Homepage
Arithmetisches Mittel
Resultante
Metropolitan area network
Virtuelle Maschine
Keilförmige Anordnung
Ortsoperator
Menge
Mereologie
Besprechung/Interview
Computerunterstütztes Verfahren
Computeranimation
Eins
Resultante
Benutzerbeteiligung
Keilförmige Anordnung
Freeware
Ortsoperator
Rechter Winkel
Systemaufruf
Wort <Informatik>
Element <Mathematik>
Ordnung <Mathematik>
Computeranimation
Internetworking
Resultante
Lineares Funktional
Logarithmus
Kondition <Mathematik>
Besprechung/Interview
Zahlenbereich
Computerunterstütztes Verfahren
Computer
Gradient
Ranking
Computeranimation
Mailing-Liste
Rangstatistik
Kondition <Mathematik>
Leistung <Physik>
Reelle Zahl
Ordnung <Mathematik>
Ideal <Mathematik>
Leistungsbewertung
Resultante
Subtraktion
Bit
Linienelement
Implementierung
Physikalisches System
EINKAUF <Programm>
Computeranimation
Kugelkappe
Metropolitan area network
Mailing-Liste
Luenberger-Beobachter
Normalvektor
Ideal <Mathematik>
Kategorie <Mathematik>
Automatische Indexierung
Mustersprache
Adressraum
Besprechung/Interview
Abfrage
Computeranimation
Instantiierung
Resultante
Softwaretest
Lineares Funktional
Bit
Unterring
Besprechung/Interview
Geräusch
Strömungsrichtung
Computerunterstütztes Verfahren
Ranking
Ranking
Einflussgröße
Computeranimation
Resultante
Rückkopplung
Lineares Funktional
Web Site
Datenlogger
Abfrage
Kartesische Koordinaten
Ranking
Binder <Informatik>
Ein-Ausgabe
Computeranimation
Eins
Homepage
Metropolitan area network
Exakter Test
Flächentheorie
Chi-Quadrat-Verteilung
Resultante
Arithmetisches Mittel
Rückkopplung
Relevanz-Feedback
Punkt
Rückkopplung
Besprechung/Interview
Interaktives Fernsehen
Ordnung <Mathematik>
Term
Portscanner
Computeranimation
Resultante
Lineares Funktional
Videospiel
Bit
Ortsoperator
Wasserdampftafel
Implementierung
Geräusch
Zahlenbereich
Kombinatorische Gruppentheorie
Ranking
Komplex <Algebra>
Computeranimation
Homepage
Task
Rückkopplung
Inverser Limes
Ordnung <Mathematik>
Resultante
Videospiel
Lineares Funktional
Mathematisierung
Besprechung/Interview
Mathematisierung
Strömungsrichtung
Paarvergleich
Physikalisches System
Ranking
Ranking
Computeranimation
Menge
Leistungsbewertung
Teilmenge
Resultante
Metropolitan area network
Bruchrechnung
Lineares Funktional
Datenfeld
Besprechung/Interview
Ordnung <Mathematik>
Kombinatorische Gruppentheorie
Ranking
Große Vereinheitlichung
Computeranimation
Softwaretest
Videospiel
Filter <Stochastik>
Mathematisierung
Schaltnetz
Besprechung/Interview
Ruhmasse
Abfrage
Physikalisches System
Extrempunkt
Ranking
Computeranimation
Metropolitan area network
Energiedichte
Datenmanagement
Verkettung <Informatik>
Ordnung <Mathematik>
Softwareentwickler
Gerade
Information Retrieval
Resultante
Softwaretest
Lineares Funktional
Videospiel
Prozess <Physik>
Linienelement
Freeware
Mathematisierung
Abfrage
Computerunterstütztes Verfahren
Ranking
Ausgleichsrechnung
Computeranimation
Teilmenge
Verkettung <Informatik>
Menge
Rechter Winkel
Beweistheorie
Information
Ordnung <Mathematik>
Gerade
Leistung <Physik>
Instantiierung
Leistungsbewertung
Videospiel
Benutzeroberfläche
Prozess <Physik>
Gebäude <Mathematik>
Besprechung/Interview
Computerunterstütztes Verfahren
Term
Computeranimation
Homepage
Reelle Zahl
Suchmaschine
Mustersprache
Wort <Informatik>
Instantiierung
Leistungsbewertung
Softwaretest
Soundverarbeitung
Resultante
Subtraktion
Benutzeroberfläche
Linienelement
Ortsoperator
Mathematisierung
Besprechung/Interview
Paarvergleich
Ranking
Homepage
Energiedichte
Menge
Reelle Zahl
Ordnung <Mathematik>
Web Site
Subtraktion
Schaltnetz
Klasse <Mathematik>
Formale Sprache
Besprechung/Interview
Geräusch
Implementierung
Kartesische Koordinaten
Code
Framework <Informatik>
Homepage
Bildschirmmaske
Domain-Name
Einheit <Mathematik>
Unterring
Rangstatistik
Suchmaschine
Gruppe <Mathematik>
Stichprobenumfang
Inhalt <Mathematik>
Softwareentwickler
E-Mail
Default
Leistung <Physik>
Softwaretest
Lineares Funktional
Kraftfahrzeugmechatroniker
Expertensystem
CAD
Abfrage
Dichte <Stochastik>
Schlussregel
Biprodukt
Binder <Informatik>
Arithmetisches Mittel
Rechenschieber
Mapping <Computergraphik>
Energiedichte
Dienst <Informatik>
Flächeninhalt
Menge
Mereologie
Wort <Informatik>
URL
Information
Ordnung <Mathematik>
Overhead <Kommunikationstechnik>
Standardabweichung
Instantiierung
Offene Menge
Freeware
Software
Computeranimation

Metadaten

Formale Metadaten

Titel Your search doesn’t work
Serientitel FrOSCon 2014
Teil 01
Anzahl der Teile 59
Autor Drost-Fromm, Isabel
Lizenz CC-Namensnennung - keine kommerzielle Nutzung 2.0 Deutschland:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen und nicht-kommerziellen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/19646
Herausgeber Free and Open Source software Conference (FrOSCon) e.V.
Erscheinungsjahr 2014
Sprache Englisch
Produktionsort Sankt Augustin

Inhaltliche Metadaten

Fachgebiet Informatik
Abstract Your search doesn’t work How to find out whether or not the search box you offer users is helpful at all This talk will walk you through the options of determining search quality - from purely offline metrics that work even before deploying version 1.0 to production to online A/B testing to check continuous improvement. I will highlight some Lucene and Elasticsearch features that can tremendously help you deploy your own search quality checks. Web sites without search functionality are unimaginable today - you search for comments and code on github, you look for books in your favourite webshop, you use the search box of your favourite blog to find articles. When offering your search for your own application - how do you know that your search actually provides a benefit to the user instead of causing lots of frustration over results not found? Only checking that the favourite book about witches of your child is ranked top of all children books clearly doesn’t help. This talk will walk you through the options of determining search quality - from purely offline metrics that work even before deploying version 1.0 to production to online A/B testing to check continuous improvement. I will highlight some Lucene and Elasticsearch features that can tremendously help you deploy your own search quality checks. Speaker: Isabel Drost-Fromm Event: FrOSCon 2014 by the Free and Open Source Software Conference (FrOSCon) e.V.
Schlagwörter Free and Open Source Software Conference
FrOSCon14

Ähnliche Filme

Loading...