Learning to Judge
This is a modal window.
Das Video konnte nicht geladen werden, da entweder ein Server- oder Netzwerkfehler auftrat oder das Format nicht unterstützt wird.
Formale Metadaten
Titel |
| |
Serientitel | ||
Anzahl der Teile | 69 | |
Autor | ||
Mitwirkende | ||
Lizenz | CC-Namensnennung 3.0 Unported: Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen. | |
Identifikatoren | 10.5446/67306 (DOI) | |
Herausgeber | ||
Erscheinungsjahr | ||
Sprache |
Inhaltliche Metadaten
Fachgebiet | ||
Genre | ||
Abstract |
|
Berlin Buzzwords 202160 / 69
11
25
39
43
45
51
53
54
60
00:00
ApproximationSystemplattformZahlenbereichOrdnung <Mathematik>MereologieGruppenkeimProdukt <Mathematik>WärmeausdehnungAbfrageTermRankingStandardabweichungAbfrageTermsinc-FunktionSystemplattformProdukt <Mathematik>Coxeter-GruppeWeb-SeiteEindeutigkeitSpannweite <Stochastik>Wort <Informatik>Numerisches VerfahrenUnternehmensmodellServerOrdnung <Mathematik>ZweiXMLUMLComputeranimation
01:26
BitMereologieRankingProdukt <Mathematik>Computeranimation
02:01
RankingProdukt <Mathematik>DatenverwaltungGlobale OptimierungSkalierbarkeitAbfrageMusterspracheMathematisches ModellATMEinfügungsdämpfungFunktion <Mathematik>PaarvergleichTopologieNumerisches VerfahrenProdukt <Mathematik>DatenverwaltungAbfrageRankingLambda-KalkülPhysikalische TheorieRechenschieberWellenpaketTextur-MappingMusterspracheMailing-ListeResultanteGruppenoperationKontextbezogenes SystemKonfigurationsraumStrömungsrichtungMatchingComputeranimationXMLUML
05:15
Überlagerung <Mathematik>Produkt <Mathematik>RankingSchätzungWeb logRückkopplungRhombus <Mathematik>DifferenteEreignishorizontAbstandBetafunktionDistributionenraumRechenbuchRankingNumerisches VerfahrenReelle ZahlProdukt <Mathematik>RückkopplungSichtenkonzeptOrtsoperatorAbstandInklusion <Mathematik>Arithmetisches MittelOrdnung <Mathematik>ExpertensystemLuenberger-BeobachterAbfrageBayes-NetzEreignishorizontTwitter <Softwareplattform>Gewicht <Ausgleichsrechnung>EntscheidungsmodellInversePunktMailing-ListeEinfach zusammenhängender RaumSoundverarbeitungKontextbezogenes SystemWeb-SeitePlastikkarteComputeranimation
09:34
RückkopplungStandardabweichungVektorrechnungProdukt <Mathematik>Desintegration <Mathematik>AbfrageStichprobeDistributionenraumUmsetzung <Informatik>BitrateVerschiebungsoperatorWeb-SeiteRankingStatistische HypotheseProzess <Informatik>Metrisches SystemGruppenkeimInformationMotion CapturingIterationKonfiguration <Informatik>MathematikOrdnung <Mathematik>IterationMathematikResultanteOrtsoperatorOrdnung <Mathematik>Konfiguration <Informatik>Produkt <Mathematik>RankingMailing-ListeMittelwertHecke-OperatorInformationMereologieStichprobenumfangUmsetzung <Informatik>BitrateWeb-SeiteDistributionenraumVerschiebungsoperatorStatistische HypotheseNumerisches VerfahrenIntegralRichtungZusammenhängender GraphAbfrageWellenpaketSoftwaretestDimensionsanalyseGrenzschichtablösungMetrisches SystemEntscheidungstheorieCASE <Informatik>Automatische HandlungsplanungTermGüte der AnpassungMotion CapturingDifferenteMultiplikationsoperatorStandardabweichungComputeranimation
18:10
Produkt <Mathematik>Motion CapturingRankingIterationKonfiguration <Informatik>InformationMathematikOrdnung <Mathematik>Lokales MinimumWeb-SeiteGruppenkeimBetrag <Mathematik>SichtenkonzeptDigitalfilterRechenbuchUmsetzung <Informatik>BitrateEntscheidungstheorieSoftwaretestAppletSoftwareentwicklerAbfrageSpeicherabzugInformation RetrievalPunktwolkeRankingProdukt <Mathematik>AbfrageWeb-SeiteIterationUmsetzung <Informatik>BitrateMusterspracheSoftwaretestBetrag <Mathematik>SichtenkonzeptAnalytische FortsetzungEntscheidungstheorieFilter <Stochastik>Mathematisches ModellWellenpaketOrdnung <Mathematik>InformationSchwellwertverfahrenGenerator <Informatik>MultiplikationsoperatorKonfiguration <Informatik>Information RetrievalResultanteVHDSLGüte der AnpassungMailing-ListeBasis <Mathematik>MAPEinfach zusammenhängender RaumMatchingAnalysisMomentenproblemEinflussgrößeSpeicherabzugInteraktives FernsehenDifferenteRechenbuchProzess <Informatik>RechenschieberPunktURLOrtsoperatorDreiecksfreier GraphKette <Mathematik>KonstanteComputeranimationXML
26:34
AusgleichsrechnungProdukt <Mathematik>OrtsoperatorProzess <Informatik>DifferenteMultiplikationsoperatorGüte der AnpassungSoftwaretestFitnessfunktionRauschenPunktMailing-ListeMathematisches ModellSpieltheorieCASE <Informatik>Kondition <Mathematik>EinflussgrößeCoxeter-GruppeResultanteArithmetisches MittelInteraktives FernsehenWeb-SeiteTermInformationBitAdditionAbstimmung <Frequenz>PerspektiveFormale SpracheMixed RealitySichtenkonzeptWeg <Topologie>Luenberger-BeobachterRechenschieberTabelleHeegaard-ZerlegungBereichsschätzungStreaming <Kommunikationstechnik>Kategorie <Mathematik>FrequenzZweiDatenbankÄhnlichkeitsgeometrieWellenpaketComputeranimationBesprechung/InterviewXMLUML
Transkript: Englisch(automatisch erzeugt)
00:07
And welcome, everyone, to our talk, Learning to Judge, or Judgments as Ranking Gold Standard. As Nate already said, we are Arne and Andrea from the search team at Auto.
00:21
Arne is our business designer, and I'm a data scientist in the team. Since Auto might not be a known name for you, aside from being shown as a sponsor in this presentation room, some words about our company. Auto is a large online shop in Germany
00:41
with around 2.5 million visits per day and up to 10 orders per second. Our company is a full-range trader, and currently we are expanding our business model to becoming a platform and selling not only Auto products but also products from other sellers on our web page.
01:04
About our product search, some numbers from 2020, we have around 1.5 million search queries per day. And on peak days, we get up to almost 5 million search queries. In 2020, those search queries were
01:21
divided in around 40 million unique search terms. Now that you know a bit about Auto, we start with the topic of our talk. So in the last weeks and months, we were learning to judge in our team. And why did we use the judgments? We want to use them for a learning to rank model.
01:42
So we will start the talk with learning to rank in a nutshell and then describe to you how we define our judgments. The second part of the talk will then be about the experiments we did for the judgments and all the learnings we generated from those. So I'm going to start with learning
02:01
to rank in a nutshell. And I want to start off with the motivation. So why do we want to use a model for ranking? As I already said, Auto is becoming a marketplace with an increasing number of products and assortments and sellers. And the current search management contains a lot of manual work.
02:22
And that is not really maintainable and scalable with a strong amount of assortments. Also, the current search configuration was fit to a certain context and search result groups that were manually defined. And with an ATMR model, we hope to be more query and results set specific
02:42
and optimize on finer and especially automatically determined segments. Last but not least, the current ranking focuses on business relevance, implying user relevance, because if I buy something, it obviously is relevant for me, but we start on the business side.
03:01
And if we switch to LTR, we try to focus on the user relevance, implying business relevance with it, but the user comes first. So how can we use a model in general for ranking products? We need, first of all, the training data that contains the perfect product ordering and the features.
03:23
So the training data contains of the judgments and the features for those judgments. The features describe the product, they describe the query and everything we think is relevant for ranking products. All this data is then fed into the LTR model.
03:42
And during the training, the model kind of tries to find patterns in the data and understand relationships between the features and the relevance of the products. And if we have a ranked model, we can use it to rank any given list of products and the identified patterns in the training data
04:02
can be applied to any product in any query. And then how do we do the training in concrete? We have this training data you see on the slide. So we have a query and for each product that is returned for the query,
04:20
we have a number of features that you see exemplary in the gray columns here. So we have features describing the query, describing the texture relevance of the product, describing matching between the query and the product and so on. And the last column, the pink one is the judgment.
04:40
So the perfect ordering of the products or the training data is then put into our Lambda model. It is trained and then we have a model that we can use for re-ranking. But the important question is, how do we get this gold standard, perfect ordering of products?
05:00
Can we calculate it? Can we guess it? Can we ask our customers about it? How do we know what the perfect ordering is and how do we calculate it? What is relevance for the customer? And this is where the judgments come into play. So our judgment definition actually.
05:22
How do we estimate the relevance for the customer? And the common approach in data science is doing crowdsourcing or asking expert or asking the whole team to do a data labeling day each week. And then they will be shown products or rankings and ask for the relevance of those.
05:42
That is a good approach to cover topicalities. So you know that a shoe is a shoe and that is relevant for the query, but in an e-commerce context, we need some more. We have to include something like trends and seasonality and personal preference even.
06:02
And due to this trends and seasonality effect, also the evaluated products from handcrafted labeling are very quickly outdated. And those are our reasons why we decided to use the big data that we have and use implicit feedback from our customer logs
06:22
to model our judgments. But which of the big data that we have do we use now to define the relevance? I think many of you will know the customer funnel that is defined for an e-commerce journey. So many people visit the page,
06:41
they search for something and they view a lot of products. Then they click on maybe a couple of them, maybe they even add one to their cart and if we are lucky, they order it. But the most reliable signal for relevance if the customer then even keeps the product and is happy with it.
07:02
So as you see, the deeper we go in this funnel, the reliability of the signal grows. But on the other hand, if we stay higher up in the funnel, we have a much, much larger amount of data. And another positive side on being higher in the funnel is that the proximity to the search event grows.
07:21
So we are closer to the query that was issued and we have a better connection between the query and the signal that was given from the user. All those were reasons for us to focus on using clicks as a first approach to measure relevance. And our assumption was that if we increase
07:41
the number of clicks, it will carry through the whole funnel and increase to cart's orders and happy customers as well. And now in more detail, how do we calculate the judgments? We calculate a click probability for each product for a given query by dividing the clicks through the number of views.
08:03
But as you can see in these examples, this is very dependent on the number of data points we have or the number of observations we have for each data point. So the product with zero clicks and one view has a zero click probability with one click
08:20
and one view has a 100% click probability. And that doesn't seem very reasonable. This is why we leverage a Bayesian probability approach to generate more reliable click probabilities. For this, we assume a common base click probability for all products for a given query.
08:40
And with each observation we collect for the products, we move the real click probability for that product away from this mean click probability. Either it becomes higher if we observe many clicks or if you observe less clicks, it becomes lower. And last but not least,
09:00
I think all of you are also aware of the position bias that we are also observing in our data. So independent of relevance of a product, the probability of people clicking on it if it is high up in the list is higher than if it is lower in the list. And to compensate for the position bias, we chose a very simple approach
09:22
of inverse probability weighting. So we just apply a de-biasing weight to each observed click probability on a position that is larger than position one. And now you know how we implemented our judgments and Arne will continue now with what experiments we did
09:42
based on our assumptions and what we learned from them. Thank you. Hi everybody. So at the beginning, we were pretty confident about our judgment and the way we calculate it. So we originally wanted to get started with a learning terrain model right away and start to do a B test and everything.
10:03
But then we thought maybe just for the sake of safety, it wouldn't hurt to take a little test just to validate that the quality of our judgments is good enough before we feed it into the model. So we thought of some kind of experiment, how to test the judgments.
10:20
And basically we use the elevation component as a technical integration in our solar. And then we ranked like the top hundred products of a random sample of about 5,000 queries directly by judgment scores. So basically like the product with the highest judgment score would be number one and so on. So, and we figured the gold standards,
10:41
that's the best ranking we have should beat our status quo easily. For the next, yeah. So before we started the B test, we thought about how do we define success. And we came up with three hypotheses. So we expected the users to click products on higher positions and would measure that
11:02
with the shift and the click distribution towards the top positions. Then second, we expected users to click on more products, which would be like the click through rate to measure it and last but not least, the conversion rate because we thought, of course, when they are more relevant products, people go to the detail page of these products,
11:23
they will eventually buy them as well. So that was like our plan. And then we come to the experiment results. And that's what happened. So first of all, our first hypotheses check. So we shifted the click distribution towards the top position, looks good.
11:41
The second one is the click through rate. We saw an uplift there also kind of nice like plus 4% on the click through rate is compared to our other ranking experiments, quite a good improvement. And then to the third one, that came as a big surprise to us
12:00
because the conversion rate didn't go up. It didn't even stay the same. It went down actually minus 3% and minus 3%. That is like really a lot of money going to, we are going to lose if that would be life. So that was like the biggest surprise we had. In the beginning, we expected, can you click one more?
12:21
Yeah, so we expected the click through rate to go up and we thought of like the worst case scenario would be like we bring more people to the detail page, but none of them will do any additional buy. But we never expected that it's possible to push more people into the purchasing funnel. And in the end, less money coming out on the other side.
12:41
So big surprise for us. So that's how our judgment journey started. So it was planned as like one experiment. It turned out to be a longer journey of at least three months. We are still working on it now. And we did so far like five iterations on the judgments. And can you click one more?
13:01
Yeah, spoiler, we still didn't beat the status quo. So it was quite tricky. And today we want to talk to you about like the two major findings we had. So the first one is availability matters actually a lot. So we started to look at our data
13:21
to understand what the heck was going on there. And here on the right side, you see like the top 14 positions of our list and the average change in availability per position. And as you can see, the availability was worse in every single position. And we see like decrease between 20 and 40%
13:42
for most of it. And we also know that people don't buy delayed products that much because who wants to wait for like one week when they found a product they like? So that's the basic explanation why the conversion rate went down. So what happened here? We pushed the relevant products
14:02
on the top of the list as planned, but these products or many of these products were delayed at the same time. And the users didn't know that because on the search result page, we don't give the information about the availability status. So they have to click on it on the detail page. They will find out it's delayed and then they are frustrated and don't buy it.
14:22
So they made a relevancy decision by the click with a lack of information that is relevant for them. That's the explanation why the first two metrics went up because they only concern this search result page. And the last metric, which is more related to the basket that went down.
14:42
And that actually brings us to a very interesting topic. The perceived relevance depends on the given information and the given information changes in the different steps of the customer journey. So for the search result page, we give a lot of information like the picture, the name of the product.
15:01
We have the customer review, we have the price and several others like the most crucial information. When the user goes to the detail page, he will get some additional information like the availability, like the seller and the shipment cost. And then on the last step of the purchasing funnel
15:22
in the basket, he will get even more information about like the payment options. And like if he got products from different sellers in the basket, you might even find out that you will have to pay several shipment costs. And that actually changes,
15:41
can change the reception of relevance for certain products. And that is a very important thing for us because we are trying to model the relevancy and relevancy is like a multi-dimensional concept. So on the first step, the search result page, we actually have enough information when it comes to like topicalities.
16:01
Users can decide whether the product is relevant for the query. Probably enough information to decide whether it matches their personal preferences. But there are other dimensions or aspects of relevancy like is it relevant for me to buy? And information for this is like the availability.
16:22
And if we don't capture this information in the signal of the click, then we can't model the relevancy, especially if you want to improve like the conversion rate, which is at the end of this funnel. Next one, please.
16:42
So how do we include the availability in the judgments or learning to rank? So what we did first is we filtered out all the delayed product from our judgment list. That means we will only elevate or boost relevant products that are available. And then we ran another test
17:01
and it turns out the conversion rate looked much better than in the previous iterations. We still didn't beat the status quo, but it was a step in the right direction. So did that solve our problem with the availability and learning to rank? Definitely not because basically all we did is like filtering out relevant products
17:21
from our training data that will certainly not solve the problem of availability. So we thought about how we can handle it in the end. And we came up with a couple of things we have to do. The first one is we have two options to strengthen the relevant signals. Like the first one is change the given information
17:40
on the search result page. If we include availability, and that is crucial for the relevancy in terms of I want to buy this product, and we include it on the search result page, the user when he clicks will have this information and the signal strength of the click will rise. Or second option would be get rid of the clicks at all,
18:02
and then move on to add to baskets or orders as a relevant signal, because these signals take part in a later step where the users had all these information. Actually, we are planning to proceed with the second one. So that is like one of our next iterations instead of clicks to use the add to basket and add to wishlist as a relevant signal.
18:24
And we already discussed that we probably will mix like clicks and add to basket and orders at some point, that is some stuff we want to figure out later on. But even if we include or strengthen the signal, that won't be enough,
18:41
because especially the availability is a very tricky thing when it comes to relevance, because it's highly dynamic. So at the time when we gather the signal for the training data, the availability status of a certain product might be different than to the time when we actually rank the product. For example, we can have a delayed product
19:02
in the time we gather the training data, so that product will gather less clicks and orders because it seems to be less relevant due to the availability. And then when we rank it and it's available, we will probably underestimate the relevancy of this product. So it's not enough just to include it in the signals.
19:24
And what we think is the right way to handle it is to connect the availability with a relevant signal. So at the moment we gather the signal of the click, we should also gather the availability status at that time to get the connection between the availability
19:41
and the relevant signal, and then include availability as a LTR feature. We didn't test that yet. That is like one of the first steps we will do when we really start working on learning to rank. So, but at least from this experiment, we learned that this is like a very crucial topic
20:00
for the whole ranking thing. And with that, we can go on to the next key learning we found, and that concerns the judgment quality. So we did some query level analysis, because we found some queries that not only have a worse, had a worse conversion rate,
20:21
but also a worse kick through rate. And that didn't quite match to the pattern we've seen with the availability. So we looked in more detail at the data and what you see on the right side, that is basically for the query guys t-shirt, the products ranked by judgment score.
20:40
So that is basically what the first page would look like. And as you can see, there are many products with a rather good judgment score, but very little absolute clicks and views. And that's due to the fact that we just use the ratio of clicks and views to score. And what happened here in the end is,
21:03
we will put this products on the first page of the search results. And due to the first page, we will get more traffic on it. So these products will get more views, but they generated less clicks, so the ratio declined. And then on the next day,
21:20
all these products would be gone because we recalculated the judgment on a daily basis. So that shows that these products were not super relevant. And the other thing that happened is, so all these products were gone. If you look at it, it's like 80% of the whole list is basically affected.
21:42
And on the next day, new products would come up with also that generated like one click, one view on the day before, and we fill these spots. So we have a highly dynamic ranking. And users who came back and wanted to pick up a session from a couple of days ago, probably had a very hard time to find
22:02
what they're looking for because the first page looked completely different. Next one, please. And so this is basically a question, how many interactions do we really need to generate reliable relevance judgments?
22:23
And there are different options. Whoa, sorry. There are different options to solve this problem. We could work on the prior because the prior is basically built to solve this particular problem.
22:42
But we thought about it and then we figured out another approach and we went with that one. So the approach we took is filtering out products with less than a hundred views. So we built in some kind of threshold additional to the prior. And then we ran another experiment and it showed that really gave an uplift
23:02
on click-through rate and conversion rate. So the approach seems to work, at least for the judgment tests. What we still have to figure out is in the end, this is a quality versus quantity decision. So we cut off amount of our training data to raise the quality.
23:21
Is it enough to train a learning to rank model? Or maybe did we even like brought some kind of bias in the training data by filtering, by applying these filters? We don't know yet. But as soon as we work on learning to rank on the models, we will know. If it turns out that this approach doesn't work, we still have to tweak the filter
23:42
or maybe go for the prior or think of other methods to do it. Next one, please. So these were like the two key learnings. And then we have like two general learnings. The first one is we followed some kind of continuous test and learn approach.
24:00
And this is very important for us. That's why we mentioned it. And because it brought us a very high speed in learning. So if you think of a build, measure, learn cycle, we really managed to while one experiment was running, so we are measuring, we started to build the next iteration to lose less time.
24:22
And once the experiment was running for like one week, we started to analyze the data to generate more insights. And so we will have more experiment ideas for our experiment backlog. And by that, we managed to chain one experiment after the other with very little time in between. So in the end, we managed to do like five full iterations in three months.
24:46
And the second thing is basically test your judgments because it's really worth it. At the beginning, we didn't think it would be like a big thing to do. But in the end, we are so glad that we did it.
25:00
If I imagine that we started with the learning to rank model directly, then we're probably sitting here now and do some like feature engineering or work on the model. But maybe the real problem was in the judgments. So I see two main reasons why it's very good to test your judgment.
25:21
So the first one is it will enable you to do data driven decisions about your judgments. As Andrea mentioned, the calculation of these judgments is very complicated. You have like position bias. You have the prior, you have to choose which signal to use. There are many decisions to do, and it will enable you to validate your decisions. And the second thing is you will get the ability
25:43
to distinguish between the quality of your judgment and the quality of the model. I think that is very important if you want to improve because if you just mix it together and your test results aren't that good, it's difficult to find the right spot to tune.
26:01
Okay, that's basically the end of our talk. We have, of course, one mandatory slide about hiring. So we are scaling our search team at Otto constantly. Search is a very big thing at Otto. We really, really love search. So if you do love search as much as we do, and if you're looking for a new challenge, come to us.
26:21
Your job is just one step ahead. We are working on query intent, understanding, ranking, core retrieval, you name it. Go to the URL, Otto.de slash jobs, and find out more information. Thank you. Thank you all so much. Any questions? Too many questions, I think.
26:41
So you guys, there's a lot of questions to answer, and I'm just gonna kind of pluck off the top votes, and we'll work our way through them as we have time. So the top getting question is, what are the approaches taken to handle new products as they would be missing CTR or other user interaction signals?
27:04
So since for the judgment test, we didn't really implement a solution, our ideas will be that if we have the learning to rank model, that in the features, the relevant information about new kinds of products
27:20
are already covered. So if we don't get a completely new assortment, that will be something we need to still figure out. But for anything similar to what we have already in the shop, we will cover it with the relevance that we have for those similar products that were in the shop and already collected information.
27:44
Great. The second highest-voted question is, I think you guys touched on that a little bit, but maybe they just want a refresher. How to handle presentation bias, meaning not so much in the positional discount that you did, but the fact that something good might've been buried and just not shown to the users.
28:04
How do you measure improvements in recall? So I think for the further ahead future, the idea would be to present some random new products in the top positions of the ranking
28:20
or not previously seen products. So they get the ability to be clicked or to be ordered by the customers. And then as we have seen by the example that Anna showed, if they are high up in the list, they collect data very quickly and then we know if they are relevant or not.
28:41
So this is the idea, but I think until we are there, it takes some time. So you can join our team and help us get there actually. Yeah, that is a great idea. Okay, and we have some time, so let's keep going. You guys are doing a great job answering them.
29:01
The next question is what is the difference between a view versus a click? How do you track views? We track the views in our clickstream. So we have a tracking of the whole result page that is shown to the user. And then we obviously know
29:21
when the person clicked on something, but we track the whole list that is shown to somebody. And then we say, okay, everything above the click product was certainly seen by the user and everything below the click product. We have to assume for now that it wasn't seen,
29:40
but we are currently implementing some tracking that we even know how deep the user scrolled on the page and then we exactly know what was seen and what wasn't. Cool. Yeah, that's great. Another question that is really interesting is do you use the same model across all shops at Otto?
30:01
Being a large retailer like you guys are, how do you deal with situations where you have significantly less observations in a smaller shop? Okay, actually we are from like, Otto DE is only one shop. We don't have seven shops,
30:21
so that is not a problem for us. And they had a follow-up question about how do you guys deal mostly with languages? Is it mostly German? Is it a mix? It's actually only German, so it's easy for us in that perspective. No multi-language models. Yeah, that is super great.
30:42
Okay, cool. And then another question in here is, is the status quo you compare experiments with based on smaller or already with additional, sorry, let me, is the status quo you compare your experiments with based on smaller catalog
31:01
or with all the additional sellers kind of expanding into where you guys wanna go in terms of the marketplace idea? So it already contains everything we have in the shop. So that is, I don't know how many other sellers we have currently, but it's growing a lot in the past half year at least.
31:22
And I think there's already a lot of other sellers in our database. Awesome. And then the last question I have, well, there's two. What to do in situations where there's no click interactions? This is a little different than the first question we asked, but this is about when the information need is satisfied
31:42
just by looking at the search page. And this might not maybe matter from the business and what y'all are trying to optimize for with your models, but I'll leave that for y'all. Yeah, so actually we were thinking about one more additional possibility to measure relevance of product.
32:03
And that was interaction with the search result page and also with the product page, which would be after a click. But yeah, even if you see that a person interacts like scrolling and filtering and looking at products for longer times, that is something we thought
32:20
could be a good relevancy signal, but we haven't tested anything on how to implement that or if it is even measurable or if there is too much noise in the data, which could also be the case. Yeah, so that's a fair point about noise and the data. I think that's something that gets glossed over a lot in this game.
32:42
Some more questions came in while you were answering that one. So have you begun to model features relating to seasonality with your LTR models? Do you think this is a good fit for LTR? And I kind of piggyback on that, maybe and ask about when do y'all retrain? How frequently do you retrain a model for your LTR product?
33:00
I know we're still testing, but. Yeah, so actually we only started to implement the very first model for testing because we are not confident with our judgments. It kind of makes no sense to build a model based on that. We are planning on doing regular retraining, but I actually can't tell you how good,
33:22
which frequency would work or something because it would be simply guessing. Yeah, I should, I want to quote you on that, but we have no confidence in our judgments. Therefore it makes no sense to model. Should be a big poster made up of that. So I could like point to it. And then finally,
33:41
do you have one model for all product categories? That covers everything. I know you showed product as a feature in that table in one of the slides. So currently we will, because it's only the beginning, we will start with one model for everything. We are quite certain that it might be a good idea
34:02
to split it up since we are a full stack retailer, probably furniture behaves differently than clothes. So we will probably start splitting up and training different models, but we want to start very simple first and see how that goes and then learn from that.