Metrics and models for Web performance evaluation
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 490 | |
Author | ||
License | CC Attribution 2.0 Belgium: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/47051 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSDEM 20205 / 490
4
7
9
10
14
15
16
25
26
29
31
33
34
35
37
40
41
42
43
45
46
47
50
51
52
53
54
58
60
64
65
66
67
70
71
72
74
75
76
77
78
82
83
84
86
89
90
93
94
95
96
98
100
101
105
106
109
110
116
118
123
124
130
135
137
141
142
144
146
151
154
157
159
164
166
167
169
172
174
178
182
184
185
186
187
189
190
191
192
193
194
195
200
202
203
204
205
206
207
208
211
212
214
218
222
225
228
230
232
233
235
236
240
242
244
249
250
251
253
254
258
261
262
266
267
268
271
273
274
275
278
280
281
282
283
284
285
286
288
289
290
291
293
295
296
297
298
301
302
303
305
306
307
310
311
315
317
318
319
328
333
350
353
354
356
359
360
361
370
372
373
374
375
379
380
381
383
385
386
387
388
391
393
394
395
397
398
399
401
409
410
411
414
420
421
422
423
424
425
427
429
430
434
438
439
444
449
450
454
457
458
459
460
461
464
465
466
468
469
470
471
472
480
484
486
487
489
490
00:00
Performance appraisalEndliche ModelltheorieMetric systemComputer networkData transmissionMeasurementAlgorithmGame theoryMetric systemEndliche ModelltheorieOrder (biology)NumberWeb 2.0Universe (mathematics)Matrix (mathematics)WordLecture/ConferenceComputer animation
00:58
InternetworkingInternet service providerContent (media)Streaming mediaEncryptionInternet service providerWeb 2.0Web browserPoint (geometry)Web serviceBitOrder (biology)Insertion lossFocus (optics)View (database)Context awarenessNetwork topologyRight angleComputer animation
02:33
Computer networkFaktorenanalyseContext awarenessPhysical systemWeb pageContext awarenessPhysical systemSoftwareQuality of serviceSoftware testingMetric systemOrder (biology)Cartesian coordinate systemDivisorLevel (video gaming)Price indexBuildingPerspective (visual)Network topologySign (mathematics)Physical lawPoint (geometry)Musical ensembleTheory of relativityFactory (trading post)Forcing (mathematics)Equaliser (mathematics)Computer animation
03:45
Context awarenessBand matrixHome pageMetric systemVideoconferencingComputer networkGamma functionArithmetic meanMultiplication signMeasurementSubject indexingStructural loadMetric systemWeb 2.0Order (biology)Web pageConnected spaceWebsiteWeb serviceArithmetic meanServer (computing)Instance (computer science)Archaeological field surveyWordComputer animation
04:27
Metric systemHome pageVideoconferencingArithmetic meanRaw image formatContext awarenessMusical ensembleClique-widthQuality of serviceExpected valueMetric systemWeb pageMultiplication signStructural loadWeb 2.0FeedbackSubsetCharge carrierComplex (psychology)MeasurementCross-correlationComputer animation
05:04
Context awarenessMobile appModel theoryMetric systemPerformance appraisalEndliche ModelltheorieExpert systemINTEGRALComputer networkWeb browserEncryptionCartesian coordinate systemSoftwareGoodness of fitMereologyModel theoryDifferent (Kate Ryan album)Metric systemRow (database)Web browserMeasurementLecture/ConferenceComputer animation
05:50
Model theoryMetric systemPerformance appraisalEndliche ModelltheorieExpert systemWeb browserINTEGRALWeb pageWebsiteArtificial neural networkCondition numberArithmetic meanRaw image formatVideoconferencingBit rateRational numberBuildingMetric systemEndliche ModelltheorieMeasurementInformationWeb pageArithmetic meanSource codeStructural loadWeb 2.0BitType theoryDifferent (Kate Ryan album)Home pageCollaborationismBeat (acoustics)CausalityMultiplication signUniverse (mathematics)Computer animationLecture/Conference
07:05
Condition numberArtificial neural networkBit rateHome pageWebsiteWeb pageVideoconferencingRaw image formatArithmetic meanWeb browserWeb pageTask (computing)InternetworkingNumberServer (computing)VideoconferencingClosed setRepresentation (politics)Communications protocolCondition numberWeb 2.0Home pageProcess (computing)Mechanism designReal numberComputer animation
07:52
WebsiteWeb browserArtificial neural networkCondition numberVideoconferencingRaw image formatArithmetic meanBit rateHome pageWeb pageFraction (mathematics)FeedbackHome pageDifferent (Kate Ryan album)SoftwareWeb serviceMetric systemWeb browserComputer iconWeb pageArchaeological field surveyRow (database)Model theoryLecture/ConferenceComputer animation
08:40
Web pageWeb browserCondition numberWebsiteBit rateArtificial neural networkHome pageVideoconferencingRaw image formatArithmetic meanHill differential equationPerformance appraisalShared memoryComputer iconModel theorySet (mathematics)Table (information)Traffic reportingComputer animation
09:08
Model theoryExpert systemEndliche ModelltheorieStandard deviationHome pageMetric systemScalar fieldStructural loadComputer networkHypothesisPairwise comparisonFunctional (mathematics)Well-formed formulaArithmetic meanMultiplication signExpert systemSingle-precision floating-point formatMetric systemBitInstance (computer science)Standard deviationMeasurementDependent and independent variablesPoint (geometry)Web pageHypothesisModel theoryStructural loadComputer animation
09:53
Endliche ModelltheoriePairwise comparisonStandard deviationStructural loadMetric systemHome pageScalar fieldExpert systemComputer networkHypothesisVector spaceoutputMachine learningComputer-assisted translationEndliche ModelltheorieLimit (category theory)Web browserFunctional (mathematics)Virtual machineMetric systemPoint (geometry)MeasurementoutputCross-correlationDivisorType theoryFitness functionWeb pageLink (knot theory)Order (biology)Selectivity (electronic)Goodness of fitNetwork topologyAlgorithmProcess (computing)Computer animation
11:08
INTEGRALWeb browserMetric systemData structureWeb pageWeb pageMetric systemProcess (computing)Object modelEvent horizonArithmetic progressionOrder (biology)Point (geometry)Computer animation
11:39
Protein foldingWeb browserMetric systemINTEGRALWeb pageContent (media)Arithmetic progressionStructural loadMetric systemFunctional (mathematics)Moment (mathematics)TouchscreenAreaWeb pageSubject indexingContent (media)BitCore dumpMultiplication signType theoryView (database)Home pageMatching (graph theory)GoogolComputer animationDiagram
12:31
Web browserRectangleComputer-generated imageryMetric systemVisual systemComputer networkINTEGRALObject (grammar)Core dumpVisualization (computer graphics)Home pageSemantics (computer science)Multiplication signForm (programming)Web pagePlanningMetric systemINTEGRALStructural loadAreaComputer animation
12:58
Object (grammar)Complete metric spaceComputer-generated imageryRectangleHistogramWeb browserComputer networkVisual systemMetric systemINTEGRALMetric systemWeb pageoutputMultiplication signContent (media)Dimensional analysisObject (grammar)Process (computing)Subject indexingHistogramOrder (biology)Visualization (computer graphics)FamilyDifferent (Kate Ryan album)AreaGraph coloringSoftwareProxy serverCore dumpNeuroinformatikPhysicalismWeb browserMaß <Mathematik>MultilaterationDemosceneInstance (computer science)Execution unitCausalityVolumenvisualisierungImpulse responseComputer animation
14:55
Object (grammar)Metric systemDomain nameSingle-precision floating-point formatMachine visionGoodness of fitInternet service providerMedical imagingWeb serviceObject (grammar)Signal processingOrder (biology)Virtual machineSpeech synthesisoutputSeries (mathematics)DataflowComputer animation
15:41
Endliche ModelltheorieWave packetCNNDomain nameObject (grammar)WeightMetric systemOrder (biology)Virtual machineSignal processingoutputSampling (statistics)Web pageStructural loadDefault (computer science)Different (Kate Ryan album)Multiplication signSubject indexingModel theoryGradientCASE <Informatik>Endliche ModelltheorieExtreme programmingQuicksortSoftwareInstance (computer science)TorusMetropolitan area networkLecture/ConferenceComputer animation
16:27
Domain nameServer (computing)Web browserMetric systemComputer networkArtistic renderingWeb pageWeb pagePhysical systemWeb browserProcess (computing)BitObject (grammar)Graph coloringSoftwareRight angleSystem callMultiplicationOrder (biology)Point (geometry)2 (number)QuicksortSimilarity (geometry)Different (Kate Ryan album)Artistic renderingCore dumpServer (computing)Domain nameComputer animation
17:23
Machine learningMetric systemProduct (business)Error messageSoftware testingEndliche ModelltheorieExpert systemMultiplicationEncryptionHome pageComputer networkWeb browserAlgorithmAbsolute valueNumberSoftwareCASE <Informatik>Digital mediaMedianMachine learningOnline-AlgorithmusPoint (geometry)Error messageAlgorithmSubject indexingCartesian coordinate systemWeb browserInterpreter (computing)Virtual machineMathematicsFunctional (mathematics)Computer animation
18:41
Software testingProduct (business)Error messageMachine learningMetric systemEndliche ModelltheorieExpert systemMultiplicationEncryptionComputer networkHome pageWeb browserAlgorithmWeb pageStructural loadNumberMetric systemObject (grammar)Portable communications deviceAlgorithmWeb 2.0Product (business)Error messageHome pageSoftwareSoftware testingWater vaporRule of inferenceMultiplication signData storage deviceComputer animation
19:23
Web browserServer (computing)Artistic renderingWeb pageDomain nameMetric systemCNNWave packetEndliche ModelltheorieWikiComputer networkSample (statistics)Sign (mathematics)EstimationPattern recognitionComplex (psychology)Expert systemSynchronizationNP-hardRaw image formatFeedbackArithmetic meanVirtual machineDataflowLocal ringDifferent (Kate Ryan album)Multiplication signOrder (biology)FreewareSubject indexingRaw image formatProxy serverProcess (computing)Web 2.0Functional (mathematics)Artificial neural networkExpert systemWater vaporSampling (statistics)Row (database)Computer animation
20:45
Pattern recognitionEstimationComputer networkSample (statistics)SynchronizationNP-hardFeedbackRaw image formatComplex (psychology)Pattern recognitionFunctional (mathematics)Message passingContent (media)Web pageQuicksortMereologyComputer animation
21:18
Computer networkExpert systemSynchronizationNP-hardSample (statistics)FeedbackRaw image formatPattern recognitionEstimationHome pageEmpennageModel theoryGeneric programmingEndliche ModelltheorieAverageScalabilityCluster samplingWeb pageDivision (mathematics)MereologyWebsiteVideoconferencingModel theoryLine (geometry)Web 2.0Endliche ModelltheorieWeb pageMedical imagingInformation securityKnotPoint (geometry)Computer animation
21:49
Web pageEndliche ModelltheorieModel theoryAverageScalabilityHome pageCluster samplingEmpennageGeneric programmingDivision (mathematics)Maxima and minimaModel theoryWeb pageWeb 2.0Point (geometry)Home pageFitness functionProcess (computing)40 (number)Instance (computer science)NumberTerm (mathematics)AverageEndliche ModelltheorieGene clusterSingle-precision floating-point formatSimilarity (geometry)ScalabilitySpacetimeBit rateObject (grammar)Computer animation
22:54
Continuous functionAndroid (robot)FacebookSample (statistics)Digital rights managementLinear regressionStreaming mediaEndliche ModelltheorieEmpennageInformation privacyEndliche ModelltheorieModel theoryForm (programming)Analytic continuationInformationWeb serviceStreaming mediaCommunications protocolSet (mathematics)Order (biology)Bit rateSystem callWave packetFacebookRight angleVirtual machineFamily2 (number)Computer animation
24:16
Continuous functionAndroid (robot)Digital rights managementSample (statistics)Linear regressionStreaming mediaEndliche ModelltheorieGradientGraphical user interfaceRaw image formatEncryptionDemo (music)Coma BerenicesInformationFactorizationPrice indexStandard deviation10 (number)MultimediaMetric systemExpert systemMeasurementInternetworkingObservational studyScale (map)CodeImplementationAutomationModel theoryWebsiteImplementationWeb pagePoint (geometry)Computer iconComa BerenicesComputer animation
24:51
Computer animation
25:27
Arithmetic meanObservational studyInformationProxy serverFraction (mathematics)Bridging (networking)Musical ensembleMoment (mathematics)Sign (mathematics)Instance (computer science)Logic gateRight angleGradientControl flowKey (cryptography)Web 2.0Computer wormEncryptionOrder (biology)Lecture/Conference
26:43
Musical ensembleMereologyMultiplication signInternet service providerLevel (video gaming)Set (mathematics)InformationDifferent (Kate Ryan album)Key (cryptography)Point (geometry)MeasurementLink (knot theory)Graphical user interfaceObservational studyWeightNormal (geometry)Content (media)Web browserFigurate numberVideo gameArithmetic meanView (database)Right angleSound effectFrequencyInstance (computer science)CASE <Informatik>Lecture/Conference
28:56
Domain nameMetric systemCNNWave packetEndliche ModelltheorieObject (grammar)Web pageNumberDifferent (Kate Ryan album)Linear regressionoutputFrequencyMultiplication signRegular graphImpulse responseSingle-precision floating-point formatPlanningComputer animation
29:49
Observational studyLevel (video gaming)Signal processingArtificial neural networkPoint (geometry)EstimatoroutputSlide ruleGame theorySoftwareMedical imagingArithmetic meanConnected spaceLecture/Conference
30:29
Condition numberHome pageVideoconferencingFeedbackWeb pageObservational studyWeb browserWebsiteElement (mathematics)Video trackingWebcamComputer networkNumberEvent horizonNegative numberSlide rulePoint (geometry)1 (number)FrequencyCASE <Informatik>Table (information)Multiplication signState of matterWeb 2.0Software testingSampling (statistics)Web pageQuery languageNumberStability theoryBridging (networking)Cross-correlationComputer animation
31:37
Web serviceCuboidInternet service providerChannel capacitySlide ruleOperator (mathematics)Multiplication signLecture/Conference
32:26
InternetworkingOrder (biology)SoftwareMultiplication signIn-System-ProgrammierungWeb serviceContent (media)Internet service providerBitWeightVisualization (computer graphics)WebsiteComputer animation
33:09
Subject indexingDefault (computer science)BitGroup actionInternet service providerBusiness modelUniverse (mathematics)Exception handlingOptical disc driveComputer animationLecture/Conference
33:51
InternetworkingObservational studyInsertion lossNumberWeb pageSoftwarePoint (geometry)2 (number)GoogolAutomatic differentiationOptical disc drivePhase transitionComputer animation
34:41
EncryptionInferenceInternetworkingPoint cloudFacebookOpen sourceDiscrete element methodComputer animationLecture/Conference
Transcript: English(auto-generated)
00:05
a new way of looking at things that we in the performance team love. So, Dario, thank you. Thanks very much. Thanks for having me here. I'm really thrilled. So what I'm going to do is I'm going to bring basically two viewpoints.
00:21
I've been working on academia for the last 15 years or so. And then last year, I moved to Huawei. So I basically have industrial viewpoints with a university mindset. And so today, what I'm going to do is I'm going to talk about metrics and models for web. And there's a little longer subtitle that I'm not going to comment here. We're going to see throughout things of all.
00:41
So, of course, these work would have been not possible with a number of people if they are in alphabetical order. Two are actually in the room. One is Jila, the book from Wikimedia Foundation. The other is Flavia from Telecom Polytech. So thanks to them, we also can discuss more interesting thing now. So just to set about what we are focusing on, I mean,
01:02
I'm not a web developer, so I'm going to have a completely different focus. And right now I'm working on an equipment vendor. So we have a very much lower layer of focus. So no matter what you're working on, if you're a browser maker, if you're a CSP, content service provider, if you're an internet service provider or an equipment vendor,
01:20
what you care about is that the user is happy, right? So offer quality of experience is a common goal. And of course, if something doesn't go bad, so if something goes bad, you want to be able to detect it fast. If possible, if you want to be able to forecast before things go bad. And if you are good at forecasting, you could also try to prevent things
01:41
from going bad in order for your user not to churn. So detecting quality of experience, degradation is important. Now, how do you detect quality of experience and how you define it? Well, typically we need to have a good idea of how the user, if they are happy or not, and then try to correlate some of the telemetry. So like, for instance, the boomerang is collecting a lot of telemetry
02:02
and will try to correlate that with the user quality of experience. Now, if you're taking the point of view of equipment vendor of internet service provider, well, you're going to have a little bit harder time because you're not in the browser. So you don't have all the rich telemetry. And encryption now is really going to be painful because you're only going to see stream of encrypted traffic.
02:22
Still, we want to do something because otherwise your user will churn. If the user will churn, the equipment vendor will not be able to sell equipment. And so there's a loss of money as well. So it's important to get a hand on what's the quality of experience and user quality of experience is basically affected by a lot of things,
02:41
including, for instance, the context. So where is the user work, other places? If he's a pessimistic guy or if it's an old lady, probably they don't have inaccessibly the same perception of delay. And there's, of course, system influence factor. If you are down signing the building ground to or minus two level floor,
03:01
probably your signal is not very good. So we have slow performance. So in order to factor all this stuff, be an engineer. Of course, you're going to ask the user, but you're going to try to infer these things from looking from the system perspective. So system perspective starts from the lower layer, the network. So over there, you will be able to measure some quality of service indication.
03:21
These will in turn affect application performance, application metrics, application QS metrics like the one that Boomerang was reporting or other like web page tests are reporting to use some telemetry. And from that, you will have an influence on the way in which the user are experiencing their browsing behavior. And so what you're going to do is that you're going to be able
03:43
to measure some of these metrics, like from an end to end viewpoint, what is the latency, what is the bandwidth, what is the packet loss? Or point to point, what is the Wi-Fi quality? Of course, that doesn't make sense. If you look at the throughput of a single connection, because you want to put them all together in order to be able to tell meaningful metric from a session viewpoint.
04:01
Session means, for instance, if you're looking at a web application, it's going to be page load time or speed index that we're going to see later. There are also session metrics that are correlating measure about multiple sessions. So for instance, engagement. So measuring if you're staying on a website for long means that you typically are happy with the quality of experience they're serving.
04:22
And of course, you can go and readily ask the user how he feels about the service you're giving him. So that they can ask many user, all around the room, you get five stars, and then you do the averages, you'll get an opinion score, you can also ask a different thing. And of course, if you know about the device type, if you have a cheap phone or if you have a high-end phone, maybe your expectations are different,
04:41
maybe the phones are also rendering differently. So all of that, of course, is very complex. So today, we're going to focus on a subset of it. Particularly, we're going to look at the web. That's the web dev room. So we're going to look into performance metrics like page load time speed index, try to see how this correlates with the mean opinion score, your user, other user feedback.
05:01
And of course, we're also going to adopt the viewpoint of the lower layer carriers, where they are only going to be able to measure some weak signals. They don't see anything about the middle layer because of Squeak, HTTPS, or whatever, other kind of encryption. And so they either want to try to, from the network QoS, learn something about the application QoS,
05:21
or make a big step and go to the quality of experience of the user. So that's basically the agenda for today. So we're going to delve into four different aspects, data collection, so the modeling parts, the metric parts, and then again, some method that allows you to go from row to top up.
05:40
So we have a path. If you are from the network, you need to start with your method, learning something which is metrics that the browser can easily measure. You need to learn the metrics that are useful. So for doing that, you need to couple two things. You need to couple measurement involving the user, asking the user whether they're happy or not, and building models that, based on your metric,
06:02
are hopefully able to extract this information from an automatically collected one. So in the agenda today, we're going to work this top down. So we're going to start with the data collection. So data collection, typically what you do is that you build up some crowdsourcing campaign. They have a huge cost, and there are no perfect campaigns.
06:22
In the last years, we have been doing three types of different things. We've been asking the user, what is the main opinion score? So write your experience from one to five. We've also been asking the user, when do you think that the page was finished, or what is the user perceived page load time? Or seeing two pages at the same time, which page did you think it finished at first?
06:41
So to get a little bit of an idea of how the user perceives the web. And finally, with Wikipedia, in a living collaboration with Wikipedia, we started asking the user whether they are satisfied with the experience they have while browsing Wikipedia. So of course, there's no personal solution. In the first data set, we were doing lab experiments.
07:02
This means that we were having a few panel of people that were typically volunteer, close to 150 to 150 people, recruited to universities. So you have very specific population. It will definitely not fit the grandma's behavior. The good side is that we were using real servers, real protocols.
07:20
We were able to control the conditions. But the number of web pages, of course, is not as completely representative of the internet. So then you can do something else, stepping up by moving to crowdsourcing. So you have, for instance, Amazon Mechanical Turk. So you can leverage a large pool of people. But over there, you cannot let them access a web server. So you will typically put videos of the web page rendering process.
07:43
So this is not really exactly like browsing. You reach a larger audience, but this audience is also interested in getting paid for the task. So you need to filter out a lot of people that are just there to make money. So the last thing that we did with Wikipedia is very interesting because we are polling the user.
08:02
So there's one billion pages visit monthly, roughly. And a tiny fraction of that is going to be polled for performance metric. And a tiny fraction of that is going to also be polled for binary feedback. So it's slightly more than binary feedback about whether they were happy or not. So this is good because you're going to poll users that are in the real service from the service they like,
08:23
the service they use typically. The downside is that you have a huge heterogeneity. Remember, on top of my head, we were polling on 65,000 people. They were looking at 42,000 different Wikipedia pages, 3,000 networks of 250 devices and 45 browsers.
08:41
So there's a lot of heterogeneity. And so building a single model is not necessarily trivial. When I'm putting the icon there is that the data sets are available. So if people are interested, if there are people that are doing research on that, like we were saying before, sharing tools is important. Sharing performance evaluation is important.
09:01
Sharing the data is even more important because it allows you to replicate and see whether the performance that are reported are true or not. So now that you got the data, OK, cool. So what do we do? Well, basically, we're going to have a way to go from the data, so our Y, to find some function that, based on some of the things that we are able to measure,
09:20
like our incarnate Y, plug into a formula. F is going to be able to tell us what is, magically, if you want, the user performance. So here, by X, typically people use a single-scalar metric, generate the page load time. The function is being predetermined by an expert. And there are typically two approaches that are being used.
09:41
One is IQ-X hypothesis. We are using an exponential model. And here, with a logarithmic model, which is tied to the Weber-Fetchner Law, which is a pyscho-behavioral model that tells that the human response for stimulus is logarithmically related. And this is, for instance, used by a standard. So what you do, you do a lot of measurement. All the points here are different answers
10:01
from the different user. And then you do a fitting. And here, the fitting, we can be happy with that. Now, there are limits because, typically, there are a lot of metrics, a lot of telemetry, that is made from browsers. And so here, we are only using a single metric. So you can go one step further. And instead of picking a single metric that you like
10:20
and a single function that you like, and although the fitting seems nice, you could do something which is machine learning-driven. So basically, having a factor of input features and having an automated way to select what is the optimal fitting of the function by minimizing some error. So here, the trick is that whenever you select a very specific machine learning algorithm, you're implicitly selecting which are the type of function
10:42
that you will call to learn. And here, you see that you have a slight gain with respect to the typical models that you have here by considering more metrics. Of course, there are different models that are available. We're not going to delve into detail of that. Just to say that, for me, there's still some room for improvement from going to the feature
11:02
that we have to the user experience. But still, you have a good and quite high correlation. So this brings us to the metric. What are the metrics that we can work on? So in order to be quite clear about everything, we're going to have a very small animation about how is the web page loading process after you go and click on a link.
11:21
So we start something that you're going to start downloading, and at some point, you will have an event that is going to be fired by the browser, document object model. So at this point, you know the structural page, and you can start putting things around. And so you have a visual progress of the page that increases from zero to upward. Then you keep downloading more things until at some point, which is called typically
11:42
above the fold, all the visible portion of the page has been downloaded and shown to the screen. That's called the ATF, and your visual progress is increasing. And you can represent here your visual progress as a function x of t that is growing from zero to one, where one is basically everything that needed to be rendered for the page
12:00
to be visually complete is finished. So x of t, of course, you can also do something a little bit more fancy. So basically here, the integral of the residual of this function is the area, the gray shaded area above the core, and this gray shaded area above the core is what Google defined as speed index. So we're going to come into that in a moment.
12:20
And then, of course, I mean, you can keep downloading more content that is not necessarily available and immediately visible, but it's going to be available when you scroll. And that's when all the content is loaded. It's typically the page load time. So now we have two types of metrics. So one are the time instant metrics. So you have, for instance, 10 to the first byte, DOM, 10 to the first byte, about default, page load time. These are very specific time, which are important to somebody.
12:42
And then you have something else, which is the integral form of it, which is basically looking at all the area above the core. So why this thing intuitively is important? Imagine that you have two realizations of two pages that have exactly the same page load time. So they finish exactly the same time, but this one shows half of the content very fast,
13:02
and this one shows half of the content almost much more later. So in which of the two you would be happier in this one? So whenever the area above the core is smaller, then it's better and it's faster. So one additional comment is that, given that you are integrating something that is a dimensional, I mean, integrating over time,
13:20
also the area above the core is a time in dimension. So physically, if you are an engineer, you would think that of a time. The time you need to measure, and you can think it as a virtual time that is explicitly stating how fast was the rendering process. Now, you can define a family of metrics like this, and depending on what you put is x of t, you're going to have the speed index.
13:40
If you're looking, for instance, at the difference in the histogram that were shown, so the colors on the page, you have room speed index that is measuring the areas that each of the different objects that are drawn on the page are going to put, and they are going to compare it with the amount of rectangular should have been drawn at the end. You can look at SCCM, PSSI, Perceptual Speed Index
14:01
using SCCM metrics, which are much more advanced. So all of that is very good because it's visual progress, but there are downsides. So for instance, you can only measure them in browsers, and some of them are actually process intensive. So if you need to do SCCM, there's a lot of computation you need to do. So some years ago, we were proposing to do, as a proxy of this more advanced metric, they have very simple inputs like object index or bytes.
14:23
Just looking at the bytes that are coming, you would get a pretty decent idea of what is coming to your browser, if it's coming fast or not. You're going to see a little bit later, if it works or not. Good side is that you can do it in layer three in the network. It's correlated with speed index. Doesn't necessarily is good for creative experience,
14:41
so that's a question that you need to address. And I'm not going to go into these kind of details, but you can also have affecting, for instance, the cutoff of the integral in order to optimize some of those metrics. But I'm not going to go into these details. So now, if you are in the browser or if you are a content service provider, what you have is that you have a pretty good picture of everything
15:01
that is happening. You have per domain, the vision of all the different objects, also the type, if they are images or not, CSS, whatever. And you can reconstruct this feature with quite accuracy. Now, if you are in the dark, so if you are an ASP, if you are an equipment vendor, what you will see is basically a series of packets
15:20
coming from different flows. And the only thing that you're going to read is that, OK, this is a packet. This is a full packet size, MTU, and it's a smaller one. So what do you make out of it in order to extrapolate from this? So again, I'm not going to go into a lot of details. Rather, I'm going to show you why this thing could work. But basically, the idea is if you
15:42
are familiar with machine learning, you need to perform some amount, some really simple amount of signal processing in order to make your input to be homogeneous. We're using supervised technique. So supervised technique means that we need to have exactly the same input. And then different models that you're using, like extremal gradient boosting, which
16:02
is an assembled method based on trees, or 1D convolutional neural network, what we do is that we present them with a lot of samples. And we tell, look, this sample, and we also explain them, for instance, had this above default value. We build another model providing the same example and providing what is the page load time or the speed index of any metric that you are interested in.
16:21
And we provide many samples to train a model. And we test it over previously unseen cases. To give an intuition why this should work, so here we have the web page rendering. So this is basically the user. Here is what we see in the browser, where every burst is going to be one object, and we have one color per different domain. Actually, we're presenting only the top three domains,
16:43
and the others, we are using the same color. Otherwise, the picture would be really, really too colored. And here, it is what you see from the network. So we're going to have one packet, a little bit more. We are aggregating packets in 10 milliseconds. And then you're going to see one color per IP server. So when I'm starting, if I click on the right place, you see that, OK, now this is a Chinese web page.
17:02
So it starts late. At some point, you see those things are progressing. Here, there was a big object. This big object has been loaded in multiple packets. Same thing here. The green packets correspond to this big object. And you see that these cores are slightly different. But you see that there is some similarity. They're not completely different.
17:21
And indeed, if you systematically perform this experiment, this was just one example to show you how these things look like for real, then you can go and make an experiment where you're monitoring the network. So you're taking the real encrypted traffic. You are monitoring the browser, so you have the ground truth. So you have the above default, whatever metric you're interested in. And you can repeat this process and try
17:44
to see extrapolate some accuracy number. So here is the only accuracy picture that I'm going to show. This is reporting the absolute error in milliseconds. Actually, this is the median. And this is the 25th percentile. And this is the 75th percentile. So this is basically, in the 75 per case, your error is going to be much lower than this.
18:01
And in the median case, it's going to be this one. And you can see here we have two different approach. One is, even without machine learning, I'm not going to explain why. The colors before in the picture had a mathematical interpretation. But I didn't want to bring it up today. It's not the point. But with an algorithm based on that, we can have already something that is going to learn only a single function, which
18:22
is the byte index. And we can approximate the byte index learned from the network with the application byte index that we learned in the browser. And that one has a 6% error. On top of that, this is without machine learning. So it's a very simple online algorithm. On top of that, you can add machine learning
18:40
and you can compensate for these errors. So you can reach a lower error. And then you can generalize to any metric. So we're learning the page load time, the object index, the speed index, or room speed index, the DOM, if you're interested in learning the DOM, with these kind of errors. So we did test with Orange on a number of pages that we were never seeing before,
19:01
and a number of networks we were not seeing before. And these are the accuracy estimated indeed in those settings. So it's a pretty good portable. And not to make an advertisement, but given that the algorithm works, we are supporting it into our web products. Now, there was one catch that I didn't talk in this talk, due to lack of time, is that we are also able to handle multi-session.
19:22
So if I go back here, we see that there are a lot of packets coming from a lot of different flows. But you need before to be able to isolate the flows that are going to go to the same session. So this is something that you need in order to be able to apply your machine learning technique. And this is also something that is done, but we just didn't talk for lack of time now.
19:44
So basically, after, OK, now is where we stand. So where we could go to go further? So I'm going to just talk about a couple of free ideas. So for people that are familiar with machine learning, unfortunately, in the Web QA domain, we are still expert-driven feature engineers.
20:01
Basically, we have somebody that is defining speed index. And why should it be defining speed index? Seems a very natural and very bright idea, but we have no clue whether this is really a proxy for quality of experience. So a better approach, I'm not saying more explainable, so it's less intuitive, would take raw input, raw sensory data from the user, and try to do what?
20:21
To learn the features by the learning process. The learning process is going to, in the neural network, through a back propagation, is going to create some features that are the most relevant in order to find and explain why the user voted a given score. So that's definitely not interpretable. It's more versatile.
20:40
The downside is it requires a lot of sample. So here, what we did was take in packets and learning any of these functions. Similarly, we could use these inputs and try to learn functions which are user happiness. Of course, getting the data is difficult because you would want to be as less intrusive as possible. So if you need to put sensors like this, maybe you're affecting the user experience.
21:02
And other thing is that maybe you can leverage. So I know people that are working on happiness recognition through function recognition. But over there, if you're happy, it may be for the content of the message that you receive or the page that you're visiting and not whether the experience you're loading that page was good. So it's quite difficult to get the sensory part working.
21:24
Second thing is that I was speaking about single model. And actually, we did single models because they are easy deployment. But of course, web is really, really large. So for instance, Wikipedia is not image intensive. And we have other websites that are mostly done by images or video.
21:41
So how can a single model fits all? So of course, to increase accuracy, you should go per page. Here, just an example picture where you have black line is one average model. And these are all the points that you're getting. And of course, if you have many per page models, they're going to have a better fit. Now, the problem is that inherently, this process is not scalable.
22:01
So how to make it scalable? Well, by prioritizing things that are more important. For instance, if you have top 100 web page, you can build a reliably model for the top 100 pages that are more frequently visited by people. Then you can have a second approach in which we cluster the top 1 million web pages. For instance, here you see number of clusters,
22:21
out of which 24 pages were extracted. And inside each of these clusters, there are thousands of pages. These clusters have similarity in terms of the number of domains, the number of objects, and the size of the page. So there are higher chances that if you build models that are accurate for pages in this class, then you're going to also be able to cover more accurately the top 1 million.
22:41
And then of course, OK, for the rest, the top 1 billion pages, you're going to use a single average model and pray it will work. But at least you're going to already have a better operational point in the accuracy versus scalability trade-off. Then final comment, which is a community comment. If you are working in this space, the first thing you need is data.
23:00
So keep collecting and sharing data is very important. So I'm very happy that finally, working with Wikimedia, we were able to release a data set in a properly anonymized form that was protecting successfully the privacy of people, and also letting people doing the research in order to build models better than the one we built. So you need to take into account
23:21
that when you go to the supermarket, you already find this machine, right? And they're asking you, are you happy or not? And you click on it, and you don't think even about it. When you're calling over Skype or Facebook, at the end of a call, there's something that is asking you, rate your call. Also, my phone started asking me, did you find this suggestion useful, so to have binary feedback from you? And this is from Wikipedia.
23:42
So what would you gain for keeping this steady data collection is two things. One, until your model will not be good, you really have some information from the user. So you already know if something happens that is users of your service that are telling you directly. And you don't need to go over Twitter and try to understand if the user is complaining
24:02
about your service through other channels. Second, this continuous stream of data is going to also be able to make your model better. Or if there's anything that changes, next protocol. So we had HTTP 2.0 is going to be HTTP 3.0 sooner over quick. Maybe your model needs to be retrained. So you will need to have this kind of data.
24:21
And if the user population is large enough, they're also limited on site. Only it is a risk of annoying users if you leverage small planets. So this is basically a talk that is based on these resources here. I put different papers. I put also the icons for the different data set,
24:42
some of the implementation that we release, and everything is accessible from here in this page. There are things that are not out yet. So more will come. So with all this, I figured I'm done. So I would like to thank you for listening so far. And if you have any questions, please go ahead.
25:18
If you shout, I can also repeat the question.
25:39
Yeah, OK. So there are studies also.
25:42
So the question is, what if I'm able to break the key? So what you're doing things about the encrypted stuff where you can break the encryption. So government guys, so there are two answers. So if you're able to decrypt, probably you're not interested in web performance. So you're breaking this in order
26:00
to look at different information. Second, there was a study used next telling you what is the fraction of data. You've got proxies, for instance. In some of the institutions, you have a proxy. You're delegating, and you're accepting the key. And a PC that is managed by your organization, indeed, you have a proxy for which
26:21
it is not necessarily useful. Now, in GDPR, this is pretty serious now. So definitely, if you are Huawei, there is going to be twice as more concern as if you are a regular vendor right now in this moment. And so of course, I mean, having that your devices are totally not interested in looking in the payload because they don't need it, it's much more important.
26:41
So here, basically, what we're doing is that we are leveraging very weak signals that are intrinsically in the timing information that comes in packets, much like Debussy was saying that the music is the silence between the notes. So here, somehow, we are weighting the information that we see, even if you are not listening to the notes, not looking in the content, to try
27:01
to reconstruct the signals. The thing that I was showing today was not for the government, was more for the internet service provider and equipment vendor. But if you go up to the Chrome browser, for instance, the missing link is between the layer 7 and the layer 8, the user. So can you ensure that, for instance,
27:23
there will be a talk on normalization of timing APIs. So you want to normalize something that is relevant for the user, right? And this is the part where indeed it fits from going normalizing that from level 7 point of view, and if it's relevant, we can learn it also from layer 3 without breaking an encryption key.
27:45
I don't know if it clarifies or, yeah? OK, so that's a very good question. So actually it's seasonality. So basically things that are non-stationary over time,
28:01
and particularly seasonality means that there's periodicity. It's something that we look out, we extensively look out in data sets. For instance, with Wikipedia we have measurement of months worth of studies. So we were expecting to find day, night effect, weekend, weekdays effect. We didn't find any about the happiness of the user. It was amazingly stationary during the period.
28:23
So this is documented in www.paper, and we also extended that. I don't know why there's no seasonality. We were expecting it. The data now is available. So yeah, I think there is much.
28:44
So you're learning from the encrypted packets. Is it mostly the size difference or the difference in timing between packets? OK, so that's basically, I went very, very fast. Can you repeat the question? I'm going to repeat the question when I come to here. So basically the question is, OK, what is the magic?
29:02
How can you learn from the packets? And actually, we're not learning directly from the packets because every web page is a different number of packets. And our supervised method, which our regression method, needs to have a fixed input. So what we are doing is basically that we are chopping the time into regular interval of times.
29:20
And what happens is that basically you are sampling periodically a signal. You're sampling periodically this signal when you are every so often, every delta t, you are looking at the packets that come, put an integral there, and you're basically sampling this curve here. So this is the way in which we get the input, which
29:41
is by just accumulating over a small period of fixed time arrivals, packets that belong to the same session. And that's what makes the input. So there's the basic signal processing level amount of feature engineering to normalize your input to be able to fade it to a neural network.
30:00
And OK, five minutes left, so I was too fast. So you can, sorry guys. You have a question? Cool, I can ask directly in the microphone. Do you have an estimate of how many data points we collected on Wikipedia in a typical week during the study? How many data points we collected in a typical week during the study on Wikipedia?
30:20
During a typical week, so now I know that basically things have changed a bit. So I think I have some backup slides. Oh, too many backup slides. So this is OK. About the sessionarity, you get your picture there, which is here. So I know that we were collecting the 62,000 data
30:42
points during the first period, which was basically first test case in which we were, so if I remember correctly, web performance timing are triggered at once every 10,000 page visit in Wikipedia. And out of those, we were sampling one over 1,000 at the beginning.
31:00
At the end of that, we step up a little bit. So you step up a little bit, the sampling. But this is basically over this period of time. So hidden is the fact that we basically issued a sample, the query to 1.4 million people, and only 62K replied. So because people are, they can willingly or not accept to click on those or not.
31:21
So the numbers per week, I don't have them in my mind because we were mostly focusing on, can we get a breakdown of how the user is happy? And in this case, in Wikipedia, 85% of the users are consistently happy with no seasonality and no correlation with surveillance.
31:41
OK.
32:02
OK. So we can go back. This is through slide one, which is, so you're here. You want to know if things break or not, right? So if you're measuring indeed from the browser, it's because you are in the browser or because you're a service provider. Now, what Huawei do as a business is basically selling boxes to operators.
32:23
And operators, what they do is that they sell pipe capacity to their customer, which are the user. And from time to time, they have problems because the service doesn't work. And the people will complain to the ISP. But actually, it's not the ISP that problem. Maybe the content service provider. Maybe the DNS, maybe BGP. So over there, basically, there's a need for troubleshooting tools
32:43
in order to be able to tell, oh, yeah, it's our problem. So it's our network that is down. So we're going to fix it. Or, look, guys, everything that we have on our side is OK. But there are a lot of problems on that website everywhere. In order to be able to say so, you need to know what is the typical visual time
33:01
of your user or detecting whether this is changing. So this is why. And before, I was working more on the, if you want, layer 7 aspects. And there the question was, OK, we have the speed index. OK, we have the above default. But nobody tried to compare whether this was really relevant for the user. So this is where we started involving users.
33:20
And now this bit about, OK, and then I'm working for our equipment vendor. So am I able to do the same things, but from a more challenging viewpoint, which is starting from completely encrypted traffic? Just for, I mean, this is research, so it's fun. But then, given that I'm no longer in university and I'm in Huawei, there's also business model behind.
33:42
Because basically, if you are able to detect whether there is a problem, then you can fix it. And then you will not have user churn, and so you're not losing money, right? So the same thing for the content provider. Why are they optimizing? Because there are ads, except on Wikipedia. So in Wikipedia, there's a donation. But if you are Google, if you are Bing from Microsoft
34:01
or Facebook, you're showing ads. And this is the way you get money. So if your web page is long, there were studies by Google, by Bing, they were showing that for any amount of milliseconds you add from 100 milliseconds, you have a loss in the number of people that are going to go to the server, click on the ads, and so you have losses of revenue.
34:21
And if you multiply 2% loss by 1.2 billion people visiting, that's big numbers. So same thing, but from encrypted pipe from the network guys viewpoint. OK, so thanks a lot. Thank you very much.