Building a movement data and analytics platform
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Subtitle |
| |
Title of Series | ||
Number of Parts | 52 | |
Author | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/44709 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
FOSS4G SotM Oceania 201921 / 52
9
11
12
14
15
17
20
23
26
28
30
32
34
39
44
00:00
BuildingComputing platformComputer networkAnalytic setMobile appCrash (computing)Computing platformNear-ringMetropolitan area networkCASE <Informatik>Uniform resource locatorMeeting/InterviewComputer animation
01:25
Computing platformContext awarenessComputer programmingMobile appBitAnalytic setData analysisMeeting/Interview
02:00
Process (computing)MathematicsWechselseitige InformationStatement (computer science)Range (statistics)State of matter
02:46
Self-organizationProcess (computing)Meeting/Interview
03:05
Survival analysisCrash (computing)StatisticsDrop (liquid)
03:29
Moment (mathematics)Video gamePosition operatorInformationCrash (computing)Device driver
04:02
Software frameworkAsynchronous Transfer ModeDoubling the cubeProcess (computing)Crash (computing)QuicksortTerm (mathematics)Logic gateOnline helpRhombusExpert systemFundamental theorem of algebraBitArchaeological field surveyMessage passingProduct (business)Software frameworkImage resolutionCodeAugmented reality
05:43
Point cloudOpen sourceAndroid (robot)Service (economics)Open sourceCartesian coordinate systemoutputQuicksortMultiplication signInformation securityPoint cloudSoftware developerSummierbarkeitAndroid (robot)BuildingDampingSet (mathematics)Mobile appCloud computingComputing platformGoodness of fitPhysical systemGame controllerScalabilityPoint (geometry)GoogolComputer animationMeeting/Interview
07:34
Menu (computing)Multiplication signInternet der DingeReal number2 (number)Server (computing)Mobile appAlgorithmCASE <Informatik>Crash (computing)Physical systemMessage passingProof theory
08:34
Operator (mathematics)DemosceneForcing (mathematics)InformationLogicDevice driverOpticsCrash (computing)Process (computing)Call centreMeeting/Interview
09:19
Crash (computing)Analytic setSet (mathematics)Computer fileServer (computing)Data analysisReal-time operating system
09:50
Scaling (geometry)Workstation <Musikinstrument>State observerInformation securityComputing platformComputer animation
10:18
Meta elementVisualization (computer graphics)Computing platformFerry CorstenJava appletTraffic reportingScheduling (computing)GeometryCrash (computing)Computing platformQuicksortComputer configurationVisualization (computer graphics)FrequencyCloud computingType theoryCuboidOpen sourceData analysisBit rateSoftware developerFacebookTerm (mathematics)Computing platformScaling (geometry)BuildingTensorDataflowTemporal logicBus (computing)Wave packetMeta elementComputer animation
12:54
2 (number)Field (computer science)Computer fileMultiplication signStaff (military)Asynchronous Transfer ModeBit
13:32
Crash (computing)Service (economics)Product (business)Device driverLine (geometry)MereologyLikelihood functionMathematical analysisCrash (computing)TorusVirtual machine2 (number)Pay televisionTraffic reportingContext awarenessMultiplication signBitWeb pageScaling (geometry)Product (business)Axiom of choiceMacro (computer science)WordPoint (geometry)Drop (liquid)Forcing (mathematics)Phase transitionDataflowComplex numberStatement (computer science)
15:29
Level (video gaming)BitForcing (mathematics)Drop (liquid)Graph coloringVarianceMeeting/Interview
15:49
VarianceSocial classFormal languagePhysical systemJava appletVirtual machineCurveProcess (computing)Data managementCuboidSoftware developerMachine learningSoftware testingComputer animation
16:41
Crash (computing)Spherical capSoftware testing
17:01
Multiplication signVideoconferencingNoise (electronics)
17:18
System callCrash (computing)Device driver
17:38
Crash (computing)Event horizonCASE <Informatik>Multiplication signGoodness of fitBootingSlide rule
18:21
Endliche ModelltheorieAxiom of choiceMultiplication signVirtual machinePoint (geometry)Device driverCrash (computing)Product (business)Software testingPosition operatorPerspective (visual)Service (economics)BitLevel (video gaming)MassPhysical systemSystem callProfil (magazine)Forcing (mathematics)Call centreChemical equationCASE <Informatik>Standard deviationRight angleBus (computing)Object (grammar)View (database)Thresholding (image processing)Clique-widthAlgebraische K-TheorieTunisValidity (statistics)Goodness of fit
Transcript: English(auto-generated)
00:01
Thank you everyone. Yes, today I'd like to talk to you about an automated crash detection app that we've been working on for the last year or so called Safer Journeys and also the data and analytics platform that sits behind it. So it was two years ago a man disappeared whilst driving home near Newcastle in New
00:23
South Wales. It was completely out of character for him to disappear and the police had no leads. The next day his father on a hunch turned up at a local airport with $1,000 to hire a helicopter to find his son. He was convinced he'd crashed, gone down the side of an embankment and his car was upside down in bushland.
00:43
They found him 30 hours after he'd crashed. He was alive, thankfully, but not well as you'd imagine. A few months ago on the south coast of New South Wales a woman again crashed off the side of a highway.
01:01
No one saw her car leave the road. There were no skid marks. She ended up in a creek down an embankment. She had serious injuries but miraculously woke up after being unconscious for 14 hours. She was able to then get help. In both those cases we would have found them and detected their crashes in under two minutes and been able to dispatch an ambulance to those locations.
01:27
So quick clarification. Today we're talking about an Australian app that just happens to have the same name as a New Zealand road safety program. So apologies to the Department of Transport, I will not be talking about Safer Journeys
01:44
New Zealand today. We should have thought that one through a bit more. We are coming to New Zealand at some point. We are going to be talking about this app and as I mentioned the data and analytics platform that goes behind it. But first some context.
02:01
So who's IAG? So we're the largest general insurer in Australia and New Zealand. So locally you'd know us through AMI and State and NZI and a few other brands. In Australia you'd know us through significant brands like NRMA Insurance and CGU. Now like most large corporates we have a purpose. Our purpose is to make your world a safer place.
02:23
Now it's not just a hokey statement to sell more policies. It actually does go to the heart of how we're trying to change the way we do insurance. And one of the key things we're trying to change is how quickly we can respond to people when something bad happens. Using technology and better design, better human centered design processes.
02:42
And this is where Safer Journeys comes in. So before I talk about what Safer Journeys is, I'll talk about the process of coming up with Safer Journeys. So what are we trying to solve, right? We think of this like a startup, which we effectively are internally within our organization. What problem are we trying to solve?
03:01
And then what solution are we going to create off that? So on one hand, the problem was quite straightforward. There have been numerous crashes where people have gone missing, but they've actually had a vehicle crash and they are seriously injured. We know from statistics that someone's chance of survival drops by 20% per hour that
03:22
they are in that vehicle after a crash. That's quite a serious statistic, as you imagine, but that's not all we can help with. We also know from research that the moments after a car crash are some of the most stressful moments a person will experience in their life. They will be confused or could be confused.
03:41
They may not know what to do. They might not get the right information from the other driver. The car could be in the middle of the road in an unsafe position and not know what to do there. And of course, they could be injured as well. So we wanted to look at... So that was the problem. So we wanted to come up with a solution that would help with that stressful experience as well as the life-threatening issue.
04:05
So we had a simple problem to solve, help people when they have a crash. Very good problem to solve, but how do we go about it? Were people actually interested in this? How much would it cost to build? Was it technically possible? All those sort of fundamental questions.
04:20
So before we started writing a lot of code, we got our design team to start looking at the problem and the solution. And the way we did that was using our innovation framework, which if anyone is in design, you've probably heard of the double diamond design process. We've kind of augmented it and added a bit to it. How does this process go? So you start with a problem.
04:41
You then validate that the problem exists and you look at a solution to it. And that's your concept. You then go into discovery mode where you research the problem deeper, work out how bad it really is, how widespread it is, that sort of thing. And you also do customer surveys to find out whether anyone actually thinks this problem needs to be solved and whether it's of interest to them.
05:02
Now you've validated the problem. Next step is the technology one. Can you actually build something that will solve this problem? How much will it cost to build and maintain? Would people pay for it, for example, if you wanted to charge for it? That's where a product owner comes in and grabs all the subject matter experts and also
05:21
some of the design team to come up with a technical solution. If you pass all those gates, then you validate your opportunity and you're ready to build. So luckily we did and we moved on from there. If you've ever heard the term pivoting in startup speaking, it never really worked out what it is. It basically means going back through this process with a slightly different solution
05:40
because it didn't pass through these gates. So we're ready to build. We started with a set of build principles. So we use agile principles and Scrum methodology. So things like monthly retros, two weekly sprints, daily stand ups, that sort of thing
06:01
to keep the application and the system behind it being developed at a good pace. Everything was going to be cloud native. We needed reliability, scalability and repeatability, the ability to stand up things very quickly when they crash and burn. So we had to be a hundred percent in the cloud. Open source first, lowering, basically needed open source to lower the risk, to have full
06:27
control over the stack of tools that we were using and also, excuse me, and also to avoid lock into any particular platform. We developed iOS first and then Android, so we're still actually developing the Android
06:41
app. The reason for that was simple. We didn't want two developers starting from scratch at the same time because that would be inefficient. So we did iOS first and Android. The catch, of course, is that the trial we've been running for the last five months with our colleagues at work only supports iOS users only. We want to avoid packaged cloud services. So if at some point in the future we need to jump from AWS to Azure or Google
07:03
Cloud Platform or back, try to avoid the packages which are specific to that platform only. And lastly, we baked security into everything we did as we went along. You may hear about them every other week, basically, a lot of startups put in security at the last moment or after they've had a data breach.
07:23
That's not how we could operate. We're actually an insurer, which means we're liable for very large sums of money if we have a data breach and we are working with some very sensitive data here. Okay, so what is Safer Journeys? Safer Journeys is an app, just to prove it's real.
07:44
It's an app and an IoT device, otherwise known as a tag. It's this little guy. So stick them on your windscreen. This is a 3D accelerometer and a 3D gyroscope. It talks to your phone via low-energy Bluetooth, so this tag will last about two or three years.
08:00
It uses GPS from your phone and your phone does all the comms. Basically what it does is it collects accelerometer, the phone app collects accelerometer and gyro data 15 times per second unless it's detected a crash, in which case it collects it at 100 times per second. Sends that data to the system servers. Now if there's a crash detected, the app will send a message to the servers.
08:23
The servers will then validate it using some more advanced algorithms that a poor little iPhone can handle, and then that will trigger a push request to our emergency call center in Auckland. So the notification comes through. It tells the call center operator whether the vehicle is still moving, what the forces
08:42
involved in the crash were. If the person has stopped moving, the call center operator will try and contact the person and offer support. Now that support could be anything from how to manage the scene of the accident, what information to collect from the other driver, would they like to logic start the claims process, is the car drivable, do they need to organize a tow truck or transport from
09:05
the scene. It could also mean calling an ambulance on their behalf if they're unresponsive and they've had a serious accident, so if we register something with a significant force. So what infrastructure have we got behind this?
09:21
So we've got two sets of infrastructure actually, we've got the customer and alerting side and we've got the data and analytics side. So I'm going to focus mostly on the data analytics side, which is where I spend my day to day. So we have two sets of data that come from the technology partner that has the servers crunching this data in the background. We have a bunch of APIs which obviously support the real-time things such as crash
09:42
detection and then we also have files that get pushed into S3 and Amazon at the end of each trip. So how we ingest that data, we bring it in from S3 through Kafka. So Kafka is a real-time distributed streaming platform. It is very powerful, it is very reliable and very scalable.
10:02
We run it inside Docker containers managed by Kubernetes in AWS and this gives us all the scale and reliability we need as well as the security. We're also bringing in weather observations. So we're currently getting observations from every weather station in Australia as well and also New South Wales, transport for New South Wales data for train, bus, ferry
10:25
schedules and also real-time running. And the reason for that is we want to do some more interesting, answer some more interesting questions in the future once we build up a wealth of data around how many people, do crash rates increase during the rain for example? Or do they only increase in the rain after a period of drought?
10:41
Those sorts of interesting questions which we need to start collecting the data now to find out. Once that data is in Kafka, so it's currently stored in Kafka, we have a bunch of Java apps that also use Kafka which then allow you to transform the data
11:00
using GeoTools. So we use these Java GeoTools Kafka apps to do aggregation spatial enablement. That then gets put into Postgres with PostGIS and then we fire that off to our reporting tool which is an open-source tool called MetaBase which I'll be upfront is a tactical reporting tool otherwise known as we don't have enough developers
11:21
right now to come up with the actual reporting we'd like to do. It's a good out of the box here have 50 reports and wade through them yourself type of platform. And we use GeoServer for visualization off the back of PostGIS as well. We also have a bunch of data scientists in a company called Ambiata. So we have a data science company off to one side of the main insurance company
11:41
and they're currently having a look at the data using TensorFlow. Okay, we haven't yet settled on the actual long-term data analytics platform. So we need a big data solution here. I'll talk about the data shortly about the scale of it.
12:02
So we've been looking at platforms such as GeoMesa running on Spark to both open-source. GeoMesa is excellent for analyzing geotemporal data very well suited to movement data analysis at scale. We've also looked at PrestoDB which is the again open-source tool spun out from Facebook.
12:23
It uses the open-source Esri geometry API as opposed to GeoMesa which uses geo tools. And what we call the screw you guys, I'm going home option is actually just to let someone host the whole thing for us. Which would be, we're currently looking at Snowflake who are currently building out their geospatial capability.
12:41
Although for the bright-eyed people in the audience you would realize that goes against our build principles in terms of open source and not using packaged cloud services. But it's an option, it's a current option. Okay, the data. So as you can see we've collected a bit. We have one user in Auckland. Yay!
13:03
No, he hasn't crashed yet. Okay, so what are we collecting? We've got 17 fields of data being captured 15 times per second per trip per user plus 7 or 8 supporting files. So there's a reasonable amount of data. We've been in trial mode as I mentioned, 150 staff for 5 months.
13:23
We've done, we're up to 330,000 kilometers and about 30,000 trips. So what on earth are we doing with all that data? So first of all, that 100 hertz crash data that I mentioned before, that 100 times per second crash data allows us to do some extremely granular analysis.
13:41
So this is a historical chart. This was our first ever crash detected. Which fortunately was a 5 km an hour rear-ender. But as you can see there, the blue line, if you're looking at the blue line which is the longitudinal force, in other words the forward-backwards force, you'll see there's a sudden spike and a sudden drop and then a return to normal which is basically typical of someone
14:01
just going oomph into the back of you. Now thanks to our technology partner and some clever tech they've got, that actually allows us to automatically generate this page. So all of that is 100% generated by a machine which is a full crash report as to what happened and then giving percentage likelihood of which part of the vehicle was struck
14:20
during the accident. Now, when we go live with this, we've got 50,000 tags sitting in, that's kind of first phase, 50,000 tags sitting in a warehouse somewhere in a fulfillment center. At some point in the future we're going to have, unfortunately, hundreds and thousands of crashes analyzed at this scale.
14:40
So that, we hope, will provide insight, granular insights into what happens when vehicles crash which could lead to an improvement in road safety. Okay, challenge, sorry, data products. We're also looking at pothole detection. For example, I'm going to have to speed up a bit because I've been babbling too much.
15:02
Macro analysis of traffic flow, we've got 50,000 vehicles on the road from Sydney with this data coming through that allows us to do a lot of granular analysis. We also want to look at people's choice of transport with things like the weather and the transport data as context as to why they choose to drive their vehicle. And also risk, noting that this will not affect your premium.
15:20
We have a very clear statement in bold in our T's and C's, this will not affect your premium, no matter how badly you drive. Okay, just quickly, pothole detection. So this is me going over speed humps, I really didn't want to drive through potholes all day and damage my car. If you look closely there, the colors there is the vertical force. So you can see with a bit of a GPS offset there on the map matching,
15:44
you can see a rise and a drop in the vertical G as you go over a speed hump slash pothole. Okay, this is what it looks like when you look at the variance in the data, really obvious signal for any data scientists in the room, that's a really obvious signal. Okay, challenges, we've had a few.
16:01
The original Kafka system was built in Scala. We don't have any Scala developers, so the rest of it's been built in Java. So design debt out of the box, brilliant job. Anyway, we're managing design debt. We really wanted to use Python. The reality, the unfortunate reality is that Python is not a first class citizen in the big data world.
16:21
It is for machine learning, but for big data management, it's not. That's where you need Java based or JVM based languages like Java and Scala. Kafka is very powerful. Kafka is also very hard even with a very experienced Java dev. There's a steep learning curve. Okay, so don't underestimate the learning curve on Kafka.
16:42
Okay, testing, we didn't want to crash cars. So we were lucky enough to have RMS in New South Wales. Allow us to stick a tag in a crash test, an end cap crash test, which gave some good insights. I will just quickly show you this as well. We've been doing some very rigorous and scientific testing.
17:01
And why aren't we playing? Ah, come back. That vehicle should actually be moving. But anyway, if anyone wants to see the video later on, you can, since we're running out of time. Okay, it makes a really good noise when it hits the deck. Okay, just quickly, some final learnings.
17:21
I was going to call this outtakes. These are things we just could not have predicted. Don't allow scooter drivers to use your tag. The wheelbase is so short and the wheels are so small, every little dent in the road, bang crash alert, bang crash alert, bang crash alert. Okay, a well positioned strike by a large insect on the windscreen where the tag is
17:42
will cause a, will trigger a crash alert. We've had one confirmed and one possible. In both cases, there were 3G events. So that's how hard a cicada hits your windscreen. Alright, last but not least, last slide. We had someone who kept giving us crash detection, detected crashes
18:00
and we're like, okay, where have you put it? Because normally people put it in a bad spot. On the windscreen, he said, we're good. Two days later, lots more crashes. Where have you actually put it on the car? He went, oh, the rear windscreen. Okay. Yes, so every time he closed his boot, we're getting a crash alert.
18:21
Thank you. What an interesting talk. I'm sure there are lots of questions. Thank you. Really interesting. I'm wondering from your perspective of an insurance company,
18:41
how do you fine tune your precision and recall? As soon as you do something with a machine learning model, you will have to either over detect or under detect or strike the balance right. I guess in your case, if you don't detect and something happened, people will die or not get help. If you start over detecting, services will not be happy either or something.
19:04
So just some comments on that. Not in production yet. So we've deliberately asked our technology partner to lower the threshold. So we are getting like 2G warnings and 2G just for perspective. A 15k an hour rear end crash is about 15 to 25 G.
19:21
Right. So it's a massive force. People don't realize, of course, the car absorbs most of it. A significant accident will be 30 G plus. That ANCAP test, full frontal test with 35 G at 50 k's an hour into a solid object across the entire width of the car. That's caused problems for us because we're getting tons and tons of false positives.
19:43
What we're doing is at our end of the system is we're now applying a little bit smarter filtering around that. So we don't get so many false positives. From a financial point of view, we need to find the right balance because, as you can imagine, we're not dealing with a standard call centre. We're dealing with an emergency call centre
20:01
and so their per call costs are quite significant. So we need to make sure only there's a certain level of validation without missing out on an obvious crash. Good. We have time for one more question. I like you, Hugh.
20:20
Thank you. But you work for a giant insurance company. Yep. What are you going to do with all this data? When do you delete it? Whoa. OK. So, look, the way I didn't get to explain the data products very well because I was running behind time, but when we look at driver behaviour and risk,
20:41
that's not a sellable item. We're not going to the market saying, here's the IAG, you drive a risk and there's Alex's profile and he drives terribly so don't you show him. We're certainly not doing that. That's mostly for internal purposes, for us to improve our understanding of customers. I was going to say, I mean, we're a highly regulated company.
21:06
Unless we spin out Safer Journeys to a separate company, we are highly, even if we're owned by IAG, we're a highly regulated company, which means there are $1.3 million fines for misuse of data per breach. And so we generally have a pretty conservative agenda
21:22
when it comes to using this data. The key things for us around crash insights, risk insights and behaviour and choice of transport, because at some point we might like to price people differently if they catch the bus five days a week. Right now that's not data we have access to
21:40
and we don't ask our customers. Cool. Thank you, Hugh. Forgive my skepticism.