We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Using location to inform predictive analytics

00:00

Formal Metadata

Title
Using location to inform predictive analytics
Title of Series
Number of Parts
14
Author
License
CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Identifiers
Publisher
Release Date
Language
Production Year2014
Production PlaceWashington, DC

Content Metadata

Subject Area
Genre
Abstract
Predictive modeling is used throughout organizations to predict behavior and outcomes; organizations use those predictions to efficiently allocate resources. This talk will cite examples from social organizing and healthcare to show how geographical data can be used to enhance predictive analytics work and drive more efficient and effective programs.
12 (number)Decision theoryScale (map)Strategy gameStaff (military)VotingExecution unitMathematical analysisView (database)Point (geometry)DisintegrationModel theoryDatenverknüpfungSubject indexingMatching (graph theory)Vertical directionDatabaseGamma functionRandomizationSystem callBitMessage passingThread (computing)Endliche ModelltheorieAnalytic setMathematical analysisAreaNeuroinformatikModel theoryCharacteristic polynomialExecution unitSinc functionGame controllerVirtual machineDecision theorySystem callHeegaard splittingVotingFigurate numberRight angleScaling (geometry)Set (mathematics)Internet service providerStrategy gameStudent's t-testMultiplication signGreatest elementDatabaseVideo gameView (database)Service (economics)Medical imagingNeighbourhood (graph theory)SequelBuildingProcess (computing)Real numberComputer animation
RandomizationProcess modelingoutputMaizeModel theoryBit error rateUniform resource nameFunction (mathematics)Electronic mailing listAddress spaceRaw image formatStrategy gameGeometryLevel (video gaming)Mathematical optimizationHypermediaTheoryComputer programVotingComputing platformOpen setSlide ruleOffice suiteSound effectModel theoryPopulation densityWhiteboardMereologyOpen setPolygonAlgorithmQuicksortElectronic mailing listUniform resource locatorMappingComputer programmingType theoryCircleCuboidLevel (video gaming)Gene clusterPower (physics)NumberRadiusBuildingMultiplication signAreaStaff (military)Set (mathematics)OutlierAnalytic setGreedy algorithmDistanceCartesian coordinate systemCoefficient of determinationStrategy gameSystem callNegative numberAveragePhysical systemFluxGoodness of fitAuthorizationBlock (periodic table)Group actionCausalityFlow separationOrder (biology)OvalRevision controlDifferent (Kate Ryan album)Grass (card game)
Web browserComputer animation
Grand Unified TheoryTrailTrigonometric functionsSummierbarkeitCross-correlationField (computer science)Population densityNumberVideo projectorTouchscreenQuery languageOffice suiteQuicksortVariable (mathematics)Type theoryDot productEuler anglesGroup actionCASE <Informatik>Insertion lossDialectRoutingEmulatorProcess (computing)Sequel
Staff (military)Level (video gaming)SynchronizationTask (computing)DatabaseScaling (geometry)Thermal conductivityState of matterVotingStaff (military)Neighbourhood (graph theory)AreaComputing platformInformationSequelServer (computing)Physical systemWeb pageMultiplication signStatisticsCellular automatonQuicksortProcess (computing)Extreme programmingWave packetGroup actionField (computer science)Software engineeringBuildingCustomer relationship managementAnalytic setArithmetic meanRight angleMappingGreen's functionSoftwareCoordinate systemComputer fileOpen setEnterprise architectureLevel (video gaming)Query languageElectronic mailing listModel theoryTask (computing)Computer animation
Multiplication signType theoryXMLUML
Transcript: English(auto-generated)
I'm with Blue Labs. Blue Labs is an analytics and technology company that spun out primarily from the Obama 2012 campaign's data science team. And we apply data science techniques to improve social good in the areas of civic engagement, health care, and education primarily.
The examples today will mostly be from politics, since that's what we know best. But then we'll try to draw those threads into other areas as well. So basically, we're all here because ultimately, we'd like us, or our constituents, or our customers
to be able to make smarter decisions. And really, people are actually pretty good at making smarter decisions already. So what we really want to do is scale decisions that are too small for a person, and provide people with the data that they need to make
decisions that are appropriately sized for a person to make. So the example is that you can't run a political campaign with a human being looking at each voter's profile and saying, yes, I think this person is persuadable. No, this person lives with three Republicans in a Republican district, voted in every Republican primary.
This person is not persuadable. You need a machine to do that for you. So we looked at what decisions can we influence with geospatial data. And what we got to is, at the top, do we keep playing in Arizona? The answer is no. And then all the way down at the bottom, which doors
do we knock on? Everything in between. What I find interesting is that the top area of that chart is all about analyst-supported decisions. That's something that a human being sits at a SQL prompt or sits at a data prompt and crunches a bunch of data
and says, the data says this or the data says that. Down at the bottom, it's self-service tool supported. It's either computers making the decisions or non-technical users making the decisions based on self-service data tools. So how do we? So this is actually shifting gears a little bit
specifically to predictive analytics. So identify a granular convenient unit of analysis. Within politics, that's a voter. In health care, it's a patient. Lots of places, it's a customer, a prospective customer, a student.
Find all the data that you can about people at that unit. And build a coherent view, usually of a person, but it can be of a building, it can be of a neighborhood, it can be of a road. And then build models that predict the behavior at that atom of analysis based strategy
based on those things. So this is what it looked like for us, where you have general analytics databases, which is something like, in our case, Vertica, but Redshift, Teradata, Oracle, Postgres.
And PostGIS is our geospatial database, supporting conducting the right people, custom analysis, data exploration tools. So what does propensity modeling look like? You call up a bunch of people, you ask them what they think. You figure out what the characteristics are
that predict someone's support. And then you go ahead and apply that to your entire file of people. However, we can do a bit better than that. That tells you how likely someone is to support your candidate. What it doesn't tell you is how likely someone
is to be persuaded by a message. So to do that, we do persuasion modeling, which we take a set of constituents, split them in half, deliver a message to half those people, actually call up the other half. So one half, our candidate is great for the economy. The other half is actually just a support question.
It's, do you support this person or that person without identifying our candidate? The purpose of that second one is to make sure that we're not just modeling on people who are reachable via the phone. So we have to talk to a real live person at the other end of the phone to consider the person either
a treatment or a control. Then a while later, call up everybody. Again, ask them what they think. Hi, I'm calling with ABC Research. If the election were held today, would you vote for this person or that person? And then predict who it is who moves based on the message that they received
versus who it is who moves with no message at all. What's really interesting is that there's actually the possibility that some folks, this is true in retail too, this is true in education, it's true across the board. Some folks reaching out actually has a negative persuasion of that.
Sort of, we tend to call that sleeping dogs. Folks who it's better off just leaving this group alone. So what's interesting, so again, we're at a geospatial conference. What's interesting is a whole bunch of those models are geospatial.
So one example is drive time to a polling location. A person is more likely to vote in an election if they don't have to drive very far to get to the polls. And so one area we did that with a straight radius
just said, number of miles to a polling location. Then spent some time looking at, okay, how does that compare to drive time? And what you'll see is that not all of these look like circles, like this is very much not a circle, this is very much not a circle. Even this clusters up along freeways. This was built using open street map data,
graphs, and some really naive way. It's just, we think people drive 60 miles an hour on a freeway and 15 miles an hour on one type of road and 20 miles an hour on another. I think the routing algorithm we used would allow someone to drive up and on-ramp,
so drive up and off-ramp rather. So it's not a absolute perfect model, it's a naive, let's spend a few days building some routing polygons based on the data that we have and then validate it using maps. And we were like, yeah, we think this takes longer to get there than this.
So in the end, what you end up with is a list of people with a model score. Contact these people, not these people. So we're done, we'll call all those folks, end of the campaign. So not every strategy is at an individual level.
Some things happen at a group, TV ads, locations of offices, that sort of thing. So once we have the density of our supporters, we go back and say, where should we place offices that are near the most supporters? This was a fairly simple greedy algorithm.
It took the office that can reach the most people, put an office there, but took the office that could reach the next most people, put it there, next most people, put it there, so on and so forth. I'm sure that there are some retail analytics folks who could walk through a bunch of other systems,
but really the goal here was, how fast can we support as many decisions as possible with a reasonably small staff and a modest budget? So here is, going back to the persuasion model, here's the list of all the people who we modeled
as being persuadable nearby Richmond. Turns out that a whole bunch of those folks, it wasn't actually worthwhile to reach. Just walking down a really long driveway isn't worth your time. And the way that we did this was an iterative algorithm where within each volunteer's area,
we took a few of the most dense areas, said, let's grab everybody within a certain distance of this dense area, then said, okay, within that set of people, let's grab a bunch of people that are within those people and walking it out, you can see somewhere around here,
it becomes apparent, it's hard to see in this specific area. These outliers are actually huge apartment buildings, they're not errors, so if 200 people live in one place, it's worthwhile to just drive out to that building. Finally, TV advertising.
Using set, top, box data and location data, find the folks who find the programs where the most persuadable people are watching. What you'll notice is there's almost no primetime here. And where that comes from, the naive approach is you think people who are interested in politics must watch the news, so we're gonna play on primetime news.
It turns out everybody advertises on primetime news, you can get some, you can get some, you can get some efficiency by advertising in places where the aggregate buying power is smaller, but the aggregate persuasion effect is so large. So the next part about it is that most of the slides
that I showed were screenshots of an application that basically everybody who worked for the campaign had access to. This is the Virginia Terry McAuliffe race. It's white labeled as a Democrat's tool, but it's the same tool that's available in a few different situations.
So sort of just walking through a little bit, let's see if we can actually just get the browser up.
So basically what the tool lets you do is the projector goes up over the screen, is we wanted to keep this as simple as possible. We said, let's let folks specify a choropleth layer,
we called it a shading layer, let's let them specify a dot layer, we called it a point layer. It might not have liked switching screens. So basically from here, folks could select, you know, what dots they wanted to say,
they can correlate dots and demographic variables. In this case, we're looking at density of GOTV targets by where our field offices were. And really it was an approach of how simple can we make a GIS tool and still allow folks to find their own correlations that we hadn't thought of.
We also added a tool that anybody who knew SQL could push layers into this. They didn't need to be a GIS guru if they returned a query that had an attitude and longitude, it showed up as a dot. If they made a query that had a number from zero to seven
and some sort of a geographical identifier, usually a FIPS, it would detect what type of FIPS it was and spit it out as a choropleth layer. So what we're looking at now is where it is
that our volunteers had contacted voters. Green means they reached someone. Yellow means nobody was home and red means they didn't contact the person. This updated twice a day. So folks could say, hey, there's a big red blob here. What's going on? We thought we canvassed this. Somebody lost a walk packet
or this is a rough neighborhood. We need to send our best volunteers. So finally, there weren't any BI tools or advanced analytics platforms or really much of any sort of large scale enterprise software.
It was lots of software bolted together to do specific tasks. So we had a SQL training once a week. Most of our interns knew SQL. They could log into the database and answer questions using SQL. A good chunk of the staff knew how to connect to our PostGIS server using QGIS.
They could log in and make their own custom maps. And so then what is the role of software engineering become what is the role of high level systems engineering? It becomes much more about coordination. Get data synced into a database, document that data, build open data portals,
provide APOs, provide a pre-installed staff, things like we use GeoServer pretty extensively with layers for lots of information people could want and tools for pushing data, both between systems. So I found a group of volunteers that I really want to contact. How do I turn that from a SQL query,
push it into a list in our CRM so those volunteers get contacted. And then tools for pushing data out to field staff, things like Explorer, the data exploration tool I showed is based on the idea of I have a SQL query that shows geospatial data. I want all my staff to be able to access it.
So that, what all those tools allowed is in 2012, it doesn't look like it over here, but there were 150 people on the analytics team. In 2013, there were eight of us. So being able to go to that means that folks need to spend their time analyzing data.
And we think that means you need a BI tool. What it actually means is you need all your data up to date and accessible via SQL. You need a way of spitting out an SAS or state or R file or whatever statistical analysis tool you need. Once you build a model, you need somewhere that I can push that model right away.
So I think I'm pretty darn out of time. So very briefly, applying that to some other industries, primarily healthcare and education, looking at building the same types of agile data infrastructures that support those other industries. Cool.