One Engineer, an API, and an MVP: Or how I spent one hour improving hiring data at my company.
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Number of Parts | 50 | |
Author | ||
Contributors | ||
License | CC Attribution - ShareAlike 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this | |
Identifiers | 10.5446/44072 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
DjangoCon US 201814 / 50
2
5
7
8
13
18
20
21
22
35
41
44
45
47
00:00
VideoconferencingInternet service provider
00:18
Software engineeringInfinityPower (physics)Representation (politics)Inclusion mapComputer animation
00:48
MaizeMessage passingTwitterRight angleMultiplication signDifferent (Kate Ryan album)NumberMultilaterationFocus (optics)InformationProjective planeKey (cryptography)Token ringSelf-organizationExtension (kinesiology)Computer animation
02:35
Hacker (term)BitLimit (category theory)Computer animation
02:53
Price indexDependent and independent variablesMessage passingGenderDebuggerWeb applicationCartesian coordinate systemView (database)Real-time operating systemProcess (computing)Web 2.01 (number)Proof theoryLevel (video gaming)NumberElectronic visual displayElectronic mailing listHacker (term)Web pageMetropolitan area networkSoftware developerCodePoint (geometry)Uniform resource locatorProduct (business)
05:34
DebuggerSoftware developerHacker (term)Revision controlProduct (business)Computer animation
06:02
First-order logicTable (information)Web pageProgrammschleifeDatabaseFunction (mathematics)Query languageResultantSynchronizationReal-time operating systemData storage deviceView (database)Multiplication signXML
07:07
Decision theoryStatisticsData managementComputer animation
07:58
Electronic mailing listSoftwareMachine learningProduct (business)View (database)Source codeElectronic mailing listTouchscreenSubsetComputer animation
08:42
Sensitivity analysisFigurate numberInformationComputer animation
09:00
SurfaceData storage deviceKey (cryptography)InformationDatabase
09:57
Data managementHTTP cookieRevision controlSensitivity analysisInclusion mapView (database)
10:25
ResultantGroup actionMaxima and minimaOrder (biology)Real numberDecision theoryCombinational logicMeta elementGenderComputer animation
11:20
Decision theoryStress (mechanics)InformationInheritance (object-oriented programming)Web applicationMaxima and minimaSocial classMultiplication signRight angle
12:22
Sparse matrixDisk read-and-write headInformationSource codeConfidence intervalLine (geometry)Group actionoutputInformation privacyMultiplication signWebsitePhysical systemWeb 2.0
14:02
Information overloadPhysical lawGroup actionOnline helpArchaeological field surveyVector potentialLine (geometry)GenderCausalityDependent and independent variablesComputer animation
15:26
Decision theoryObservational studyPosition operatorLocal GroupInferenceDifferent (Kate Ryan album)Normal (geometry)Observational studySound effectGroup actionMultiplication signOptical disc driveNumberDecision theoryComputer animation
16:54
Group actionCartesian coordinate systemLevel (video gaming)SoftwareLimit (category theory)
17:28
GenderWage labourNumberProcess (computing)Wave packetTable (information)Computer programmingForcing (mathematics)Physical systemWage labourVector potentialTotal S.A.Parity (mathematics)GenderWebsiteSound effectGroup actionComputer animation
18:26
Device driverBlogSigma-algebraWordData managementInclusion mapWeb applicationAbsolute valueCuboidComputer animation
19:07
Multiplication signIterationSparse matrixWeb applicationSource codeXMLComputer animation
19:43
GenderCountingLevel (video gaming)SoftwareTouchscreenWebsiteOcean currentINTEGRALHacker (term)CausalitySoftwareTouchscreenDifferent (Kate Ryan album)File formatCartesian coordinate systemGroup actionBit rateLevel (video gaming)CountingRepresentation (politics)Drop (liquid)Traffic reportingSource codeMessage passingWebsiteComputer animation
21:02
Virtual machineGroup actionBuildingMultiplication signProcess (computing)Fault-tolerant systemFeedbackDisk read-and-write headSource codeShared memoryMachine learningComputer animation
22:08
Inclusion mapMathematicsMeasurementShared memoryComputer animation
22:27
Coefficient of determinationTwitterComputer animation
22:52
Coma BerenicesInternet service providerData typeXMLComputer animation
Transcript: English(auto-generated)
00:00
Cole Zuckerman, that's me on the left.
00:23
Best wedding picture ever, right? I'm a senior software engineer at Clover Health. Yes, we are absolutely hiring, let's talk later. I've been interested in issues of representation and diversity for around 16 years now. And I love that my technical skills allow me to affect how we go about doing things at Clover Health and improve diversity inclusion.
00:42
It's like infinite cosmic power. So I'm glad to be here talking about it with y'all today. And if you want, you can message me on Twitter. That's my Twitter handle. That's right. I never post though, so I do check it.
01:01
Anyway, we know diversity is valuable. What stops us from collecting data to help us improve? The fact that tech has struggled to hire and retain employees from diverse backgrounds has been written about and discussed extensively, particularly in the last few years. And the benefits are also really well documented. So why is it hard for well-intentioned organizations to shift their demographics?
01:22
There are a number of reasons, but one I don't think has been discussed as thoroughly is the challenge of actually gathering and responding to data about diversity within a company's hiring pool and existing employees. If you wanna talk about getting that information from employees, let's talk later. For today, I'm just gonna focus on getting that data in your diversity, in your candidate pipeline.
01:43
I know words. I have a master's in English literature, of course I know how to speak. So one hack day, I was involved in too many projects and had only like a token amount of time to devote to the one that I was most interested in, which is seeing if we could determine whether we had sufficient diversity for any given role to start interviewing candidates,
02:01
or if we needed to spend more time sourcing diverse candidates for the pool. I accomplished an MVP in approximately an hour once I had the API key and permissions. Honestly, getting the right permissions from HR was like the hardest part, sorry, from recruiting. That took the longest amount of time. So if there's anything you take away from today,
02:22
aside from picture of my dog, which I'll show you at the end, I want it to be that it doesn't actually take a huge effort to make a big difference and everyone should give it a go. All right, so here's what you're in for. I'm gonna walk through the MVP that I did during hack day. I'll talk a bit about the V2.
02:43
I'll talk about some gotchas or limitations that I uncovered, and then we'll talk about what happens now. You in? All right, let's do it. All right, so this is the web app I put together for hack day. It was around a year ago. This is the MVP. The front end literally has no styles,
03:02
is as bare bones as can be, but it works. I am not a front end developer by any stretch, and you'll be able to see that in a couple slides. All right, so it's a simple Django web app with one view. There's a bunch of real-time API calls to Greenhouse to fetch data.
03:20
Greenhouse just happens to be the recruiting funnel tool that we use. There are other ones out there that also have APIs, but this just happens to be the one we were using. And this is pretty hacky code. In production, this would never fly with a nested for loop, and the performance of the code is terrible,
03:40
but as a proof of concept, it was really effective to give people a chance to see exactly how homogenous or diverse our candidate pool really was for any given role. So this is the one Django view. At the high level, it gets a list of jobs. It gets all the applications for each job, and then for each application,
04:01
it gets the EEOC data associated with each application, whether you say that you're a man or a woman, what your race is, whether you're a veteran, whether you have a disability. It does a little number smooshing at the end. It renders a list of jobs with data about how diverse the candidate pool is for each.
04:21
So this is the thing at a high level. Here are the API calls I make. They're not fancy. I hard-coded the URLs, because hack day. I hit Greenhouse's API and load the data for each of these three endpoints. There's one where I get the list of jobs that are open right now and were created
04:41
after a certain point, created after, because when I load this in a web browser, it literally takes 10 minutes. Hack day. And then for each of those jobs, I get the applications, and then there should be one or zero EEOC chunks
05:01
for each application. And so here's where I add an applicant status to the blob for this job. I anonymize and tally number of applicants, self-attesting as belonging to a protected group, and then just display aggregates for each role on one page.
05:23
So this will say there are blank women in the pool for this role, or there are blank veterans. Friend and developer, I am not. So this is what the thing looks like for any role.
05:40
It says whether there is sufficient diversity. We'll talk about that later, to move forward with interviewing candidates for the role. So now that we've gone through the hack day version, what happens then? There were a lot of things that probably needed to change
06:00
to make this ready for production. The one that was killing me was those nested for loops with real-time API calls that took 10 minutes. So this is me hitting the view once, and then the output goes on for days, and that's just like hitting it one time.
06:23
And you also don't want to have people waiting 10 minutes to see their results. So because of the way that Greenhouse's API works, I couldn't do it together any more than I already did. But this is a ton of API calls. It'd be much better to collect the data regularly,
06:42
store it in a database where we can make fewer queries by using table relationships. So our first order of business was to sync the new data from Greenhouse to our database, and have no synchronous API calls to a third party, and then our page can load in a snap. Now, the next hurdle is how do we know what data is new?
07:00
How do we upsert instead of getting everything fresh each load? But that's a problem for future Nicole. Another, those are not mine, I wish. Another thing we decided to do for V2 was to only show diversity stats for candidates who have exited the pipeline already. People who have either accepted an offer
07:21
or left the pipeline because they withdrew, or we decided not to move forward with their candidacy. Otherwise, there's potential for a hiring manager to look at this page, see the candidates we're evaluating and go, oh, we need more veterans to increase our EEOC data for this year.
07:40
We should totally hire this woman because it's illegal. You can't make hiring decisions based on someone's protected class, like whether, you know, being female or veteran status, disability status, and race. We also added a list view of all roles
08:02
where you can click through for details for a particular role or department. The other things that we wanted to accomplish with V2 were showing diversity at every step of the pipeline from recruiter screen through offer acceptance and being able to slice data to see the story for a particular group or from a particular source
08:22
or from a particular recruiter. So you can say like, oh, we get really great candidates from this source, but this source not so much, or we're really, we're super represented with people of color, but we're not really great at reaching out to people with disabilities.
08:44
All right, so here were the things that totally tripped me up. Fortunately, I figured stuff out or had people help me figure stuff out before any sensitive data anywhere. Any sensitive data went anywhere. So one, be careful with people's sensitive information.
09:03
I suggest you host the data where not everyone can access it, like this poor puppy. But you know, it's probably made of chocolate and he shouldn't have it. So one thing that we did was we anonymized our data. When pulling information about candidates using an API, for example, we didn't store any information
09:21
that's personally identifiable. No names, no nothing. I also recommend that your database not be in a place where all of engineering can access it, because these are like potentially their coworkers or themselves, and it's not really cool to surface people's personal data for other people to access.
09:42
So I would also say put your API keys in a private place that not even all of engineering can access. Otherwise, if you have your database stored in a place that is private, but everyone can access the API key, they can get the data for themselves. And then where are you? And I would also argue for different levels of permission.
10:02
Be sensitive about who can see what. Maybe everyone should get the 10,000 foot view and in all hands, that's the cookie version. But being able to see per roll, slicing on different tags should maybe be limited access. And maybe only like your diversity and inclusion manager, if you have one, gets to see like all the granular stuff.
10:21
That's the ingredients, metaphor. It's really only in meta three, let's be real. So another decision that we ended up making was to show results only for groups of minimum pool size or greater, even though that means that we get data on a slower cadence than real time,
10:41
in order to give the pool size a chance to get large enough for real anonymity. Be aware of combinations. Veteran, disability, gender, and race combinations are identifiable even with anonymity. So for example, if I'm like, hey, who were the musician Muppets? There's like five of them. And if I say, who are the blue Muppets?
11:02
Yeah, there's like five of them. But if I say, which are the blue musicians? There's only one. He's very easily identifiable and you don't want to have that.
11:21
So let me just stress for you one more time, the importance of anonymizing data and not, after you keep your minimum pool size, and also consider not including candidates who are currently in the pipeline, make sure that no one can make decisions
11:43
based on any of the, about a candidacy, based on the information that is in your web app. Like I said before, that's super illegal. Right, so I was telling you not to do anything illegal, right? Yeah, that's totally it.
12:01
And making any decisions about a candidate, whether you move forward or not, based on their protected class is totally illegal. Do not do that. And so the best way that I found to not do that was to not give people the opportunity to do that and show information for people who are currently viable candidates. Cool?
12:20
Puppy in a pipeline. Okay, so another challenge that we have is that not, is data sparsity. Not everyone fills out information. It's not available necessarily for referrals or candidates who are sourced internally. And even on the website, not everyone chooses to fill out this information. Even though in my head I was like,
12:42
yeah, I'm gonna get this data and it's gonna be glorious. We're gonna have so much information to make decisions on, and then you get to reality and one person has filled it out. So understanding why the data is sparse will help you figure out how to mitigate it. And understanding why the data is sparse has a lot to do with humans. Solutions regarding humans are usually specific
13:02
to a group of what they have in common. So what works for employees is not the same as what works to get more data from candidates, for example. So sometimes there's no opportunity to collect data from non-system inputs. For example, the source candidates. So what you can do about it is instrument human systems,
13:22
like have your recruiting folks make sure that they ask for this information with all candidates. Also, folks are often suspicious about what you're gonna do with their information and are worried that it's gonna be used against them in some way. And since you don't have a rapport built with them
13:41
very much by the time they're applying on the website, there's not necessarily a ton you can do about that, except maybe put a line saying, we respect your privacy and your data. This information is gonna be used for these purposes, et cetera, just to give them a little confidence that you are not a jerk, and you're not.
14:04
So then another problem is that the data that we need to collect by law doesn't necessarily match up with what we want to collect or what people might be willing to divulge. We're also, we're just limited by what is legal to ask. You can't ask anything about someone's age. And you also shouldn't ask anything that's gonna lead to you figuring out someone's age,
14:22
because intent matters. And you also need to decide what labels and groups to include, like for gender questions. What are your options? The federal government collects only male or female, or I think men and women or something like that, but I know that there's non-binary people out there.
14:41
How can I reconcile the legal requirement from my understanding of the world and wanting to be inclusive? Making a survey is an art unto itself, and it's possible to ask questions and elicit a response that you did not intend. So if you have researchers who work at your company,
15:03
maybe ask them for their help. Federal contractors have an EEOC requirement that's Equal Employment Opportunity Commission that requires these certain questions and potential answers that don't, certainly don't line up with what I would like to ask, and so that could lead you to have multiple surveys,
15:21
which causes survey fatigue, and again, suspicion. All right, so now let's get to this elephant in the room. What is sufficient diversity anyway? There was this article in the Harvard Business Review about the relationship between
15:40
what the makeup of your finalist pool for a role and the hiring decision. So if you have zero diverse candidates in a pool, you're obviously not gonna hire anyone from an underrepresented group, but according to the handful of studies that were in this Harvard Business Review article, even with one candidate from a minority group, that person has like zero chance of getting hired
16:01
because of our preference for the status quo. As the article says, deviating from the norm can be risky for decision makers, as people tend to ostracize those who are different from the group. For women and minorities, having your differences made salient can also lead to inferences of incompetence. Basically, being the only woman in a pool of finalists
16:23
highlights how different you are from the norm and then makes you riskier. So this also works for racial minorities. The odds of hiring a minority were 193 times greater if there were at least two minority candidates in the finalist pool, controlling for the number
16:41
of other minorities and white finalists. And the effect held no matter what the pool size was, like six finalists, eight finalists. So even having two people from an underrepresented group makes a big difference. Pretty low bar, right? What we saw was from a limited size of finalists.
17:02
So if you've got an applicant pool for a role like software engineer, you're gonna have way more candidates to choose from. So even if you have two people from a minority group represented, if there are a thousand people from the in group, even putting aside unconscious bias, which totally exists and will whittle the diversity of your pool at every stage of the pipeline, and our status quo fix of having two candidates,
17:22
you're statistically still very unlikely to hire someone who's from the status quo, who's not from the status quo. All right, so what do we do now? Why not try going for parity with the eligible labor force in the US? Then at least, if the top of your funnel is balanced, you stand a greater chance of having those people make it through your hiring process. Not that it's smooth sailing from there,
17:41
we still need to institute systems to limit the effective unconscious bias in the rest of the hiring pipeline, but it's a start. So this is a table of gender and race in the potential labor force from 2017. The base number is a thousand, so the total number of black women who were in the workforce was 10.5 million.
18:01
Out of 17.5 million black women who could have been in the labor force, approximately. I round. Not all those people are qualified for every job or even interested, but if we support programs that give training and opportunities to underrepresented groups, while we also try to get as close as possible to having our pipelines represent that population pool,
18:22
we might actually get there, but if we don't set our sights that high, we're never going to. All right, so what now? Well, it turns out, we found out when we were midway through our V2 that there's a company that does a lot of the hiring pool stuff out of the box,
18:42
so we ultimately decided to use them for the future, so engineering hours don't need to support the web app indefinitely. Could we have done our research to discover this company first and save ourselves the work? Absolutely. Would anyone have thought to Google it before I made my MVP? Unlikely. We found out about the company through our diversity inclusion manager
19:00
who joined right after I made my MVP, of course. If you have DNI people, talk to them early. If you don't have the budget to outsource the way we ultimately did, the web app we discussed is perfectly usable. It probably requires some further iteration to handle the sparsity of data, and you need to figure out what is sufficiently diverse for you,
19:22
but even if your time is limited, the MVP is better than nothing at all. If you go this route, make sure to include any DNI people at your company, and recruiting, and legal, but if you're willing to do the work, most of the time people are happy to give you their opinion and send you off on your way, but don't try to do this without knowing what's legal.
19:45
Someday, it'd be really great to integrate data from candidates with data from current employees, but that's pretty challenging because those are almost never in the same systems, so maybe next hack day. Now when you identify different departments
20:00
or roles where there isn't representation across different groups, you can go back and source those missing groups. If there's a big drop off at one step of the recruiting funnel, investigate why people are leaving there. So this is fake data, but this is a format of a report I look at quarterly. For one role, software engineer, you see the counts and pass-through rate for men versus women for each stage of a hiring pipeline.
20:22
At the beginning, you start with 100 men and 50 women in the pipeline, which is not representative of the population, but not awful, but in the application or resume review stage, 50% of the men make it through, but only 20% of the women. Looks like you need to see what's going on in that resume review stage. At the phone screen stage, you still have more men than women,
20:41
50 and 10 respectively, but they make it through at the same rate, 30%, that seems okay. Then for the onsite, you have 80% of men making it through the stage, but only 33% of women. Uh oh, maybe take a look at your onsite and see what's happening there. And because I work at a really great place, everyone accepts their offer, but wow, we lost a chance to hire women at the resume review and onsite stages.
21:03
So this is hard. Do it anyway. Since the problem is one of people and systems, it's inherently challenging, especially if you're a person who's at the apex of privilege in our society. Do what you can. From my little perfectionist heart to yours, admit you won't have it perfect. Try to build processes around data collection
21:21
wherever you can. Don't guess about someone's gender or race unless you're required to by law or if you're going to, which I don't prefer that you do, outsource it to a company that does it professionally like with machine learning. They're honestly probably slightly less biased than humans. And if you are at the apex of privilege,
21:42
cishet white men, you should involve people from underrepresented groups to be involved in the process. Listen to them. They know how things come across, but don't make them do the lion's share of the work. My recommendation is to solicit their opinion early and often. Do some work. Show them what you did and ask for feedback, but also deal with it with grace
22:00
if they don't want to be involved because they've been carrying this burden the whole time up till now when you're coming in the last mile like, come on. So those are just some examples I happen to have. There's a ton of other things you can do to instrument your hiring pipeline or measure changes in inclusion at your company. Share out what worked, what didn't work for you so we as a community can learn from each other and develop best practices.
22:22
There's no reason to keep best practices about diversity and inclusion secret. I want the special sauce all over tech. So that's me and my Twitter handle. That's my dog, Chloe. If you follow her on Instagram, at corgi in the front, I promise you will not be disappointed. Her front legs are a little shorter than her back legs.
22:41
And I work at Clover Health. It's really great. Come work with me. Chloe says you should work with me. Thanks.