Managing Identities: LDAP, Google Directory, and Django
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Formal Metadata
Title |
| |
Title of Series | ||
Part Number | 42 | |
Number of Parts | 46 | |
Author | ||
Contributors | ||
License | CC Attribution 3.0 Unported: You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. | |
Identifiers | 10.5446/32781 (DOI) | |
Publisher | ||
Release Date | ||
Language |
Content Metadata
Subject Area | ||
Genre | ||
Abstract |
|
DjangoCon US 201542 / 46
4
11
12
13
14
17
24
32
33
34
36
39
41
43
00:00
Digital mediaCore dumpCodeWeb 2.0Student's t-testFeedbackTouch typingSampling (statistics)Selectivity (electronic)Default (computer science)Uniform resource locatorSlide ruleSystem programmingData managementIdentity managementRight angleProjective planeFilm editingReal-time operating systemStaff (military)Enterprise architectureFaculty (division)WeightPoint (geometry)QuicksortReal number
02:31
Student information systemProjective planeWeightServer (computing)System programmingMobile appInformationStudent's t-test
03:08
Block (periodic table)System administratorEmailSet (mathematics)Casting (performing arts)Execution unitGroup actionAliasingGoogolSystem programmingMathematicsPasswordStudent's t-testField (computer science)Utility softwareInheritance (object-oriented programming)Service (economics)Power (physics)AuthenticationBuildingRaw image formatOnline helpStaff (military)Faculty (division)Process (computing)Hooking
05:09
Acoustic shadowPasswordCasting (performing arts)Standard deviationSystem programmingFraction (mathematics)Archaeological field surveyService (economics)AuthenticationGoogolQuicksortProcess (computing)Computer animation
05:59
MaizeCountingProcess (computing)PasswordDataflowSystem programmingType theoryEmailElectronic mailing listException handlingVarianceNamespaceGenderWordInfinityLine (geometry)Projective planeLevel (video gaming)Data managementMaxima and minimaDifferent (Kate Ryan album)Student's t-testData storage deviceData modelBitStudent information systemFaculty (division)1 (number)QuicksortMessage passingRaw image formatForm (programming)Selectivity (electronic)Boilerplate (text)AuthorizationCodeStaff (military)InformationComputer animationProgram flowchart
08:53
Mobile appQuery languageSoftware frameworkSystem programmingMultiplicationBusiness modelElement (mathematics)Group actionForm (programming)BitDeterminantMappingCore dumpDatabaseLibrary (computing)CodeValidity (statistics)Revision controlServer (computing)Sampling (statistics)State of matterInformationObject-relational mappingRead-only memoryPresentation of a groupInstance (computer science)Template (C++)Default (computer science)QuicksortStudent information systemMultiplication signData structureData managementTable (information)ArchitectureCASE <Informatik>Online helpStudent's t-testSoftware testingCrash (computing)Casting (performing arts)GenderArithmetic progressionInformation systemsSequelLoginFigurate numberContext awarenessLocal ringString (computer science)
14:03
Data dictionaryWeb 2.0Parameter (computer programming)View (database)Category of beingLibrary (computing)ResultantInteractive televisionSoftware testingInstallation artDatabaseGroup actionServer (computing)CodeFlow separationNumberSampling (statistics)Exception handlingMetamodellError messageInterface (computing)Electronic visual displaySet (mathematics)AliasingProjective planeOperator (mathematics)Functional (mathematics)Right angleCuboidElectronic mailing listDifferent (Kate Ryan album)Decision theorySlide ruleQuery languageBitSystem callTable (information)Writing1 (number)Formal languageTheory of relativityMereologyWater vaporUniform resource locatorSimilarity (geometry)Video gameUltraviolet photoelectron spectroscopyCycle (graph theory)Dependent and independent variablesPoint (geometry)Reverse engineeringACIDRelational databaseComputer fileSystem programmingRemote procedure callCASE <Informatik>Sensitivity analysisRead-only memoryUtility softwareQuicksortObject-relational mappingPlug-in (computing)Form (programming)Data managementCase moddingArithmetic progression
19:14
CodeExecution unitService (economics)Product (business)Exterior algebraSystem programmingRow (database)Group actionForm (programming)Process (computing)Revision controlTrailSystem administratorDirectory serviceShape (magazine)Scripting languageRepresentational state transferMessage passingEmailData managementError messageEncryptionPublic-key cryptographyComputer fileUtility softwareMultiplication signAlgorithmFunctional (mathematics)View (database)Field (computer science)Pattern languageHash functionLibrary (computing)CASE <Informatik>Different (Kate Ryan album)LogicBusiness modelState of matterType theory1 (number)SequenceRight angleLine (geometry)Object (grammar)Client (computing)GoogolMathematicsTouchscreenFacebookSystem callKey (cryptography)Address spaceProjective planeInformation securityReading (process)Operator (mathematics)Codierung <Programmierung>Series (mathematics)Solvable groupFlow separationPasswordData dictionaryFigurate numberMatching (graph theory)Uniform resource locatorInstallation artMobile appComputer animation
26:41
CodeView (database)Message passingEmailSystem programmingString (computer science)Electronic visual displayTouchscreenInverse elementMathematicsFraction (mathematics)Interactive televisionSampling (statistics)PasswordForcing (mathematics)Maxima and minimaMultiplication signWeb pageFrustrationCASE <Informatik>Keyboard shortcutKey (cryptography)Right angleTerm (mathematics)Scripting languageDifferent (Kate Ryan album)Process (computing)Type theoryWordData dictionaryMobile WebOnline helpProjective planeBlogReal-time operating systemDot productBasis <Mathematik>Goodness of fitPoint (geometry)Address spaceWaveRule of inferenceRepresentational state transferWeb applicationInformation and communications technologyAttribute grammarForm (programming)Slide ruleStapeldateiValidity (statistics)1 (number)Scheduling (computing)IntranetTheory of relativityProduct (business)Drop (liquid)Software developerPattern languageSet (mathematics)CuboidBit rateComputer configurationQuicksortWindowThresholding (image processing)Entropie <Informationstheorie>Student information systemLink (knot theory)BitMassService (economics)Single-precision floating-point formatLibrary (computing)LengthMereologySystem administratorFirmwareUniverse (mathematics)TelecommunicationRow (database)InformationCanonical ensembleRandomizationGoogolStudent's t-test
35:09
Direction (geometry)Standard deviationInstance (computer science)MathematicsStrategy gameSoftware testingBusiness modelSpacetimeNatural numberFlow separationExterior algebraDatabaseRevision controlParticle systemRemote procedure callCASE <Informatik>Data storage deviceInformation securityLocal ringContent (media)String (computer science)Roundness (object)ImplementationInformationSystem programmingQuicksortPoint (geometry)AreaProcess (computing)Total S.A.Mobile appOnline helpView (database)Substitute goodMultiplication signExecution unitLimit (category theory)Directory serviceRight angleINTEGRALUnit testingLevel (video gaming)Arithmetic meanGroup actionDefault (computer science)GenderCustomer relationship managementCodeAuthenticationModule (mathematics)Data conversionAuthorizationStatistical dispersionArmFood energyTable (information)SequelPasswordFlagVirtual machineProduct (business)BuildingBlogIdeal (ethics)Utility softwareForestEntire functionKey (cryptography)Term (mathematics)BitEscape characterParity (mathematics)MyspaceProjective planeTraffic reportingSoftware bugLecture/Conference
43:18
Lecture/ConferenceComputer animation
Transcript: English(auto-generated)
00:03
Great to see you guys. It's been fun at the coffees and lunches and dinners discovering
00:22
how many of you work with Django in academia. It's, you know, it's something that I sort of came into by default in academia but in other roles have had to champion and push for. So, seeing how many of you have had success getting Django to the enterprise is pretty exciting. So, I work at the California College of Arts, currently working
00:42
on a big portal system but before I could build that, I had to build a big identity management system and I'll get to what that means exactly in a minute. The slides are at this URL if you want to track them down later. So, California, oh yeah, so quick note first. So, when I
01:00
gave this talk at SF Python Meetup, one of the pieces of feedback I got was that there was too much code walkthrough and so, you know, I was going to eliminate a lot of code and then other people said, well, yeah, but when you find the slides on Google six months from now, you want the code in there, good point. So, what I've done is moved most of the code samples into the end of the slideshow so they're there for future Google searches and I'll just sort of touch on those briefly here. So, California
01:25
College of Arts, two campuses, Oakland and San Francisco, relatively small campus, about 2200 students, hundreds of faculty and staff and as a lot of you are probably familiar with IT systems spread across decades and dozens and dozens of systems that all need
01:45
to talk to each other, some of them modern, some of them legacy, some of them with APIs, some with no APIs, but somehow you've got to glue it all together. And, you know, like most campuses, we have lots of external and internally hosted web systems, everything from Moodle learning management system to VoiceThread for collaborating on art projects
02:05
in real time, Simplicity for selecting housing, PaperCut for the art students to print their work, MediaCore for sharing large media, Razor's Edge, Vault, WebAdvisor for course selection and one way or another, everybody needs to get into these systems, needs to
02:22
be able to find them and needs to have a central and unified identity. So, you know, have to get this right. So, you know, there's a lot of mission critical weight leaning on this project. So, you know, it all comes down to LDAP in the end. We had a traditional
02:42
LDAP server which we recently migrated to Fedora 3D9. We're also a Google app school, so everybody has, you know, Google Mail and Docs and Calendar. And then there's the SIS, the Student Information System which is currently in Datatel from Colleague which is a very old and cranky legacy system which I found a really elegant way to talk
03:02
to. And then recently the introduction of Workday which is a human resources system and will later replace our student information system. So, when people log in to all of these external systems, you never want to go through the process of making them register or sign up for an account. Instead, we hook up to something called CAS, Centralized
03:22
Authentication Service, common on campuses. CAS interns talks to LDAP. So, you know, as soon as you try to log in to Moodle or a WordPress site or whatever, you're taken straight off to CAS, CAS checks LDAP, LDAP says you are who you say you are, sends a token back. So, you know, these systems are set up like whoever LDAP says is okay,
03:42
go ahead and create an internal account for them here on WordPress or Moodle or whatever it is. So, the system that we needed to build needed to, by the way, is that familiar to anybody? You know, Daniel Johnson just around the block is this mural on the corner of a restaurant called Thai. How are you? So, this system we needed to build would
04:04
do things like activate new student accounts. So, you've been accepted by the campus, you're given an ID, now you need to create this account that's going to follow you throughout your campus experience. We also have, you know, newly hired faculty and staff are coming in through Workday, they need to do the same thing. Staff, people need
04:22
to change their own passwords, staffers need to be able to change passwords for people, your contractors need accounts, we need to set LDAP entitlements so so and so can use the big fancy printer. We have to set Google organizational units because we're using the Google admin API as well. The super users need to be able to edit raw LDAP
04:42
fields, enabling disabling accounts, email aliases, delegated accounts in Google, LDAP groups, all kinds of crazy stuff and it all had to be done through this one central place. So, the experience of this for a student is that they can change the password or they can activate an account and the experience of it for a logged in user is they can
05:01
change a known password but the experience for help desk is a whole bunch of powerful tools and utilities and for super users even more of them. Now, because when you authenticate through CAS, it's going to create a shadow account in that system, that's kind of the CAS standard, you need to be mindful that whatever usernames that are
05:23
going to be created in LDAP need to conform to the lowest common denominator of systems. So, while Google may allow a 48 character username, the old data tell system is not going to allow a username that, so you need to shrink it down and same with the diacritical characters and foreign characters. So, you need to sort of survey all of your campus
05:43
systems and say what are the lowest common denominators because that's what we're going to allow into LDAP. Password is not an issue because those are all handled in LDAP, you're still logging in through CAS and we're not going to store a usable password in that system anyway. So, just a quick workflow of the process of activation,
06:05
so this is the Django based system over here, so they're hired in Workday, when they come in they're going to verify their account against Workday, it says you're verified and that step will create the LDAP, well they create, select a username and password, create an LDAP account, create a Google account and then meanwhile need to get their
06:22
newly chosen email back into Workday or back into Colleague depending on the type of user that it is. So, a lot of steps there we need to keep track of and then we have various permission levels, so you know that's a little bit of what the system ends up looking like. So, yes, so the activation paths for students versus staff and faculty, they
06:47
start differently, so students are validated against the student information system, the legacy system, staff and faculty against Workday, but they both do the same username and password selection, so two different forms funneling into one shared form that ends up doing all of this stuff. So, username, so previously we went with the old first initial
07:08
last name thing, but the namespace is running out, we've been around for 80 years and people want more flexibility, but we don't want to give them infinite flexibility because people can find innumerable ways to create offensive words. So, what we wanted to do
07:24
was if you, you know, said that if your name is Django Reinhardt, we would provide you with a prefab list of usernames that are guaranteed to exist in LDAP. So, I've written some Python code to come up with these variants and if you have a nickname, we allow
07:41
you to put that in there as well. So, I've been working on Django projects for ever since 0.96, you know, I've worked on a dozen major ones and the one thing they all have in common and the thing that most of us love about Django is how amazing it is at managing data and, you know, sort of this philosophy that start with your data
08:03
modeling, get it right and everything flows nicely from there. The really big difference with this system is that it didn't store any data internally. It's all about talking to external systems and so there's the first big Django project I've worked on that really wasn't about internal data management at all or just minimal. We use Django's
08:21
auth system and then I put dotted lines around these TMI is a semi-exception, I'll talk about it in a minute and the same with logs. But, you know, the systems that we're talking to, we're talking to APIs, we're talking XML, we're talking SOAP, doing the CSV shuffle into and out of legacy systems and because we're getting rid of a lot of the things
08:45
that Django traditionally does, the ORM namely, you've got a lot less boilerplate, a lot less stuff you're going to find in the Django docs and you're writing a lot more raw Python. And then we've got these permission tiers, you know, your typical anonymous user and a logged in user and a super user and then, you know, people who are in the group
09:01
help desk. So, this is not the, these permissions become interesting for these help desk users because it's not the usual, you know, if user can edit books or if user can create, you know, ISBNs or whatever. Suddenly, it's just a, we needed permissions that were based on your group membership and I can't believe I never came, you know,
09:23
hit upon this before but believe it or not, there is no native Django decorator or template tag to determine if a user's in a group. It seems like basic, I would like to contribute this to Django. I'm going to hopefully talk to some of the committers and see if there's a use case for it because I certainly think there is. So, I ended up rolling my own
09:43
template tags and decorators to determine group membership and handle it accordingly. There are code samples for that in the appendix of this presentation if anybody wants to check them out. They're not that complicated or difficult but it just seems like the kind of thing you would get by default. So, then the question is, well, if you're not using
10:01
the ORM, you know, which is sort of the core, why use Django at all? What's left over? Well, there's actually lots, you know, there's still the enforcement of clean structures and styles to system architecture. Form validation was huge. I use a lot of form validation and, you know, usually you're working with model forms but in this case,
10:22
they were just raw forms with really crazy validation methods that were calling out to other systems and verifying your birthday and, you know, doing the same stuff you always do in form validation except more complicated. And then Django provides, you know, nice URL routing and the templating system and the session framework I used quite a bit
10:41
because we've got multiple forms and we're trying to save state between these two forms. And then there's the whole batteries included aspect of Python and the Django ecosystem and being able to pull in common libraries like Django CastNG, which allows a Django app to talk to these Cast servers. So, SIST, student information systems, there's lots of them out there
11:03
and most of them are really old and janky and cranky and impenetrable proprietary systems. Ours is Datatel. It has more than 800 tables and they're mostly poorly designed. No enforcement of schemas, no API, which is really different from, you know, when you create
11:22
your Django models, it creates a beautifully designed database. So, I really, really didn't want to, you know, do everything with CSV shuffles. I wanted to find a way to interact with this legacy system through the ORM. I have one ace in the hole, which is a system called TMI, which stood up a MySQL layer between the student information system.
11:43
It was read-only. So, I couldn't write to it, but I could at least get data out of it. So, what we ended up doing is, you know, your typical multiple database approach. You can see I've got my local Postgres and another one that's defined on some other host as a MySQL server elsewhere. And then there's this wonderful command, InSpecDB.
12:07
A lot of you may have seen it, but unless you point your Django instance at an external database and do its best to introspect it and figure out what its Django's models would be, it's not perfect. But it works only on a default database, so you have to temporarily
12:21
switch your default database, run that management command. And in my case, it was huge. I mean, literally go get some coffee while this thing runs. And I just got this, you know, multi-megabyte text dump. And a lot of that is secure, sensitive information, which I do not want to risk exposing in my app. So, you know, you don't have to take the whole thing. I was able
12:40
to just copy and paste bits of that out from the dump file and bring it into my app, not putting the dump file in version control because we just never want to risk that information getting out. And, you know, once I've just copied and pasted just the basics that I need back in, I'm able to, you know, bring that into my app. A couple
13:01
of things to note about this. When you're writing your models, you can actually specify the column that it maps to. So, you know, these are the, you know, it's internal names but I can use a nice friendly lowercase name. And then in the meta, managed equals false. So this tells Django, that's a read-only database I'm talking to. I'm not going to try
13:21
and migrate it. I'm not going to try and do anything fancy with it. I'm just aware of it. And then, you know, what person it maps to. You can write your own model methods and your own strings on that, et cetera. So, you know, this query doesn't look like very much but holy crap, I'm able to do ORM queries against this legacy student
13:42
information system and that was pretty exciting and that opened up all kinds of doors for us. So, progress, that's great except next time I want to run my tests, they crash and burn. Because this is a little bit of an outside use case for Django and it doesn't
14:00
really know what to do when you're talking to read-only databases because when you run tests, it wants to create a copy of each database on the same database server and it doesn't have write access to do that. So, it turned out to be a number of steps to get around this problem. So, first of all, you want to create a separate settings file for it that'll be just invoked when you're running your test. So when I run my
14:24
test, it's specifying the special test settings file. And in that, I created a separate list of test databases which were similar to the original ones except the remote one was specified as a local postgres table. And I had to also tell Django and the test
14:40
runner that where I've previously defined all those, that external database is unmanaged, now treated as managed. And so the result is that when I run the test, it looks at that remote read-only MySQL database and instead test it against a local postgres copy. I thought that was way too hard and I wish that Django had helped me more with that.
15:03
But the, in the appendix, you'll find an example of that sample test settings file that I used to get that all working. All right, LDAP. I'm not going to go too deep into, you know, learning LDAP. It's a pretty arcane language. We're spoiled by working
15:24
with a Django ORM about how easy it is to read and write data and deal with relational data. LDAP is not a relational database and it's got a lot of fancy terminology. There is a library called Python LDAP, which I'll come back to in a second, which simplifies query in LDAP but it's definitely not the ORM. So it turns out that those calls required
15:46
quite a bit of code. And I didn't want to litter my views with that code because they weren't really view code. And what I really wanted to focus on was like my views handled the request response lifecycle for that user's web interaction and the interaction
16:01
with LDAP would happen through an external library. So I ended up writing a pip installable library with all of my code for creating users and managing groups and managing aliases, et cetera. And then with just one call in the Django view, I could get a true or false from that. So, and that became interesting actually with those functions
16:22
like create users, should that return true or false? Or should it return the new object or false? Or should it raise an exception or true? And it was kind of case sensitive. I made different decisions in different parts of that. But this is not finished. It's a work in progress. It also includes similar functions for interacting with the Google
16:43
APIs. And I was thinking there may be other campuses out there who could use something similar. And I'd like to invite other campuses to collaborate with us on this. And either, you know, fork it and do pull requests. Or you may determine that, you know, there's stuff in there that's too specific to your campus. Everybody needs something different
17:03
so, you know, feel free to fork it. But there's the URL. So if you want to join me in this project, please do. It's called CCA utils. So, yeah, so just an example of the kinds of functions that are in that external library. This is, you know, Python, speaking
17:22
Python LDAP to take a dictionary of properties and create an LDAP user out of it. And then once that's defined and imported from within my Django view, you know, a form is valid, I can just do it as a one liner. If LDAP create user with a dictionary of arguments and then just display it and log it. So it really cleaned up my view code. Yeah, so Python
17:46
LDAP, Lightweight Directory Access Protocol, you know, I sort of had to forget everything I knew about relational databases. You know, I kept hitting these really frustrating points. And this is a great example of its inability to do reverse lookups. So, you know, given
18:01
an LDAP group, I could really easily get a list of its users. But given an LDAP user, there's no way to get a list of the groups that they're in without going through all the groups and iterating them. There is apparently an LDAP plug-in that the system in could configure but, you know, it doesn't come out of the box and our system in wasn't
18:20
on that for this project. So Python LDAP provides, you know, some basic CRUD operations which, you know, this makes, this looks very clean and easy but, you know, building these mod lists in particular, how to construct those took a lot of trial and error and you'll see examples of those in the CCA utils. So we had this talk yesterday from Russell
18:42
Keith-McGee about the new meta-model interfaces. And wouldn't it be awesome if somebody took that new capability to wrap, did I miss a slide? Whatever. If somebody took that capability and wrapped it so that we could use some ORM like syntax or the ORM itself to speak
19:06
to an external LDAP system because this is seriously how I felt compared to how I was used to feeling. So, I went into the project committed to Python 3, you know, I
19:22
Greenfield's project should be on Python 3. It turned out that Google's Python client library for the admin SDK wasn't yet Python 3 ready and I was really dependent on that so I was stuck with Python 2. So, there is a totally separate LDAP library for Python
19:44
3. It's called LDAP 3. The syntax is different. It's not a drop-in replacement. Fortunately, Google did update their Python API client library to work with Python 3 right towards the end of my project and I was out of time and couldn't, so I was frustrated, especially for reasons like this because LDAP expects UTF-8 encodings everywhere and so
20:05
my code is just littered with these stupid dot encode things. Python 3, one of its, you know, big deals is that everything is Unicode and so that would have not been necessary but maybe next summer, we'll see. So, you know, just an example of conducting a simple
20:22
LDAP search. You know, the syntax is not too horrible. So, one of the interesting things that came up was that, you know, there are a lot of stakeholders in this project, a lot of people who are involved who want to keep track of what's going on. They want
20:41
to know every action that's taken place in the system that modifies data in any other system in any way, shape or form and they want to know who committed that action and at what time and whether it was a success or a failure. And, you know, rather than just doing the typical logging, I thought, well, this is kind of a perfect opportunity to use the Django admin. So, I built a simple logging utility and a function to
21:06
call it. So now, the, you know, any stakeholder on the campus can now filter in the admin by LDAP or Google or Workday and by success or failure and they can search for usernames and things. And then, there's also a lot of anonymous use of the system for people
21:20
creating accounts for the first time. So, in those cases, the user was anon but we just trapped the username and logged it as well so we could still see what was going on. I loved how easy it was to write this code. It was just a really, a really simple model with a few fields and, you know, and then this function and then I can just
21:41
call, you know, log action with a series of attributes, the username and the action message and the success failure status, et cetera. And so, all of that just happens automatically. I've been also wrote a management command run by Cron that would query for all log entries marked as failures in the past 24 hours and I emailed that out
22:05
to a bunch of stakeholders everyday or I don't, the system does. And we also have import scripts so we're doing certain actions that are, you know, we're getting CSV imports and processing those on Cron jobs and it can also write to the same logging utility.
22:21
Just a random note, if you need to create passwords, do not try and do it yourself and deal with all the nasty encryption library stuff. Import hashlib, it's a PIP install and it's, you know, really as simple as picking your encryption algorithm and then when you send it in to LDAP, the password fields are prefaced by the algorithm that's in use.
22:42
Makes it so easy and you don't have to worry about it. So, all of that seems pretty clean but that's not my desktop by the way. There are certain aspects of the system that no matter how much I tried to clean them up and simplify the views and, you know, like anyone don't want long, crazy functions but in the case of activate user, you know,
23:03
once that form is valid, a whole bunch of things have to happen in sequence and I need to log all those things. So, you've got different types of users starting the activation differently, we're saving state between the forms, some users have nicknames, we're displaying success and error messages differently to different types of users, you know, sysadmins versus end users, trying to build the data objects correctly so that LDAP will accept
23:25
them, hashing the passwords correctly, adding the right users to the right groups and entitlements and then going out and creating a Google user in the same step. So, you know, I'm ashamed to say that I've got now a 200 line function in there, activate step two but I can't
23:42
figure out any way to make it shorter because it all has to happen at once. Okay, talking to Google. So, there's the other piece of this is that for every LDAP account, there needs to be a matching Google account in their directory. So, for this, you use Google's admin SDK, aka the directory API. They supply a Python client library. Every action has
24:05
to be done by what's known as a service user, it's a special user you create in Google and give access to. And it also has to be referenced by a sub-user which is a human on your team who is ultimately responsible for that change. So, this would be like a sysadmin's email address or something. So, these calls have to actually invoke both of them. And
24:24
I got to learn about two-legged OAuth versus three-legged OAuth. So, three-legged is the kind you're used to seeing where, you know, Facebook would like to do such and such on your behalf and is this okay with you and you throw in a little dialogue in the user's screen. In this case, we want our system to interact with Google transparently
24:40
to the user, they never need to know about it. So, that's two-legged OAuth, it's just between the two, Google and our system and the user doesn't know anything about it. So, you know, basically the way that's set up and I won't go into all the detail here but you need to, you know, look in their APIs and every API call has what's
25:01
called a scope. And the scope is a URL that refers to some capability. So, if I want to get a group, I discover that it's this scope and I capture that. And then step two is you go into the Google Apps admin console, dig way deep into security advanced API client access and you associate the service account with that scope and that gives that service
25:24
account permission to execute on that scope. And then I'm able to write these reusable functions and just like the LDAP ones, these live off in our reusable PIP installable CCA utils library. So, I've got a, you know, reusable get off which involves opening
25:41
the key file they provided and signing it with the client email and the private key and that scope and then that subuser I mentioned. And then finally returning that off handle which then can be utilized by second functions such as, you know, building a service. So, here's updating the user record at Google. After you go through this step of, you know,
26:05
getting the off, you build a service. So, this is the alternative to, you know, working with like a RESTful URL. So, you build a service in the admin SDK, the API version directory V1 and once I've got a handle on that service, service.users.update with the key, that'd
26:26
be the email, so that's the email of the user we're operating on and then the body would be a data of a dictionary of data and you can execute it all at once. So, whether that's easier than using the REST API, I'm not sure but once I got it down, it was pretty
26:44
straightforward. And then, you know, once all that exists, then it becomes a one-liner in my view. So, I'm not littering all my view code with all that stuff and all I have to do is, you know, build a message string and display it to screen and log it to the logging system. So, then we had this need to set up email delegates.
27:06
So, Google has this cool system where people can access another account without entering the password for it. So, I can say, you know, Joe is a delegate for Mary and now Joe can access Mary's email with his own password. Unfortunately, that's not part of the admin
27:21
SDK, it's part of their email SDK which hasn't been updated to these newer systems and so it was a whole another API exploration. Hopefully, you know, it's not service-based. I have some sample code for that if anybody needs it. It returns XML instead of JSON, so now we're in a beautiful soup land. This stuff just got hairy fast. All right, more
27:43
fun. So, rethinking passwords. We got really frustrated with the morass of password advice that's out there because people, end users, are getting contradictory password advice from every sysadmin. One sysadmin is saying, random, random, random. It has to be random
28:01
and long and others are saying correct horse battery staple. You know, you can use plain English words as long as it's long and unguessable and others saying, you know, well, if it's short and random, that's better and, you know, you've got systems saying, you know, you have to have an uppercase and a lowercase and a punctuation
28:20
mark and what we care to, oh, and people need to be able to type them into a mobile device, right? So, people want, have different desires in terms of creating a strong password. We felt the rules were in the way of good pass phrases, but you know what? We didn't care. We're just like the honey badger. We don't care how you got there, right?
28:41
So, if you want to do an eight-character fully random password that is considered strong, that's great, or if you want to do, you know, correct goat battery staple, that's cool too. The problem is that now you're outside of rules, you know, and so now how do you do your form validation? And it turned out to be pretty difficult with different sets of
29:04
rules and I start looking into this problem and I discovered that Dropbox actually created and open sourced a solution to just this. It's called zxcvbn for the, you know, the lower left row of your keyboard and this is actually what they use in production at Dropbox,
29:23
but it measures password strength as a matter of what they call entropy, but essentially you just set a strength threshold and it takes a whole bunch of things into account. It does crazy dictionary lookups, it lets you penalize certain strings, so I was able to penalize the name of the college, penalize your own username, penalize examples that are given on the
29:47
and let people create passwords however they want. So the issue with that is that it's JavaScript-based and if it's JavaScript-based that means dictionary lookups are hard because you can't be transmitting dictionaries over the wire. But we wanted the really rich
30:03
interactions, we wanted a whizzy, you know, strength bar that would go up and down as you typed in real time. So somebody ported this to Python, that was excellent. Now I could do it on the backend, but then I would lose the rich JavaScript goodness. So what I ended up doing was
30:22
using it on the backend, making a simple JSON endpoint that you could call with any given password string and it would return zxcvbn's whole dictionary of attributes and penalties and, you know, ultimate strength. And then I ran it through jQuery to bounce because we want to
30:41
have, you know, pretty real-time interactions but I don't want to call that API every time you touch a key on the keyboard. So I use jQuery to bounce to wait 250 milliseconds after you stop typing, then it would call this endpoint and this thing would go up and down. I was going to do a live demo but we've got some screen sharing issues here. But this does exactly what
31:01
I was saying, you can have eight characters of totally random or you can have a nice long memorable Sgt. Pepper's Mr. Kite type passwords, whatever you like. I ended up doing a big blog post, so all the sample code for this system is at that URL which you can get out of the slides later if you're interested in a system like that. So Workday. Workday
31:25
is this modern, currently mostly human resources and, you know, payment and scheduling system. They are also introducing a student information system to it for the future and other pieces will come along. We are one of three pilot schools out there who are going to be experimenting
31:46
with their student information systems. But we did move all of our hiring and payment stuff to them. So because I have to validate a user against Workday, I had to talk to their APIs and, you know, quickly hit some roadblocks. It got frustrated because, you
32:03
know, in my world, in our Python, Django, modern web application development world, it's all about rest. Now, APIs are all about rest. But they're very much in the Java, Windows, heavyweight system world and most of their stuff is SOAP-based. They claim to have a rest API but it doesn't do everything. So after trying and failing
32:25
to get what I needed out of their system with rest, I was thrown back on SOAP only to find that it's, you know, considered so sort of deprecated in our modern world that there's, you know, even SUDS, which was the most famous SOAP library, has been deprecated
32:40
and replaced by SUDS Jericho. And I couldn't find a single person online who was interacting with Workday and SOAP. So it took a fair bit of experimentation to get that working but I did and I wrote it up in a gist. If anybody is out there and needs to do this, there's a link to how we ended up solving that. So that's what I've got for
33:01
today. And, yeah, then the code samples are here if anybody wants to see them or we can open it up for questions. Any questions?
33:22
I have a question. Knowing, there's a lot of universities out there and knowing what you know now, like what would you, if somebody asked you to do this again, what would you do differently having gone through this huge project? Right. Differently, well, I mean I have the advantage of time so I would have been able
33:44
to start with Python 3 so that would have saved a lot of frustration. But, you know, if I had to do it back when I did it, then I still would not have had that option. I mean I learned a lot in the process. It would have gone a lot quicker. I would have had a lot more sample code to work with. But, yeah, I think my biggest mistake was
34:06
not communicating enough with some of the legacy data teams about, you know, certain ways they wanted to interact with the student information system. Yeah, so communication is probably the biggest thing.
34:20
Hi. You talked about the least common denominator for when you're trying to design it. But what if that changes in the middle? Like, you need to interact with another system due, which seems to have a max length of five for username. Yeah. Contrived example but I mean. Right. I mean it's a case by case basis. The one issue that we hit there that
34:41
we didn't, weren't able to anticipate was these canon copiers and their fax capability. So, you know, somebody could go and fax from a copier and it turns out that they didn't like dots in usernames. Are you kidding me? Or your email address is right. So we were able to update firmware on those to fix that. Another system that, you know, suddenly had
35:04
a much lower, I mean we try to fix that system, right? We're trying to move into the next era here and hopefully not be dragged back by the legacy of the past. But if you're forced to, you know, you might not have, we might have to limit usernames to 12 characters for everybody or whatever if it really, so it depends on the system and whether it could
35:21
be upgraded. Thank you. You mentioned that you stored some information in the session and I was wondering if you ever ran into conflicts with like where your Django session was getting out of parity with another like systems session or anything like that. Nothing there. The only few issues that I had had to do with like the help desk
35:45
users and the super users who were doing things like helping users reset passwords over and over again. And so you'd have like a session variable leftover that was interfering. So you know, those were bugs that I needed to clear out just that session variable. If I understand it right, something changed in Django, I think in 1.8, you know, my first
36:02
approach on that was just to kill all session variables. That turns out to now it didn't used to, but now it will actually log you out of the Django app you logged into. So that turned out to be a not viable solution. I actually had to actually delete session variables one key at a time, but I haven't had any conflicts with other systems
36:21
on campus now. I've used Python LDAP a lot and I've gone back and forth in my apps. Like I've tried to make it so that we're just using their Active Directory groups for all of our auth, like authorization and permissions. I guess, yeah, authentication
36:42
and authorization. And then I've ended up having to make some things really quick where I actually end up using the Django, the permissions, you know, and actually putting that in the database. Have you, do you have opinions on that? I mean, have you had to do that? And do you have a preference or is there? I haven't had to and I would strongly resist it. You know, this is the system that I'm building is not a canonical
37:02
data store. It's a canonical place to make these kinds of changes, but it needs to be making these changes out in external systems, you know, not within itself. So no other system is going to look to my system saying, what permissions does such and such user have? We eventually will bring in Grouper into LDAP and that will make it even more,
37:23
you know, in LDAP land and not in MySpace. So yeah, I'm trying to store as little data as possible. Really just the logging data is the only thing. Yeah. Okay. I've literally
37:43
never done that this is more of a comment than a question, but I'm really excited because I presented yesterday on Project Callisto, which is a system for sexual assault reporting in college campuses and we're using your implementation of ZXC-VBN and it's really, it was super helpful. I love the walkthrough and so yeah, it's really cool. That's
38:01
awesome. Thank you. I'm very happy to hear that. Thank you. Yay. Yes, I had another comment as well. It looked like in the example where you were building an LDAP filter string, you're using Python string substitution, but actually there's an LDAP string substitution
38:21
that will sort of in the same way that database string substitution will prevent code execution. So what's the advantage of doing it that way? So if you have characters that you're trying to substitute in, you won't be able to sort of break out of your LDAP filter and
38:45
it'll escape characters. Yes. It would have a security advantage. Thanks for that tip. I didn't know that. I was writing Python and so I wrote it Pythonically, but I'll have a look at that. Yeah. You talked a little bit about testing. I was wondering what you had in place
39:05
for not only unit tests, but mostly in terms of functional unit tests. Did they actually talk to the legacy systems? Oh yeah, good question. So I mean there's sort of several levels of that. There's the CCA utils which has the modules for talking to LDAP and Google
39:24
and those have very basic unit tests and I would mock data and I have a test user that runs for a lot of that stuff. And then there's the Django side and that's where most of the functional tests are. So you know I'm testing entire views and things,
39:44
but is your question about the risk of modifying data in a live LDAP system? Right. So in my case I'm restricting it very carefully to a single designated what I call direct test user. It would be great to have a secondary LDAP system that I could test against
40:03
just like you have a test database. I'd feel better about that but it's not feasible in our case. Any other questions? So we also have something at work. This is slightly not related to LDAP exactly, but when
40:30
you talked about testing you guys spin up your own like an additional DB to do the testing against. And so what we do at work is a slight work around right now and
40:40
it's something we were going to solve probably next week, I don't know. But Django 1.8 comes with the keep DB flag and if we want to run tests locally like on my system then we run it against our local DB, we clear it out first. But how, like can you talk a little bit about how, what you had to do exactly, like you had a custom manager to work around
41:03
to pull up your own DB. Is this something that is, like just if you could just talk about what exactly you do there. Yeah I mean I do it because I have to, there'd be no other way to run tests. As long as you're defining two databases it's going to try and create a test version of that database on that same host by default. So I basically, I literally
41:22
could not run tests unless I'd gone through this process. As for keep DB, the, I think the point of that is really to speed up your tests, right? I mean there's, you know your situation well enough to know that there's no downside to keeping the same test database between test runs. Right. Yeah. I think what I, what we do is we actually
41:43
just, at least on my local dev machine, not CI, but we use the, an instance of the DB to develop with and we, I just run my tests on that with keep DB. This way I don't have to pull up a separate DB for testing which is far from ideal. Please don't ever
42:00
do that. The advantage is we have a lot of custom SQL that we need to get those tables up and running, which we use in production and we use a mirror of that. So this is the advantage that we get because we also use the managed false flags and we're hooking onto a system that we don't define the models for. So I don't know if this is like, it seems like a forest strategy because any test data which conflicts with your dev data
42:23
can cause issues. But yeah, I don't know. So, so that's an alternative to constructing a copy of a remote MySQL database as a local Postgres database. You can just use it as keep DB all the time. So yeah, I guess it'd be, I'd be concerned about the integrity of the test because now you don't have a 100% guarantee that the test database data
42:46
that you think you're testing is actually the data you're testing or do you? I mean, I'm not sure. Maybe this is, this seems like it's going to be a longer conversation if I'm allowed to. Okay. Maybe you can show me some code. Yeah, sure. Yeah. Yeah.
43:01
I did come across keep DB as a, as a tip for speeding things up. I came across it by writing it, but I missed it. And then I just realized that Django had one, so I'm. There you go. All right. Awesome. Thank you. Thank you, Scott. All right. Thanks.