Liberation and modernization of government legacy data using Django

Liberation and modernization of government legacy data using Django
How the government of Puerto Rico is making the release of government data and interagency electronic communication a reality using Django and a stack of Django and Python tools and libraries. This effort resulted in the creation of the LIBRE API engine.
and thank you thank you thank you and you and of the order of so only at the number of and this is the last time and you're tired you wanna go by copying going to be doing my best to keep you awake so I
what for critical this is for terrific of custom intricate revocation company we're is itself a development and we love tangle myself on the company we been
markers of many high profile of the younger products like the schema migration product that high-performance jungle but it by the run and we are co-sponsors of the gender framework Kickstarter I've been
wounds of a development which is years my 1st women think that came I have something of the recent intended so if you ever got fired by playing a sloppy made pixelated gains in your cell phone that's I knew about and in the world and on for a few
products and on 1 of the most of the reason was that would have been a lot of disabilities mining medium as a document management system entirely on jungle so
how I got into into this mess and 2003 thing I was appointed director of development for the governor for everything and I had to oversee the creation and use of sulfur in the government and 1 of the other part is I was handed was that the development of Perego had just signed an executive order on ordering all government agencies to stop sharing the electronic but we had no no infrastructure to that and
this is a problem this is the the scenario I was given we have a we have at the time 142 government agencies each of them creating an that in completely incompatible formats and with no way to check the data some of the most forward-looking AUC
tried to to fix the problem on themselves but because there was no policy no side the end solution was the same everybody just get wasting money to in a completely and not inter non interoperable interfaces are export of the data the so
premises was my reaction what is going on here with so I realize we be not understood
problem so the 1st thing we did was just make a check what do we need to make this happen OK we need an X for and universally not compatible XpertRule where we can take any data government data and exporting into the new format like Jason and XML and regardless of what the original file format was and then we realized that does not exist in the universe so basically we just have to create it
also Selfridges where a new experimental past suffer development department for the government of less on name the start development developing in this way we came
on very came up with the leverage is I watch left about 100 and to create an engine to free up government data the varying in English means free so it was also kind of of applicable a political statement and this is
the ego-ideal basically well that the platform actually managed to we can take completely Ituri of the discourse regardless of format and and the place where the data is being originated and just by doing a simple the description of how the data is our incorporate a structured we can import government data so we can do version of the government the the and we can start hosting also open government data from the same product because infrastructure 1 of the big problem in the government you can have a you have very few government agencies that have good infrastructure most of them I will collapse as soon as they get 100 concurrent users and we also had to create a unified query language because our users are no more technical did this the public is more technical so some people do once you something for but most of our user now just 1 access to the data of themselves to do so this the canalizing analysis mathematical analysis so we had to come up with at 1 end a fight ways that our our new clientele other developer clientele could filter could select what they want and we had to support as many output formats as sick as we could not just your reading of of formal we have to support Jason for for just a development we have to support XML so generous framework were was very crucial in the spot on now we can have completely outdated government data and it can be used in a lot of different scenarios so this is now where and how leverage fits in the whole ecosystem because I have no all government agencies producing data how they know how to produce it and we can no drop in this tool and they don't have to change anything internally how those things how they how they keep producing this data and get the data can now be shared can be used by all the government agencies or the general public and we can start turnstile varies all this stuff like this like a spreadsheet files that have no kind of validation they didn't tell you anything we can start turning them into beautiful stuff like this a developer can use without having to worry about importing the file we can turn complete all this stuff like this this is such a file that it is a very bad number and name for a chance to uh a file format because another single files the distribution of files you can blame as he SRI for that and we can turn them into this we have
the tools that support geospatial capabilities which is another big topic in the government the government has a lot of spatial of data is being produced but is being produced timing in of the performance import we we had more on matters the plane projection in 27 format such of form the was are standardizing 19 27 with molten that it is not an between still was not interoperable because this is it was a state-centric format and with this tool we can convert all those data 3 and completely transparently to build yes 40 a 84 which is a geocentric is our world centric projection so now Pareek and that can be plotted in and stuff that is assigned to work around the world so basically we're modernizing our government data but for developers for example
you know they can take a legacy check fell from the government and use in the lever platform they they can we can render map and a developer can just captured the map in I frame and you can actually now I'm incorporating geospatial capabilities in your software we got coming to write code just by capturing the rendered map from a worry you just features of of the platform and you can just said most stuff like
that as a form that I work that was originally came from how they that she felt that were just our accumulated be in in our government server and and from Excel spreadsheets so this is that this kind of
reaction was that it came from the developers but the administrators that hating because they
already have these amount of work that thoughts are very stressful people they they they had the weight of the infrastructure and the shoulders and having them create descriptor files for files that there were going to import into the platform was the universe that is of becoming another about obstacle in to the platform so
that's where that the jungle ministration are actual came into the rescue and on top of that and this will allow us to create this new in our web interface with a person without knowledge of how to create a kind of file can describe the file format that they're going to export so now they're works which are sometime the only technical person in a government agencies stuff now he can without having to know happier data scientists of developer no he can use this tool to start are exporting its government agency there and this in this reaction and this
is so for the moment so we 1 because we don't the itself so that that's how successful the the conversion to die but me what
so that they came from the standard that the path from got very popular for so we have a company whose name starts with M means very small himself at the end and they said no we already fixed this problem that you do a event in the way we have tools that create web services from all our databases now which ones have you work with web services the so which 1 of you like working with web services the know like the FIL the probably was services is if you don't have it not sufficient documentation have URI any 1 of you have tried to reverse engineer the complex type from a web service where the conditions it's not possible so your over the dependent and argumentation and what services have become a way to promote vendor lock-in and our problem is the standardization this way we jump from country is the 0 1 1 2 1 tool to myself to point out and as a 1 . 2 0 1 1 . 3 draft and never made it to the into the public because they were complaining and
interpretable and the tools are creating WSDL file that wasn't as Richmond about language files someplace create a description files which are not even interoperable between 1 then there are the other so this is like trying to to assemble an Ikea on table with instructions and bad things tends to
happen I how have I
so what services were just also so that also the tool has to be rest centric our tools not even if if they if if people didn't like we had
to do because there's a beautiful thing about rest rest and J. some are self-documenting even if you don't have to mentation these and you know it's a dictionary as a key value pair and even as cryptic as the key value is just to have a rough idea what this is and how to operate and because we are using jungle rest framework from the same solution that we can we explore use in eastern Genevra's framework vendors to a different formats and I love this this is wire company circles sponsor agenda-driven framework the browsable API allows developers to playing with the data to start exploring the data and get used to the data even if there's no documentation for so what I want a unified query language to be
able to access all this completely different datasets this is the thing we actually
got from the company whose name starts with M all we already saw that some people SQL that's used for accessing data where you want to create the the recreated will because stuff like this this is
the name of this is way at this was code from an actual website our government would them the the name and the government and there actually concatenating and creating an SQL statement In jealously trusting user input and not doing any kind of sanitation forward checks saw it took to the developer that this from the company in and out and out of what's standard asking if you about insulin injections annotation but I said not and then asking you more interesting question do you know what the table is how these
are now also about what higher so SQL was must out of the question to because this test was used as a
standard structure for a language we haven't told you that SQL stands for standard query language was playing we prove joke on you is that this is a natural question StackOverflow Ikari I did was created that platform because I want to know how to limit the amount of results in in our universal support opening and it turns out it's not even that simple things that he was standardized across databases
so we end up creating our own language is called illegal delete role query language now another problem we be exported tools is that you need Excel for a server and make 5 so still do that that element of vendor lock-in so well we viewed as we created a
RESTful query language basically the URL these they're wearing that will give you filtering a selection of slicing for the data here we have an example of this is a folic calling out from the music municipalities supporter regal and if you were in a small but if you see I'm asking having to resort to predicates and selling it give me only that chase the following whose properties in the name of immunity viewers municipality contains the fragment ignoring case so I get the more I know I was in as God was and instead beginning just data and telling it give me that render it into a leaflet this this is a very nice feature of generous framework there renders can also give you maps or charts or or or tables it they don't necessarily have to be numbers or or our to realize they this is an example of a simple
learning this is a declining points of the Department of police and we few changes for that finds so sigh for aggravated aberration this is a kind of thing now we can filter we can start analyzing just by rewriting a simple you Arnold because the
we we're using a jungle restoring incorporating jungles and templating system into list let pop up and markup language and now from general we can start customizing creating customized customized and we created a not builders and we can start doing stuff like this I can't take it file from 1 government agency aside to
stuff like this this is the whole universe universals of primacy porta rico being filtered by the resold wearing from a polygon of a municipality from the very beginning work that that this is basically this is a joint between 2 datasets completely different committing to a completely different government agencies and this is a municipality centric wearing and this is
the URL that produces that there's no called is just 1 yard looks complicated
see in a moment is actually just 4 elements and even with the 1st 2 you can produce them out the last 2 i gives up on cosmetic markup the 1st thing is central in the Indian what is that the data I wanna work with this so this is a crime they I'm telling it filter all those times where the geometry of the crime in this case i . falsely then they geometry and they minus bracket is a supporting marker word there the encompassing geometry is the result set from a simple query to the planning board asking just for the polygon of the news of political as single and adjacent that is actually slicing the properties of the of the each year of the largest special feature I'm just giving me the data points that the the the the the map points and then passing that into the imagery and doing a filtering so this is this is this is basically a typecasting going wrong time from the URL this system then I this is the land they engine to friend amount not giving the points and to be able to the to see the
outline because dead in the about wooden anything and pressing also context to the rendering please pay me the outline so I know what a filtering because our knowledge of the of the language and that was out the barriers so we create also acquired build fort at all and this is where you can start experimenting for doing that you have view on the bottom and you can yes and you can stuff like producing there's also as a dictionary lists so this is already process to be able to be applied into a chart you have to do any post-processing for example in JavaScript you can take that as it is uploaded into stuff like the DTDs and already start point charts his and after you have that you do do I don't have to do is copy beta costing
and we can start using the doing stuff like this is this an egocentric SL centric results of the same crime or the same thing that I'm asking the engine show me all the crimes in a radius from 1 outstanding from and a query is even simpler I had the same police crime data but filtering instead of filtering for the resources of a on of filtering just for a point becomes pointless and don't have area and learnable offer which in this case in this projection and some level is just 1 1 box which correlates of roughly 10 miles some basically telling the the engine Jimi all that kind
that have happened where Timorese from grandstanding to see them In and and European mark or assaulted or killed and this is a good place for simple a lot of uh apart because the only thing that happens you so aggravated richness of his 4 so if I wanna point my car I know that this with my because can't that has not happened there in the time frame that the status of our how was created and we
can do well so this is called a feature analysis I can see how find behaves in regards to G and a geographical feature in this case this is that the 22 word because B is high weight and it has been criticized at the sun enough on police and this is this is the English but after the routing The the the preventive are patrolling so that this simple analysis and we can how to money you know that at the cell to the south of the highway there's this know kind happening at that in the times at which is 2 years is what that was on that so there is something happening at the north of the highway discussing a beta are crime rate I cannot tell you what it is but now I can give you the observation to do the right questions and this is the start of the scientific method I give you the servation I you explain why this is happening before this we had no idea it was even in
the quarter to do this is basically the same but is still circle on a we're creating a polygon runtime just for the points a sport so this is more in the know how the tool works this is now more time needed me the details because I cannot filter are gives users the data time they requested it is a very heavy operation we took a page from the analysis this is that our right once I warn write once read-many times all the processing was moving to the import face this thing for it the 1st thing we do is do a scary because I cannot trust the government agencies to give me that by basically have to go and get it forcefully if that's so great they end the next step is to do and use layers says tell the Indian how to get to the great once the danger has made out there so they are under drivers are layered which is details of how to understand what a process that the IceTagger XSL file press API he came from a tree fell then we see relies on the data to store it to be able to sort in another place because it's a binary data and and for this specific implementation which shows are basically for encoded pickle files told that they can be are stored in the base 64 people filed this this is you acute was to square and again nauseous so as not glamorous but for this particular implementation we wanted just to get the out sole no Mongolia me you know all fancy infrastructure just code 1 order of magnitude of the worst case scenario see you never tell you were going to get a business lesson from the John with so presented and this is now the dean read part this where we process the request for the data that just cookie cost of generous and were does most of it we make sure that the user that is axis and the data has access to the day said maybe we want to control the specific because just for government employees were maybe some public company public data namely passage for all custom in where that is the query is split into export profit removing aggregation segmentation that I sent the 0 I from the database and rendered in whatever format the user is asking it and then we despite that to the response the diseases 1 object the general risk framework that supports now then hand as this is a
bookmark of the presentation that's for the product died acid 12 years I became tiles the hey because you if you're suffer development in a place where nobody's technical to get a lot if you were in the government to get even more had to get hate explicitly implicitly and secret during the boss hates you secretly because you showing that is not prepared for the job toward your co-workers sorry take you implicitly because your this off developer to have this with some analogy and it you've refused to fix the coffee machine and the public in general George as a government lawyer at that point they're gonna hate you a softer after 12 years have been out like a lot of hate I decided to move forwards and not a company where work for every open for community and we are actually hosting our own copy of leverage and hosting public government agency in Africa and in our infrastructure so basically we're doing the artist and with that the state
hosted that now we can start at the the doings of equal self like this like for for example creating their are using completely are completely on this very there for example this is the the section
of the data sets of the Department of Energy of AUC has react much interesting data how much less they have how much energy they sold you can see in table but when you apply into stuff like the star
seen pattern just have seen correlation distance behaviors that should not be happening that for example in this chart and you see that the amount of in just appliances there the power company its and just using because this is from right to left at this time we had the having even implemented ordering in the end that's fixed so the part company has lost 2 1st to 3rd order induced of the induced your clients and yet their revenue for the concept of industrial in income never decrease that's not supposed to happen and so solutions of the 1st ones and stuff like this this is for example the dutch for of the
Health Department would regress isotropic tropical island we have a lot of mosquito-borne based deceased us sadly some people do like but this this is our preventable this just about and making sure that that that people get the help they need at the right time so this money allocated into awareness and if to see now a few when we plotted this and we added the the that of asthma the problem of asthma in the completely overshadows the problem of speaker the worst diseases and when you look at the amount of the budget has been allocated for asthma research and as an awareness it is a function of us mosquito-borne diseases awareness programs are getting and when you plots of like diabetes a completely crush as the problem of of of asthma even though both are chronically from the sees this as might have diabetes of real property really big problem in the and we do put hypertension so something very interesting happened the behavior of hypertension in the underlying almost directly correlate the behavior of diabetes in the island so a statistical was part of the correlation doesn't imply all session but you cannot deny that there is something happening the government long that that we have a
very big problem India and the town hall so that the people are leaving the town centers because of the knowledge you know they have netflix and supplied as and the government that the central government wanted to start giving free Wi-Fi in public spaces and they were storing over its well look at a few million dollars till we get this map of this is the map of all the municipalities which work by their own initiative the mean free Wi-Fi in the town centers so they were fixing the problem in the 1st place of fixing the problems and getting people already into the public spaces so just this stops save a few million dollars budget and this is the same
primer like I said created just using just an frame would just 3 filters municipality time and type of crime and we use and there's just seen time-based climates maps and you see how crime is more again it that you think prime behaves very differently from the time of year and you in his me from the time of day with seen how most crimes have a peak at 2 AM and yet I house the it speaks what was on 9 PM at 9 AM and 12 b and 12 usually the times the working class were outside homes so even doing stuff as simple as sending employees at different time brackets to have lunch the houses would have reduced the problem of house that this is
this was 1 of the interesting data this is the Department of solid waste barrier and they give it to me they were very nice it was 1 of the few government assistance that we cooperate the effort this really was worse was that I'm from just put it out there people are going to find a way to use this and they did this is a product
from happening actually 1 there's more what I that it was created by tree only uses the students in less than 24 hours is quality of tires and is this scenario for for the application is you're just doing internal tourism and the islands and suddenly you can't you got a flat tire in a place you have no idea I don't know anything about the place so the
application will give you an not using our technology and give you all the places where you can go fix your car before you become stranded with the metadata so you can call
negotiate places a view plate will give you the route to
get there some spots all this from 8 data from a government agency that actually disappear because that's how on imported the government think about it from a data that even the government the was producing thought must worthless now we can have a commercial or a product that can resolve our real social problems and this is a snippet of the code and
you can see we're serving because in the name of the company in the middle and you can see that actually they're feeding the application from the the instance the public instance we are we are a hosting they get the latitude and longitude via JavaScript from the user they just filter thinking so these efforts
that noticed by 1 way of some government agencies instances of statistics and they they they contracted us to start they have a massive amount information a maximum among data and very few tools to get to that so we got encountered with them and the stuff that's been happening with that data is amazing I'm going to try not to get you
all killed I The certifies mindless copy of Microsoft Office to the costs of yeah but the the the these are the map
I just showed you that this is using
all open source software the open source in the range and and the car BI and opens on the dashboard applications obligation I created from jungle . 6 you can see the behavior of the electric company you can see the behavior of the porta rico grid for the last 10 years and you can see the peaks and the valleys of how usage behaves in particular and aggressor predicting our which month of the year that great is most likely to collapse and the power company what did in these with they were sending the brigades in June and July and they so that there we realized September-October usually the most so they were paying over time in 2 months that nothing was happening and they don't have enough the gates of the times of the months of the year that it is the was obscene and we also have the interesting curve the electric company lies to make everybody uncomfortable and blame the problems of the electric grid in the people that you're wasting too much electricity this to have your lights but this charge a then no 1 . 3 million kilowatts in that 10 years ago very classically consuming less electricity now the thing years ago so why is that we continue to collapse is not because of users because of a lack of maintenance Tanaka shifting blame it can actually point fingers now so this is just the open source version and now this is the commercial version that we we
created from scratch doesn't space 64 anymore use more solution and the kind of
products were doing with it are much more interesting but for example this 1 yeah
excellence inquiry grace of research for people I leaving the island coming back up 1 rate on rate this is all bureaux transportation better combined with the we census data and this is the comparison chart of how many people are leaving the island destinations they are reporting using to leave the island yeah and the final aggregation 5 . 2 medium . 9 run up 5 . 2 million people left the island in 2013 and only 5 . to came back so i have a difference of minus 1000 residents in an island of all the 4 million people you can see the problem and prove the stuff like this I can start predicting the peaks that that touristic peaks sold out what kinds of years the government need to prepare to receive and to make enough accommodations for tourists to see if we can fix the problem
it and we can start doing things
like this and have to there is a
big problem and use it is a political issue the sense that the argument is that there's more toxic emissions in places where there is a lower economic income in the area and this is a great project to start experimenting with that it is a map that was found in some latency there is a map that correlates the amount of toxic emissions which companies are emitted then east in meeting that correct taxes there are register for it and the income level all the area compared compared to the mean most about the product of a ghost that for our GP GP the of the of the island and you get started doing experiments like the to see how this theory is correct in the island and you can see that some companies are Stein are starting to throw into the atmosphere this nasty
chemicals just a few miles behind your back yard nobody knows this on the we did this so that pretty much is my brother
if you have questions or
comments please be kind 1st and this and this thing