but so on my talk is about a previous scalar experience in the the Russian parliament and in particular its in Russian state the Duma so
agenda is so at 1st I will talk about a little about State Duma and the specific inconsistent that the base is n I will talk about data synchronization because the State Duma a lot of redundant systems as well as the site and many of open systems is and I will talk about was Open Data system and its implementation and involves full-text search and there are some changes in it so a little bit about
Russian parliament so in Russian Parliament goal of general assembly and in Russian and so did out someone in and inconsistent the upper house and lower house so up of policies fiddle console but it's not you know it's not the newsmakers us someone some group of people who always said the US president had had so that's not to do anything but state do use of some kind of quot that has a lot of editing sometimes of fighting the data clear on the system and so on so the name of the what it came from Russian toward the west is to think so the blue 1 is the the place where some people think of the of of the of the fact that it has to so what what
was that when are my company it starts the increase of the State Duma in it was in 2009 this it was our wedding all of websites in the style of you know in 19 so as so it price not much Meishan open data no data in machine-readable form what it was running on Oracle database it was so from what we have started
and full internal database was needed for site functionality as the 1st was the so called classifiers can be shared campaign you know some kind of dictionaries well like what did we see sort of like the sessions etc. and this then personalities in these in detail information about it the dots which contains all needed that offload dots at accepts was that documents itself and so 2 gigabytes also demonstrates so it's important that area water speed at end of the city in the state so this was of how largest that the is up another that the bases are really small so the
opposite we implement new websites it looks nice but we need to get a little deeper into the it's a books we implements a
variety of little bits of complex data synchronization because because of will walk was only public services you know we wouldn't it be implemented as an internal database associated means on oracle that and lasers and that we used in order to PG if I may use data to was the scale and then as a family internal several of the State Duma lower as it in the stock of the State Duma can approved what using public on his website because it's way you know I but basically politically important don't publish anything in character is because any incorrect information on the State Duma website is a big problem in this area the people publications is curve in you know in in in a minute so we'll appears as for instance on all the website and if box will for each DPT have information when he started working in in the stop at this and lots of if uh these GPT is working now the database holds the that Oracle that the way schools the the cells and the cells and here and I 1 day it was a period so on the website so that and that the world was full of misses the the abilities of have plant the will continue year missiles and so and this is the stuff of the web graph folker despondent consequences for this so I and when this information approved the leasing finalizes wind news external also and we fear seperate 0 for open water systems I will talk about it and the more later so order
to PCA is this innovation was soul 1st it also was up lineal synchronization soul of someone decides that this information is safe to publish have them very sparse and phonemization radius so for small that the basis we use the force in the innovation what's constants are quite last oceanfront lies each time for that's why re-implemented incremental synchronization forward too busy and it's also quite so easy I will show
you what the PG config file lower store but is that we're conditioned to extract only a part of table and then read them as simple as skills could which generates size condition so it's really got and all is that the waste is that's a fault last session number and the combined into such conditions to always select the recognition system number is greater than the last session number with him what is that the waste is and there was this condition is in installed into audit the PG configuration and that's why we that incremental synchronization only at the last session numbers of which you don't fit in all of and it was really fortunate isn't to actually is is that the blaze through session number fields for each and stable actually it's filled with uh different different names so this example and this severe fight but in principle it's also so it was quite easy to incremental data loading and also
on the based on data loaded interpositus Open Data API was implemented it my access to the input of data from Oracle that the ways in some of that the ways in which was implemented from scratch my us so that was a few minutes any kind of different musicians from or order busy and the GIS data in XML and song and the suppose so really sophisticated kinds of searches I will show as that they were this is how
of open data at the communication links it forever remaining methods so this screenshot is that that's actually it was don't knows and all of them for data weighting different kinds of information and effect of a greedy greedy URL parameters the example of the results and so also always fully documented yes and there is
this this example of how the results can look like in this particular case draft in XML format and also
of low search was implemented as a former websites done things to their differences but forms that was the same situation is through a different sets of parameters so that was a little review no that's if you want to 1 kind of so you use the 1 from which another kind and other formats that search was at a place it was so them by implementing a superset of was a possibilities and it was implemented on the boat of Open Data API so API was designed so people stop at all not of possibilities offered as these kinds of search so
it's what you hear of everybody can do this but it is it's some journalists it's who is actually do what follows the low process in Russia so this is the the external dimension of losses form it's quite complicated actually when you select some sentences appears more fields whether it also clearly shorter social dimension is you you you if you show the false fields you can also become what only mean
this is how the result looks and so as namely descriptions and so on a lot of meta-information involves that and also
about what system it was a new system implemented from scratch it gets grow at similar files from the water so it's another system where you know which directly serves water is it will yes and then it produced by the example files visually important that debate yes it's of a stores area water off the pity uh again because this system if you for example for example if you listen on CD that some appointee to consider my fraction war that against you can go on the website and you can check the difference there but it through so it's appears to be you for Georgina bytes of data currently and it's interesting is the the amount of data we should get won't be wise sludges and has this that you know that the flu transcallosal plugin right it's quite interesting but it's actually because the of considered a container that what people said at 1 moment in all but Wednesday water is they are in our analyses so would indeed its set of 400 so 50 deputies as a reward together and these produced more information than that of the because it was in the distance to understand so I only understand that relate it to the store so we will collect
so you you can do some search and you you get a list of 14 but also you can get some statistics you know you can see when a different divisions about what's on their session on all it's also investing again in in you you can see it as a
distribution of wars and you can see in ways and means of particle teachers Mies affection and so does
it's system stores area single polycyclic it's been for what's the area what's that that the people who make can understand the you know it is sometimes it's hard to understand for of hazardous said insert contains many a kind of sophisticated search for instance you can create at all what particle a DPT said it and all discussion of particle all after and yes it's not all about 2 gigabytes of data so for
example you can search for particle GPG and
gets his task keeps so after
that are the 1st to implement this search not only for metadata offload of for the full text as well and we found legacy system on the 1st stage and we don't have this ability to implement this this legacy systems yes but the test do a search on that on that data that somehow and we have to to crawl all the documents in the system itself was 80 85 to demise of documents and then when we extracted text as its appears to be able to get by its also the specific of serious wasn't your notes search for all the documents self you search for load off in the end if you decide to search for documents you search for all all . which contains at least 1 much and documents and their relevance flow . sees the maximal and of is documents so the agreed
was like so it's a simplified form you know what this is this is a process of so you agree for table which contains documents which might you query as in your group and by was there a little ID old in in that the basic flow but actually it's off it's not the currently approved walls and then ordered by maximum over effects if you see
what was the great 1 at 1st and 3rd beats judge the index then 108 so and needed so that this was the spectacle and dying at about over 100 ms so it's actually not with a fast what are you can make here
is that we do a walk in jail yes there are some part of this walk is not common concept importance of what's the right idea to use my June prices and the uh non-political sees this prices and implements the extension to suffer this particular use case and the stored document this meant that in the district that Ivory how all draft so that all of its history at this was in 1 field yes and then I implement to operate at a later 4 of my children reach shared going through if they all this this at a much greater and therefore KNN search me shared on us is there that it answer but actually there 1 divided by goes up at the market in in g a you know a new she a talk about June improvements are you you can hear it's the a debate and in more detail this because the fortune read each subpart of a banking inside the index so in the use the same for such as the and he is still likely gene you know it's not currently in positive score of what I around this system we use in mines in patches yes at so
after that's the greedy it became much more simple it just use Jose store of potatoes and because I implemented this apparatus in a predicted class has a boss has the index support and then by a
secret plan to get everything in from this and only in 1 of 5 ms this so the way we can result so the search was you know almost instantly so the
conclusion of this is final picture like they handled by rational of the signal due to offended you know you know is that both the fitted telephones packet codes that have 0 after the Russian State Duma so we need to implement open source integration is oracle is an open-source tool at the BG and it's done so well so borders gives good performance for all implement systems and the only excellent that full-text search performance and says and that the base users of watches and extension so that's
all thank you for attention any questions through the rest of it was it now it's online and up to 10 and ask questions about what I want to be what what's like what on you know that the my company means as their own tests on the this the project all of construction of websites that has become more projects and the face of is problem is that this infrastructure didn't you can still be around the Oracle anymore because you know they have up for external center of some set by Cervantes Oracle on spot already also and as a family census only only for the organization of cortical only 1 spot and then we can proposed tool of did you decide to buy a new Oracle you know because is a cost as the license would cost more has about project this is the 1st project than it was the construction of sites there was another girl projects and maybe just as the stadiums in architecture we decided to implement them very hard to find 1 or all the way up and surely yes amino would is implemented as a system would be share Wikinews use all dots between big companies need new system components as high as a way of were rages operators have a operators we're always define this is
actually I give this expansion only for particle I it's account continue its campaign where you are minimal for papers will 1st full solve my social the documents you if I don't need I can implement more of them and went went 1 by default without saying and last you know you you know this this successor not only on but budget all this is sponsored 0 so are actually by system you share was not known if theUS only half of them was in mind before so there also a walk tho give that all of them in the form a 5 minute test and in fact
