Logo TIB AV-Portal Logo TIB AV-Portal

hale studio: Effective Data Analysis and Transformation for Open Standards

Video in TIB AV-Portal: hale studio: Effective Data Analysis and Transformation for Open Standards

Formal Metadata

hale studio: Effective Data Analysis and Transformation for Open Standards
Title of Series
Part Number
Number of Parts
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
Release Date

Content Metadata

Subject Area
hale studio is an open source environment for the analysis, transformation and publication of complex, structured data. We're developing hale studio since 2009 and have reached more than 5.000 downloads per year. Most of our users employ it to easily create INSPIRE data, CityGML models, or to fulfill e-Reporting duties. Some use it with BIM data, health data or even E-Commerce information. In the last year, hale studio has gained a number of headline features and improvements, such as integration with GeoServer app-schema and deegree's transactional WFS. We have also added support for more open formats, such as SQLite and SpatialLite, but also for enterprise formats such as Oracle Spatial and Esri Geodatabases. In this talk, we will provide a quick introduction to the declarative real-time transformation workflow that hale studio affords, highlight the latest developments and provide an outlook on the roadmap for 2016 and 2017. We will also highlight some of the most interesting projects our users are doing. Thorsten Reitz (wetransform GmbH)
Keywords wetransform GmbH

Related Material

Meta Computer animation sin Meeting/Interview model libraries Results
standards standards Context validation encoding complex data analysis bits open subsets sets Transformers open subsets van variance dual Computer animation objects provide encoding model extent kernel
point filters Open Source views time maximal period testing model implementation extent platforms script unit Zoom's Graph help surface projects feedback coma bits lines open subsets Computer animation software model Results kernel
programming language Zoom's relation mapping key help projects complex maximal bits Transformers part structured data Types Computer animation software normal Sum form
standards states map time formating ones expansions shape model Internationalization localization relation mapping formating Airy Functions bits declarative Types structured data orders organization website input model variations structure functionality mid maximal Emulation elements MACH specific harmonize level structure Sum optimal cellular projects plan coma Databases sets Transformers scalability Semantic Web Computer animation software topology classes objects table extent
mapping differential building interfaces projects effects schemes bits Arm evaluation CAN-bus Computer animation software different case functions level Right structure input 5th
functionality Open Source views time mid ones maximal argument dem perspective energy Arm Emulation elements Explorative Datenanalyse mathematics loops terms Em's Rolling level structure Conversation Internationalization Sum mutual systems default unit Multiple validation mapping formating feedback expressive sets CAN-bus Computer animation topology input hill objects geometric
script functionality matchings mapping validation information directions feedback sampling second CAN-bus response time mathematics Computer animation case different level configuration Right 5th table Sum progress WAN
Context Java map genetics time argument part mathematics model descriptions script services mapping namespacing formating Development REST API schemes production Types category normal website Hacker generate Runden modes functionality server mapping table Open Source Development component interactive specific structure platforms web addition matchings Super server projects Java interactive plan Databases sets Transformers applications limitations voting Computer animation environment functions platforms cloud table libraries kernel
sources Computer animation information Void's Forum ICC software Open Source consultant information open subsets
point current time WebDAV schemes ease of use expect different ensemble complexity Allocation conditions area comparison CAP mapping formating moment Open Source code limitations structured data Computer animation software case Forum family modes
script point services Development The list schemes sets Automata part degree Computer animation visualisation different case terms URN editors Conversation 5th Routing modes tasks systems
Computer animation
as so this continue with their
with the next session and I'm really help got where really happy to welcome and thoughts arise from results from an meta meta Thorsten Indians by a conference of the glacier only the since then never when I have like it questions about the complex models I always try to call him usually knows an answer and then really duties in this work we can thank you very much for the introduction
and thank you all for staying for 1 of the last talks of today so there's still 1 of the tools that may be the 1st budget community in particular is so well known but has been around for a while that's what I'd like to presented today uh
so maybe I should give you a little bit of context why are we building something so all idea quite a few years back was that we want to build tools that really helped making open standards work and so on open standards we often have things like really rich object-oriented models like for example in spite of all these others that I mentioned here encodings also tend to have the pitfalls and tricks and often the standards are built for extensibility and flexibility which doesn't necessarily work with all the existing tools very well so we said OK something that we need to do is for example we need to provide something that helps people analyzed transform and validate data sets that they're working with so that they can provide high-quality open standards datasets but the other and
so like but almost 10 years ago we started working on something called him the 3 quiet years behind that was we want to enable people to 1st of all understand all these rich and complex models to provide them with a way to explore these and also the data that you have available that then to make it really easy compared to the other solutions at the time to actually do a transformation so there's always a source and a target and I just want to specify how would I have to go from 1 to the other and so a more procedural approaches to typically write a script with many many lines or you create of pipes and filters graph that can be really complex and wanted to improve on that user experience then the other thing something that many of you probably know so people learn much better when they get real-time feedback so so you test something you get feedback immediately like to touch the surface of the stuff on the top and you know immediately a patient experience so it's usually a good idea to give somebody who works especially on a complex topic immediate feedback so that was 1 of the design goal was to change something in the transformation immediately see what the result is that these views we also paid attention to make this an open source project from the beginning on it was pretty clear that otherwise it probably would have been it was a research project originally and would be maybe not accessible at all anymore and he wanted to make an open platform so also from the beginning on uh the documented which extension points are there and how you can actually integrate it into your own applications for example a bit of history originally of the major funding for him came from a project 56 at the time called humbled which went from 2006 to 2011 from which was basically like the FundaciĆ³n where most of the origin concepts were developed with some additional tests and the software started to get a couple of users that was about here but as you can see it continued to be developed so that was you have something like 15 to 20 projects in the period to work was used like research for actually different types of
deployments and so on and then in 2014 we actually
decided so that we come to that as sigh project anymore and just being dependent on research project so we have decided to found a company must be transformed and since then we've actually convinced quite a couple of partners and users to use the software and also to work with us on how to improve it and so you can see this fat thing here that's the ramp up to read these 3 . 0 which I'm going to talk a bit about the latest yeah if you wonder what
do people actually do with that have got 3 examples for you 1 of the larger examples that we're currently doing this in Germany we have a standout called the 3 a so that's so I could sort keys and so 1 and so it's not use of what years of his thinking and of the other special 1 about that is that if you think inspires complex then you probably haven't seen that 1 yet so it's significantly larger and hence has far more relations and relations can be many different types so they can be expressed in lots of interesting forms but so there some challenges in this project but so by now we're almost done and the interesting part here really is that this complete mappings had actually be that's not just in the form of executable transformations but also as so really well readable documentation so all the you actually created readable documentation from something like that if you had used something like a normal programming language that would be pretty hard to do that have a look at that with you later on in the timeframe just noticed
another project that we worked on was with a group of about 96 municipalities in the state of Hessen who also wanted to have a common solution for implementing inspired and for that they had actually created local harmonize data models but they needed to do a transformation from those local harmonized models to inspire models as well and so here the nice thing was really that we didn't have to create 96 mappings but rather because they had already agreed on a shared structure for themselves and just minor variations input to 1 basic mapping and a couple of ones that adapt for the specific needs of individual organizations the and and maybe 1 more here there's a special challenge was that a project with the European Environmental Agency and they have a couple of pan-european datasets like this protected sites dataset into the challenge was for example that they actually have a few requirements we need to aggregate all protected sites of 1 type across Europe into 1 object and so know that we've managed to do that in the end of the world there was some challenges involved of course yeah so what's the principle
behind the software and when I said why can we actually make it a bit easier than other software to we decided to but this set 10 years back to take a look at something that was used in the Semantic Web the state at the time it is called declarative mappings so we have 2 data structures we basically just a clear I want to pick an element from that related way function to an element of that of the target schema so for example we say that the tree type should be comes from little plant type of something and apply a lot of functions so that in the end we have all these individual cells of mappings and so on now that we can do something with them the nice thing about such an approach not just that it's easier to use but it actually also offers some additional advantages so 1 thing is the user doesn't decide anymore what should happen and what order but rather we can decide that so we can analyze the data and the schema and the mapping and then determine what an optimal execution plans and the other thing is the mapping is actually independent of the concrete data format so usually really work on the level of the conceptual model so you can apply for example the same mapping that you used for a shape to use on not the ability of a database table from the prosperous database yeah and when
I say the performance is a major differentiator especially when working with complex schemas so this is 1 of these cases both you can guess what the other stuff that might be and uh was was actually done by customers evaluation so they had done that they had gotten the workbench projects for the star star software from a vendor Germany and tried to the mappings that we created for them there and that the differences really especially when you need to build complex structures have significantly faster because that's what was built for kept that so modest mind from the beginning on that we need to create this 10 levels indeed nestings and so on and so on basically the other works quite well so in some cases we have a performance difference of effect of 200 yeah so I thought it would be a good idea to just tell but also show a bit so have switch to right
considering uh yes so that's principally what the interface looks like we have this scheme I explore the idea behind the scheme explorer is that whatever structures I have here we can be very deep in some cases it's always broken
down into a tree so even when the loops and things like that in the end that's a tree and yes there are places where you can go to very very very very deep levels in the street and then the actual approach is always to just pick an element of multiple ones if you need multiple inputs from the source source-side to my kids this feature idea and pick an element on the top 2 sides like the local energy that's missing but also other validation is complaining and and to apply function and here 1 of the things that he also does it will not always for all its functions that you but rather it will tell you we think that these functions are probably the ones that might work a very generic 1 is renamed rename of the function that tries to do everything automatically like structure matching conversion of data formats and so on and and so it's always a good idea to try that 1 1st and only if something doesn't work then we actually look into the more complicated functions so transformation was executed by validation was executed in our directly get feedback here how holiday looks like the dataset is valid now but that's not enough I can also really have a look at the data directly so here we have a source data set and the targets transformed data we can see for example what the resulting structures so here for example this river object didn't have a name but this 1 had 1 and that's directly represented on you that the geographical name structure of inspired and see how and the text of the spelling of name it actually appears yeah of course it's geographical data so it makes sense to also look at this kind of data on a map and so this is a perspective that is especially useful if you have a distinct striving for both the source and data set object data set so that you can directly see for example if all the classifications are picked up correctly and so on yeah so the system
that would actually does something maybe not note this coming from yeah if you have your own maps of course you can also say I want to use a custom time for W but in the background if you want to that's that's entirely left up to the requirements 1 also works as as for example the 2 yeah I'll have a look at the alignment itself so you probably saw in the default perspective that we had here so normally the view was that you work on the schema but sometimes you really also want to know OK what do I actually have in terms of alignments and you can see for example here OK of that's a function of that connects the with to the geometry and said well look at that and that parameter what does it does it actually do 2 we say OK it's actually a mathematical expression that can tell us that the system how much to buffer and I realize away accidentally that the foot conversion the wrong way around but it's change that that run again you can also work with yeah that's a small to
medium datasets we see it's much flatter than before so that's that's more or less the the nice thing I think about it is that you make a change somewhere in the mapping you don't get direct feedback on various levels in this table using the map while the validation and usually if you pick your sample a somewhat right you can keep the response time below 1 or 2 seconds and really get good progress on that then before 1 of the things that I mentioned was this html
documentation of something I've got here so this alignment that you create inhale something that you can't export in many different ways so 1 thing we heard about yesterday as you can actually exported as an obscure configuration for of awesome matching table and x or as its an interactive document and so on as you XSLT export so if you sum all don't trust the transformation engine that we've built but rather would do XSLT then please go ahead but be aware that some of the spatial functions of which are not available in of and this interactive documentation that something that for example people can use to actually review the mapping so they can go through this and think OK and that makes sense and what was contested possibly going to get some information that they can really step through In some cases so we also have to use scripts so that I'm not everything can always be done by in about functions but 1 of the things that have also affords us the ability to define your own custom functions that you can then use everywhere yeah and that's something you can entirely review using this kind of documentation right yeah
after this very quickly you of what I can do going to quickly back to what's up next so the abstract also promised that would explain a little about what our next plans and so you for the current release that's 3 . 0 or attributes candidate for last week obviously because it would be very nice to have fun before this conference but not going to be next week the main thing is really improvements to custom function so basically where you decide that the 70 or so in those functions that a has Arnold not for you would rather have little description that something in addition and then we made that much easier now and also better reusable and so on and you've seen the interactive mapping documentation but there's also more so for example we had decided to before we always had 1 reference map which was OpenStreetMap we always have the heavy usage limits of good the for idea then we use another 1 that was not discontinued and now you can do it basically pick your own was almost every really we also add 1 2 formats so this time and this excess was sponsored by customer so we have that um and something that might be interesting for developers pay was originally and always ji application but now all its components are available as normal Java library so you can't really think anything that you like like that so you need to work with schema related things because chemolibraries so you need some part of the transformation engine just habitat and to use it for whatever you need we also added quite a lot of generator so for example for generating here projects based on a set of parameters this used for example in In the JRC interactive data specifications toolkit now that to get the votes right so on uh but we also use it internally along there and there's all kinds of API so if you think OK it's a desktop application that doesn't end there you can use it in all kinds of ways that are can use of environment as well as rest interface the command line interfaces and so on we've also made quite a few scripts that make it easier to run it in such contexts some and the rest is modest and left to you would expect in end of November it's going to be the next release so for that I currently assumed that MS SQL Server starting to get in some and the bigger functional changes are going to be aspect mappings so aspect mappings you probably saw these so that I picked an element of on the schema and the source and the target site and however sometimes we have many many many tables for example from a database that don't have an inheritance structure so I can't pick something and higher up in the Iraqi and basically repeat the same mapping many times and aspect mapping allows me to just do that once and so much by name match by property type will match namespace or any of these they can basically talent holding and it should be so that I don't have to the same mapping it 2 times a just because I have 2 tables the and the other thing that we're doing this also we want to make it easier for our users to share both the transformation projects that they do and the custom functions so we're offering in addition to to the new best-of-seven environment a Cloud environment that allows you to do exactly that there will be a couple of additional functionalities like modeling tools so 1 of the continuous requests we've had on on Haiti was see I would so much like to just click on and you type in the schema explorer and create a subtype for example we've Indian decided to make that's because it's it's not connected just to transformations more of an independent thing which you want to model so we decided to put that on the online platform as well
yeah and other than that I can just say 0 we're always looking for smarter motivated people so to join the team
and if you need any information about this you're out someone is complete so there's also going to be hosted you AUC in maybe 1 day or something like that of unfortunately we didn't get that complete before the week anymore OK thank you very
much so the should
questions again we have questions from hi I'm startled to maybe in question how does this compared to you and me you have to clean it there well I think there's 3 main differences so if any is more general purpose so you have like 400 also formats by now and tail has less so it's close to 40 but honestly you might not need all of those formats anymore but if you do well OK we've been we've been adding 1 so requests and when necessary from the other thing as it's it's really a usability things on family if I go from 1 complex schema to another 1 and going to generate a gigantic workbench with lots of future mergers and join us and what not and that doesn't necessarily work well anymore and will have a really hard time debugging it and I will have a really hard time running it at some point I have personally in many cases where was unable to process even a couple of hundred megabytes of data if the workbench became sufficiently complex and so in this case what we've done is really but software that specifically targeted at working with complex data models so the technological approach is quite a different 1 and I think the usability to that once lined up with the with the performance comparison so especially for these cases of relatively large datasets and to that's a complex data models the performance differences usually between 10 and 200 as effective what what Christians I have a question concerning whether correction conditions nice thing I have questions and execution reduces the problem of how that is is there well obviously it depends on 2 things actually so that the 2 limitations are on the 1 hand what at schema can do because by far not everything I can do in a human allocation scheme I can also have supported map schema and we've been working on with Ben current status of getting a couple of these fixes and the other is also more coming so I'm optimistic that really gets the or at least for the concrete requirements we have right now and inspire a couple of other areas I think we'll get there it's not that many things that are missing and then on that scheme exploit the 2nd thing is that people may be created a moment here mapping and the expectation is that everything they did their automatically also works and use some of which unfortunately is not the case and will also not be the case so there 1 thing in here which is called the Compatibility mode time that's indicated nonlinear so
I can't see it right now that's funny see we compatibility
mode so this C UserLand XSLT here and each of those has slightly different in some cases quite different sets of things that they support service which you for example to genes of unknown it will most likely give me a warning about stuff that's not supported 1 of the later spilled from 1 or local usual anomaly it should be a few years the red I don't know whether I can a small the director of the actually tells you that it's not working anymore how much just is front to people understand what what the consequence of this and that this is a major development and said it was previously and and it can tell you he was writing these these conversion scripts by hand and
automata soil amount of inspired and now we can do it in the visual editor and have and inspire support and use of in in minutes small hours instead of weeks for the can you pull your existing schemes part can you pull your existing Epstein stuff just into studio now that's at this point it's just 1 way sorry interesting idea but you can import the existing in it will help you there's still again yeah that would be possible and what about the contended below to degree but in reality this actually it's 1 of the things that's been on the wish list for like a year at least some of the systems that use of a schema thing came up with the idea to also do that for degrees but I have to say that in concrete terms it's kind of a problem of a lack of funding for that particular development this is not done in the day it's a convention and so we have briefly evaluated whether we could do it in the around this conference but both from the degree to and from Mars that was the route already like 2 months ago and we thought that's probably going to be too much for for to be realistically done in a couple of days so yes possible and so also something that I would like to do but it's not on the priority list of the Jamaican noted task yeah some only I 1 questions but I think it's awesome for this great work thank you for listening each which