Add to Watchlist

Case Study on Redlining application using JATS XML at the International Organization for Standardization


Citation of segment
Embed Code
Purchasing a DVD Cite video

Automated Media Analysis

Recognized Entities
Speech transcript
so that someone has this giving include finance and Pereira
and I'm going to be talking about ISO streamlining workflows so most of you probably know why so is the inflectional organization for standardization uh and then the world's
largest developer and publisher of crisis standards they they Geneva in Switzerland and they're 164 members of this national bodies some members of ISO and the national body section most of the lifting and I mean that publishing a standard processes this expert groups committees who put the standard together it goes through the the fairly large wealth and comprehensive pro process which eventually gets published so this is in the numbers that I subdued the and that they have about hundred thousand experts around the world who collaborate with them and that they do close to create thousand standards so and that that that and they do close to kind of about a 10 thousand standards probably people than that now but yeah so that that's a fairly large publications volume so traditionally what
these tools to use to use microsoft word at the Committee's working mainly Microsoft would and then would fossils in Geneva activators engineer who who indicated the PDF the that the word file and then it is that would file after that in about 3 4 years ago and the change of process to a XML flow so this number of components that 1 is a vertical textiles oneiric styles that does a committing process significantly and also creates
associates XML and I this XML for people from the object's it's very similar except for the mid department is very different and that it is still a few other changes and then you'll see in a season bits and pieces and going through and what happens is that typefaces so it's this x amount produced outputs pretty evil uh PDF Daisy and so on and so on so that that that's a slow that so ISO's kind of been doing this for a while and we met
with the program directed saw probably need last year and this and look only part of an ongoing by hand is the red line so what's there and this and the red line is the take up of 60 standards we give to somebody was to the red pen and and leaflet you know master the differences and then the producers said cool that's interesting is that book can you automated and actually delivered the 2nd and this would be too difficult to we had to die difference project before but is it didn't we should give it a try and of course the cost of production for the standards is quite high Because this a technical experts sitting there if you can think of ISO 9 thousand is being the 1 documents somebody sitting there comparing it by hand but it's it's a lot of work and the of course deliberate release of the standards and standards professionals it turned out loud the red light products because if you are complying to ISO 9 thousand 1 right now and the new standard comes out you only want to know what's changed along with 3 to 2 and a half dozen pages again to find out what you know what we need to do you want to know this assessments change that assessments changed all these levels have changed in measurement meaning from like that and so they loved it they consider a lot of times of electrically doing significant amount for the red line which of the standard so it was a kind of flagship product for the vital so I I support as well so are these indicating made
this however good do this this not the 1st problem aside find changes and the 2nd problem is rendering them because I did find a memorandum of different things and goes to the number all and the 1st thing that came out
was well why don't we use would structure just use with a yes and that's the sexual attack changed version of my paper coming back from India but
there can be some problems this the 1st 1 is set but some of the benefits of using wood is becoming members conceded unless working on it which is great because most of them use with you might nodes can be applied to cancel got changing this measurement because of these reasons and somebody can see it from knowing that all great and it's a foreign word which is used by more stuff that have felt that domain most but in terms of
drawbacks has a few drawbacks that 1 is you need to make sure that people have affected is enabled in would document I for 1 is quite often because of the company we have contracts and what people quite often I forget to press the button so that means abilities and don't have I have not capture the changes so you need to make sure that stuff also if willing to do
that you might really behind some trees by turning off that battle meters subsidies and turning it back on and then you really can't you know from doesn't people sure there's at least 1 Mbyte Apple in there as well but not nearly as all the status of self-selection this rule was but build communities but so but also what they're going to see is that for some of the standards this intensive periods of anything so 1 paragraph might get a 200 kinds of me because everybody's out it and if you look at the changes on that just becomes overwhelming as no 1 paragraph showed me because 1 paragraph that is included and standing committee of 40 the changes themselves took about indicators it's just all them to deal with that and uh and also cluttered a lot as saw this
example as and track changes it's really good if this between 2 people or 3 people this form of people in the community for it's a changes in 4 different colors and it becomes very very excited to work with uh and 1 of the biggest challenge and that there is a solution for this and would like it to be big problematic it's very high direct line across multiple and so so let's imagine you're by so many of them went to 1998 he decides to 2 thousand 2 and then you decide would 2006 well what is the difference between 1998 and 2006 that's that's not something that would would have put an easy answer for and believe that not all not all of those 100 thousand people have with a few of them use other tools funnily enough and some of them might even get a paragraph property in e-mail the paragraph potency I'm sure about this paragraph they'll make some changes it'll come back you copy and paste the paragraph that now the whole paragraph has changed so that it is the regions so the additional similar differences now that this this
just that is the difference between just differencing additional differences but because if New York traditional data it's going to come back to those 2 the the chance of XML are not identical but that where that includes a real difference in 8 years that abuses switched around and now having a tag rather self-closing that but really wise that's the same as good go with configurable examine differences to reason give you any problems that that is the same thing uh so that even that window the inverse additional difference and is again a the pros and cons and of course that's not depend on the behavior and 1 of the things we all know from policies you cannot control the behavior so this this is a big big plus we really keratoplasty you're working much further down the line but we do need to retrain the contributors contributors with with the media wanted you looking at XML that's generated much after the fact uh and you're only comparing the final document so this is a problem that solves an entity that her ability because there will be a real time in your environment we don't worry about that we very much right at the end of the spare rooms standard cost for all of its work this comes to ISO than ISA publishes a different graphs by the and so I so it's no longer enough know influx each is fixed even for a small amount of time so they comparing what I the final documents is we're comparing documents that are not in change the book and also the laws compare nonexistent was was taking 2 different XML files and differences there there was some drawbacks as indeed is the possible complicated implement as initially cell we can detect it is still problem yes you could but it is actually quite small workflow Indian and if you quite some additional software so this is this is how this is how
it works it's not different and over the presentation the pitiful me touched on this but I would like to get it done before so that uh yet using a product called over x amount on this 1 and Kristin and need not here from of x amount but what's happening in that's so what's happening is that too small chance of x amount here because I tried to find the smallest and similar I can show on the screen and still be readable so we have a little bit of time so as maturity so uh this field is more for patterns and 2 tags of change tags have had so if you get look at the output XML it actually contains both sets of information so once document valence document be it says look this is the same thing this is new we know this you be for example if I take a look at that in your head it's the same for a and I and I but if you look at the project ideas different from the age that these errors and so on so that gives us the basis for a measuring what we need to render so just to kind of dig a
bit further into that the biggest works is that there is that did that added in the delta smelt them there namespace so but if a equals b that means really that element also Alsteris that's that's it says is it's the day that means that is unique to it if it's B C B and really if if you have at the end of the entity did something about that he should have list of people to be because this is separate of the separate and if a shock people in that means itself for some some of the has changed and really it's a very simple approach and using this approach we can render the red light
no rendering it is is a bit bit more of a problem itself by getting rid of so in terms of creating the headline and he'd this is but I suppose specifications so in anything next to leading the Inspectorate's everything that is added origin we had a green underline that but the challenge that was if you color blindness if it is hard to see the underlying sight through so the incentives to highlight security color but you still see the kind the great actually so again very simple tactics Amalthea equals the 2nd and 3rd of 1st and 2nd and you can see them as a 2nd and then they're seconds from struck out that centered and again is this that past this condition is defining you know what the old document was a new
document because it is something to around just items and know when you edited it if different a different so this is the red language so that work that was
implemented by so that so that our general despair but they but in the process of implementing it now they have implemented so this is this feat few bits that are slightly different so really what this tactic is to examine files of the same standards now 1 of the interesting things is because they went into the external workflow only 3 years ago they don't have a lot of standards theory published yet but 1 of the things they did after the semantics and will call this the bet that had and they can develop a process of converting that so we have a whole back catalog full of XML uh and it's true that some interesting problems to just show you in a minute so what what we have is the good group which files both for building XML and out of the matter is he filed with the attributes and their and then we take that and rendering so rendering challenge that changes so that
there are some interesting issues here rendering the text images cables match those the 4 major categories worked with now that doesn't mean that everything we have only a few months into this
project and so we're across more complex it is but relevant text really simple right for example the
have a takes note that need to change to types of text exchange so this Amontons except the start of graph paragraph and then downsampled text and the semantics and can compare that against that's changed this is the new text is still text strike the old stuff pretty green underline the new stuff and it's it's relatively simple and I think that by the time we got here don't think some all that is so that the uh the the Charles pointed out which is the standard so I think that we actually had run across that not
range in the images this is how I decide what would be major so if there's a new image replacing the old 1 the old 1 it's crossed operate that he wanted to that this is actually an old slide because they have changed is because imagine if those label sitting right on here the that lose the labels selected duties between sequentially rather than letter in the origin specifications so all of them and made about him and 33 % of original size it looks beautiful you do it this way but you did lose information because it into working you all of its so we're not going to build a little bit uglier because of the red potholes reply that but you don't lose information and aware of this but also the challenge
as part of this is the challenge of how we did the image has changed because that this accident of the 1 of the most problematic areas uh so we can say well the founding has changed therefore the image may have changed all they might have just read in the files of the new standard not really sure that so that's the 1st book Proceedings stated origin living we've been living standards intermediate I thought of that have very strong file naming conventions after but the 1st pulse involves the rent review realize they did that there were certain things that amount that that really should have been marked up because the findings were changed at some point the 2nd thing was to look any image fasting neutrinos if the file has the same thing as it is a good chance 5 that's true but if they don't have the same signature that is in in the fossil different because you could have somebody would resize the image properties in the pattern of the major changes signature and as soon as you that nobody false positives and if you get too many false positives people don't cross the line and the whole point of this is to make sure that people use in the document and cost but the
highlight otherwise they're not interested optimum so really in the end we came up and ideas of the greatest establishing simply metadata so if in somebody passes the file and the usefulness of an illustrator and it supports variable it's implemented a business that injecting many using perspectives so when somebody at competent adults and what the change if you if they have nothing new to balance the effects of selecting only a problem use ICA and then be modified as not being changed if they actually say you know it's a different file as it's in live at 3 or 4 different options if they choose 1 of the options that since the file has changed then we'll modify has been changed but it does have a bit more human element to it but really couldn't think of any other to compare images and say has something to so they're not tables the cables were huge
fan render fall for change tracking as soon as their competitors you can do to a table but you can't change the continent able really work with or you can change the structure of a table and this is when somebody editing and you mean that you knew which of the standard they look in the table and all of that those columns means all wrong with merge these columns and merge these under restructured table and keep the bit of the same number in trouble so certain examples here so you can
see both both 68 for so what what you can see you know how to fund and added content deleted the the cell that's fine but 1 of the things I did here was this this felt it was merged with that that content on there so I deleted that massive recess now what we suddenly have respiratory from Table known graph or continued because this extra cell and you can just have a cell hanging off by itself so it's recall because now this column the right most column only exists so that you can show but being deleted so imagine this is like a 4 or 5 from Table and there's lots of merging and you know I'm losing going on it's going to look like a mess I example like there was select wouldn't we we had a whole lot during the winter but for some reason I couldn't to find 1 but you can imagine what what happens when this sort of and if you can identify the changes that that have to become the user problem but during the rendering them in a meaningful way becomes very very difficult because yeah because there's all these other cells of so what we decided to do put the American still at so what if you
decided to do was take 2 different approaches so the 1st thing is to check if the table structure is the table structure had changed meaning if there are more cells more columns things have been merged on my when before then across all whole table and put a brand new have lot it is just much more readable uh yes it's it's not optimal but it's more readable that is on that something in the table has changed but this will work now
sometimes and this is also table used improperly that is an example of this is from I saw 1 1 1 3 7 care about standard is not you can see here this table is visited table so data points assigned at 1 1 . 5 2 continues on you mentally that's backed up 28 30 and so on so this is really 1 cable that's been wrapped up to make 2 tables the column in the middle and what's nice has no no problem the
next person in do change so if I can go back to the
2nd will back so despite the 2nd column 7 that
point of 28 this starts at 19 so added some
links about of water so 28 is down here so really what has happened is for different data points have been added that's all that has changed in this document but when
you look at can go I I I guess we do not get 4 data points have been added by him so this this is a problem of misuse of data are capable of so let me leave it
licenses that that there is no way we can pick it up and rendering of be because of a research group and maybe even to the company's fix the things and really what we did was that was addressed correlation so that means that it was process so those tables that no longer like that and also to just decide that and detecting is it some like people that you know and I mean the beautiful columns of control that had been like able save space of the body is what you think so it's always hot and XML directly have the contents from more rather the contraceptive structure you would and also this directive north of isolated pairs and the hope for the committees that is changed table structure only been necessary don't change just because you think you can do the job United States the real reason for a change structure of the life of an
individual to read lines in the process of the methods so for the man as in the workflow here if you remember from
earlier is that we take images of the the directory Internet real-time again the reason here is rendering that is a hard problem agreed that the solution in this speech so what we do is we take images as the data or the Committee is composed and replace and senses but the problem is again becoming diff those images that that's that's always trouble so what we do is become better at the middle and if the matter most showing the content has changed then the market has changed there are 2 problems here and this is something actually write that kind of my life went on and the last step from the 1 that wasn't supposed to happen then I the laurels talking about maximal to and that the military and that 1 of the problems arising against this is some of the archival information has the same optimal but the sort of the same content but are expressed differently so that means comparing that amount between words and so it is the same as it has the same meaning but showing a difference and you know sort of that uh because sometimes that there dental there's more than 1 way to express things in and different versions of tools different
versions of products between 3 or 4 years time spans tend to have at of them differently and we have a problem with that right now and but generally the of the also 1 of the things I still want to do was there want
produced plot this looks like this and this is from the 1st standards that go around and they changed editorial style from seeing question so what happens is everywhere this equation kind of free 2008 noticed formula so that it's highly in a lot of changes that actually has actual meaning change of meaning so I said thinking about the set of rules for us to implement during the rendering for example
Anderson of deletions and ignore the change of hyphens the non-breaking spaces uh you know in rules to hyphens hyphens to him rules and so on this there's a whole series of things and this is actually from my paper and political change changes
in whitespace non-breaking spaces apostrophes uh uh area from flat pixel hyperlinked text because in amount going from effective hyperlink the hyperlink will be a change now obviously the highlight this is changed and I for the more the highlighted for these reasons and finally from the bridges the formula now the initial list I took a market working session was about 3 times this size but some of them were really safe to do and you have to be
very careful with this removal of competition and removal of criminals no they did actually point out that they had gone from 1 reading style to another styles locks that come on argument but these are not good at getting into the rule of law they this but In Europe specially continental Europe come can also the decimal point you remove that additional point bad things happen so that's thankfully not in there but the actually looking at doing a bit more intelligent processing around that like you know if if it's between numbers then don't ignore it but it's not related to ignore it but there's also bigger fight between the 2 so that them with that of 1st capitalization changes again that's not something we implemented the give something like a good thing but wasn't a changing from metallic to Roman and that it sounds like normal but you know the constants and variables not things that so that there's a lot of things
that people think it's flat and even you could be further injury does it's not quite as so the the capital think findings to what I think is think and also and this is actually this is something that content from earlier as 1 of the great things about as it lets you do things
in many ways the awful things but at a distance to many ways and so this is this is something that around the brain the the with getting things that come market change so in this case we have this as solution against that in paragraph hold and he was saying know and those who crossed out residents with an entity and what was happening was not all that good stuff the standard because markets label and the head the 1st paragraph in bold in the new ways of standard he became the title of the title was printed in bold so surely there was no difference structurally does a lot of difference so this this this this is not something like this as in the best way to fix this is to have a better much of the as in of most victims of instructions so people wouldn't markets and this is how the mountains of itself uh but uh and but I suspect down to their credit is signal we want to highlight things like that and if mark-up is being applied incorrectly by editors or by an external you contractors then
you want be get out as part of the red line process and go back the aspartic using process so because initially it the cycle can can be trained all that using military yeah maybe but before we kind of come up with any any clever way of doing it is in a region that should respect for the line because this is something like this is not a politician but now there is a few of the things ran across 1 of them was that because the XML is
generated every single time from would file now sometimes the word file goes back to the Committee for the new edition Detroit everyday rewrite the whole thing sometimes they have made the existing ones that there's a whole bunch of process but because that's most be regenerated every single time we don't have consistent idea attributes which is the which is a bit of a pain because if somebody moves list you cannot change full this around almost a section part of section from outside the and that's the idea this change because we can't really see that used to be section 1 and Section monsters will not be deleted and and so that there is a big problem there again I'm not really sure how to succeed instead of other than falls in the other committees to use a very controlled platform which which is not not in school but overall I think of this
concluding this in terms of automated creation as it is definitely possible to create automated line as in that this type of this project and that the standards said investigators in recent years we can give it a try and probably 3 or 4 the standards bodies that all the time this is impossible because he cannot be automated but it can be be committed and the processing the publication sizes putting out right now but also if you cannot controlled authoring process and this is something updating process and this is something that this certain groups doing DJ system then you have to use this but because you can use a 2nd because you're not controlling the change process itself you kind of coming much after the fact and Associates bytes they suggest provides a reasonable support for x amount from in red light but it was I have had kind of philosophical arguments what it doesn't have tags for capturing changes or watching who changed what and why I don't think many of those are really necessary especially pop so ISO the people who like to use these guidelines standards they don't really care about why 1 . 9 but change 2 . 0 they just need to know it's not point 0 and the known conform to what do if you need to find out why 1 . 9 but change to point out you can go and read the standards committee deliberations if the online but generally the general public quite often does not going to changes made but they do have a change was made because that many matters for compliance and very importantly consistency of tagging between words this is really really important I know as organizations which will change they will change the tagging schemes striking problem is if you do do that who Walston then your redline is going to be in trouble because it's good start thinking of things that have actually no changes on the meaning of content but just changing time so that things like in changing the question to to uh formula formality regression or adding adopted command indulge become a changing your spelling from the in Oxford 1 also do have to spell that splits time causing problems of this type creating a lot of clout so be careful in doing that uh and finally which is in my paper but I didn't mention here is there is some but not the challenge here is the accessibility and I probably have build a lot and I think is very difficult to make a redline document accessible right now because it's a visual treatment on how highlight change uh and kind of reading that making that initial treatment machine readable you can make a machine readable that making understandable from because it's been rated to is very very difficult because you know the synthesis of breaking down all your edit and delete it's coming in and I have tried that and it's quite often just as meaningless sense so that is something that meets the needs more attention was particle was this project but there is some some interesting work so that's my presentation I think
I'm kind so the liability with don't forget to
turn your microphone on no questions but I don't really brilliant job boards is loss the OK but I have 1 then iso is made of many different national bodies obviously I so is going to be using this themselves how what kind of a lot of having moving it down into the national models OK so national standards bodies voice I 3 years he's currently adopted by actually will at least 10 national ones much of it consider allowed there but this at least national bodies working in ice recedes implementation of this also not national bodies but combinational down standards bodies who are also implementing associates as a standard schema of course the schema so think is definitely getting wider acceptance and I wouldn't be surprised in a Europe stand out the most hands people here questions people because the it is that as in that glaring meeting the standards will a standard and and and this is used to come come at the right time in the body needs needs to change their processes and it's it's heating heating perfectly near the 1 of of of of the problem and the yeah that of the 1 these independent consultant on the so I thought about this and the previous talk really very exciting and partly because we can see how the goal lines are moving around 1 thing that I thought was particularly interesting and problematic that dimension the end is that as organizations move there market practice forwarded it destabilizes the system like the 1 year development and so that creates an ones in the red line in which then makes you say 0 no that's a bad thing don't improve your mark up and and of course the 1 of that far but it does illustrate the point that they're stressing between depending on system like this to be stable and the need for the organization to develop this central processes improves the reports that the market teams and so forth and so my question has to do with that problem and what you 1st see your what your do doing with the ISO standards and with respect to that issue because it's the number 1 it seems to me that you must have some sort of cycling through all of the red line process itself whereby you can look at red output and say 0 no this isn't good enough there's a lot of noise you we have to do something about that 1 and I think that that and you assuming that that from is sort of regular orbit of doing business allows you to you read in the press itself so that you can for example white list children changes like equation and the formula SLE national among so can can you tell us about the whole general problem will more said that so what what we did over I said that is there nouns have about the change management process for the capture the markup changes not just market but the Korea Science changes and the those changes then get whitelisted into the uh the like process most organizations don't have a formal process of capturing Markov chains and definitely no formal process capturing in Korea stylistic which uh you don't I mean they might have a during manual they might it is like it didn't make a decision that that change from this to this uh and but that is not you know that there is a bit addition dispersed rigorous change management process I think if you Paribas change management process of a mark up real style changes they can capture that and white histamine and get out of the noise but what refining right now is we can almost OK stand and evolution of ISO style over the last 20 years by doing this we this is something I would like to do when I have some free time but it is interesting because you can see things change that it will start back and forth back and forth a few times from between between 2 2 different cancer but that scientific capture that I think we can we can kind of the white list them so that that no longer becomes an issue I wanna the ball is on the paper and the other is that there is this about size paper uh the system gave last year I think about kind of the taking beauty behind his back about the above your kind of how likely it is through laundry the this is out of date and advisement canonical form as well as I think said from the market perspective if you create inconsistencies consistently saying something can be done about it so if you know how to change mark then you could account for the for comparing and great um which gives you much cleaner differences of from Howard American Society of Civil Engineers what we talked about some of the problems of identifying what image has changed but did you history title automated image forensics tools to help you you know what got when these changes are made be beyond just when the editor remember flight is changed the the head we had and that's how force it doesn't almost every tool we use for giving us a lot of false positives that because flagging image has changed because there's a very tiny change because not obvious to the reader and so it could be that hasn't been quite often if I can go back to that you you you you know it yeah different that you can see those lines in the red label lines quite
often between editions that will deliver lines depending on how the images are clustered together and a change of label line is not really a change in how the item to be built all the items should be organized in the same thing would happen would flow charts quite often somebody in open a flow chart and it will redraw the flow chart just to make it flow better because
they've added 1 would expect some and then you kind of thinking OK it really that hasn't changed the and the and then the content that hasn't changed and the major malls tools we looked at it would have to tell you something's changed that something has changed and what was happening was you running across like the of a thousand thousand 400 standards and really across 2004 understands it is coming back like nine-tenths of the images that changed and they had all this is some had changed but 90 % and had been just very slightly modified so the only way to understand this is useful and this is not the could come up with was the because it's not it's not good department and actually change images so we kind of put the onus on them to to say you know you need to identify the magnitude unchanged find here the questions the method and then he
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation
Computer animation


Formal Metadata

Title Case Study on Redlining application using JATS XML at the International Organization for Standardization
Title of Series JATS-Con 2013
Part Number 5
Number of Parts 16
Author Perera, Chandi
License CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
DOI 10.5446/21798
Publisher River Valley TV
Release Date 2016
Language English
Production Place Washington, D.C.

Content Metadata

Subject Area Information technology
Abstract Redlining is the process of comparing two datasets and displaying the changes in a meaningful and human readable way. Comparing XML files and rendering the results is more complex than just identifying the differences between two files. Using the experiences of International Organization for Standardization (ISO) as a case study, this paper will describe the process of comparing two versions of a JATS XML file, filtering out changes that have no meaningful impact (e.g. changes in tag order of article-id tags) and ignoring changes that the business requirements deem trivial. The paper will go on to identifying and rendering changes to content ranging from simple paragraphs, tables, equations, figures and lists. The case study will cover how differences are rendered in a way where the reader can easily understand and follow the changes. The paper will describe the easy wins, the difficulties and impossibilities of a JATS XML redlining workflow. The paper will conclude with what changes can be made to process and content structure to make redlining more effective.

Related Material


AV-Portal 3.5.0 (cb7a58240982536f976b3fae0db2d7d34ae7e46b)


  531 ms - page object