and I think in the less tormented than usual I hope no the masochistic disappointed from I have in the past argued that there a document can be edited there are no documents you can add a record to the database nothing changes all that stuff but of a denounced over that well I can tell you so I'm not quite not quite that traditional this time this is then a fantastic absolutely fantastic conference idea was really I don't know what I that's too good time and it was just wonderful and be In an energetic and I thought he's you know how did this happen and mean was about 2 years ago that some of my ne'er-do-well friends of Tommy did Jeff Bruce couple years ago that they started this work with NCBI 2 thousand 2 and thank you some strange time dilation gone I my head it seems like yesterday was 2010 I don't know of 3 as something like that but it's it does that it seems to happen very fast but it it you know was it easy and you know the guys for doing all this wonderful stuff because it's important everything that we do to work on improve the efficiency and the usefulness of scientific has tremendous potential for social impact so far it's a great thing that to great responsibility don't let up and I'm really happy to even in a small way be part of it so I got to thinking about level Predicting revolutions and noticing revolutions and things that tend to happen fast and things that don't have slowly Amen is kind of a subtext of my presentation along those lines I I realized that the excitable hominid outlook accessible that kind of excitable when it comes to markup languages anyway I 1st come part will develop itself it'll be over about 2 minutes and I 1st encountered out of markup languages in 1970 data 79 I was earning a text processing system called frass at Brown University was actually developed by Ted Nelson in any event amnesty comedy about the people in the sixties I end up it had hypertext it had the you filters and that descriptive mark up but really fascinated me was the version I was using seem understand my which was logic oriented cosmology it had all the elements for proofs definitions and all the things that we had Burke sort of the intellectual molecules and atoms of what we did this this system knew about the and it was just totally enchanting to me how this could be so a shortly I'm about the same time the system was that I was scheduled to be over taken down and been ported to an IBM mainframe from from graphics workstations and I was asked to convert all of the files to a new text processing systems by 1979 that was IBM script DCF sure many of you know that and I had a look at these constituents improved definitions principles counterexamples extracts and sort of recreate them in a new environments and every step of the way I just thought was this this is what our intellectual wiser are composed of objects right my next project the next year was to have big programming project developing a system for typesetting an academic journal same thing I remember going to the library and I would take down journal was used in I told them at arm's length square my eyes and on the page I could make out those things those things that was there were part of In the intellectual world the feels that I was part of and consists of type I discovered really the key the key key to the universe is 1 of these unified language you know this is going to just going to change everything I sludge encode about that time I was 1 of the 3 people who raised their hands I actually the handle such great effort on and I also was about that time started attending but as an observer x 3 1 TGA nonlinear polar Tommy I down here and ask for a show of hands from those of you who know what text 3 D 1 TGA is hold just to in a while how about 88 79 a couple more OK so 3 B 1 G ages the NCI so conflated organization under whose auspices isolate 879 SGML was produced and in the early eighties getting to watch that happened on was just an amazing really formative experience for me so I went on to do things the TEI and I can't resist saying no because I know a number of you work with people I was the 1st year of the of the OED PDF committee that created that I'm sure you think very flawed version for know kind of people of that of PPS that later became the core some idea of I'm glad to have contributed but I I I thought that was going to the book revolution going to happen the next day and it didn't and I lost my patience I guess I should have hung in there get longer so all of the throughout all this time I was always expecting an imminent revolutions and it never quite seen that happen and I know I'm not the 1st person to say this government it's been said many different ways Dr. high graphs and in a number of other interesting slogans about how we perceive change but in fact it always seemed to me like some the radical change that happens within a month or maybe next year and 10 years later nothing and 5 years later after we have all after i'd moved on to something else somebody pointed out 0 yeah it did happen eventually not quite not quite the way you thought but eventually these changes never happens as soon as we think and they never happen like we think they're going to we still have jetpacks right wait for those forever I once argued with a fellow at the Oxford debaters union thing we got 2 different doors claim that the paperless office was a total fraud of hasn't happened never will happen I tried to argue that the paperless office is here it's just not a paperless and not exactly was reaction I was going for work in Oxford England together so that's the idea that and when not to worry more about of over predicting and probably still going to be as excitable enthusiastic as been and I don't think it's a bad thing to well get the schedule wrong I think it's more important to work hard and moving things forward so stick with system here the talking and the figurative moves forward so not that I think he has
also been breathless is actually certainly very well you know to use dramatic titles go over at the nicely and the couple mine used to see a formula there don't share my will know that it's a formula but it works so I Lisa I mentioned
Lisa born January 1st 1997 that means that today she 17 and she's just starting college when she was 1 year old Google indexed 60 million pages in the 1st year of life she's been using computers our life and she cannot remember a time without the web Google Facebook smartphones etc. Lisa some 17 today now let's look forward about 8 years to 2000 22 I have my arithmetic right now this is just finished a doctoral course work in molecular biology and she's about to start her research so she walks up to the science referenced sense reference tested and there why
would Lisa and 20 20 to walk up to a size reference status you do know something which he defines some resources you need to know what a user resource she starts by saying I'm
interested in the role of 53 in Huntington's disease and the reference librarians there still will be 1 says so he like to find some articles to read I t 53 and they both break out in gales of laughter why did they last but so funny about reading articles about tumor protein 53 the librarian is making
a joke libraries to make jokes all the time the in 20 22 no 1 is looking for articles to read at least not in science and I told you I was gonna hold back from well predictions and 2022 engaging with the scientific literature is going to finally be like in the world of words of Alan Kay flying a jet plane through information space the you won't be at all like looking for finding and reading articles here's
a cake diagram P. 53 tumor protein 53 is circled in red over there and here's the problem should this another way to test this nearly 64 thousand PubMed that citations with be 53 in the title or abstract today as achieving a better paper P3 is not just any protein the kind come but nevertheless 64 thousand with p 53 in either the title or the abstract in Google Scholar if you do a full-text search it's like 1 . 5 million or something that's a lot of prior work to get on top
and for years now I'm not a biologist saying things like that so nowadays sets of relevant papers are identified that set of relevant papers are identified that surpass human capability for reading and interpretation and synthesis we can argue forever other that's really true or in what sense it's true but the role of specialization is always questions but at some point the circumstantial evidence strong part there is a real challenge in the immense quantity of information that researchers have to engage with I so many years should tell me how many I don't know how many Medline references there are in the database now and it's an enormous number it's growing very very fast of the of the blue line their total abstracts in on another scale the red line is the is abstracts citations abstracts on the cell cycle so specialized area just really really hard to know what to do against the avalanche of information so what we do
about the problem some I think something which is going on now and I'm not saying that the trend is new but I think I would argue that the urgency it is there is detectable acceleration of a particular trajectory away from finding and then reading an article which is kind of exemplary scenario in libraries away for or storing research away from finding and then reading an article and towards other alternative ways of exploiting scientific literature some 1 response are matter to this problem is how many little of data mining text mining literature mining for information extraction undiscovered public knowledge that kind of thing and this is I would say emblematic of the the flight from reading 1 another at approach it's response I wanna talk about data mining gets a lot of attention these days is support for strategic reading so this is the
background for our strategic reading when you look at what researchers actually do during the process of searching bibliographic bibliographic databases and things like that matters Scopus Google Scholar brother science astronomy database it's fascinating from and at the end and you can see almost immediately that nobody looking for an article to read come if anything to me and this is a little impressionistic this part of the researchers seem to be engaging with the literature is if they're playing a video game their rapidly almost unconsciously developed queries track references make variability rapid but subconscious relevance judgments they locating compare terms strings equations definitions particles findings depending on their discipline the whole process works some cognitive kinesthetic and trans like you were I think to while the researcher with them no galvanic skin response of sensors around you know do a PET scanner something I think you would have you cook it would be you would detect I'm not inclined to say some kind of trance like behavior after the session the users done their breathing returns to normal the alpha rhythms returned interval the heartbeat falls he asked them how was it you know was of I know that sounds like to and they say was good it was good yeah OK to spread productive and yet they found that injury and they're quite happily and they go and I think more and more on this we see in certain certain disciplines within science
so the good for that I engagement is not defined in article 3 it's not in fact it looks to me like the goal is to avoid reading and this is nothing new are be quick to say it I like to put things in historical context and to be sure now this was not something measured in bytes of digital technology indexing citation analysis for instance I have always helped us decide whether articles for relevant or not without reading them abstracts and literature reviews again helpers take advantage of particles without reading them the articles we do read in their analyses and their summaries help us take advantage of other articles without reading them and our friends are colleagues and best of all our graduate students can exploited by Kevin Help us to take advantage of particles without reading them
all that and Illinois in the nineties up there was a study of engineers behavior with paper articles and it's it's you know we see a lot of the very same behavior that that we see now with digital technology on nominal read this it's wordy but it's it's it's I think pretty fascinating but engineers do shots and this is students writing engineers describe a common pattern for utilizing document components by zooming and filtering information this is on paper this is using paper articles 1st they read the abstract then scanned sections next list summary statements technicians illustrations they just disaggregate in reaggregate optical components for use in their own work perhaps by using a marker to highlight perhaps by creating a mental register
shh at the same time that Bruce was leading an effort to study engineers on reading articles on paper and that should be in the same project actually supposed emphatically members of guess was that was studying I really engineers reading online and this is 1 informant but recording but it's difficult I use the sections of the papers for the equations I wouldn't have even wooden read all the other parts of the article I look for specific surface tension experimental measurements I sometimes they to look specifically at other methods in theory so but the point is that this is a kind of an in
quenched that's not new it's it's a it's an entrenched familiar where dealing with vast quantities of information I believe it's becoming more important more urgent and most importantly we have now new tools and new possibilities for supporting it before going on the whole arm I wanna read this account from some researchers
at University College London this comparison of research behavior with the channel surfing of a child and its long but it's very cool you wanna and this is actually David Nicholas talking it's he's the father and his daughter 1 is minded that the father watching his young daughter who was using the remote deflect from 1 television channel to another he asks why can she not make up among and she answers that she is not attempting to make up a mine but is watching all the channels she like our bounces is gathering information horizontally not vertically now we see with the migration from traditional to electronic sources has meant in information seeking terms we all answers and vectors and the success of Google is a testament to that with its marvelous ability to enhance an amplified this 15 and bouncing like a really good remote the analysis of the searching behavior did you consumers tells us much more than that it also shows us how people develop knowledge and most of the studies by the way were based on log analysis on and done the quantitative of measurements of actual all behavior at the interfaces a like mind the
so I been saying these these entrenched behaviors are now I all the way well not new on newly important and
it's urgent that we attended to them and here's why so
throughout the 2003 much of the last decade of Carrollton appear and Don King I had a number of studies some of which have presented a fascinating view of changes in how scientists are engaging with the literature to summarize them the time we spend searching and browsing has been rising rapidly from 1984 to 2000 up until the mid 19 nineties the number of articles with and I don't think they are being red but that's the self report from the informants was more or less but since then the number red has been climate were reading more and more articles really and think so and so reading time per article is dropping and in some cases in some fields it's dropping very fast down to like 24 minutes and these are long hot articles it's just not plausible that they're being bred in 24 minutes from copolymer at Illinois is also shown how researchers are using increasingly sophisticated techniques defined and used the information that they need from bibliographic and literature databases is also shown going to broadcasting studies how in different disciplines researchers in different disciplines behave are quite differently it at the interface and she also as evidence that the the of innovation in strategies for engaging with the literature is increasing you are a hockey
stick there's a hockey stick right there I mean that red line is of the average number of particles grade per year not saying this is the last word we haven't really on factored out possible changes in an article laced indifferent fields just a lot of evidence that more and more more and more particles are engaged
with but there's really not any more time being spent with those articles to the time per article there wasn't an appeal the so and
here's where the predecessor comes in and act I think emigre fan of logic-based ontologies of 4 publishing for of scientific publishing and science itself and as scientific ontologies use the phrase really broadly but conclude structured terminologies controlled vocabularies ER diagrams whatever as things like that become integrated into the publishing workflow standard terminologies for for proteins and metabolic pathways and genes and so on for instance on many new enhancements to scientific communication are going to be possible and they're all going to be sitting on top of things like jet these enhancements will include support for strategic reading now in hand now enriched semantically texts are also valuable for on text mining data mining searching and so on a lot of advantages from but I'm looking forward to the advantages for strategic reading I don't think we're going to get away from reading but I think that the granularity of our reading is going to be changing kind of away from reading just relying on on data data mining and machine learning problem and such I think we're going to find that
we're not going to have the flexibility of the subtleties and this has been 1st person say this that the have flexibility in subtlety they're going to be important for innovation yeah so said that of the breathless was
eventually you get the this is the great grand old dream of Alan Kay have flying a jet plane to information space but also call what they can of our approach to co-channel bottles of the whole catastrophe of missionaries search of includes things like computationally available data items that are accessible to specific discipline-specific tools advanced navigation and viewing in many of those things to me all as always domain-specific discipline-specific or even a workgroup research group specific type hypertext linking finally when at every happen from with the Acculink links stored separately as first-class objects and happening I interactive diagrams and graphics data driven these are not of course these things all exists now in some form or other above were looking forward to seeing them at
scale and really working in an integrated way SPK computable equations in a lot of
progress there from supported ontological inferencing and so on some of you may know
Textpresso for neuro science which implement some of these things and in in PubMed and Textpresso works particle by article helps you to it takes advantage of uncontrolled vocabularies and ontologies and helps you navigate from an article without actually reading it left to right top to bottom I
hoppers another example this time I'm working with multiple articles at once which can be kind of unnerving at 1st because you sentence after sentence after sentence perhaps each sentence from a different article different author but on aggregated in selected and filtered according to what they're talking about from what kinds of diseases what kinds of proteins what kinds of of genes and so on of course being perpetual interim
Dean it's my job to say that more research is needed Jeff assured me that there would be program officers in the audience so if you want to see me afterward so that would be great we really need to know a lot more about how how scientists search can read on the old approaches to understanding Precision Recall satisfaction are and they don't really deliver the subtlety of that we need in order to develop the strategic reading tools problem we need a much better understanding of how we can use ontologies and Semantic Web languages like RDF and OWL in ways that are numbered 0 cost us an enormous amount of money to and start managing issues with heterogeneity and different levels of specificity I am I'm a true believer number I love logic-based ontologies will happen but I I know at the moment it's objects from still happen obviously natural language processing data analytics can help us read articles on a wide variety of ways a long history of research there and dumb and recently has been a lot of interesting work on integrating data and article some and there's a lot simple important work to do just making sure we know a reference datasets but we really need some kind of new integrating blending approach have the equivalent of literate programming or something like that on this this is a very unnatural can archaic separation of data and
discussion right now so where where smiled
plug for my sponsor were working on these things at the University of Illinois gradual library information Science particular most of the topics I talked about today I have research projects on located at the Center for Research Informatics and scholarship or service has been more about these things like
and finally some having reflected over the last day or so on how hard it is to detect the sort of the contours of a revolution figure out whether it will
happen discovering that has happened in discovering that hasn't happened at all life 1 expected I remembered really truly wonderful stories which is
absolutely true happen a friend of mine which uh is a embodies a deep insight into this should so this setting is very important for for a mother and her young daughter are both sitting in the kitchen the mother is working on her computer on the kitchen table a laptop computer and the little girl says my when you were a little girl did they have computers and the mother said well actually yes In the you know that tone of voice that this sort of paternalistic parent to child was here on yes yes actually when I was a little girl they didn't have computers but when I was a little girl of Computer was in the and this and the little girl's eyes be and her face became quizzical and she looked around the room and look back to her mother and said then where did you eat I I and I know that sometimes when I'm trying to figure out what's
going on passing the question you know where we're going to see that you if to questions I think this so I'm looking at you 1987 paper no with somebody mercifully tweeted during the off sessions into 1 looking at it as an aside from the fact that you use in parentheses instead of angle brackets it remains a slightly different h 1 h 2 instead of sex sex act 1 would argue we have become that much farther from where you were In 1987 In order to get us to where scholars need to be find the information how much of this is that we mn having within the scope of Jackson how much has to be more intelligent search engines and how much it sort of halfway in the middle were something we haven't figured out yet yeah so you know that's the kind of question that really I don't know what to say because that that those are the things that we never get right I think we never get those right we were right about descriptive markup descriptive markup embodies fundamental principles of information organization abstraction in direction but it's it's conforms to formal grammars and on and on and on despite what Deryl's complement you like that on was a great thing but the question you're asking now that I don't know I don't know I mean I hi I thought in 1984 that there was gonna be revolution 1985 and dumb I condemn it didn't happen mil-spec skin I thought 0 this'll do it no DOD mill specs analysis you must that in at Microsoft embraced courts embrace extended as a threat for Microsoft embraces didn't happen to a long time and it's so unpredictable I mean as evident as most of you know when he probably thought about this you think about what happened the nineties with PDF PostScript and so on you know electronic publishing revolution was predicted by others not market-based just the using the network probably by trophic hills in 1979 and then in 1981 we had the Journal of online clinical trials in a manner that really was important to nothing was important until PostScript until every publisher was using prescriptive converted to PDF with relatively little experiments on the test no more about this than I do but so those things techno 1 next for the the Oracle at Corning money by can hear him no 1 expects no we can figure out those things it seems to me in advance they happen you look back and say wow postscript became the weighted to create page images and then it could be easily converted to PDF the the network was there adopted kind of happened I don't know if I understand that answer your question I don't know what it is a practical side my question which is I see a fair number which now trying to go into semantic tagging within jets space for example using things like named content to get off additional enrichment and I'm wondering if potentially that's the right thing at the wrong thing and I don't know if we have the answer at this point but that's very hard time-consuming and labor-intensive with that and the trick is to get a specialist in there who actually knows understands the content into the that degree of mark up by the search engines basically going to blow away past that in terms of what you really need to get information on the content the well that last I think that remains to be seen and so maybe on the other hand his some biologist and natural food name Tony Microsoft said you know you wouldn't have to mind data very 1st place problem is that I have always been on the other side of that thinking we didn't buried in the 1st place we wouldn't have to mind and I've been wrong as often as I've been right about the relative efficiencies that the so it's an amazing thing that these search algorithms are able to do I think some comfort I guess and thinking that for support of strategic reading credit than just a recommending recommender systems or something it may be that now semantic markup is going to be but more broadly and searching and finding now we may end up putting most of that semantic markup then with an 80 20 accuracy using machine learning techniques so maybe you know that they may just come in the front or back to original the end the dienoate expensive and to I try I try not to that my intellectual temperament interfere with my assessments I have a lot more fun with logic and statistics cannot be trusted this we and maybe it's layers all the way down act you know that's a tradition that there's been a lot of computer science information science research where you combine these you combined things combined in for better results your comment that you've been waiting for a long time for things to actually happen all but back in the seventies and eighties I think things were happening within you maybe give credit to I mean I work and I utams back in the back in the eighties and transforming in the aerospace industry was all over a system based on SGML content provided a phenomenal resources for look consumers of the content of the thing that made it held it back maybe was authorship for those systems which include credibly difficult preparatory work difficult to treat it looks so much effort to build the content that the ideal includes consumed and then you know a someone mechanical aircraft here could consume in and so you know all most of which you presented talked about the the readership changing in how it goes about its business but perhaps it's the authorship that needs to change and how it creates a lot more to content means certainly true in the early eighties we didn't have it was rare that anybody was using this syntax editor to create part of text and then then there were I know it's digital following in places you know and then SoftQuad and so and now most everybody uses pretty of sophisticated syntax editor so that eliminates coming that that helps me it may not solve the problem but there's certain kinds of mistakes you can make an recognition so it's easier than recall yeah I know that I mean of as is that I was delighted when the mill specs came out in the look and large body aircraft and I have been working in factory manufacturers were using as dual based in the technical documentation and tools it was great I was glad glad to see that the it's not the residue difficulty after you take back after you make this index editors good as possible that may be an interesting there may still be a barrier he know their partisans of but you sometimes here people say that the scientist should be doing their own and and really and from but so there so even that even with the best and exited in the world is socially in terms of efficiency I just enough of these empirical questions some it was interesting I worked on nosy engineers they would evolve the content people were the ones understood the sun B that knows the beginning right the was tech workers the wrote the actual walked up kind right right and I know a lot of these differences I'm actually married to 1 so I think 1 aspect of this problem that the always hang out all and even even even layers all the way down even with that he said mean 1 1 of the 1 of the things that so challenging is that we have systems the evolving at many different layer levels of scale and inertia in interaction the inner interoperability at the same time so I you know it it we we saw this is problem microform shape when China was telling us about the other day about the redlining out of the you know of the highest standards where in in order to be the redlining he had to have a snapshot of the way in which the market was being used which of course introduces problem if the market's going to evolve and so in other words that you know we need a system which is got sufficient integration up and down the stacks so that we can have these models functionalities here the same time if an author has a new thing that he wants to the higher later display in some way he should not be prevented from doing that is the system can't tolerate that that level of changing micro level right so and how you can designed and built work how do you allow a system like this revolves choices at that that kind of machine can happen a list of options time 2 great problems I mean it's a great great problem and unfamiliar were saying solved in small ways and descriptive markup projects most of the it's like that of but I don't know and what general solutions look like but it's a great it's a great problem 1 of the great longstanding challenging problem the apple in that kind of evolution just 1 more that demand was not high enough the queen of literacy the things 2 main factors suffers from a long time Ural publisher if you're librarian if your bookseller if you're a robust in the warehouse than and you think but here is a report doesn't do you think of the book the journal and also as an indivisible units the thing that you it's an object if your research you have a different view of course but in the end I think to the future of the book of the journal the article the magazine the future of the documents is the end of the document is an indivisible units right well to break it apart and think it differently so we can fly through it's in your analogy was the lesson so that we can fly through its annual enlightened yes yeah that's a attractive notion and and for actually 4 least 100 years it's struck up now and then historically in particular it's been interested in decomposing you disassembling from documents into their units of meaning In recently is its Marlins were filled on forget talk about some nanopublishing 1 triple at a time come so that there are there are lots of puzzles there to because when you get very small here we have this rhetoric of creation for text but the smaller you get the less applicable creation seems means short sentences that and of created them of aggression they're just there then after white it oldest there but I have I I told you beginning I would be doing this sort of stuff but so this amongst the problems with the disassembling documents into the atomic units of communication you know Of course you lose some narrative structure or or or not I mean you have you have that much you want but it's a new world it will be a strange world to live in and I don't think publish 1 triplet of time it would be of the advantages of course you can see whether or not anybody said the same thing really quickly with that when your search which it's not a discovery something else said you know xyz but be strange world no matter what but maybe
Strategic Reading, the Future of Scientific Publishing
Untertitel - Something for everyone
JATS-Con 2013
Teil 16
Anzahl der Teile 16
Renear, Allen
Lizenz CC-Namensnennung 3.0 Unported:
Sie dürfen das Werk bzw. den Inhalt zu jedem legalen Zweck nutzen, verändern und in unveränderter oder veränderter Form vervielfältigen, verbreiten und öffentlich zugänglich machen, sofern Sie den Namen des Autors/Rechteinhabers in der von ihm festgelegten Weise nennen.
DOI 10.5446/21803
Herausgeber River Valley TV
2016
Sprache Englisch
Produktionsort Washington, D.C.

