This lecture provides an introduction to the fields of information retrieval and web search. We will discuss how relevant information can be found in very large and mostly unstructured data collections; this is particularly interesting in cases where users cannot provide a clear formulation of their current information need. Web search engines like Google are a typical application of the techniques covered by this course.
There and welcome to the latest here is information retrieval and Web search and and and Ruby teaching the schools in England because we have a new international might just programme and in any case it's good for you to know that some English for a future business life anyway enough because the companies are getting more global you don't and the easy to work in into disciplinary teams and the the of speaking the sentiment so that shouldn't be a problem of us that we should be more concerned with the topix of war against doing this lectures and them want to do to date is basically use introduction in what it calls will be about enough what topics will become Loewy's in Poulton the recovers topics and how the scope topics compared to some of what I've Ellis's there and databases and inflammation retrieval so I'm IBM IBM Dave Ofcom would use the term information retrieval during the stop because of the of the celebrity will information retrieval actually is and there is a definition of what we we recovered a couple of destinations from from different sulcis for example here this excellent book by Christopher Manning and other those with just like 1 of the lecture of 1 of the text books for the cost Britain have real textbook that we keep to string and the but you knew what will show you a couple of books that might be helpful for 1 topic or the other but that is a very good introduction and Christopher Manning said that Iyer is finding that the real usually documents on unstructured nature and you text that satisfies in information need to from within large collections usually stalled computer so we find a lot of terms you would find finding look for something we also want to find unstructured it's not like and databases will be at the table that and and select from where but it's text document that something like that night to find information within the pages within the steps of that said the size in information which would allow me well information need is we have a question when we look for something we have toss to fulfil when we look for some time something very abstract and we don't know how to sold it yet and can do something very clear want to know the birthday deaf some person very the atop a want to know about global whelming open tough what that means in all of the definition of global want what what were about document that interesting with suspected global of a different opinions from by his and so that is what characterises and information very weak on the open comes and the last point is a lot collection and not in this case means you sell information to stop it from a couple of pages and to works to leave her Abidin Petabytes of documents the idea existence the and that could be keeping the always good for some definitions as a sign of searching for documents for information within documents and from mad that about document as well as head of searching Relational databases and the with way a pet some some hold different doesn't so we don't find research said that that is a big distinction to make time for documents all information within the queue so we can that the dress or book like a want this book that help me well will not particularly but want to know how see Rene 1 goes or something like that to inside and so documented the holder up you information within in documents and inform had data about documents whose wanted the and I guess that's exactly the information about what inflammation and on a matter of this is about information that she was happy there which it under the topics that cover might help me a little bit more even over whether the order is an expert in the field for some kind of behind extol just wanted to get his book published enough that smashed the information about the information that is contained in the book likened the quality from the and that is what was needed so this is a very interesting thing by has to searching relations that this is a group that ludicrous Schnabel here because you keep amid a can be can be exposed in relation databases unstructured document are usually not exposed Relational database I'm not like the the world works by this is 1 of the largest document collections currently in existence and the circuit Merriam-Webster also very nice for definitions of the techniques of the starring and recover the and off this imitating recalled that especially for the use of a computer or a system that seems to be totally different from the others business 1 wants to find a new of the other wants to sumps and this 1 wants to start and recovery now and the end disseminates seemed like sounds like like the European Community proposal has also were Equitable about dissemination found that the act of different of causal when searching for something I'd used indexes indexes need the stories of on the other hand indexes the PM at to kind of agonised think so or of these are correct there is no 1 definition of of information retrieval dissatisfies pulled me but they are race in Paulton's and interesting point that is the thing about and my friend the storage and and he so that this is about starring recovering and disseminating lecture and it will happen sometimes sometimes more from the and and now suppressed which if again at what would Paul point laxative sell and another about definition is some eye the part of computer science which studies the retrieval of information so information retrievers is the part of computer science that studies the retrieval of information that kind of interesting and very level you written not that was this makes it much clear from a collection of Britain documents the retreat documents Amos satisfying use a information I after we have that before the end usually expressed a natural language but also interested but not structured as in number of salt Nadeen's all addresses of something like that of the structure you tell me something and a half to the right a 5 to 6 for information from what you tell them that kind of the naturally which how power is the only interest in how that this in this definition under offer so weak weekend kind of Kenneth come to a conclusion information Rodriguez about documents about unstructured that about text about large collections and this is something that was very very very common in the information we have to have an information I sold users a dressing and information retrieved and and would have had Eastaway idea of what they going to do with trying to get it is about storing document searching for documents finding document and it has something to do with the world wide web rather than of UK and but if we look at the release database word many of you have heard German Britain's best ones who with with relations with 1 of so many interested but most of your committee was racial databases relations with the bomb so as a different relation database looks like this you have an Aesculus statement up and select the from document title like the men but something like that and you get an on that shows you the document was idea 45 of blonde is about the semantic something to do with the title when he was a regular expression contains the were semantically and what does it do the database is a structured starring information into exactly No in a database of the title few because of a neck tribute find saying this is the title of a book document of what it can and will look for something for example books about semantic where by computer regulate Preston setting up a demented were trooper somewhere in the title and had to be well-defined number of books where cemented work across the aisle don't know if it's the beginning of the total and the title of in impulse part of the time so that the book is about the cement but it doesn't hurt in the type of car in the latter case idon't retweet this book still might be relevant to the even not having semantic weapons and that is what I'm on in a way that saw what we do is we retrieve all inform Asia World update rather than to the information and the notion of red events as 1 of the notion that will stay with us for the rest of the cost what is rather than what is not it's not like in relation databases ways say OK the idea to be 14 and that it that the fine would having awake information the and having heavyweight description of the document we have nothing that as well be fine we just have booked kind of strange notion of residents and this is more of this is less rather than this is somehow all rather than in a not think that we have to consider it and that is also reflected in the grid is that it's not select from where likened to be here but it's about critics in which is say it should be ready when the suspected semantic but it in any way mediocrity the title may occur in the abstract the book should which have some the sights on the topic and what we get is a result this would get in the scale is the results that time and we know these results that for example from well but type in semantic where I'd get resultless order by rather than some the residents some Somerhalder determined with a lot of funky algorithms and we have revisit some of them during the selectors to see how they exceedingly to see what inside a Web search and very very exciting time and then we get the optical selected the Coppélia entry for information for the semantic where all of the work records also has a Web site about the current cemented approaches activities that the US appalled pink and those off some in 1 way Buddhist not even this ranking is Tokyopop wouldn't know what made who will take the competing and as the 1st Briton to win the 1st Test to get in the way of get the 2nd part of are lecture will be Webster each information richer and herbs and what such is kind of a kind of very similar problems because of from the early days on the Web was just seen as well this is a large repository of documents that is distributed although multiples are basically the problem of getting documents rather than document from any of the service is an information retrieval Paul and sensitive you could Distributed information tree and that was how it was seeing the ball for for quite a long time recently we of mobile approaches we will also go into on to to naked Lubich of to to change paradigm but the basic the what it comes down to is that this kind of Distributed information Tree high some differences to inflammation and want this very in Poland is linked structure we have the hyperlink structure that tells us that whenever there was an item that has sunk into do with some light may put link between the 2 items that while if you want to read something that while the public click here and this week the 2 document high level of details for example the idea want to go into this if you want to know about it it's just 1 of related topic click here and this leads me to a document that is somehow related probably not more details but just a little bit off the topic of though it could be some but what would I do in books was a glossary pointed out if you don't know this term click here for definition so so that is what is used in the sand and we accept that glossary all the footnote invokes we don't have that in annulment of this is a very for the verdict of ideas and and the sitting structure of goals and and should be exploited went looking for what document so it is slightly different from from just being distributed information and the the same goes for all the way the stuff is stole from the if we have a library it collects and documents in a very strict everything's index and you know where it is on the way that someone difficult enough because if it is just the way multiple but the overall numbers of ideas to know you you take down with pages you put new which up you change them New York update demeanour you do a lot of things that this very introduces and large degree of of of flexibility and dynamic in what you do with a few which approach would not do with a moment of human because changing a book that making a new additions is a lot of work changing something on your side you can be difficult time on the other hand it is also home and people use library for many people use where Webby's used all over the world for all kinds of ways it currently the 1 single inflammation salt that his pre eminent throughout the world
And actually I went to university in India and after 1 of my 1st question so so where's the library for Computer Science and that to me is that the library and do not we have 1 that I'd really contraband is it with the mass celebrity Elizabeth of the with Engineering library and should be somewhere over their had visited for some time because I get everything from the where everything that his new everything that will go libraries and totally different with other classes was of discipline in history of something that will still be the 1st point of entry for the Humanities at a lot of Library in computer science and well and then we have a big friend span all usually before somebody right to book like that and get the index by a library Umistim and she will make sure that very interesting to reach for people of the West nobody but by and and then nobody will make a new Edition nobody will that kind of had a you will from your fund publisher for that move work so I should have something to say about the future of different area because pudding of a website that require much for the good put up the most nonsensical websites that you ever can't imagine or you can try to sell some products such as a side line that you will don't to books and this introduces law noise because it is selling something 0 2 1 2 0 1 2 have you folk fun on people who are now you want people to visit a Web site and was a lot of tricks how to get people visiting your side but just pretending there is information that they might want to know for satisfying information by an interest in the Blue Ridge review some of the techniques also in the squad but the should you by hot about this was imposed for you but the party is that it gets more and more for them to deal with the unstructured information in the business so that got group for example so that 80 per cent of businesses constructed conducted on unstructured information 85 per cent of oldest order skeleton and unstructured full of 7 million Web pages are being every day and it every day for the but the group unstructured dead W every 3 months and if you imagine the 2 big of cases know you work for some insurance company of insurance claims by a very unstructured white even if there is a way for me but the hockey is if but what happened actually all was paying attention and then I'd go up the mean babies and non and the on oath kind of his the your eye things like that to get a insurance claims for would you with
Pouring in that they should Kanye West the idea was the structure behind in the database he would have found some columns like all the item that also destroyed well Amin made the value of the item that was destroyed well didn't say they have to extract in in Emmanuel fashion rather than information from the insurance not going to happen because I just too much for to the still left with filed by the children claims to discuss written basically estimated at these on the other hand the it doubles have everything been and are interested in that exponential nothing really information so that is they exit Baker the discussion going on whether the dead W or the inflammation that is contained in the dead doubles big difficult to adjust get lost the same old way get rid new things new inside new knowledge we that is growing at a much slower rate than the actual number of publications the actual number of that at produced and it has been very often compared to try to drink from a fight ahead Hydra in a year that the real battle for coming at you and you can't just said tenderly from it and you need to get the the said that interesting for a from a finite and everything's kind of like it all water but which looking at what new information as published most of his totally a resident to what you are interest in finding that was small bits and pieces that might be interested in this very valuable and actually has some businesses that set at at very valuable for companies to have people on the list that detective exactly the right information that detective where fraud as and for that I take patterns in the information and those of very highly paid professionals and if you pay attention to learn all about that and and at the end of the cost of yourself as a list of the top information given companies who is that the talks information visiting companies here they our by Google number 1 employe in the last voted for the pain and the time in a row over the best employers of the whole new and they do we interesting thing for me and my can Seybold who will only 1 likes and many controversial topics like the with street you when you need to know but still good and then old and that the player to be reckoned saying goes for your from 9 America but more local more Moggach that it has been in Andy Europa and especially in Germany Germany with the highest my picture things 93 per cent of something like that and so that is a little bit by in other countries but the difficult being as the new Microsoft and and that the law and the and search that intake of really there also smaller and more specialised companies cost for example cost of calm on for for questions on thinker and if you see the smite along the face of these men anybody knows who they are and they are the group were found with their Peyton such Britain exactly and their young intelligent and successful your young and intelligent and this calls will fall we make a success of that and that you have the bundles some money to go off street viewing of some the again this basically a very good should be of calls the 1st you ethnologists what you should be interested in I'm not so much money that it should come come from the of the last thing to solo someone as a study shows where 30 lectures in this calls and we integrate the exercise intellectuals so we like to plus 1 lecture and that through integrated in what we call the to some those lecture will be Wednesday's 10 o'clock of on Wednesday step up to of 20 including 5 and by the end of the world and found and we will have a fine exam for both betulin must still the bench to win 1 must diploma were found it so that will be important for example except for the film about a man who do the diploma exam at some point or exam at the end of the Cold Pollark there will be homework exercises that will publish waste of told my but we count grated any more so I used to be 1 of the of the idea that you should try to get 50 per cent of the homes were points to get yourself for the exams and this is what recommend the bomb but we Collins Stewart was the the current of guidelines for for examination from the Ministry of so we we come to it in a mandatory fashion we have to do it and I'm voluntary fashion so if you do which very good and we will show the solutions and discussed solutions with you at the beginning of each selected by of it I don't do it amino icon tell you to do it it's your own responsibility in any case I've seen on a very large degree of correlation between regularly doing homework and assignment and possibly or early because it you don't do I owe more now like you you you will have to read all the lecturers you not him at the end of the term and that all my guide and it so much that you know like a Hermite and this is why the in or exams quite a lot and I'm not please and you will not be pleased to after words and that the health of the sometimes who also practical exercise the and programs something try some algorithm from existing called for a time and the idea is that by by playing around sitting around with the parameters often algorithm you would get to know how would actually works out what it does and that the rate for the cell would also recommend to the and the interesting thing of causes the exam because the with the great and found out that awaits have to say that preparing for quality Sumwalt different than preparing for Britain was ever done or early them from of some of will agree that can or examines the trampled talk about public to be able to talk about how can you be able to talk about public if you never talked about it before soul doing all the exercises and reading the site was not prepared to go for exam should talk about the publication discussed the but we should explain the elderly only if you can explain something I understood what actually does
So full small groups whether to people 3 people for people it doesn't really matter but for groups for rooting for discussing among each other and that will be the best preparation for the exams and the rest of peace of the end of cost to the homework exercises specifically designed to help you understand some people want to stop and the more you understand last year to earn in the end and we know it up and of domestic every lecture that you have to have an exam so I will have all the exams at 1 of the morning if you didn't work during the term but that to show the main volume for this costs of get this Christopher Manning right government should the introduction to information retrieval time and it also he should be available online so that the very good thing very good starting point if you are contraflow any of the algorithms all want some more detailed knowledge of want to see some more and this is a very good starting point for for between the time the 2nd book of the boys like this knowledge information retriever from recover by radiates and then Rivera nature found recalibrates is actually the head of Yahoo's European lapse in Buffalo on the a so he knows what he's talking about because he tried 1st hand in the industry and so this is a rather practical perspective on things under a Robert lawyers and a rather different opinion is about is the way of finding out about cognitive perspective on the search engine technology and the World Wide Web also nice bloke who over with the different on the different ways of tackling the probe more from the use of respect
The rest of the idea of what is resident and and what information I cover very well in this book and there is of course the classical broker he some right back information retriever if you dates of 19 79 Valley asking why this kind proposing a book that what thirties who would because much of the food Algorithms for information still in place and that the nett that they worked on a number have and some entirely new things but most of the old things and and that the main Area for information that was basically the Sixties and Seventies of some of the big breakthrough were made the time are covered in this book and very well explained and also available online just have read some of the very interesting to see that what we doing today within the introduction of some fundamental notions of will want to do that then we will help right into the thing retrieval model of all will talk about the rich recording nations have a method of just based more probabilistic retrieval most and index to speed up the retrieval late semantic indexing which is affected station techniques for the last spring of document that something to do with the death language model the valuation of the Tree will model rather than the banks and the user in the group be helpful for the flow of the river retrieve opposes the classification of documents rather than non ruined the easiest justification I'm that all Everett Tree paradigm has to deal with but there might be some different views of biases the interesting to specify and will talk about the machine learning techniques especially support victim machines and say in the last couple of lectures will go up from the basic retrieval technique to the but retrieval techniques and the easy because we can't directly transport some of the techniques that are used for distributed information retriever to the Web which is attacked and high on some somehow was in Poland things so out for example of the way crawling and link and other this ought to be the problems and there were have coverlet lecture for for miscellaneous would do a little bit of spending tection and and help improve qualities of the remains of the day to say this is the plan for the services that solve this lost my preamble that stopped basically with lecture on any question if you may also for groups to make them work
You don't have to to read it in full of the sort we all get along the way way but when you look at the whole thing so as best for this cost that this is a very intraepithelial walks if you have any prop into view Dome understand if to stray Santos tells about the grid directly interact I guess The difficult for me I'm the publications before the and you can make the other command of the ship that they would stop right into the thing from with a brief history of libraries information from the Web search their on to some fundamental notions and show the 1st time yet we will actually introduced the 1st eye on 1 or 2 days a very simple 1 but so it will goes back to the land of Sumatra that was around 2 thousand 3 thousand before crisis and it was the 1st recorded said where the library but people collected inflammation of for example a in this case it was a it is known to be about 25 thousand late tablets that were stalled in the temple and the mean religion and signed into twined for for quite a long time and I'm this the 1st recording instead of having a collection that of 25 thousand tonnes that is quite a number of things and and interesting he is and what it was it wasn't really you know all the time for philosophy all aren't or something but it will from a practical say it was time record solo invent freeze of whom like foodstuff and commercial transactions treaties contracts that kind of stuff that it was the 1st incidence of Britain role in a wave and a window that the standard more or and 1 of the Kings and the at the but later that is 1 of the 1st written the attacks actually that everybody could point at and say no its has like what I'd want to have here and so this is the 1st 1st recording and the and the same happened to the EU about wide after that in the
2 inch Egypt sell are also here the Temple of the collected information collected not knowledge and at some point around Brignole mobile times actually 300 before Christ the dividend so at that time was actually Greeks and away and had a wonderful idea what it would put almost each in the word together in 1 place but it was actually the idea as it was under the has on a date in all an extended the great was cut greenhouse of the world and she was so surrounded by by very clever people so he was executed by her studies on 1 of the great 3 legend and that it very very many people interested in signing the and technology and so he said Well IMI Egypt is the land of wisdom line up to using Alexandria ports cities and in the job as the place where every piece of knowledge every sprawl that has ever been written has a copy the and and so at the height library had some 700 54 thousand calls for 3 Hunt before Christ unimaginable of the feet worse off an exam the great and the found and going on its kind of interesting that the debate in the library of the legs and real destroyed in Roman times and it was just you know the several several fliers that it will destroy some calls and most of the text book of collected and there are to be lost some of the text worth how rescue for example of the many of the Greek philosophers lot Williams on because they have been copied all the time of the full of Roman I'm usually in aspect libraries whether they were a very selected that in copy everything but only of the great and the good of the old those that were into with the Christian ideas and also many of the ancient and the Houston text the pagan texts are lost 4 because there was simply no up but that some of the most safety and that is good and actually very very important work to save them and very many strenuous work to say that because what he basically had to do is get to copy the by half of the book script tool yos which basically part of of 1 Street Austin where monks that could read and write doubles skill and was not mandatory for 1 thing was not not to many people being the able to a 2 0 2 0 read and write well under book hoping some of the volume of work and exchanging the Williams between the different libraries for this was the 1st time basically when the information wept became distributed to the idea that led that carried off to a good Andrea and their days after the 1st 5 other people learn while from probably not clever and here at least which would have some other places where the and of abuse of things and this was kind of like a network of 1 of the reasons during the melee just found where your assets and some central point some focal point for example 0 0 we we see that the take up of the team at the University of Heidelberg we see baby attack the library of the salt 1 in Paris and 1 of the biggest walls of cost the Vatican library founded and 14 75 found that in fact containing much older collections for example take up of of the team that it is also part of the Band of the Vatican library today and it will took off and point 15 when the youngest couldn't the invented a wonderful thing but it's not totally correct what was for him he did not invent the and the printing press is a technique that it was already known in ancient Egypt but what he had to do they sickly you had to come all the way she and print your books and after a couple of books that was kind of like the wouldn't stop loss spoiled and you call for a new 1 thought Gutenberg's actually invention loss was movable types so it took led and the type from single letters and he had a way of putting those types together in a certain order and building your printing will from that they used for printing and but it spoiled of the couple's friends you just meltdown de led and get you letters from it it just for that put it again in the same over and start printing and sensible smacked of was somewhat more during a book and then the wooden printing blocked and you could win more copies you can print copies easier and and political cost so that copy of books became less expensive and it was possible even for private collectors books which was a very expensive thing before that because he needed amongst and copy the books by and unit and painted beautiful and pictures and if you want to see some of these handwritten books that very good idea to go to but which is kind of like 10 kilometres from the the of 15 and see the old library of what but they have 1 of the most beautiful Williams handwritten of the middle age on display of the treecode thing to see that that well income was the German National libraries Sellaband of people so that collecting all this information is basically every state should do every country has a library systems to educated people because knowledge is part and the mould knowledge to engineer as and scientists and high skilled workers year view Maloney you can make and the Betty economy authority so it was considered that of its basically a a governmental duty to collect knowledge and information and 2 handed to use it as this is why many of the National libraries were were found the German load German national libraries distributed but the between Frankfurt and Berlin was having the biggest party in the White in Frankfurt actually
And it covers today over 25 million items the biggest library of that kind is the Library of Congress of national library of the you S was 150 million items and it's not only the number of interest its the growth in because it not contains of the year and now content about 150 million items that sent to sell the line 20 million new items were growing at a rate is incredible being able to handle that it is possible that he with some automatic means with some Computer Science you you can light and Load began on full of grit and getting worse of according to the Guinness Book of Records this is actually the at the world's largest libraries and it before indicting all these some items it introduced a classification system indexing system library of Congress justification that you will find in any book for something of here in the pages and there is a certain classification where is so down here where the interesting part of the book the public that are covered by shown in a numerical system and so for example you had takes Processing in brackets computer science information retrieval document testing semantic where and the number of the author and some some real numbers of the 2 countries recognise here but that to have a meaning the justification of just for of finding just 4 indexing the information and there seems at interesting point the items Ibuki catalogue by demand that and that's a very appalled of public very for terms mad and that is information about inflammation describes the information that it does not refer completely to the text that there but who of the officers who edited the vulgar pocketed beat the dressed so he usually have the eyes the and number 4 books to identify recently you have the digital Object identified the week system where you register of jet that can only be books but but also videos of experimental set of any doubt that the team might might find and some of these identified except for those information that are basically describing the book in a way your also have a very interesting information what it is about what is to come tens of the document before a company information tree and because content and the different and and the area of can be seen with a certain point of you you have to classify was actually meant of the subject area for example information systems of computer side further classifies the term information Ritchie that might be a disciplined in chemistry called information will probably be totally different from the information retrieval reuse he and computers and to make it more clear but that is also and then you have the specialized classification systems for a company library of Hong Kong with this what I've just showed you in the Prix phase of the book and its as the title titles and who wrote it when was reused it has of 480 2 pages it's about document persisting information retriever period was text 1st testing as computer science information review document customs semantic where and the area covered databases and information systems and these are the identified the numerical identified that can be resolved by using the Library of Congress thing and so on some some some of links and then you might have some information about the library that contains the book for example can be borrowed of is that is where is kept in a row and stuff like that you can actually find the physical copy of the book
There this information is made available not only to use but also to computer system that kind of in a way it's machine-readable all some degree Machine understand that Supreme pulled point for handling the information because you come to a computer to some degree and of cost to see that the man had the Priory and of the book identity about the book and the say is a good book and you should read the kind of personal and 1st opinion of but when you have to describe the book when you find the but we need some more information and assumed that the low quality method that somehow somebody has not said that this is about information retrieval but about what may be a all about information searching the whole thing gets a new idea and there is less of a voice 1 of the same of the proverbial saying from from library wants to take a broke out of its shell and just randomly put it somewhere in both never be found again this No Child because it was to go for a volume of the library and actually libraries do that once the Fiat although the book's shelves and and read you will operate because otherwise they would not know where half of the book because somebody just put them up for the summer and has no chance of finding that this information is very interesting for finding and and want to show you I'm a example of that is down the suspect to the topic of medicine was the child of a lecture indicated he the through part so I show you some law practical examples of with things we just discussed the usually done so and went to prison and the matter and that Met lined on and condemned Fault medical integer noses and refuse system on line along and the deal is that followed literature and knowledge published in medicine on Scientific results are collected in the such a digital database and each document is that manually classified by a lot experts that credit some very complicated and sophisticated has a case in system so the that you now systematic becomes can search if you doing research in the last science is a menace and you can do research found that exactly on the topic 0 currently working on and it has occasion system used by Medline is is so complex and so early this get updated every year that you really can can be line on that you can find computer into a review of the current state of knowledge in the euro particular lead out of the discipline
So I'm under his funded by the Government last week by the National Library of Medicine and currently has about 80 million records from 5 thousand applications from the field of medicine so large under books of conference proceedings published in this field are delivered to the natural Arriva of medicine and there are hundreds of experts literally hundreds of experts reading was mature and closet it according to the dedication system so usually when you want to work at the Medline Hugh euro at least have to to have all Mussa's degree in some last science-related featured of the pitch to be and usually the trading at least half of the year end holiday used the Medline classification systems and business so complex and the basic and higher over its highest Pacific documents have to be classified according to the system and it's really important to them that justification is consistent and stop stunned correctly by different experts because this is the only way to find this information later Sell the said that the document is many index and usually where he worked terms are assigned to reach document so it's not just for the money but at them so this is a debate about menace and but it is usually usually very very specific on their details level so I'm that database is available on the line and the and take a look at it Just to get an impression of how the works and the ability imagine 9 is out his an approach to transfer classical it library ideas into the digital age so used as occasion systems and smaller and that you can search for under no point to get this is because the related and found think the problem while I don't you to swine flu and the tide is just because signed signed to would never use the Cloaca term so the correct Thomas summary later to stage a one on one 9 8 1 and something is so that will class off the of viruses using your and you could are so you don't lose some some hints on what the possibility of creating refinements could be and these days of NHS are generated from the classification system so that they have some something stood that at the time swine flu and the the Labour associated plans to me quiet a 1 and and 1 0 some this annuities become communication so he lot of documents that they could look at this 1 and you can see where this document has been published initially showed again there would be some time China where it has been published on the automatic matter that we just easy for the 4 was for the 4th has collaborated with the title The after the abstract bomb and the classification nothing but and this is what control system means that has not been tough for me yet a wise to do we would have seen some some more detailed physique Asia information
It like and find inducted agreement that it's a bit further than that the public has been
Processes Oradea try different search terms to sell the 10 balls and closest so offers acre really have a lot of problem in a indexing almost an hour of the day cricket so American find a way to search for whom order publications
But 1 surge Marie
Dein's that has not that's not very easy to use yesmanship as may be but a guest for the advance surgery in the summer and the to parallel of money which had wanted to go out now
Follow the L mandates prevention of no idea
Supplied the publisher head of print of the new that new that there should be some sometimes associate each document so and they started by dates memories of the last page 19 52 with next time line yes this should be a real looking for mesh terms mesh is the vocabulary used by mesh-defined and currently that to would terms associated knows those of is a quite a documentary Dave used to much less terms at the time of indexing than they do today which would diseases some load up and mesh and who is seeded descriptions of the term so and the and the thought the related to the road and I can also see a way out of this term diseases was is classified in the high recognise well off the mesh has occasion system so it is in the disease category and in the it was the disease this category and it falls into data from the diseases class and then the for diseases and that could be a different kinds of food diseases that can be distinguished in match so far as a canopy in some other categories usually each keyword Melott are different branches of the trees and has written Samaha and industry is really really not this is the top level and and can go all the money humanities category and then you see all of was subtropics off the amenities example
Here again of subtopics and to the UK which is a post office and different of the transportation system so this time mesh looks like so mesh means many subject had links with the control of a Parallel you just sort and currently about 25 thousand individually Keywords in this large so not now it is a prise and that he would be needed load training before you become expert for classifying Medline documents you need to understand how the system works and what this really meant by each in order to keep that he would and how to assignment to documents in a consistent way so as it seems erratic and in addition to these 25 thousand on bass handles the cars that are 100 plenty of thousands edition of the works that hours and the name stuck sound so swine flu would be a topic assumed that were brokers summer on the database and it would be associated with the the correct medical to fall 1 so sewage 1 in 1 of found it 1 in London may be would be would be subject heading and which subject heading has allowed explanations and and if he would like my flu
Metlife mesh gets updated every year on the day of the year when wrote in a note and as the ball flew this rather than the whether this same read just sort
Interesting its use the mesh browser
He began to research in the concept and when I'm not searching for swine flu but they have indicated that they have had the in next on the ground that there would be used as a private virus a which 1 and once uptight swine origin influenza and it as the group location in the train and that 2 positions in the tree where confined this handling the distant invitation of defending last what is really meant by and some addition a description so named losing this mesh headings alone is a lot of work and has said it updated every year so slightly less has been ordered to mesh into side and 5 and since the end of assisted in the tree so that are the time that the desk that get good remove at some point that you can't see that it wanted to stop the 5 that only or the next influence a rise and into violent and 5 they decided to split up into Emeritus of instigated distinguish distinction between different kinds of influence the viruses so know it's not that the special category because there was a time when few other first time Walker to the public
So this is Medline and match the men who work but this where you're able to reuse find which looking for new conventions and and medical science is the you would be on its its hot to to could do to keep track of particular scientists and usually few working dispute to have such a good into Medlin mesh and you can use this said fine but you owe the before another approach to to are using method that is diving clad method at initiative found that rather generous framework for and stating that kind of measles sell webpages books because his and about its the alleged that of the general general approach to doing that and that Dublin car basic possibility defines away all up the storey of debt in the structure for me so I'm good at it suits and either standard and Dublin it's not Dublin island Dublin in Ohio where some Workshop for what was told where the manager has been designed so that 15 minute and elements of the of the researchers who created the recess much the subject of and other things that to up to rule colonial it's the right when using this research results allways who were walked what aid of area come about this research and the rather general with writing so Dublin commented that can be spent in the party and tax sometimes you as a fine Dublin commented at the beginning of a new to web pages in the at element of the new demand which and you start like this you have just the document by the researchers and then each of these get you could do is just showed that could be the value provided to it so 1 type to create a one description and so on and so on so
The and this is the way currently method is used in the way so many and then that is not where we are doing protect such this lecture about the grid due this after a five minute break so we will continue to add to 20 at 11 20 set of causal ways of this lecture is also recalled and will be available on the on the cost home page full download and streams browser so you can see any of the command and and any of the slide stuff for the fact that that if you want to revisit some part of the that said of the up these ideas comes to mind when looking for information is that if they pay these catalogue cars that libraries usually do they are kind of like the minimal representation of the Book of the document and that usually is this kind of like and if a book is on the semantic with the terms cementing where should Walker somewhere that kind of week you eristics to to get what you want but on the other hand when you classified things you have to use of controlled look at the head to the site for some firms that he will useful and and other terms that you might not useful in determining the for computer is a very strenuous rose because 1st you need to discuss among X but was a good term with well not as good term then you need to train or the indexes in how to use this to trip hop distinguish subtleties the suspect to the index and then you have to go through all the document manually find out whether some term applies or does not do so on the problem with this kind of like this makes for a lot of manual work that you will have to do with the time and having a computer hand it might be much more easy not to use the the classification of or the just look for the word in the text of Soham as showed you before you can't of the semantic wept told the book that contains the term semantically in the title in the bag somewhere in the full tax and is basically full takes retrieval does so under this also has ideas not entirely new Amin also monks and at the latest came to see the benefit of that and what they did was so called concordancers and kind Commons as the alphabetical list of words that are used in the book and you reuse ficta some some what works and and there have been other concordances Fault many of the of the interesting books in the middle pages that was usually the Bible and so they are kind of like books ways he found where certain terms for example of the Malta that she actually augurs well for example employed occurs in this by the part and the the sentence book where president is Christ known as a more and in this part of the particles the mobile must put on a motel so of across reference went looking for some words where does across in the Bible and those concordances don't only exist of 4 of the Bible but also follow emboldened of roadworks however is kind of the difficult to built such a because you have to read the book gift to note every or current of every term that of some for of of what sort of poems and was not and and then you have to look through it all and all with 3 called off happening in a style that you can find it in the next over by the 1st the components was made by you decide chiral and and used about 500 months to do it to go from the Bible and critical of the current every kind of interesting works but the idea spread from their not only indicating the Bible but indexing everything that you have seen them off the 1st person to do that was about it published 1 about Bush was the director of the Office of scientific research and development of the beginning of the for to use the you as though he were the direct adviser to President Truswell of higher told the Siam's could influence of the The the education and the Engineering and the industrial development of America as a whole and the kind of had this vision of something that is well Amin remembers the 40 of the last century and he and his idea what if we have a high that takes place peaty when he didn't know about small by devices denial brought up PalmPilot and other pieties had earlier BlackBerry could need no vote hypertext but he had the basic notions of libraries libraries these concordancers libraries glossaries libraries index terms so in 1945 he rode and as safe as we may think that most published in The Atlantic Monthly and the and the heat is vision was and the device which he cope may makes it was a personal or well yes 0 document repository that individuals but also all the book's record communication as well at the end which is mechanised so that may be consulted was exceeding speech and flexibility so it should be a machine that could just 3 a document collection and called everything that you do that you say that you can't of communicate and the selection should be by association rather than by index so I don't have been indexing scheme on top of that the new search for the key words that you can follow up on comes of the conversation so who wrote what and what time and what was the answer to what the Association of very modern concept basically those public and that he invented it in such a way so that was the 1st drawing that he made and it was some of the best was a mechanical technology there were no computer that time and that was the stuff actually films syllables microfilm microfiche and don't know if any of you have seen it actually when you when you go to the library and no 1 2 research something that very old you might actually be left to the microfiche read as a where these sheets of microfilm found that you can put and and the under the read it and then you can see some of the PM its and not for you and that was basically the idea sold to build it into a desk way of projected that actually show you some documents that requires ordered from microfiche and the basic you get through the was a basic idea of men makes and building on that the idea of searching 3 a document became very fast the for example of the end of the 50 and but and moved from the and was the 1st 1 to to use a word indexing units for documents and he then went on to determine the similarity of documents by hominy words they and then was basically that was basically the 1st half of the information retrieval field was job Sultan whose also referred to as 1 of the 1st information retrieval specialists found he was at the time but later when out and about created the smile system and system as road retrieval system that is still in use today very all from under can be extended to test the basic functionalities of information between the technologies so that was very early way it introduced at the 2 most that we will Cecil the bullion model of was in in this month system and it introduced an all new model of that will come next week the actress space model for document 1 of the most of what most still in use today he also consider the idea of relevance taking the you into the Lupin and and finding out more about the use information need by getting explicit feedback on items and how well the match will have cover the and later lectures and from then on the success of files basically unstoppable and the a sea and the American Association of computing Machine array so kind of 1 of the computer science slabs of the you as founded a special interest group information Coffeyville sick are under and since the 19 78 thousand annual conference and fixed they but it even had a prise for the best contribution to the field which was called off to 1 of the founders of information to Jose salt and he was all in the 1st 1 on out with the profits is that it's 1 of the array of occasions where somebody gets a price that is named after himself interesting and the promenade information retrieval is all with the quality of how good is the result of cross that became quite quiet but early stage the also for The the algorithms in information so how well with for in databases the what you do and then you can measure the time although the storage space that you need or whatever so these kind of tickets man but not information treatment if you are a paid for the long run but produces the results that problem that was for the new because the result as such was not takes it wasn't you have to retrieve that set and 1 item is missing at its wrong incorrect but you had to have a range resultless and the ranking reflects kind of the quality of fuel retrieve and the quality of the sound that while the new concept of possible that you should know what is rather than what is not for them and for this and do we have the track Conference track the annual text retrieval Conference beginning of 92 people began to design and Schmucks ways say OK have a certain levels of texts and within this Capa's 5 6 7 2 paper queries and this these documents in this ranking of correct answer to that and you just take a couple of expats from the field and you let them Systems document allow of many Labour but in the end you get kind of what we call a ground troops so the true result that should come out if you have a good record at the close they are to the without better as you get year growth of this was banned by the idea that of stand technologies and the Department of Defense a loss of about operating and it just snowballed because it was only the problem of how to find rather than text to some some period but the and the walls of for example hot to find rather than blogger entries how to find some of the stuff in genomics how to find the staff in chemistry hot to detect spam United old different problems that face the same problem of being very hot to evaluate without ground troops without certain certain Boyle benchmark collection of this kind of what what what track is all about turning to websites such from from the information troupe up with that in the last lectures and in 1991 timbered as he actually invented the World Wide Web and it was by no means of land in mentioned it was just a good idea that that some of these mobile and the 1st Web search engine this the Alky only are could take excite engine about were very closely after the 1st couple of web pages emerged found because at 1st direct everybody new that from from the and and some of the order of the day because of the of the more Web and usually those were only held by universities and network of safety piece servers on the work of the bill for the work and and it soon became the at the Whitney new technology cover the same ground Conchos take everywhere servers and log in and have a kind of an index of the with the growing where he needed scalable techniques to of all the information and the index all the information that was on the way and and on the 1st tee as people were quite clear about what we are told because Web search was just the the idea indexes distributed all over the place felt that just use information retriever techniques over them and it's a Scalability problem yes that's of Distributed problem yes but that all there is to it and that was on show 98 I want a deal that 1992 and 1990 8 was kind of a break it was the year that who will come and go but and a flight to introduce something and highly new to the law and that was basically link structure bling structure exploiting the link structure of the of the way that she was the patron of such a novel ideas that kind of made all the time but service engines provided to rethink their UK and to rethink that Web search is just a different kind of information and today with very complex with the company retrieval and stop this the call problems of information retrieval but search cultist strong up large document that she should not take too much storage and should be Scalable website and every day I had to retrieve efficiently the interesting sites to be false and we don't want typing you would period the wait for quote of them now something and it should be effective so you you should see the best pages should get the 1st the 1st tend the 1st 20 results should be sufficient to get you information I because nobody half the of those beyond the 1st sight of the will be with you click on the 1st couple of results and don't see the need will waste returns this number you know you have 19-billion pages in New resultless somebody allowed them to be just the 1st and still the quality is there and it employs pointy but that's not to some fundamental notions of information retrieval and to was the 1st algorithms how to deal with the between the of the document is central unit of information it's coherent passage of free to air it may be the document like in this book but also every paragraph India might be considered as a small but so any coherent number of words is the document coherent means that the topic of this kind of just 1 topic of random word from from some some some some dictionary and free means it has a certain structure use EMINI Auriemma language discovered and so it's understandable and typical document for example newspaper optical scientific opticals dictionary entries explaining the meaning of the word a web pages usually document 1 email message of what it might be split into some sort document which find all which which makes topic more quickly the document collection that as a set of documents and library and usually called a culpas Copas books of what I'm Copas also has a certain type of and you documents in 1 culpas off the old Britain was suspected stop if some document of a different topic would introduce a new called and a new several called for about 2 big examples of Cobra Ahmed lined will be obstacles covered by would use which basically are cut for 4 off some of Reno newspaper that even when the road the call and not a very good year for us usually a kind of cult then we have the notion of the information needs the information need as the topic of all of which designers to know more
That's a very difficult part of information chief because unlike the database the you don't really specify what you want and why don't you do wide and his specified you could be some more exactly the promise free but you hockey no PPL would you looking because if you look no went he looking for it would have for the solo manycore race that he posted information retriever engines are not of the navigational gives me that find airline the file number and the then I'd just want to have to look up something that most often intimation tied want to know about some topics on all you what it means are a when we when we looked at the swine flu did not really know how the exact type of virus was caught it was totally new book now and all but 1 of 4 that this is how Ohio information or a work here very un clear beauty and you have a very un clear information the wanted get documents that explains something so you might get a 1st into a she and what level on a begin as level so in easy would that ideas scientists working on the island of swine flu medication and you might be you might need a totally different documents you don't like the he entry for what is swine flu because you know he's been working on backsides for the last 15 year saw something that nothing new but the latest instalment of the book adeno some some medical journal saying that we had a wonderful of success rate with this new medication Lawless's treatment of swine flu but that might be interested in see the information the does not only reflect your theory but it also reflects you background you context talking a computer model that not by you feel interest and so we have different kinds of signed statement the for example was a capital of Uganda it's easy to say and and in the world Figure is really true that made Alzheimer's contain whale meat and the room was on the way to a fine so that was plot Computing long term planning is all inflammation each different the 1st 1 of factual questioned is a 1 capital of Uganda and has some named tell is it really true that rather speculative other through for that are the just a bunch of assumptions of 1 of the PSNI series that people are always have was companies are government was out Computing in walkways very which he won a basic introduction to cloud computing you want to know about the economic impact of cloud computing what would you want a aiming at the end of the information needed 1 of the problems of information tree and only get used to determine what the information need of the view that this is really the tree is the communication with the using the user want something about club compute out 1 capital of Uganda and usually do that in a period in which the easiest case just he would fall bunch of here you can also put some some some of the information about where Singh should approach that it should occur in the title the document although it should Oka needed some other firms so on something about and the need to Jaguar about not Animaux you know what I'm mean the car brands actually of the FIA and the and the Jaguar car companies so out 1 document combining the both some all and solid these operators can kind of the of the search for basic the it comes down to some all getting the the retriever engines to grasp of the meaning of the word grasp the implementation of a difficult thing to do to find itself and was so difficult because rather than is old house you want information that is resident suspected effect get the keeping the entry to the cloud computing questions that might be very resident either get the ramblings some Professor of all the new development and his greatest inventions on cloud computing that it is probably rather than a full well basically because the public is held computer but it is probably not relevant failure because you want to know some basic about the sub-grid and this is kind of a strange concept it's a subject that comes this no relevant for the event as an object if terms that you will the you that you and something as rather than if few perceive that as the and usually if it's kind of a a fine or concept something is rather than all not but of course it could be degrees of freelance as old as very relevant all this is not as and found
It said its very hot to fail life what rather than a king of that different notions of and we will go through them when we talk about some of the issues that are not having come with some of the fundamental notions want to go into the actual systems and presented models and then we want to go to the ring in the new Ian retrievable which is the easiest retrievable so every eye system basically has a number of component it has to have a theory that has to represent the Tree some somehow to make it machine-readable Machine understand the book on the other hand it has document collection usually very big and has to represent the documents and the collection of either by indexing all in some other level away and there was basically download the representative of the document collection and the representation of the Cutie as 3 compared somehow from the comparison you get used to the rank results and he delivered the result of the user and maybe start feedback cycles and is that what he wanted that rather than for you and then the you the may say yes but tell my information I don't know what Tony not worth looking for and rather go into this that much of this basically the circle that is used by information before you all the time bomb and this eye are more Pope basically has certain component the reeling which how to represent series how to represent documents and the ranking ranking function that kind of associated this goal was to create a document hollow much higher well just some specific document the tree and that kind of the basic off the he needs to build a retrieval and when you may still want rather than the big engine of this kind of the cycle with touching rather than of the resultant inputting that was Creery refinement to get the better and 1 of the representations of his most famous and direct this brings to mind is the back of words representations that upwards of basic 3 means that the document is just a collection of work and we just ignore amount we just ignore increasing that structures of document which focus on the individual terms so that a new because those of a similar to the individual terms the key words that uses injury from their basically I'm no from some document that information retrieval the term information retrieval will cover the Austin in this book he and that might mean something I'd know that the word Biology Wilker very rarely in this book another term rocket science for example may not Walker in this book that level and what may be helpful when considering the SOS that cases that you just have a of February set of all words in the document that you and sometimes you also count the words and you take a multi set of the words in the back row for a moment and just say OK hot under some term also very interesting for the basic you do is you take the document that 1 small step for a man and a giant leap for mankind and that well was contains a country that ones small fell 2 times they to time and until the and this is a modest basically the representation of the easiest representation of the works model but rather than Commons for this smaller well you totally economic were although it Italy in all the grandma backing W wonderful example have their small document saying this document is not a bad rocket science how the term rocket science operas so it must be rather than fog about rocket science that it's not the way big of what would ignore it was that it was not recognise not rocket science but rather something that can be found they different document was respect to the grammar and expressing executives things that might have similar representation in the back of work well because you using semantic and you know everything that is kind of visual clues to what document the for example the title of Keywords and type might be much more interesting much more than those of during some of that in the last act of the document but go and other related cost of love life these are usually negligible on the other hand you get a good mathematical representation full of something you can build on and something you can work on and you can start individual terms and retrieve them off to a very efficient way because you just account found that the number of words and German basic the patient a given another document you just take the same number of terms and that the new terms of homemade terms with the total of timing most documents the Indian germanlanguage can be written using a 100 200 thousand term February real also and you just represent the the document was the fact that and 1 of the best advantage is that it is actually works quite well so just using the simple model already gives you a very good retriever true and and a lot of things so I basically do it is to take a look at it very basic but there but terms that somehow Walker in any of the above you called the helpless for the bill and you take all you documented for these different documented and whether some workers you just at to the victory over the different in some for example that want balls before meant for mankind 1 of current of that 1 of current of 4 1 4 current of for the Royal current of Tyco of this victory described the document in a very compressed cash in a very easy to work with cash you want different document update Thaicom outside a small step is a giant leap for China contained that contains 1 contains more convinced that the contains Tyco amount of
Different way the different document up and now we can compared the victory instead of the document at the trick of the because they of work model basically comparing the victors much easier much more structure and comparing the documents and victory at the meat the abstract from everything that is grammatical word although or something like that in the document consider for example the talk of finding out what about what works Walker in both documents was the collapse of the documents from this easy he just that device comparison again and then no UK this does not work this does not because this is not where it easy to do during on the tax by have that led look at whether than that no has not that than have won match look whether the London not and and and small but he would is so this is 1 of the of the country and it very difficult to do that to a large over the selectors and this is what a good representations them but if you put together John too much if you put all these Bektas together as rose in a match with the documents and whom the human and the and the human and it's what code term document method of the incident that matches on 1 side you have terms on the other side of the documents painful for the match and we will see that you can do it not of stuff with metric that actually to impressive is so the term document Maddox will be 1 of all friends for the next week's to come up by the cell without 1st retrieval Mollo and that goes about this nightstand demand earlier this is good old full time and a he was a member of 1 of the 1st 2 2 2 2 Define a mathematical Logic was the abroad for logical or combination of Logic of for positions and actually the of bullion retrievable is the simplest way are model there this way take That set of works as document representation and you even economic all the number of home anytime each word is in the in in the document you just 6 Walker's or does not Booker and then you use only and collectors and full not to combine the victors given by a ranking functioned those just 0 whom to work of current would and would be only both of her again ranking were where the of what if 1 of them both of them do not offer the ranking Billy of 0 it easy
And the and the retrieval is based on the membership in the set in the word of example find all command the next by the way mankind how do that accepting the just take the respective column corresponding to the word got on the document in a row that the 1 year term the document easy retrieval and find all documents index by the work of man not the old mankind between term that he you just pick the troop columns and whenever there was a 1 in the column your it what would you do with fine or document next by work meant end mankind you look at both columns and when there is a 1 in both the House and the same time that 0 documented Cesta operation 1 end 1 is 1 1 and the road is 0 this these are just the Mamoulian operate as less not very very very heavily dumb time if you have that kind junctions those columns after we 1 2 you wonder if you have this junction users of the columns of to show a 1 to come to retrieve 1 and with the not the negation just let the but that's very efficient to do that very easy to to work on that make example where many small documents the iPhone 4 that anything so document once as step mankind and document to assess the China of adjusted lostness bullet cyanol just the that workers cells that and MannKind we just pick column look at it off half were just pick the column look at it here is the 1 up and 1 of whom and he is 0 don't make APEC very easy to do some singles for the or kind only that we don't do that and here the all the all MannKind both contains step that 1 1 4 1 1 1 0 0 0 is also 1 of both return everybody absolutely amazing easy to and from the air inches inkpot as the negation because the negation has to be safe if they were long quite document that do not contain mankind that of them where mankind is a 0 of the loss of human will come down on it if says something like all eye want them full of a document that contains but Neil of strong but not to mankind that might be a little bit better than you 1st cut out all those that have the neon strong and the and the kind of complement of its to match natural language that usually you don't say and not a you do not say and not the end of the day and not be something that that you say that not all of step but not just language just the same operate as and and I'm and you can also use off very often done so you say about what they see the want to all stepped mankind China which basic to translate into Iwunda either Stepan MannKind all 7 China all mankind turned up and they see the same thing just abbreviating way of saying UK and in any case we can processes furious with the and to speed up we used an inverted index has a very 7 notion of information retrieval because if you want to know whether some terms for Kirsten document it's much easier to look for the economy with a told walkers and then go for the documents and by savers and usually for most words you experience with very many nights in this column because I'm I give me or take the weapons give me although documents that contains no idea Christmas party pudding can do very little about human still going strong documents to help much just to find out that their 0 that take those documents that contain a want and put it to the idea that if you say Christmas banque and document whom document to and document 5 and that it and this is what is called an inverted index I'd just look for the worst and for every word like in the concordances right before in which documents does the term just get all the document but the does not of her account of can be pre computed so very they want to go through the documents to be of such an inverted index and of course the trip assisting is very fast especially if you want the document because now you can say what they want and the documents where Christmas pudding and whatever Walker's enough cyclist 1 list and look for the document whether they are in the other less so that the votes of the Fiesta step mankind and step China than the inverted index would read a bit for the different terms to i due as an entry step meant China and then that is of documents where this term across the the UK it everybody knows the idea of the inverted next to a difficult time feuding such an inverted in next ambiguities of the type sure we document containing term X can be on the tricky because you just go to the term and he I just to reiterate list of easy Esso University in this section of the said can be done very quickly and this is most of what we amputees are about and and all union and and as a result mankind and that means you take the results of mankind and you just intersected was the result of step then you take the result of mankind's and you just union it with a lot of what you can do with all we an expression of this you can convert into a conducted from a home in almost home while the distant not be just consider where the and the loss of the 1st to be and which part of the bullion expression should be evaluated 1st conjunctive almost all of the fans should be considered as a last onto the disjunctive almost all the and in the end comes should become the 1st sell from the sea is a time for positional for me low incomes and the small is is commandant of losses where every clause just contains this junctions of the tree the of the type that I'd have some they all be all seat and died for each is in because you 1st at the disjunction of literally and after would you put the and anti any 40 and for me that can be converted into the contempt of the just shift around thinks so that some of the roads like Eddie Maugham's rules that tell you how to do it actually just an algebra you can translate of almost of the fresh and was this job is not for me to see the other way around the brackets containing all the way and be law along the and the something that 1st you do your conjunctions and then you put disjunction and as with the conduct of losses you can also transfer for any opposition fall below any bullion expression into a disjunctive not so rise in plans to have the conduct this time on the road lack of snow comes 1st supplies of that the interesting part is that rearranging period actually has something to do with what you evaluate 1st in what is the cost of the dating certain separate his all that with the them with the conjunction we had to do with and industry with this challenge with the union the cost of the unions in the state may be different sold sticking to either 1 of the moment for the good idea what can we do for example is where some their debt and China and type not all and we can't just translated into something so for example here we need that kind jumped small most home but there are some expression that has to be evaluated 1st and contains and this not possible Umaga tells us that we can and with the overall all into it so we can do China all man and Tyco not all meant and this is not a problem because it not combined with any or to sell them we step and China all men and Michael not all meant the their power on Wednesday that first hand after Wolves everything is connected with a man who can do the same thing for the disjunctive almost all the time and we have to to do the and pop 1st self stepped and China and type of all meant for use a same rule of 1 of applying the all it into the bracket what does it mean so China and tycoon out and step is a possible solutions all man and step is a possible solution the side of the 1st the old part of the simple idea how you can how you can well fanciful bullion algebra and now why is it so important not it has to do something with Efficiency because if you use the Punjab to from for you computer unions 1st and then compute the section for the end with disc jumped from a for you computing the sections of the US and then compute union but in terms of what you get as aggregate results union will usually give you a lot of result in this section would usually give you smile amount of results says you do it like this he would blow up the set and then trinket down again if you do it like this you shrink the said from the beginning and then load up to the extra without the size of those are perfectly equivalent to the results that will the same but the intermediate results that you get from every operation will be lodged in the conduct of mobile phone pink and small in the disjunctive much more provide you to have everything in disjunctive moment and not from the UK and this is operational fluctuated while that about how the William retrieval model of very I to you the and bomb you can retrieves the subset spies you suitability bigger period despite cutting of those terms that you're interested in and if to documents have a different representations you can distinguish between the because the they use the distinguishing terms or you don't that means he that you get 1 of those or you don't and that of cross is rather theoretical because you would have to to know the right period for distinguishing documents that 11 on very hot to find the good that will show off the advantage of the Tree of her on the other side of them of course but we have a set of results that is a term contained is contained was not contains but they don't know ranking with the term is not more important or less simple is just they all not there and you might use even the get empty results so if you put too many terms in your 3 you might not have a single document containing after they would kick out of the results said even those document where it has a kind of this is very difficult all this is that this is very difficult and I'd just over terms except for London you keep those documents you don't have a 2nd great document of the great the board irretrievable of and not deal with the and this is why we controlling the results size is exceeding the difficult because if you put into a mini creator and you end up with no results at all you put into it returns he will have a lot of result was a correct number of said into the and a good results depends on a call put that you don't know this no way of finding out
As said celebrity is not the point of either contains doesn't and that's kind of like the most most of the funds interest still if you think about the about document that you with some of the power describing it somehow enough and usually cash you will not find the stock in the belief in the free them we will have a look at document between most later that can also be was the APEC Everett fine with William between the bed and then we have up last died to a fall today the quickly European and examples of information service that actually heavy relies on Blue inquiries its called Westrail and it basically is the matter of what it is a Medline line all few as law so it's a really good research service that covers a lot of love are really publications in the last song the and Robert demands and comments about often used a visible lost out so everything's index using especially indexing system simulator to the mesh system and the UN to the recently they used research default method for using documents seems not very much and but it can be very useful in full flow said you documents so I'm really would like to show you the best side by side using withdraw cost of money and we don't have a subscription cyber some example to sell at this zooms some some such search engine expert from Google goes to Microsoft and and will want to find out how they can prevent the for my employee foam telling Microsoft their such said engines secrets of some kind maiden each but all you might have and then the Greek could be yet it to talk about trade secrets
And in the same sentence as the words disclosure should look a sort somewhere that stops with todays condo's so disclosing disclosure discloses something like this was on the same sentences were prevented and somewhere starting with employee employee-employer employer and will use some of this and that and Western which runs or documents that contains for these walk of conditions of home for terms in the same sentence so if you are a review really know on holiday new takes written you can construct like this and find already and documents that you need to know about it and not exiled is ideas information needs requirements for the sale of people to be able to access workplace and Greek could be look for air documents in which terms starting with this a occurred in the same paragraph as the term starting with access which in turn occurred in the same sentence like that has been the way it works side of the road 1 where says The end of the document that it contains the term and the long and the within 3 words of the term need so it was a complicated to build discrete the trust me if I value of the euro have learnt that about the new work and and do what the new wave studying citing hobbled that you terms a used and Holloway has been sentences of it and AS now sometimes roots Holly's this usually is done and if you know about what senses to expect you can be the kind of free and find exactly documents describe time would you looking for a sell out to usually some there was some kind of study default on the previous ask on would on unrest long and and effect
On average a creamy to withdraw contained tend words so of it's really really complicated and it's not designed for all for used by by abide people and not nothing about law but it's designed to be used by real expert of defeat to sell and the reason why the use Bhuiyan search is that professionals really want to know what they're doing and what's happening they are and what documents on return so zoophyte typing some creating Google and get a result but it is now idea but where this without said comes from without this comes from and other know whether there is some really rather than document on page 1 thousand on of Le don't want to look all the pages with things that might be relevant in some way that want to be sure that that was time to really matches created and so after a control over what I'm critics and really know what the systems doing this can be very important off when 1 funded with 2 simulations because time you in this case last so that means that some some quiet are all some some cut that is quite high in the hierarchy of make some decision on it becomes law and went as the situation at the 1 decided in this called occurs later than the job Dutch and in the 2nd case should decide exactly as the decisions in the 1st case of its reimplanted to find all the same cases and this can only be done if you really really can control of looking for and we can be sure that you followed every documents that match decree but not get follow document that are slightly simulator but you have no idea why we had the opinion that this might be so and so on but the site is a goal that being some experiments compelling Westlock queries with systems takes queries as found will and found the result was that usually found Critics priests are compatriot result quality but it really depends on what would you what you want to what you want to do So on currently West also supply of pretext various and but this can be a hapless some cases to get it all your life you usually if you really want to go very deep into details of some from the below public and I still need to use it she said that research currently on its useful so accuracy in the next lecture will take a close look at some to the model that are some from a more modern passion and that really come close to what we could be doing so far the tree look unreasonable matching the backed losses in used in this not Systems so that aren't any questions L and would like to say thank you and scene extinct