Representation II: Automatic Indexing
This lecture gives an overview on Information Retrieval. It explains why documents are ranked the way they are. The lecture explains the most relevant ways for content representation: Automatic indexing and manual indexing. For automatic indexing, the frequencey of word is of special relevance and their influence on the weighting of term are discussed. The most relevant models are introduced. The session on evaluation discusses new metrics like the Normalized Discounted Cumulative Gain. The session of information behavior provides a brief overview and explains the relation to IR. The session on optimization mainly introduces term expansion and fusion methods. The session on Web retrieval is concerned with the quality aspects and gives a basic insight to the PageRank algorithm.
Keywords Information Retrieval Search Engine Technology Information Behavior Automatic Indexing Academic Lecture Retrieval models Web Retrieval Evaluation of Retrieval Systems
so what they stop it is the 2nd Test and the 1st versions comedy indexing said automatic mixing number 1 that of are good for the 1st classroom mentioned the basic to what we do in hours we call words and were lost their last next summer for about how things were going to with a month to their all what all of the union would is over when you see 1 word 0 work and which occurred on public but also worked for the next time you going away the what happened before the whole of the British people for Cray so all of
their 6 days we could save the cost quiet that things like to the cost of the man who indexing with select appropriate representation terms all was word song `abec some words and use them as a representation of the text of somebody interest eastwards the find the text of lively order book and on the Web and so far it looks quite similar but that was a big differences for example being a man relating the use of full problem as you have a regular will do go with work assignment on where this is really a typical for on and 1 leg in the Open and the US and his basic if do that in a few scuttled covered in automatic mix of all the tax cuts occasion extemporization from what could be other differences between now and then automatic index think 1 of the big differences with mentioned last time cost huge differences 1 uses an intelligence and no just use a machine but that's all because of what are the differences in the way to over the Gulf or example was some PPL again on some but was developing and Load Applications or something like that and we had some indexing terms which index in terms the treaties costing class which could be appropriate for buses Raymond labrum on expectations in the state in the match case and ejected that all of you might be 3rd of cost interesting British until the like the quality Quentin trade off we have made a many more words rooms before the automatic disrespectful for 1 book reviews in terms named for the main excess rippled want it's a place where the bookstands unless you want to books including the cases so some groups and enjoy some items on the programming confined book of the single of America the but went to work for them by the end of searches in the automatic system for a while but document for example with a large number over terms by seeking users selection but almost not really correct because basically any turned that is in the document becomes stating changes representations to by confined to reclaim lawsuits deceptions and for them takes the document with a word that is this talking also find a lot of all other clubs so that the big difference was that he made equality will be time to go to like it too much people people discussions about it on the in a way that many of the scenes much of what we can to bring the details of the comparisons Cypriot where have to soothe basic to most large scale BigSim start automatically anyway even if it would be worse does matter despite the slopes of the shoes that they said there were manually mixing still being done so what did he in order to local and 1 of other estate room where the developing and applications in what was 1 of the terms and we discussed the was developing we said developing might not be such a good terms it's very general ambiguous we could use Programming made by the end this is that taking all Cavalleri developing and would Applications when we talk about the developing that means Programming went public this event and the 1st thing you should never have been made 1 Programming that would development we don't icon assign turned programme that might not be in the book and all that but images all wasteful developing cold developing programs and and from context developing means programming and after right programming another the next are could say while with a minute to go in we recalled programs and it's not Inditex this is something that has never been an automatic mixing allways selected said in the document Machine Eugene can not Shinko come up with the terms of the public think so these are the 3 main the Frances if we talk about the old comes out the results and we don't talk for wanted the it's no cabinet and its the index terms or from the text for from or can be in the should so far to be
reaching a fine loans so far only 1 society and the staunch driven on we have all different terms Rossi find something with the terms in which all the things we were all the fixtures for the full text combined with the a couple and their of this also different ways to also something new we can talk about how to the search model for cut such while the we have been usually when we talk about men remixing that is because we have so few terms it's because we have the future and what the world indexing while the celebrity have remember from the introduction classes to different main different families of small but Jay alleged the FA meanwhile that was what was except that 1 of the great museums of the world game and some of it is this which will follow the details but they should know that from the glassy like something that will match the exact match between the 2 representations turned for full match of but this this while they can be appropriate approximate matches can be the perfect match that something between 0 1 1 of the perfect and not match remember this should be still their from the introduction class and Web search while do we have to be clear when we talk about a lot of mainly mixing would is a search model of all library to supply bring would is a search for the mental model of Web search engines most obsessions and used everyday the used Pashen maturing segment DJ coconut decide that if you aren't sure for can make a decision confined out or for Manchester of cost much agenda as part match was the 1st to accept He of working with the but it has all but been goes home which turned halt and us what those a posh match model she had although the results presented and Web search engine MusicMatch motivated results said it was a perfect match all the said to get on to its will its an awful matches ranking that took 1 is better than the other based on the work concert succumbed to sold in the UK will be himself or the ranking being 11 automatic level attack Malek such Systems began writing or not such thing web-based and presented as the 1st of but but reason number 1 document you could say that it is ranking it's not during King it's based on but even looks a bit like the old days of the document at the fall criteria but said that to get back Saudi by the date most recent books are of of the public to the Pope positions and for the Web search engine a could be quite off and all the other 1 difference but to bury going through the
1st but the 1st time that the new wave the such a part of many is so called Edwards words approach with does it mean we take up like a bad in all the words of from the means that context his picks up their not being the regional context in the city said that the richest all thrown together were found her at the end of his time consider room for the sentence the context version and what can be from the for example 2 room and 2 miles from the to the of the World of in the week I should be tried by the document 0 in the last 2 months the Wolves have been seemingly does of this would be a perfect fit for my Creery from the 1st time so it's a good is not will not will want her to be exactly the opposite of because negotiations is not to start off with a stock of its revenue for the using gets it sold mediations can not be as much a good example for simple in the context it gets lost so sent UK semantics like occasions Noble's been seen on whether the case is that stupid 2 this extends to dream of the negotiations wouldn't it be better for them but we just simply and food prices Mitschelen which that the main reason the technological for for mass footage data can find all for litigation in which to do it just to difficult Sol all
interest across a these reiterated limited we don't look for negations anything like that it's limited to the Mexico for logical they're language but commentators say it's just too difficult but now is not ready for this mass you studied for for some kind of all the teaches the current state of the Oct in processing and people talk about costing the instrument accepted tax but this is not difficult enough for the rest of the for example the Poching approaches been in abducted by part of the winning goal from 2 thousand 5 assumption that and the deep analysis only covered 50 per cent of the sentences in for collection and means that the fund would motion for half the collection and for the other half but they could crosses see the interpretation of the laws public but also wrong in many cases were so cheque this Computing soul bursting just is
the next the shown reserve cause the nearest across a single on the protection reminder that period and we don't of violence some all of the low for the year before for all work now which wants to be yet when we was restricted stock in his life work is done the book by pictures of the 2 would lead might not the seller work of ideas several seconds users if uninterested in the south of was by some of the good as well but I'm I'd 1 something like the disease should be there for all to see what sounded like the sort of Road lost all the year over wrote the most interesting of guest point of all these to the st we were worried freeze Coventry's usually it would be easier with those of all the other injuries so that could be used to review by the work of the Germans and the leader of the 400 strong onwards but it was the 6th eye on the ball and the ball flew to the US or so the new from was a part of me was the 1st woman Britpop history of phrase of it this is what you get is a worry what via the world of the book is only overuse regions use but only if you have a 2nd for the sake of the mix up over a a move for fuel lygo Street where you would say that the the full provision to get by on 2 but may be not something like that you got a lot of the ball so we should only get compound the fees all you were using the head of the EU and that it would be wrong for music changes would examples prologue contrived should be gives the baked ahead book but Sisulu we are ready don't use the word is that we are used by all the morning by more than where was the ball but the question is spread across a you have to have a far bigger that was the thing that would make it is basically the decision about how all this the application risk of exile 1 of with the disaster He said called wrong as well take those may say this is a pretty only before up to date the news reserve rudely by wallowed on gs we have no single of all on the was also the but what the idea of being Boschi P all the talk a version of the space you qualify as the of the old the whole thing for 4 a lead of was about to end the year with a view of the that are is the related to the sale of the whole group is due to the birds were also is used to the pace of images of all were the facing the of what we do in ritually this week try to cut down we keep both parts a and I hope that in Germany civic each will improve 1 0 5 to 10 per cent what you don't ask we have talked about which accepted in 1 can of Nomad would decide interested but that was about all this and the other in the eye the with this case is that of a woman from a useful loss of life is thought like it is that it is because real it had but I agree with only find those nations even though the person will get some of the terms and a overall in the whole process on average most uses it turns out to be that the distance to the following a will find also off
the sole where about what this of word really and the ball was going by some of us think the way the view that the words are the right start kind of space we were part of the business so that uses base not 1 were going in the right place for about this so before you were separation was last few bottles of was the not to be the day said from the very not 1 off well they follow up and the rise of the far are Left and that there will be a lovely would also be a review of all the big on the outside the and still users with black rather than that of the assets that would sell in the bed in which the new carrier would not be sold as a way of used saying that with all of Britain on not overnight yet is work it would with and in the end he said some something of the the many reasons for the fact you can get some idea of the this part the worst of all nyse critical more so your that while there had been the site will also be on the role on a number of the role fees while all the while we were all saying the use of the replies as also medical application so looking for its what where time worse we have is already not as easy as it seems when you can see that she of between were sometimes and the United you don't have placed the by were caused by also even more but these they easy but still absolutely abbreviations booking so says the begun with off the words of cause
ideas he would is as houses step called off from all in for all the Senate not but means of them fine also to use the proceeds from the sale the for but they don't really change a meeting of the worth itself the worst the worst the sentence that make the make a big difference but they don't change so they all brought back to the same 2 before
that was to decide what this really worry the and compiled its phrases like a Red Cross is 1 where the 2 were not difficult and will talk about it later
and then comes a low
cross baseball worse and those use of beams the other was of her on the job want the sell but now they died in a suicide because of the use violence by the ball of the flat of of words Turner a new definition of what was it's not necessary for for use of the word that said that the Asian from a uses point of view I would say that not 1 of them was in the right place on the edge of chaos for all shoppe because he says the goals of the end of the year German the you are particles in by the society by the time the last gooseberries of 1 of the highlights fine but it was close to what you only which she the house where they enjoy the sort of other but use of the user not the way to determine the cause of another fine which words on so called was and mean are you mean they did from the index the adult Anakin of such a ban but taking them all to make systems more fish and were fictive also the highlight determined stop what belongs to stop for a chat by see see below also single various wants to stop the we have also Nilesh L of this part of the across we use the word well not a crash where the book is so near each he and the rest quite right too why was carried all the time we were searched for them but also something as was 1 of 2 from the English when the 2 room in the standards of a crucial thing except that the main thing where they are minute with search for them because they so low that they don't continually information because they many Textron favourite Damian German text Babel all paintings of the 1st searches for audit of be by will find all of the those Eriksson as soul we could make that they are not the 1 text so or could be also defined this rephrases the Midwest strict as you think they think they are very frequent overall right period frequent words and is that the division the with the use their by also and and was the last of all the the via the as the 18th in the world exactly how the game should review of the system being made off with the West which would stretch of across the board not to think about positions in wenger hundreds of millions of bottles of but the book also led for example we have a patient who has been 2 pm program pizzas by which all the fun of it all the talk all the polls in this community this community I'm fine with the words that are not for example the lack of defined as they are also over all that sort of thing multiples of programming with could be frequent words the programme called for job of something like as the foreign words necessarily but could be the new trade was that it was written and interesting that some of the issue but the day legal terms from this community time so there might be need page might be programme called and then this that on the day of also turned stoppers for a special only but does as a real of namd at at all but 2 the of 2 4 were to could this is basic to
all approach we never stop
August for German please see days of all particles a see you there for those on low just because and it has about to find it times for law said that lack the bombing of the the British public but P not all of all adjectives assuming something like this on those in this case yes because it seems to be very frequent but may be not something like colourfully far because something for other something paying the not so frequent and antiques humiliated depends to depends on the frequency something like shouldn't could be stop would because her frequently used but it's a decision we have to make redesigned system the debut of the week just accept of the standard sub-grid is like this 1 he published by attacks on top all the 1 that is the budget with programs to use such a scheme would be a be using we don't worry about it much but it's so that look at the case for and against but the 1
that 6 tonight to be reading with these words were busy making difference for example of to a the of the failure or against the use of all the time you get far in Java German through can be viewed in the world and the world would be the result of the cost of a degree of the system of use these stop worrying that sort of thing which is why I was not against the law in favour of the fallout from the before race for the of the direction next difference what happens in search engines defy typing the stop were accused said Wednesday as it anyway S a wooden what in like for like a good Louise is full of the talk of the might violence for against what just time but they were not the use as they look to the area but the head a solid slow these days earlier in use all round the world where he was going to be and what the law by but it was the studies for result what is that today Over the act in the last the has been in other parts of the NHS which are at the scene of a murder in each of literature is over well as well as those who will not sell it is lower and this is a stop said the 1st of which was the real thing the some reason I was something about the the euro but in end in there is something about the but in the not as long as those offered by other clients and Jewish at the last what was more public Processing 1st is not as if the continent were named or is it a wonderful and the side of the road or not and if I look at
the reality it is quite interesting to look at this year at just furious stop with few being made in the polls every month right not only these case television Review was some of the side for this all this the this is exactly what the bingo where the of what sort of new ways to reduce the boot a suing the will of has not which so they don't do do what is written in every retrieval textbooks publicising in their just
don't do it for you by also find wrong in the book 2 the beauty a should find anything should only find something about contract should find sole why so
that so that they in before you make a decision he said of the city in the world so that would be interesting by US works by their away the other half of all the documents of the collections to use much less computing power of the storey of reduced the which uses only 2 of the worst for their journey to work with an obvious White who will not be part was 1 reason this we have discussed iterating Nady's
abbreviations for all in the same for would you in the for example the United side but were led by the United for the whole of the of the something called the best 1 reason not to take out just upwards walkability since with the rest of the world through to the wall with the of mesh that might think well if uses music that period but if somebody we wants to search for a stop what added that it will Ferdinand uses and of but it's something edition of some of this information for the public Institute of other reasons 1 of them for treason think of your degree it up all the works in several country's will be written the the while of the bill to the White different from that of all the at low grade at different from the job it's all over the city the jury saw stalwart of the vision of a new video of those via were for collection of its main aims to test to the buried at the extremely the shorter the tea in French something and but all that's quite risky so abbreviations while seeing well in the and the result was somewhat it wants to stop order between might be reasons why with other search engine has decided we do it different as it is called in the textbook Street include the this upwards the
reacted the news of the eyes also index has not only for the realisation of also the ball past his title the
doomed it has also been being the same period the same time you being recognises that the UK was also the place funds and offers fulfils it is also read the called the German about the see the few words pronounced in front of him to have not only been the also French place because thanks to the magnetite to all credit to and the of cost could also be being made in the German for example we are all the phrase as a whole is an 80 entity which contains the fight eliminate the into are to fund the drug 2nd later for to recognise as a name into 2 so we seemed that this allied booked typical text which we
find a lot of the issues of the day the lesson of this in the public services to the legal systems for judges of was fully the professionals that they could be include for against the dollar index we do not eliminate because all clients and I'd really be interested and take caught truly over rooting something over ring another case might be quite interesting to my be own from around the could be really crucial for me by the and the case of made in the case and there are plenty in in and you are all German of words in also German Optimizations the each medicine but it wants to search
all so interesting cases to see the character of a talk what to sing numerous and now we have the abject this in will be interesting to begin by the media widening see there are created the and widening and this was quite some time ago Google is perceived by the fight for the the year of way widening the given the Rover stationary said the order was made clear that should be of the uses of this case it invited me is different from what and see but it's only a single character who just upwards this run out you can find difference between 2 but smoke and quite
interesting book now for the rest of
us for a while what beaches about status on my production of which wants to stand down troops times for the 1st time annoying witches the words unchanged based on their functioning the sentence and the more I want to change the appearance changes you could say save and that is something we have to also be where he will become the world's will come all the ball home to get this is not such a big but from some East Asian Languages the some and all of a logical changed by the
German it's quite a big job but not as much as the which is not such a big issue for the media group that scene in which a German enough for selling which is bigger for example feature the area and that much more of all ages German-Soviet to to retrieve for these languages and moves has created a example with hotel interesting I'll supplying the true cost of the whole worse but nobody needs will tells for the same day on the singular also and given the rural and will be to keep sexy now what happens if I'd
typing cool will tell when you look at what should be the same height if we can see fit to truly right but basically Stanley should work that deadweight at all once brought to the same for example the German
cases of boos from marinating cases Israel's for cases but cost the words unchanged really much soul and we don't have to prove also cost with big change is not what was then called who is a single mum who were in the school in the of 1 of the is said to be reduced to the status of this and the and the found the fledgling means that he will be reduced to a few a blast of my life will be the sole saying if somebody to is why as if all these words of cost you get on behalf of computer to go to the publication of the restoration G that all maybe not but the cause of Tuesday's publication of vision the meeting a change of it may be the use of doesn't really want to see that
food named waste to write a book tried tool to get all the medically to main differences 0 3 could say it should be different 3 ways to make 2 to stay for German called the voted of sold by the fact that they in the for most of what is wrong with the rules of the game British out of what changed by using the new ball and the rules based just how good is not as the best in the division uses on the table but the and though this is a way of getting policies exceptions exceptions to begin German millions of Egyptians was a really difficult worth false the freezing load of exceptions of what was going on when the mostly Shi'ite were less of a culture they call up trouble quote as was the case for a show of the year which was used by the CIA to use out of their what is wrong with England are even the journey also English scaling the sailing then points some Germany's 11 also with the well changes there seems to be a lot of these the early some old were all members the of the language regular so overall we were the trauma of the and then there the are wrong conscious of the need something in the that that is what works like that would have been the works at of frequent are the use of language the old so that leaves a lot to say and then we had a lot of Paris when so which from all over the place is a well but we know what we have all these were somewhere where they are going to do the rules of the World also face of glasses in old languages let's just the right in this of all over phones and we have the knowledge and the computer looks looks it up and we and the word from some a safe examined only move have lexica entry that this is owned that the last season of you can find it because you want to Edith a difficult it's just who was being dictionary and release of words with the words This is called table sold their as full use of all the flexible or and their stand the now game is also disadvantages which ones each it that's quite a long history the a 100 thousand were and German 50 files to be reduced to 10 thousand stems so that's quite a lot of such a high kind and when we were asked to begin with they may we have made the most the most 20 rules apply 20 was is more time efficient will computer them to search the use of 10 thousand documents each time I find anywhere in the millions of where to begin on collections of this quite confident of different disappointed a language so static not really if open the newspaper we find you words all time that are being made it to come into work to language might think of something like except that would be used to use a goal now knows what it means to be be the lexicon and the all time so we have to write you table salt and worth getting very efficient way of of the place is full of appropriate lawyer useful the both systems used always be ready for as in the N Brown said something that has been called in the Introduction range the standard that of the American disadvantages competition expensive and it's never complete dynamic but
appropriate for a some foes limits Ingrams the top ranked
resting examples also cause of reacting to her from the very best of several changes the of all the news has full figure that she would not was because it for example with although have is that they were which but where does it all was very ill and he does it all was change 1 of the names of the great views of the sea that should be reduced to to think that they are a very tricky for this in the middle of all the words and those Mosley should not be reduced because of the different from the region the burned of costs were
also have a lot of regular reduced the temperature moral sometimes regular which and and in India and nominally and that makes the club and that is easy full-system to make a rule that turns says the wrong the group's is that we have different ways to create now owns person but tricky found changes remember were but were on the wall or in the even tricky so lexicon on computer Christ storage and the based has disappointed use of those many many exceptions can from simple but forget look at the Royal this
could be a very simple rumours in any German for them on what does it do please pseudopodia 2 what was the rule book the row over the filling playing with the claims of the of what the word is not for them s there will be chaos this throughout the world meaning the care and the last show because there a cash that the last position might the 1st was last while he was being back in the best in the league Richard before being made we use the changes to 0 that needs in the word to the word is a read like take off the away that the so it thought He had this would a supplier but solo than for the 1st game and we will take away the last to the end the mice and who became the 1st fine hat and applies to take the last who carried does away my worth less than that would be the last to go 1 of them would be on now has a confusion potential contributed from also work interesting and checking that such and confused as and of cost now we have these of the time is on example when its the habits of a less than full words almond my almond judge cost more on if you want to do not apply the rule because it's for care of Logic for woman OPEC so this just 1 example found
from was although the recent times we have seen the reduction of the rules at the time was that the cause of all the news bulletins were used to rules were going on a cruise on the right with the law or the media you it is used jobs it was actually the last almost all the rest of Asia these reduced its rules further work that it resembles a name but it was that it by allowing about half an hour in the form of words that would act as a group the ISS in what does not mean that the shock of rural the result will added to use Edwards disease walk walks also result will be used to show it off as a a result of the move is all presto so we was out with 1 who already subsidies what I believe is the boy something like shocks of wise for something that is ready made being seen as a with to win the last by the recession a was the way that they can be from should be able to identify those who that potentially
arose on goal was standing and understanding reduced was reduced from what she was based on the dozens of rooms in the city but they said Rory last May said that some of the risk all this is the part of a new front in the 3 days of the Wild West or is there experience with living standing for much of beauty the Germans have been written about her in the 1st half what they were were doing his mind was because of David Wright understanding with with the not but they are not to be found during the 1st five year the the Academy and the tree was the only got 1 we believe that these days the dollar and the South Pole by the idea is clear we so that they just can't Booker depending on we get take a look
at standings before you want to know for example that the views of the rest of text and Jack was being broad it was the the man in the world while this is not the correct the also was not produced to us strauss also you goes to the but recognised for the people talk shows the was not cost to the text Susan examples book now that approach and and will be
rather than for a long work with a small compilation with stemming and savings ingrams have been talked about items trigrams sequences of characters in
these use some explaining his way to explain his example of his work as a review of the trial during the day off the analysis of the plan that these very where the America's below and with the way over the next will be in the the hands of not might try trigrams that we of work sometimes we get also we all the way which just all the 1st try of the game but it pays to be used by them for those of those of are not as far data the the which is and were Indianapolis Ingram Singh common whose job by what they
what the following the analysis of a plan to look at it in the end the these must be related they must belong to to the same standard of was the simple for more standing but also possible that it's a football have Ingrams
a don't belong to the same sort prop Kapolei that with a new book with that the the city jicama neighbors those words and both were sent to some of England's of the operating 1st on the 2nd day of the 2nd set the world might find that a lot of talking to her that this is because this is the the fault of the reference to a new high and we need some then I'd get singularity school for the 2 words that itself is a Simulated is caused by the exploded these on belonged to the same cocaine resembling about
and the number of no common a Ingrams loathsome immunity not really the cost can be issues up but
not so and also and and on new sometimes as a representation instead of words so interesting because of the way the case but they chose to use its
really to the rules and if August a
mobile ways use use 1 and based sauce with its available based but of
stones in the scope to
the work of the public to have
escaped the of problems like Skipton unfortunately we have to round multigroup rich but would expression likeable Guskiewicz of things like the of cost acting easily get lost
and we have been saying


