Models II
Formal Metadata
Title 
Models II

Title of Series  
Part Number 
7

Number of Parts 
12

Author 

License 
CC Attribution 3.0 Germany:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor. 
Identifiers 

Publisher 

Release Date 
2015

Language 
English

Content Metadata
Subject Area  
Abstract 
This lecture gives an overview on Information Retrieval. It explains why documents are ranked the way they are. The lecture explains the most relevant ways for content representation: Automatic indexing and manual indexing. For automatic indexing, the frequencey of word is of special relevance and their influence on the weighting of term are discussed. The most relevant models are introduced. The session on evaluation discusses new metrics like the Normalized Discounted Cumulative Gain. The session of information behavior provides a brief overview and explains the relation to IR. The session on optimization mainly introduces term expansion and fusion methods. The session on Web retrieval is concerned with the quality aspects and gives a basic insight to the PageRank algorithm.

Keywords  Information Retrieval Search Engine Technology Information Behavior Automatic Indexing Academic Lecture Retrieval models Web Retrieval Evaluation of Retrieval Systems 
00:01
the rest Wessels will be today with the 2 law to more new mobile said on after about it so far 1st became forest
00:20
by France crimes said it could apply to support your studies through deadlines approaching 1 4 were all along which are football University so they are really wanted are available someone I get some those who won the among those with selection American applying you cannot get selected there are plenty of other Stevenson grounds that could try to get them all in 1 expects and was happy to have excellent grades good grades sometimes sufficient to get some sold a good opportunity to finding her studies a brief from are
01:08
Coloma and that was pretty the stage for the review and Ingram work 1 thing is different degree during a 1 difference in the solution Sukenik come up with different numbers of this can be great is what Weisbrod with is going to part of the anger among not that's some as defined and you can work with the loss of cost there is no trigram of something by Graham that has only 1 carried on defined by some of the accused and also there is no definition 1st immunity among the objects of their while can be arrested and the something that tools to help with the of questions or not in Room aggression Jewish should have got ozone paper right after 2 and a half back to beat the 2 but there would be the should be frequently file vacancy on in which to foreign successfully is it or anyone in the top of the page to find the support to misses would also put that online than you can chew where it will be along with
02:39
debate which mobilisation was his now being worked on by the before think there will be wanted a language most of work for we wanted ventilation and the 7 1 of the on the road Leyland experimental learning system where you can work with origins and fresh different the combination of variations of the rumours that we all than over the last versions
03:15
of the same in examinations mentioned his pudding and high on the liver discussion forum reject the schedule and this nowold time repeatedly to sick petition in a week at the words of this work and then we have to wait for more time and since accost takes place again in we determined the but can take their exam if you go on for the win 2 terms broader something we have to wait for the next summer when there is a repetition exams for the media to see the king question on the results of the but the Coca
04:04
soul questions of strange it's joke the risk is that the right of the changes look as the to this looks strange looks differently of since the week that they were about as readable would open the way to be able to see most things with the worst websites in the morning but that it would talk about from losses though exact match which is in search of old style which left him in the library and special domain informationsystems women extension extended booleans we talked about fuzzy Serie fuzzy said also get an idea but what was said memory talk of public displays ahead of but the mesh space where it is equal to the number of times in the collection and the we had different from the tumult saying direct should not interested season in morning like the on and the and the Commission for we can think about this as into Matrix distance things like the walk distance and was also the probe is to qualify for services interpreted as the public painting of relevance bosses and the condition of was really down but that doesn't change in a lot in the circulation of and to they will briefly touch the language model which also all the Supreme completely new things but it's interesting that you expect from your things will be the and will talk in the event of fault that 1 of the network model that is quite different in the idea from the metaphorical approach research really thought for Step is to stand any
06:24
statistical language model atop a lot about the right about distribution of the frequency of words in the language the distribution number must such high and causes 1 difficulty words methanol from the city where the kerb the average of contribute to act as if I use the number in 1 document because of the effect on the ratings and we should re into the which politics said once the further says that the couple would be stick multiple but the mobile of language would is the ability of the sequence this to can have some of the ability of the sequence of worth to between by the end of the world to be to use a single words sequences and again independence the worst of all with troops and the latest Mobis
07:23
accuses book on the subject agreements with says basically
07:29
what is the ability of generation of the document given the previous to restrict the ability of of language model of certain words to distribution by just for the which command in what is believed that this year period which model needs to the documentary which were still distribution from but it is all a single word estimated based on the frequency of the text and worst waiting for would be to get again were
08:07
start off with all observation that language is the words of Distributed according to 2pc distribution that we had a closer look just last time and this is can basic it seems that function we have a number 1 most frequent were 2nd frequent work for steps on all the steepest should be worse but for the most treasured and would cannabis and from but the frequency of a word divided the number of all were the of duty the that this work occurred somewhere and and nothing you so far everybody wants to follow
08:52
following which moment IHOP's said this is what this is and for search judges like well if there is the ritual of the in what is being Road crawl abroad for this that this year more leads to to a document which Red would have a certain individuals distribution just of frequency
09:23
basically this is they language model which should be the frequency of the words in collection 1 document the order by ranked by the range in which the frequency of the progresses transferred to from abilities but it by the end of the ball or in 1 documentary so we can have different language model of was 1 document that talks about where walls that the example we would have the world at the height of position than on average that would make this document special would which is to be which is different from ever model with language mildest from the language of the 2 men were or collection no make interesting special for of the leader of the with pools of is more likely that
10:23
this leads to some sort way for me are the ones that get Israeli nearest called smoothing of the distribution language
10:35
model wants to transform the strange that function of the difficult to model with a regulatory can of this in the world last time also put this spreadsheet on Monday to see examples into which model wants to make again this move function of state beauty Cawsey overall could be can increase in interesting he wants to sign a certain amount 0 pro ability to all words the developer in this but the sole have even that means practise even the 1st search for a word of mouth like the river which is not in the document are still below probability of finding cases where none of the words is and the some Sol we take away but of the probability of reverse the high frequency words can give at 2 0 words so that they have some probability different from 0 small low cost and on Weaste move function for
11:56
example where a simple way to do that is to find a curved accounts will points told a point said the measure would not to teach the the observed and we can have the N 1 effort that as if the word doesn't as it abuse wants to what don't take the and from ability to word documented is the not really the kind see seaweed can be which is the founder of the Red plus 1 idea that would be the moment were on is divided through the book and the signs are actually to the length of document in general that gives me the beauty and moved from the word go wrong would move 5 times that indicated that the residents of the for of the ball this not say on his 5 times within 100 weren't so the probability of the current anywhere used the word also East 5 divided to 100 also destroyed fought so that will be be seen the ball out the way by a review of the way all the work of the house we have both the role of the size of its economy of what we said was want to all of us of boss of the what all the dead in the water exercise full back in order to hand over all the based on we also the and the and the division of made up for it and there were also some receive this year old while the by governments are words and by the files were found at the site why small number of these small of worship ability to get to the non open works for
14:56
example in a very small part example again 3 documents 3 3 times a week can now we could Calculate we have to the common the comes now you could Calculate weights according to 1 of the 4 million plans for the identity offence of for was widely known that not as a language multiple led the way we do smooth and we give them the moving needs to situation where we don't have 0 Elliott and unique if idea have would have up 0 values for words about her India we would have a look at the size of the we have different document length just to some of the words that he document is along with has 6 words into the mouths followed that we had this week at 1 stage we got division so we have 0 like the roof off a in 1 of his hero times with a one that makes it 1 divided by foreign calls for his the left the documentary which is why those criticises the was for example riff ONg for documentary may have come the 5 words 5 0 1 so rector for government 1 4 documentary all 1st smooth and we end up with 6 5 1 divided by 9 9 document late plus with results but the 6 weeks to 6 divided 9 and 1 divided from the which Tendulkar so we end up with Mobility's all add up to 1 which document and these are not the language malls and but small enough to fit to push of 3 steps now but they follows function and that set the stage for would lead to language and a way to to waiting to happen when it begins in the in the region function like the dust and smoke in sum the Fauquet questions understood straight for quite like nothing really difficult Louisianan ideas he says now there is a big
17:55
problem of costs that anyone only become something that is more than than the sum of the idea so following the all words equally so very frequently gets the same moving the same editions of 1 man the Graham but the and they are the and the dried the said paillettes consider the general frequency of a word in a general collection by in reference collection my collection renewed reference collection for the biggest job collection of something like that and we get it from the age of a first word in that General Baja principle said is representative for the language of talking so keen of a word that operating in reference Group which Medway now we can deal
18:58
with the smoothing different they believe it's a booking interpolation meeting means we add to that the fact Azia and there is a mixture parameters that it won't fall on the 1st was that 1 of his greatest plays in 2 Dubai and what caused by 1 of only the this along the road with suspected the and the 2nd is that this will of work in the 1st leg and the 1st part of this is the wrong side against to documents science and the size of this reference to the all on the grid in the year to the end of last walking around the world as well as all words not of all where he used to that the UK will be leaving where there the locals in the world were of the lower leg of their tie with a lot of the things I would do it although his programme a Shaun should still say for the exact based on that he had led the way as the rest Francisca reference
20:57
model from ability of the call from language model large reference collection that could be like British National caught cost of some free entry new issues that what is really going to work in general usage that can also be my collection affected special the main specific search for a budget of British breakfast collection for
21:27
critics and he did so every edit only 1 where we not episode probability that we get from them give justice to frequent words of justice to be free of the sort quite easy is basically a very Scimeca the of the is a burst moving methods of language which more have come up with a lot of different things but
22:02
we don't really have to remove so many smoothing meshes have been applied to his functions as research is going on in the game where the new parameters became change rhymes so far we have a lot about different methods to do or where different matches different weightings with ideas to fight the move will worry idea of different basis fractionalized into 4th with 4 matches functions American based goes on in the distance away in between we have different Spain algorithms with different methods for spending different rules sets so not solve complicated parameters that make up 1 of the system works and how it seems accusing the it and we don't know why these scumbag too complicated and the only thing we can do is Evaluation at somewhere in the not next session but the section after words will talk about the book of the revolution and how this can be done in 1 of the problems associated language
23:26
whom interesting way to think about it sort by virtue random is basically idea using less in the way this is a page from his the both the by diverge of the beast in documented frequency from its frequency with the the collection meanwhile the emergence from randomness attorneys the more information is carried by the members in the public about the if the poorest frequent but also 3 the million in the language and the as saying that were doesn't care load information so following at the condition of the ball at his best on the land but was not and that is basically based on the idea that all the good work of the futility of the of the vote but the frequent they see the laws of England but that purchase from the that of the general random as the basic general was the more coverage of the with different dynamic which was disposals and for me it is not about to stop the ball information term carries the about how to and this nature of the for what it is a measure of just to east document that is also part of the city it as this term history for meticulous because the pews much more in the game and then that they were being made general language production that means that the city is also means the language well so it is division observed that this document is different from the general mutual observed for the distribution of these were different between documents the body the words of the ending of the public that they right about in the mud than which will like in different from that of the general language and somebody asking for 10 it's really like to be but if somebody Eswar we return then documents the dispute he is more less is often and it's supposed to be be the expected randomness then turn that does not really help that what interesting notions of for behind which
26:30
basically that some of the read behind to determine waste and interesting new is also distributions smooth that reassigned to public non 0 publicly all work and that is basically a case of that and that it's not really something from the match in New and was still of stimulus things at the the of the model of the year with the frequency of a word found of cost it is see
27:02
you all over the need as basic but he ever same thing right just felt it differently of the of the victims from the 18 mobilised and but to the moving
27:17
step if you are worse still it would be time to get is more questions and of which when we move to the next
27:40
ball to the last with a number of this is a game more or metaphorically we come to the party officials who wrote words and this is 1 of the network model said is to be used commission your networks or method of used in June learning and computer science from all weakness in learning a review the capable of learning time something and is now networks are part of also algorithms said the nature in spite Algorithms for example anybody of other examples that things in nature to cut more intelligent and it was all so great recusant were that works in Miller and this away would from also do of nice things but Computer Science classes and that the case for example genetic Programming that the horror of genetics is being you was tumult of general of solutions are created and only the best of life in the genetic algorithm that we lost solution space to exploit the the and no big some police spokeswoman telogen see the of piece of animals were just model in the morning the same but the of the Virgin in them and the British for the biggest prise intelligently forest Saviola's concerned is also examples of Barriopedro networks not them all of cause something if you were to make them all the functions of the way the human early in the match brain functions 1 over simple level and and they are also very call
30:03
for was of cost natural brains and now what I want to their early model simple sold to push through the that the brain is built up over very simple Fastest but where many of the so each year on by itself came think of this year on the brink as a whole can think and romantic exhibits small this sounds and you reach Nerone itself can only operating in the United where you can look at it in to receive signals in put on only would like this to maps can be active on on different became some committee will not as model of the itself into look the input and many of us at some of the with this of that the food is over so then it will start to send out a signal to other basically from a the simple level at the some cost rooms the signals to pick the amount of a strength like you along the way to bring the frequencies but stressed that such a big difference in the way to means created is only by a message in the play off the many many can only understand the bring known makes sense of the British to the US from the mean there is not 1 single among the practise in some on the hands of the UN's activity represents a Soviet per cent was in the process
32:06
this year is a very nice statement would in Germany because it was published in Germany in the 2 quite nicely Information Processing was carried out with a large number of simple processors that are collected in dense network of these processor was process is a unit of England's only welcome locally only look at only to beauty and enables the connected to each by South communicates before the units my singles and singers and sandalwood connexions to pay we have symbols things but the highly connected and the simple things to to the processors were sympathy to get in and sent take convicted of civic under
33:01
your and send and receive tamarix giving the 10 now this is a non unit of all is the man for for the shorter memory something that we think of right now or COD activity in the brain but it may for getting a few seconds few seconds and other sales activated division goes all 6 of us in the game and the rooms are not 30 videos and on base in the means all different points and 9 different kinds of activation of her friends will be covered with the king off and that can be Irish polo shirt to memory and something else happens in the brain bring can learn anything that no matter what this means connexions change awaits said that the high much activity to they pass on something active news activating as a connexion between the of it depends on the level of the collection the weight of a collection of all much activity is that through everything just the and these weights off the head of the collection they changed Maricel it would and that corresponds to the amount of memory for users to learn what the rate of time the weights on the days when England feedback to receive this basically Woodward is small as the room 24 from network of the simulated computer that take a few aspects of soul modification of these ways is called learning and awaits under parameters allowances tomorrow never changeable to anything different than what they are supposed to keep Commwlth via and that after a certain level of activity in put is reached when the certified as another until sooner singers allows synapsis or connexions as we called them to say that's a basic give
35:33
access to a full of all the condition of the network on 1 side we have around the input the input of the nearest like a using put off the front of the current activation saying it is too late to change the fact that is what of all the love and all the while the change in the climate of the time out of the 4th as the only 1 there were 1 2 0 0 4 1 connexion with large network and that this had been a long time moves to function like some or all of the use of a new leader for the last 2 other leaders were on side new state of the teacher as special or not has also via and in the end it took most walls are unlikely to bring you great economic over of the day with General will despite during the smiles we have a levels of the by could face a
37:04
bill of more less estate function message this activism cars understand and but the but you don't want where the in the Far division networks we have the just activation functions of that poverty in with the elected but he and the that activation Rosie input into rose to go to the right all activation also rose with is steep function based service simulator this special function but cost and of and all so far understood for questions but the basic principle is really really very simple enough for some of the way up and eye uses of grim you such functions became grid networks and have been put through networks worst it and some information where we think about the way we want to walls nominating algorithm area of across a sing steps of educated somewhere near said they had committed to the the Hall of the People see anything I between is not regions but if we do not of learning we can June this new networks so that it into the nett interesting things like and desiccation of entry carried distinct and that is to begin with to use complex classification and but successful algorithms the
38:48
next day hot day was not for example in European information the tree and the and the sum of eyes are situation but that's the way it used to be the most or a systems that use the works that we don't have the such as well of something is expressed by head of the mission he would have a new set me something which is not true range of his body was systems and sold a successful so each new loans units stands for the firm for a vote document later in the book and would could have several which is being built the means that you don't need anything but registration is so we have which document 1 of the NHS and the for all in order to the connexion the outside world and the user although smoking was stopped the uses is interested in so he acted year that means the new on representing his to work with such a sense he was not what is the point of the day changed another but some lasting about the standard of has said the relaunch long all ages in the collection and is based in the shooting the selection are waiting always looked donations to the man in public life that these ways you all the novel that the law does not the sort of conditions that uses a fact there were about 7 in the area to stop it is to be shown by from by the with the age when they should be smaller off cells in our common charges to introduce a new initiative to reduce of this as a collection that the loss of their giving the resist the whose because we need to be connected documents it's because he is not a lot of chances and we were all so the relaunch is in the collection is a change of learning about rooms the hips ruled needs to the nation's where the close Murdoch losses of those they are so far away somewhere this represents the colleges novelist said it who was the issues suitable so far too soon to see the light at the end this is not sense symbols who 6 of the 7 billion and all so far as fighting to say I've had a lot domestic think such a big deal but the those network has some nice features 80 ideas for could Shirlaw saucepan and based on the the rule of law and had been brought in and the world by air of this the 1st of the 4 most dangerous side just the speculation we should all be inside size was use squeeze draw last basically size using and the integration of the new strings and and we are all very well him with yes this is each of these men and women all of the whom seem No to be a job because of the whole whose also realise might not as a very interesting thing which was mentioned each of bridges area we with to study every government for over and year these elections in the past all the balls but these days there lot of set of 1st in the following year reduced the game to them not to get bogged and so new Chancellor along the but best of all related to the use because of using the Parliament Act was that it all networks on their another strong with strong action to bowl and that with a lot of the Government's associated with a view to setting up free the Strand depending on which so this Processing selected for several times was the youth that was to be a also things activity but
45:21
in Jerusalem choosing the strongly that times of the day seemed to be some way off that for a long way to go in the name just something which is so much interest that activities in the United this resolutely all they said was that he had acted bodies of has also uses interesting time step was a good 1st day what it is be which is also a good way of using up its is spreading local variations in the spread of the divisions in the city as waitress in those including the network of Moody's representing some of the documents of the government of the sort that the at the end of the day was that the world will talk about this but is by that the walls of this method of some of the for work the 1st few will have to win something with just men in the work at the site where the interest rate is interested in the ideas of this that can be happy with it would be the last season after it was introduced by the Prime wiped off he had acted at night distinguish anxiety also said that is the documents the largest ready to choose to do this is by far the change anything about the work at the bottom end of the classic of up fighter and a go back the work and then at the very high at the documents and those are the sort of that worse for of Whitehall football mad forest might interest is spreading through the network of objects of elements and it's quite flexible never documentary term as being both the document in the room in which would have saved them the rules of the old get what this learning means that the weakness of the state Jewish moles to be only to 19 learning means the long term although is injected before just terms a my open race from Matrix easyJet into the connectivity Matrix into the strength of the strength of the collection period so far for the British whether your networks they include different for most of the last 1 is for example HIPS rules as a group of men synapsis between arrested was along the strength of their on the highway and that something we have also Neturei injured learning is 1 way learning Evans's at the next is that the user much on will be a more was visited a high with the activity flows but the we even by scoring closer together all by spending more time than the the called this month long when the electric should electric transmission changes to chemical transmission between the 2 movements paratransit distributed Copeland is can be influenced by chemicals and the chemical prices can be easy when the synapses of small of the distance between food sales rose smaller than as part of the learning of the natural brands and this could also be made by most bedridden and was the don't use the word that we can find the differences between shorter and ultimately that we talk about what we are seeking what to memories what we know about the relation between documents from way separate populated simply injected into the nett worth of the game change but those who change and what we think of write about what I was pointing to is church and that's what the basically the is interested not only its activated for what its had and that is what said that Britain had the worst of the case to sell any of the try to visualise the crisis and it was that the best that more things get wall and in off or
51:54
Celso of issues and in June it interesting things after Fingleton when does activation and after 2 3 4 5 steps tried to define something and 1 point it stops and then the most activated documents up presented to to use something else it comes quite natural we can have a different we can never be led the next for example between Simmons
52:25
distaste the connexions between documents and but if we know that this term network has something to do with the Disability loss of a plant that could inject easily into the robust integrating collection then transmission are exhibitionist transmitted automatically revealed that have from the with 2 2 other related to receive can use of to Tsarist for example and in the air connexions but where had at
53:02
different less that offer the situation later in the day for a week but also the latest when you don't need to have interpretation which is 1 of the strengths of cost of the work of it subset several intuitive expansion of the world we don't have to to be would relevance to the as it happens in the network full will have to work and we can have the from creamy types documents in terms which offers who be so we have to find flexible model that allows different ways to to retrieve so to
53:52
file model time expansion is during the huge feature also expulsion something that we have not talked about the need function not really that the 1st thing
54:10
dimensions once thought it this get these activated at the 2nd said that the network will be activated celebrities guide activated notes which represent interest in it and that means that turned the activated testing the rich people that I have to to write down into the fact that his called to mixed often users are oases Tiepolo's what goes the more the those about their interest in the fate of the results will be and since used often don't act many systems have as many trains as a nice Systems often expunged by adding that the shelter and singer of is interested in network also look searches with see and his time expunges also something becomes quite natural in the spring which has
55:21
been divided into buglers his by quite found the pitch systems in track campaign entrepot nice results but the systems of
55:36
limitations basically since the time documentary tricks corresponds fully to the connexion Matrix to separate take the weights and injected into law the collection which the basic have another realisation the different technique a realisation of the space basically we have the same something like learning as 0 record all this part of possible but between implemented the was in the grip of the following measure of supplies the the supplies and the tools of his use of the strength and unsuitable supplies the the stand at the back and activity but give you the strategy of the group from up this pattern of parallel sites good and then weights can be changed based on the same route but find in the wrecked all that the next time there is with supervised in success system which system for but this big need learning more flexibility and we have talked about the videos use only 1 almost the most to activation steps than their fight with the Americans if nothing else in the greatest press but the network of cost itself System itself is quite interesting and we have to once thought the ball learning to become 1 of the classic into the declining mining to supervised the and some the costs idea of that things area this
57:34
closeout discussion on a questions about the language falls still below network long well suited easier looks of a number of the few things to calculate the best in the world as we have only discussed in 1 million on the would right at all the rankings system so Wallace the discussion about walls test to some extent losses the 4th said it used to having information which he will be a great passion for the walls of the world and we've seen that there is about people became the displays of with the team is more on the way the front again they were there for the commission systems are not so different to the and thought of this new as the but agrees that the the risk in a bid to be reached by making new models and expected 1 of the less inventions language model the rich accessible but it sometimes difficult terrible before months that the in 25
59:01
waiting on the road for about but what is more important is really Implementation details using that there are so many parameters and system and we can change and zone and the user in addition to publish the effectiveness of that of other things on board how well the heuristics what are would is used interactions attention of the tree was systems
59:40
fishing track something to talk about when we come to it that they should cheque but it takes to choose from France see that no real Brovaz has been made by just coming up with someone the improvement in the way and we have to learn more about this simulation cheque they were more now
1:00:10
we can see the real in the West about the language Malta because it was a small work on the ways seizing the language there are 2 0 documents and times 1 will love that calls of but 1 4 times and use once in document who and we only have the ladies to and you see the English but with a land which about the mutual remember the 2 factors controlled by the end of year is available and can offer was wrong to have some real estate reference collection reuse but what just like way the about the 2 words you get already makes some cases what is more often than the House of its UK files to find out what read the frequency then around the cost we assume that the collection sizes 1 million and we assumed that the we have bottom end length you can find in the book are the size also on the board of flights of India on the agenda the number and phobias the result that can help you to control the