Formal Metadata

Title Extending Full Text Search Engine for Mathematical Content
Title of Series DML 2008 workshhop - Towards Digital Mathematics Library.
Part Number 14
Number of Parts 14
Author Mišutka, Jozef
License CC Attribution 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal purpose as long as the work is attributed to the author in the manner specified by the author or licensor.
DOI 10.5446/21268
Publisher River Valley TV
Release Date 2012
Language English

Content Metadata

Subject Area Mathematics
Chen so high many still hasn't done it and how to get the nod from the it's OK I was talking about the new mathematical suggesting that we have created it's called it a lot hopefully so 1st of all I would like to to define the term of the search engine beds that zeroing and the definition that you be build up our search engine of on and off the words I will talk about that show the technique used because useful Dexter search-engine citing isn't imagine that's that's a problem the the mathematical nature so I'll talk about his thinking is the most important part of the book then island erupted throughout year of very brief descriptions of all the other available solution then tried to combat them and of the world and you so much information as the tunnels it was the 1st thing OK so the definition of the term of the search engine these other properties so 1st of all we I wanted to search for the elusive scientific documents as well as the historical
scientific documents and this is also of these were actually is about it's about digitizing will scientific documents the result of the surface for the next few years we'll be but the document which was summoned to keep poor because so also depositors of today's .period both human also the source document format does not contain enough cementing information and the result will be probably the most user dominates the media format which doesn't expect to contain any summoning information and other property very important money's there the mathematical search engine has to be able to search in a large collection of theater in this document for example the World Wide Web wider abuse because there will be probably many tools used for the generation so they also produce so many different documents then it has to be stable to search for banks and the from the former now the searching for normal Texas really important as the basis of full-text search engine proved that he lost something information and still get reasonable results so desirable properties .period the main goal the of Matthew search engine most to be you really applicable to the applicable today looking few years that's also why we chose and made several decisions on London later so the challenges that it's
quite clear that from the nature and also mathematical informants used and that's media OK now we will review of their Vermont also very briefly because I think I'm going to invest so it started off very in 2006 and the and then continued as well and
that abroad that the implementation was available at the end of thousands have on at least 1 of his revelation is based on a before a wedge into the job market full-text search engine and the reason for the seats there is no special reason why he chose these search engine it can be another but I wouldn't say there year the index stocks of smart and enough but the mingle with applicability and this has resulted in the has to index also began looking for rock and used in the project for converting the gift to and 2 this resulted in a presentation of mail so we I specialized for presentation of people news use as an extensible augmentation of Britain I will described the center as the most important part of focus the dimensions of search we introduced 2 different fields at least for now there are 2 problems there would be more apt basically just said President vexing to the level of violence sexual part and the Khmer language use now it's electric light so this is still a design on the left side of the year on the left side there is designed along the full-text search engine on the right side of the political parties you can
see the design as an expansion this was made the purpose an arbitrary full-text search engine can easily build things and then use these advantages for example during the functional or whatever you want Our over existing solutions so 1st the document process of finding that serve as on the search engine do we then we also fans documents to the mathematical extension which converts it to amount and then the augmentation of Britain's it is a critical need for 1 which produces text the text is to organize words which are normally indexed by the index so basically just as some information some of works that be indexed and then later used in the search phase out the searching face again it's quite clear that we have to have exigent and letters the section so normally I command the did would you just Texas next section and all it was just very then this database what we do we use the mouse section we execute the augmentation of Britain which produces up to end the forerunner warrants which are then appended to the text sections but each work for each said authorities in the use of a banned separately so we can get up to and various I'll also described as the leader of the free media on bonds which you should know 1st of all this is the point parent 1 of Maximoff maladies well the former recognizing mathematics quite simple because there is so much realist president To say that this is a former Soviet really nothing special bent on additional wouldn't go into the library and all of this is a bit of a surprise to form what organizer is in the hands of a really built it influences the performance in In almost all cases of high-level son ablation available amateurs compared these but basically these 2 organizers responsible for the atomic information unit of former locked in normal full-text search engine that information in is a war is you cannot search of a part of Europe but in the mathematical formulae you have to be able to social will from a lot so when the state must be and people just presently 1 worth about speed and you concentrate power in the form of grants is responsible for providing destabilized the let's over a bear the last sentence the that you consider for support OK so the end of the augmentation of Britain the idea is that we do not store the mathematical formula only in 1 texture representations about the great multiple wants dead
we are able to well then we are able to also returns reasonable needs for some equations and political fallout um does all the conditions of Britain is made up From for Phoenix actually studying the reservation then the transformation generalizations and ordering of of scrap itself Our and as I mentioned that organizer passes the representation in the world so we called the rain then this is a very important because you have to think all the EU's each S and for example the the digitized documents so there's no cementing information probably there are also some opinions that also documents which had some of your friend information available about the user he does not want to import any sending information because he doesn't know what to search for something else so what we do but also in the indexing part and also in the searching part when the correct meaning cannot be deduced from the well probably because of some the information is missing then there is the definition which was responsible meaning for a symbol for example as we begin function of the also constantly so we just choose 1 1 1 meaning of BP's is what circumstance and then we operate also in searching and also in indexing face 50 symbol we'll get probably more results but not less and as I said it's designed as an extension so that he memorization is performed at the start and also at the end of the 1st thing he does normalization so that possible Harris thinks so because some the and also the
digitalization software producers symbols for example have either low free-form maybe more character and the scriptural codes have to be met into 1 so users for example 1 type both sticks and then later all again and that those utilization has to you know as the structured formal mathematical formula because you cannot saved don't foreign destruction forming not full-text search engine database in the database we have in writing for example using leave for example used the idea of a lot so superscript use of special assignments then you don't she bought also these augmentation of Britain-based transformation generalization rules following a series of available there are also more which we use but it's the most important ones so when you have a formal offer then broccoli or the there is no canonical form of mathematics for what the general case so what we actually do we try to create several spaces where a canonical form lot east what we do we just somehow forget that some of the information on the south side of interesting for what it then create these spaces and in the space of and for a lack all equal for we'll just take the form and say these does not solve the problem with reference to major he just produces and you can reduce its very significantly the ordering of the procedures used just 4 at the time did all parents to "quotation mark mathematical formula if only permitted over so you can look at it and you can't take something and of sensuous that the school for model have the same representation to what he people we'll have more people representation so about this case I think it will be quite clear so this is a performer 1st there is the so this is the 1st representation created by the by the culmination review which is then stored
and as you can see here is the difference that used the ordering of work because they are the faltering all the simple functional use of for all more complex or so indeed a further representation really indexed by the database the 2nd representation was created by using distributed so using transformation roles of wife what Our aim is to there during the pregnancy as because in representations of Britons the chance is bigger than you really really just 2 3 1 and the last lasting demonstrates the marginalization of working so what which we do when we know
that this is opposed but well when we know it because of some of the information in days when we had the definition of a single so we have to be in the world Nordic then we just made of the just use the 1 from definition and then have a definition it was a constant so we just political here and basically these representation replaces called the plan was simple and all rivals the world just as another another transformation have generalization Candida which function is replaced by a few function is against the euro I what sort of that I think you I realize that I was going to the because the United Nations section of the country the problem the number of but I hope you good the idea now the surprising thing is the index terms so 1st when only spoke on hour a lot of areas accepted by the then we can expertise when there is something harassed it's applied and we can use this
1 is that you define the word can search for the this also lost search and as you can see what in this way and for example the community marked another book on the witch of accepts all right so now available solution I don't have time for them but I would just love I felt so that possible solution that's indicative semantics indicating molecules from the disseminating information they just they
just try somehow to improve them ranking function by introducing
Harris sticks and the Sematech is mother to introduces students tuition fees for each of the of us so the comparison 1st of all other solution was designed for some of the people documents that is the difference of from the mouth that search
but it's not mathematically precise but can be extended to extend you can always add more rules more transformation and then get out then get more representations OK and the current state of the art so we have an implementation of available but BUY still not finished and it needs more testing when really are
interested Secretary Clinton later at and of it actually use these we use this organizer types there I users still has a dives there are more of them but the problem is that the index database for each organize there's a different in that it was reached organizer and currently number implementation of use only 1 in this database at the time you can choose between them on life in the evolution the ruling as president it's time for it so the conclusion we introduced a method for indexing and searching instruction data using a full-text search engine of this is a
more general approach but we of specific committees
for mathematical notation we also have a prototype implementation of labor market we really hope that it will be available publicly revolution each user's precise searching similar to see 6 similarities searching well actually there are several types of similar searches behind it and the generalization searching as I mentioned the geese applicable of semantic or mathematical documents and the interesting part is that the evaluation shows that really the fine-tuning for example people come over to my there is necessary because the the database override of up to 50 per cent so this is really something that has to be regarded and it is designed as an extension so any full-text search engine can really not so the future of use may need to make enough publicly available because then we can on to make a real evaluation of users and of well probably the next and will we also consider the structure which can be used for similar searching school entry and maybe we could have another let between other other wrote a letter rewriting the rules so the generalization of the Commission rules and then the representation we researched in dancing we'll see to find you and me the rest of the year necessary Nunavut you have to no
actually I have a right to breakthroughs so "quotation mark you lie idle land but not all of them at all yeah I hope so and I eventually about 2 years of 4 years ago he I've ever lost favor with so actually I read something for example the utilization heats of the 100 very similar but will I find it all in my head and neck cancer Have you get anything the number of cost similar cost this and how
how about them by Abbas AD being independently ladies identified yet you do really
depends on the token as so for example when you have these forests alone was started by the program either you would like to welcome to the
best example but let's
suppose you and here is the sole when you have these you can't use the society while in hiding because when would like to search for anything you would not think of something which has titles doesn't so it depends a little bit about this inside these words you can use to ID 1 and I do so for example the new head of the organizer available which can produce all of which we accept everything that you think you can basically Of the basis indexing the index was up divided I do what I do and it does make sense hopefully that's the answer it was a lot of things like this in so for example here this is the former and a single for what which would be found and of them as a result would be like frame multiple B because it will just end up in the representation of the new findings so please he's 1 of the New York City when 1 of security there there are others because the represent these enough practice form and you know that president promised and you will be informed of the nature break so basically what you could do is just search for pool halls and the and you don't care about cooperating in the fight but its future it's that we're here because this is in in which 1 is used in the whole world would you really want this 1st he will be killed by news that the real article that he does not find its 1 back to year it will not find this 1 but find that decisions example there there is another transformation rule which replaces the violence of the flow a similar and if a similar here or it's called the idea of this is just an example we have another thing and it's been replace any of these I'm so basically another was because being interviewed by the square and hearing me also but you also have sold into and out in another representation and you will find the yes basically our changing the arrival names is not alone is not an issue this is the of the all 3 of those old barrels currently being and in 5 different representation are index you can actually of seeing lead as the opposite of the soul but we have to that the Kenneth b
